summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2018-01-09blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mqTejun Heo
After the recent updates to use generation number and state based synchronization, blk-mq no longer depends on REQ_ATOM_COMPLETE except to avoid firing the same timeout multiple times. Remove all REQ_ATOM_COMPLETE usages and use a new rq_flags flag RQF_MQ_TIMEOUT_EXPIRED to avoid firing the same timeout multiple times. This removes atomic bitops from hot paths too. v2: Removed blk_clear_rq_complete() from blk_mq_rq_timed_out(). v3: Added RQF_MQ_TIMEOUT_EXPIRED flag. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: "jianchao.wang" <jianchao.w.wang@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-09blk-mq: replace timeout synchronization with a RCU and generation based schemeTejun Heo
Currently, blk-mq timeout path synchronizes against the usual issue/completion path using a complex scheme involving atomic bitflags, REQ_ATOM_*, memory barriers and subtle memory coherence rules. Unfortunately, it contains quite a few holes. There's a complex dancing around REQ_ATOM_STARTED and REQ_ATOM_COMPLETE between issue/completion and timeout paths; however, they don't have a synchronization point across request recycle instances and it isn't clear what the barriers add. blk_mq_check_expired() can easily read STARTED from N-2'th iteration, deadline from N-1'th, blk_mark_rq_complete() against Nth instance. In fact, it's pretty easy to make blk_mq_check_expired() terminate a later instance of a request. If we induce 5 sec delay before time_after_eq() test in blk_mq_check_expired(), shorten the timeout to 2s, and issue back-to-back large IOs, blk-mq starts timing out requests spuriously pretty quickly. Nothing actually timed out. It just made the call on a recycle instance of a request and then terminated a later instance long after the original instance finished. The scenario isn't theoretical either. This patch replaces the broken synchronization mechanism with a RCU and generation number based one. 1. Each request has a u64 generation + state value, which can be updated only by the request owner. Whenever a request becomes in-flight, the generation number gets bumped up too. This provides the basis for the timeout path to distinguish different recycle instances of the request. Also, marking a request in-flight and setting its deadline are protected with a seqcount so that the timeout path can fetch both values coherently. 2. The timeout path fetches the generation, state and deadline. If the verdict is timeout, it records the generation into a dedicated request abortion field and does RCU wait. 3. The completion path is also protected by RCU (from the previous patch) and checks whether the current generation number and state match the abortion field. If so, it skips completion. 4. The timeout path, after RCU wait, scans requests again and terminates the ones whose generation and state still match the ones requested for abortion. By now, the timeout path knows that either the generation number and state changed if it lost the race or the completion will yield to it and can safely timeout the request. While it's more lines of code, it's conceptually simpler, doesn't depend on direct use of subtle memory ordering or coherence, and hopefully doesn't terminate the wrong instance. While this change makes REQ_ATOM_COMPLETE synchronization unnecessary between issue/complete and timeout paths, REQ_ATOM_COMPLETE isn't removed yet as it's still used in other places. Future patches will move all state tracking to the new mechanism and remove all bitops in the hot paths. Note that this patch adds a comment explaining a race condition in BLK_EH_RESET_TIMER path. The race has always been there and this patch doesn't change it. It's just documenting the existing race. v2: - Fixed BLK_EH_RESET_TIMER handling as pointed out by Jianchao. - s/request->gstate_seqc/request->gstate_seq/ as suggested by Peter. - READ_ONCE() added in blk_mq_rq_update_state() as suggested by Peter. v3: - Fixed possible extended seqcount / u64_stats_sync read looping spotted by Peter. - MQ_RQ_IDLE was incorrectly being set in complete_request instead of free_request. Fixed. v4: - Rebased on top of hctx_lock() refactoring patch. - Added comment explaining the use of hctx_lock() in completion path. v5: - Added comments requested by Bart. - Note the addition of BLK_EH_RESET_TIMER race condition in the commit message. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: "jianchao.wang" <jianchao.w.wang@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Bart Van Assche <Bart.VanAssche@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-06lib/scatterlist: Introduce sgl_alloc() and sgl_free()Bart Van Assche
Many kernel drivers contain code that allocates and frees both a scatterlist and the pages that populate that scatterlist. Introduce functions in lib/scatterlist.c that perform these tasks instead of duplicating this functionality in multiple drivers. Only include these functions in the build if CONFIG_SGL_ALLOC=y to avoid that the kernel size increases if this functionality is not used. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-06block: move bio_alloc_pages() to bcacheMing Lei
bcache is the only user of bio_alloc_pages(), so move this function into bcache, and avoid it being misused in the future. Also rename it to bch_bio_allo_pages() since it is bcache only. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-06block: bounce: don't access bio->bi_io_vec in copy_to_high_bio_irqMing Lei
Firstly this patch introduces BVEC_ITER_ALL_INIT for iterating one bio from start to end. As we need to support multipage bvecs, don't access bio->bi_io_vec in copy_to_high_bio_irq(), and just use the standard iterator for that. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-06block: introduce bio helpers for converting to multipage bvecMing Lei
The following helpers are introduced for converting current users of direct access to bvec table, and prepares for supporting multipage bvec: bio_pages_all() bio_first_bvec_all() bio_first_page_all() bio_last_bvec_all() All are named as bio_*_all() to following bio_for_each_segment_all(), they can only be used on bio of !bio_flagged(bio, BIO_CLONED), that means the whole bvec table is covered. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-05block: introduce zoned block devices zone write lockingChristoph Hellwig
Components relying only on the request_queue structure for accessing block devices (e.g. I/O schedulers) have a limited knowledged of the device characteristics. In particular, the device capacity cannot be easily discovered, which for a zoned block device also result in the inability to easily know the number of zones of the device (the zone size is indicated by the chunk_sectors field of the queue limits). Introduce the nr_zones field to the request_queue structure to simplify access to this information. Also, add the bitmap seq_zone_bitmap which indicates which zones of the device are sequential zones (write preferred or write required) and the bitmap seq_zones_wlock which indicates if a zone is write locked, that is, if a write request targeting a zone was dispatched to the device. These fields are initialized by the low level block device driver (sd.c for ZBC/ZAC disks). They are not initialized by stacking drivers (device mappers) handling zoned block devices (e.g. dm-linear). Using this, I/O schedulers can introduce zone write locking to control request dispatching to a zoned block device and avoid write request reordering by limiting to at most a single write request per zone outside of the scheduler at any time. Based on previous patches from Damien Le Moal. Signed-off-by: Christoph Hellwig <hch@lst.de> [Damien] * Fixed comments and identation in blkdev.h * Changed helper functions * Fixed this commit message Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-05lightnvm: set target over-provision on create ioctlJavier González
Allow to set the over-provision percentage on target creation. In case that the value is not provided, fall back to the default value set by the target. In pblk, set the default OP to 11% of the total size of the device Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-05lightnvm: make geometry structures 2.0 readyMatias Bjørling
Prepare for the 2.0 revision by adapting the geometry structures to coexist with the 1.2 revision. Signed-off-by: Matias Bjørling <m@bjorling.me> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-05lightnvm: remove lower page tablesMatias Bjørling
The lower page table is unused. All page tables reported by 1.2 devices are all reporting a sequential 1:1 page mapping. This is also not used going forward with the 2.0 revision. Signed-off-by: Matias Bjørling <m@bjorling.me> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-05lightnvm: remove unnecessary field from nvm_rqJavier González
Remove the wait filed in nvm_rq. It is not used anymore, as targets rely on the functionality provided by the LightNVM subsystem when sending sync I/O. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-05lightnvm: remove hybrid ocssd 1.2 supportMatias Bjørling
Now that rrpc have been removed. Also remove the hybrid 1.2 support from the core. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-05lightnvm: use internal pblk methodsMatias Bjørling
Now that rrpc has been removed, the only users of the ppa helpers is pblk. However, pblk already defines similar functions. Switch pblk to use the internal ones, and remove the generic ppa helpers. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-12-17Merge branch 'WIP.x86-pti.base.prep-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull Page Table Isolation (PTI) preparatory tree from Ingo Molnar: "This does a rename to free up linux/pti.h to be used by the upcoming page table isolation feature" * 'WIP.x86-pti.base.prep-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: drivers/misc/intel/pti: Rename the header file to free up the namespace
2017-12-17drivers/misc/intel/pti: Rename the header file to free up the namespaceIngo Molnar
We'd like to use the 'PTI' acronym for 'Page Table Isolation' - free up the namespace by renaming the <linux/pti.h> driver header to <linux/intel-pti.h>. (Also standardize the header guard name while at it.) Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: J Freyensee <james_p_freyensee@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Clamp timeouts to INT_MAX in conntrack, from Jay Elliot. 2) Fix broken UAPI for BPF_PROG_TYPE_PERF_EVENT, from Hendrik Brueckner. 3) Fix locking in ieee80211_sta_tear_down_BA_sessions, from Johannes Berg. 4) Add missing barriers to ptr_ring, from Michael S. Tsirkin. 5) Don't advertise gigabit in sh_eth when not available, from Thomas Petazzoni. 6) Check network namespace when delivering to netlink taps, from Kevin Cernekee. 7) Kill a race in raw_sendmsg(), from Mohamed Ghannam. 8) Use correct address in TCP md5 lookups when replying to an incoming segment, from Christoph Paasch. 9) Add schedule points to BPF map alloc/free, from Eric Dumazet. 10) Don't allow silly mtu values to be used in ipv4/ipv6 multicast, also from Eric Dumazet. 11) Fix SKB leak in tipc, from Jon Maloy. 12) Disable MAC learning on OVS ports of mlxsw, from Yuval Mintz. 13) SKB leak fix in skB_complete_tx_timestamp(), from Willem de Bruijn. 14) Add some new qmi_wwan device IDs, from Daniele Palmas. 15) Fix static key imbalance in ingress qdisc, from Jiri Pirko. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits) net: qcom/emac: Reduce timeout for mdio read/write net: sched: fix static key imbalance in case of ingress/clsact_init error net: sched: fix clsact init error path ip_gre: fix wrong return value of erspan_rcv net: usb: qmi_wwan: add Telit ME910 PID 0x1101 support pkt_sched: Remove TC_RED_OFFLOADED from uapi net: sched: Move to new offload indication in RED net: sched: Add TCA_HW_OFFLOAD net: aquantia: Increment driver version net: aquantia: Fix typo in ethtool statistics names net: aquantia: Update hw counters on hw init net: aquantia: Improve link state and statistics check interval callback net: aquantia: Fill in multicast counter in ndev stats from hardware net: aquantia: Fill ndev stat couters from hardware net: aquantia: Extend stat counters to 64bit values net: aquantia: Fix hardware DMA stream overload on large MRRS net: aquantia: Fix actual speed capabilities reporting sock: free skb in skb_complete_tx_timestamp on error s390/qeth: update takeover IPs after configuration change s390/qeth: lock IP table while applying takeover changes ...
2017-12-15Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "Misc fixes: - Fix a S390 boot hang that was caused by the lock-break logic. Remove lock-break to begin with, as review suggested it was unreasonably fragile and our confidence in its continued good health is lower than our confidence in its removal. - Remove the lockdep cross-release checking code for now, because of unresolved false positive warnings. This should make lockdep work well everywhere again. - Get rid of the final (and single) ACCESS_ONCE() straggler and remove the API from v4.15. - Fix a liblockdep build warning" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tools/lib/lockdep: Add missing declaration of 'pr_cont()' checkpatch: Remove ACCESS_ONCE() warning compiler.h: Remove ACCESS_ONCE() tools/include: Remove ACCESS_ONCE() tools/perf: Convert ACCESS_ONCE() to READ_ONCE() locking/lockdep: Remove the cross-release locking checks locking/core: Remove break_lock field when CONFIG_GENERIC_LOCKBREAK=y locking/core: Fix deadlock during boot on systems with GENERIC_LOCKBREAK
2017-12-15pkt_sched: Remove TC_RED_OFFLOADED from uapiYuval Mintz
Following the previous patch, RED is now using the new uniform uapi for indicating it's offloaded. As a result, TC_RED_OFFLOADED is no longer utilized by kernel and can be removed [as it's still not part of any stable release]. Fixes: 602f3baf2218 ("net_sch: red: Add offload ability to RED qdisc") Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15net: sched: Add TCA_HW_OFFLOADYuval Mintz
Qdiscs can be offloaded to HW, but current implementation isn't uniform. Instead, qdiscs either pass information about offload status via their TCA_OPTIONS or omit it altogether. Introduce a new attribute - TCA_HW_OFFLOAD that would form a uniform uAPI for the offloading status of qdiscs. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-14Merge tag 'pm-4.15-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fix from Rafael Wysocki: "This fixes an issue in two recent commits that may cause pm_runtime_enable() to be called for too many times for some devices during the "thaw" transition belonging to hibernation" * tag 'pm-4.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / sleep: Avoid excess pm_runtime_enable() calls in device_resume()
2017-12-14Merge tag 'trace-v4.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Various fix-ups: - comment fixes - build fix - better memory alloction (don't use NR_CPUS) - configuration fix - build warning fix - enhanced callback parameter (to simplify users of trace hooks) - give up on stack tracing when RCU isn't watching (it's a lost cause)" * tag 'trace-v4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Have stack trace not record if RCU is not watching tracing: Pass export pointer as argument to ->write() ring-buffer: Remove unused function __rb_data_page_index() tracing: make PREEMPTIRQ_EVENTS depend on TRACING tracing: Allocate mask_str buffer dynamically tracing: always define trace_{irq,preempt}_{enable_disable} tracing: Fix code comments in trace.c
2017-12-14Merge tag 'pci-v4.15-fixes-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: - add a pci_get_domain_bus_and_slot() stub for the CONFIG_PCI=n case to avoid build breakage in the v4.16 merge window if a pci_get_bus_and_slot() -> pci_get_domain_bus_and_slot() patch gets merged before the PCI tree (Randy Dunlap) - fix an AMD boot regression in the 64bit BAR support added in v4.15 (Christian König) - fix an R-Car use-after-free that causes a crash if no PCIe card is present (Geert Uytterhoeven) * tag 'pci-v4.15-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: rcar: Fix use-after-free in probe error path x86/PCI: Only enable a 64bit BAR on single-socket AMD Family 15h x86/PCI: Fix infinite loop in search for 64bit BAR placement PCI: Add pci_get_domain_bus_and_slot() stub
2017-12-14Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "17 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: arch: define weak abort() mm, oom_reaper: fix memory corruption kernel: make groups_sort calling a responsibility group_info allocators mm/frame_vector.c: release a semaphore in 'get_vaddr_frames()' tools/slabinfo-gnuplot: force to use bash shell kcov: fix comparison callback signature mm/slab.c: do not hash pointers when debugging slab mm/page_alloc.c: avoid excessive IRQ disabled times in free_unref_page_list() mm/memory.c: mark wp_huge_pmd() inline to prevent build failure scripts/faddr2line: fix CROSS_COMPILE unset error Documentation/vm/zswap.txt: update with same-value filled page feature exec: avoid gcc-8 warning for get_task_comm autofs: fix careless error in recent commit string.h: workaround for increased stack usage mm/kmemleak.c: make cond_resched() rate-limiting more efficient lib/rbtree,drm/mm: add rbtree_replace_node_cached() include/linux/idr.h: add #include <linux/bug.h>
2017-12-14mm, oom_reaper: fix memory corruptionMichal Hocko
David Rientjes has reported the following memory corruption while the oom reaper tries to unmap the victims address space BUG: Bad page map in process oom_reaper pte:6353826300000000 pmd:00000000 addr:00007f50cab1d000 vm_flags:08100073 anon_vma:ffff9eea335603f0 mapping: (null) index:7f50cab1d file: (null) fault: (null) mmap: (null) readpage: (null) CPU: 2 PID: 1001 Comm: oom_reaper Call Trace: unmap_page_range+0x1068/0x1130 __oom_reap_task_mm+0xd5/0x16b oom_reaper+0xff/0x14c kthread+0xc1/0xe0 Tetsuo Handa has noticed that the synchronization inside exit_mmap is insufficient. We only synchronize with the oom reaper if tsk_is_oom_victim which is not true if the final __mmput is called from a different context than the oom victim exit path. This can trivially happen from context of any task which has grabbed mm reference (e.g. to read /proc/<pid>/ file which requires mm etc.). The race would look like this oom_reaper oom_victim task mmget_not_zero do_exit mmput __oom_reap_task_mm mmput __mmput exit_mmap remove_vma unmap_page_range Fix this issue by providing a new mm_is_oom_victim() helper which operates on the mm struct rather than a task. Any context which operates on a remote mm struct should use this helper in place of tsk_is_oom_victim. The flag is set in mark_oom_victim and never cleared so it is stable in the exit_mmap path. Debugged by Tetsuo Handa. Link: http://lkml.kernel.org/r/20171210095130.17110-1-mhocko@kernel.org Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently") Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: David Rientjes <rientjes@google.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Andrea Argangeli <andrea@kernel.org> Cc: <stable@vger.kernel.org> [4.14] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14kernel: make groups_sort calling a responsibility group_info allocatorsThiago Rafael Becker
In testing, we found that nfsd threads may call set_groups in parallel for the same entry cached in auth.unix.gid, racing in the call of groups_sort, corrupting the groups for that entry and leading to permission denials for the client. This patch: - Make groups_sort globally visible. - Move the call to groups_sort to the modifiers of group_info - Remove the call to groups_sort from set_groups Link: http://lkml.kernel.org/r/20171211151420.18655-1-thiago.becker@gmail.com Signed-off-by: Thiago Rafael Becker <thiago.becker@gmail.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: NeilBrown <neilb@suse.com> Acked-by: "J. Bruce Fields" <bfields@fieldses.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14exec: avoid gcc-8 warning for get_task_commArnd Bergmann
gcc-8 warns about using strncpy() with the source size as the limit: fs/exec.c:1223:32: error: argument to 'sizeof' in 'strncpy' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess] This is indeed slightly suspicious, as it protects us from source arguments without NUL-termination, but does not guarantee that the destination is terminated. This keeps the strncpy() to ensure we have properly padded target buffer, but ensures that we use the correct length, by passing the actual length of the destination buffer as well as adding a build-time check to ensure it is exactly TASK_COMM_LEN. There are only 23 callsites which I all reviewed to ensure this is currently the case. We could get away with doing only the check or passing the right length, but it doesn't hurt to do both. Link: http://lkml.kernel.org/r/20171205151724.1764896-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Suggested-by: Kees Cook <keescook@chromium.org> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Serge Hallyn <serge@hallyn.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Aleksa Sarai <asarai@suse.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14string.h: workaround for increased stack usageArnd Bergmann
The hardened strlen() function causes rather large stack usage in at least one file in the kernel, in particular when CONFIG_KASAN is enabled: drivers/media/usb/em28xx/em28xx-dvb.c: In function 'em28xx_dvb_init': drivers/media/usb/em28xx/em28xx-dvb.c:2062:1: error: the frame size of 3256 bytes is larger than 204 bytes [-Werror=frame-larger-than=] Analyzing this problem led to the discovery that gcc fails to merge the stack slots for the i2c_board_info[] structures after we strlcpy() into them, due to the 'noreturn' attribute on the source string length check. I reported this as a gcc bug, but it is unlikely to get fixed for gcc-8, since it is relatively easy to work around, and it gets triggered rarely. An earlier workaround I did added an empty inline assembly statement before the call to fortify_panic(), which works surprisingly well, but is really ugly and unintuitive. This is a new approach to the same problem, this time addressing it by not calling the 'extern __real_strnlen()' function for string constants where __builtin_strlen() is a compile-time constant and therefore known to be safe. We do this by checking if the last character in the string is a compile-time constant '\0'. If it is, we can assume that strlen() of the string is also constant. As a side-effect, this should also improve the object code output for any other call of strlen() on a string constant. [akpm@linux-foundation.org: add comment] Link: http://lkml.kernel.org/r/20171205215143.3085755-1-arnd@arndb.de Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82365 Link: https://patchwork.kernel.org/patch/9980413/ Link: https://patchwork.kernel.org/patch/9974047/ Fixes: 6974f0c4555 ("include/linux/string.h: add the option of fortified string.h functions") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Kees Cook <keescook@chromium.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Daniel Micay <danielmicay@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Martin Wilck <mwilck@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14lib/rbtree,drm/mm: add rbtree_replace_node_cached()Chris Wilson
Add a variant of rbtree_replace_node() that maintains the leftmost cache of struct rbtree_root_cached when replacing nodes within the rbtree. As drm_mm is the only rb_replace_node() being used on an interval tree, the mistake looks fairly self-contained. Furthermore the only user of drm_mm_replace_node() is its testsuite... Testcase: igt/drm_mm/replace Link: http://lkml.kernel.org/r/20171122100729.3742-1-chris@chris-wilson.co.uk Link: https://patchwork.freedesktop.org/patch/msgid/20171109212435.9265-1-chris@chris-wilson.co.uk Fixes: f808c13fd373 ("lib/interval_tree: fast overlap detection") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Davidlohr Bueso <dbueso@suse.de> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14include/linux/idr.h: add #include <linux/bug.h>Wei Wang
The <linux/bug.h> was removed from radix-tree.h by commit f5bba9d11a25 ("include/linux/radix-tree.h: remove unneeded #include <linux/bug.h>"). Since that commit, tools/testing/radix-tree/ couldn't pass compilation due to tools/testing/radix-tree/idr.c:17: undefined reference to WARN_ON_ONCE. This patch adds the bug.h header to idr.h to solve the issue. Link: http://lkml.kernel.org/r/1511963726-34070-2-git-send-email-wei.w.wang@intel.com Fixes: f5bba9d11a2 ("include/linux/radix-tree.h: remove unneeded #include <linux/bug.h>") Signed-off-by: Wei Wang <wei.w.wang@intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Jan Kara <jack@suse.cz> Cc: Eric Biggers <ebiggers@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14Merge tag 'drm-misc-fixes-2017-12-14' of ↵Linus Torvalds
git://anongit.freedesktop.org/drm/drm-misc Pull drm fixes from Daniel Vetter: - two fixes for new core features - a corner case fix for the connnector_iter fix from last week (this one is cc: stable) - one vc4 fix * tag 'drm-misc-fixes-2017-12-14' of git://anongit.freedesktop.org/drm/drm-misc: drm/drm_lease: Prevent deadlock in case drm_lease_create() fails drm: rework delayed connector cleanup in connector_iter drm: Update edid-derived drm_display_info fields at edid property set [v2] drm/vc4: Release fence after signalling
2017-12-13drm: rework delayed connector cleanup in connector_iterDaniel Vetter
PROBE_DEFER also uses system_wq to reprobe drivers, which means when that again fails, and we try to flush the overall system_wq (to get all the delayed connectore cleanup work_struct completed), we deadlock. Fix this by using just a single cleanup work, so that we can only flush that one and don't block on anything else. That means a free list plus locking, a standard pattern. v2: - Correctly free connectors only on last ref. Oops (Chris). - use llist_head/node (Chris). v3 - Add init_llist_head (Chris). Fixes: a703c55004e1 ("drm: safely free connectors from connector_iter") Fixes: 613051dac40d ("drm: locking&new iterators for connector_list") Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Dave Airlie <airlied@gmail.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Sean Paul <seanpaul@chromium.org> Cc: <stable@vger.kernel.org> # v4.11+: 613051dac40d ("drm: locking&new iterators for connector_list" Cc: <stable@vger.kernel.org> # v4.11+ Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Gustavo Padovan <gustavo@padovan.org> Cc: David Airlie <airlied@linux.ie> Cc: Javier Martinez Canillas <javier@dowhile0.org> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Guillaume Tucker <guillaume.tucker@collabora.com> Cc: Mark Brown <broonie@kernel.org> Cc: Kevin Hilman <khilman@baylibre.com> Cc: Matt Hart <matthew.hart@linaro.org> Cc: Thierry Escande <thierry.escande@collabora.co.uk> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com> Cc: Enric Balletbo i Serra <enric.balletbo@collabora.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171213124936.17914-1-daniel.vetter@ffwll.ch
2017-12-13ipv4: igmp: guard against silly MTU valuesEric Dumazet
IPv4 stack reacts to changes to small MTU, by disabling itself under RTNL. But there is a window where threads not using RTNL can see a wrong device mtu. This can lead to surprises, in igmp code where it is assumed the mtu is suitable. Fix this by reading device mtu once and checking IPv4 minimal MTU. This patch adds missing IPV4_MIN_MTU define, to not abuse ETH_MIN_MTU anymore. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13drm: Update edid-derived drm_display_info fields at edid property set [v2]Keith Packard
There are a set of values in the drm_display_info structure for each connector which hold information derived from EDID. These are computed in drm_add_display_info. Before this patch, that was only called in drm_add_edid_modes. This meant that they were only set when EDID was present and never reset when EDID was not, as happened when the display was disconnected. One of these fields, non_desktop, is used from drm_mode_connector_update_edid_property, the function responsible for assigning the new edid value to the application-visible property. Various drivers call these two functions (drm_add_edid_modes and drm_mode_connector_update_edid_property) in different orders. This means that even when EDID is present, the drm_display_info fields may not have been computed at the time that drm_mode_connector_update_edid_property used the non_desktop value to set the non_desktop property. I've added a public function (drm_reset_display_info) that resets the drm_display_info field values to default values and then made the drm_add_display_info function public. These two functions are now called directly from drm_mode_connector_update_edid_property so that the drm_display_info fields are always computed from the current EDID information before being used in that function. This means that the drm_display_info values are often computed twice, once when the EDID property it set and a second time when EDID is used to compute modes for the device. The alternative would be to uniformly ensure that the values were computed once before being used, which would require that all drivers reliably invoke the two paths in the same order. The computation is inexpensive enough that it seems more maintainable in the long term to simply compute them in both paths. The API to drm_add_display_info has been changed so that it no longer takes the set of edid-based quirks as a parameter. Rather, it now computes those quirks itself and returns them for further use by drm_add_edid_modes. This patch also includes a number of 'const' additions caused by drm_mode_connector_update_edid_property taking a 'const struct edid *' parameter and wanting to pass that along to drm_add_display_info. v2: after review by Daniel Vetter <daniel.vetter@ffwll.ch> Removed EXPORT_SYMBOL_GPL for drm_reset_display_info and drm_add_display_info. Added FIXME in drm_mode_connector_update_edid_property about potentially merging that with drm_add_edid_modes to avoid the need for two driver calls. Signed-off-by: Keith Packard <keithp@keithp.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20171213084427.31199-1-keithp@keithp.com (danvet: cherry picked from commit 12a889bf4bca ("drm: rework delayed connector cleanup in connector_iter") from drm-misc-next since functional conflict with changes in -next and we need to make sure both have the right version and nothing gets lost.) Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-12-12compiler.h: Remove ACCESS_ONCE()Mark Rutland
There are no longer any kernelspace uses of ACCESS_ONCE(), so we can remove the definition from <linux/compiler.h>. This patch removes the ACCESS_ONCE() definition, and updates comments which referred to it. At the same time, some inconsistent and redundant whitespace is removed from comments. Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: apw@canonical.com Link: http://lkml.kernel.org/r/20171127103824.36526-4-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-12locking/lockdep: Remove the cross-release locking checksIngo Molnar
This code (CONFIG_LOCKDEP_CROSSRELEASE=y and CONFIG_LOCKDEP_COMPLETIONS=y), while it found a number of old bugs initially, was also causing too many false positives that caused people to disable lockdep - which is arguably a worse overall outcome. If we disable cross-release by default but keep the code upstream then in practice the most likely outcome is that we'll allow the situation to degrade gradually, by allowing entropy to introduce more and more false positives, until it overwhelms maintenance capacity. Another bad side effect was that people were trying to work around the false positives by uglifying/complicating unrelated code. There's a marked difference between annotating locking operations and uglifying good code just due to bad lock debugging code ... This gradual decrease in quality happened to a number of debugging facilities in the kernel, and lockdep is pretty complex already, so we cannot risk this outcome. Either cross-release checking can be done right with no false positives, or it should not be included in the upstream kernel. ( Note that it might make sense to maintain it out of tree and go through the false positives every now and then and see whether new bugs were introduced. ) Cc: Byungchul Park <byungchul.park@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-12locking/core: Remove break_lock field when CONFIG_GENERIC_LOCKBREAK=yWill Deacon
When CONFIG_GENERIC_LOCKBEAK=y, locking structures grow an extra int ->break_lock field which is used to implement raw_spin_is_contended() by setting the field to 1 when waiting on a lock and clearing it to zero when holding a lock. However, there are a few problems with this approach: - There is a write-write race between a CPU successfully taking the lock (and subsequently writing break_lock = 0) and a waiter waiting on the lock (and subsequently writing break_lock = 1). This could result in a contended lock being reported as uncontended and vice-versa. - On machines with store buffers, nothing guarantees that the writes to break_lock are visible to other CPUs at any particular time. - READ_ONCE/WRITE_ONCE are not used, so the field is potentially susceptible to harmful compiler optimisations, Consequently, the usefulness of this field is unclear and we'd be better off removing it and allowing architectures to implement raw_spin_is_contended() by providing a definition of arch_spin_is_contended(), as they can when CONFIG_GENERIC_LOCKBREAK=n. Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1511894539-7988-3-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-11Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This push fixes the following issues: - buffer overread in RSA - potential use after free in algif_aead. - error path null pointer dereference in af_alg - forbid combinations such as hmac(hmac(sha3)) which may crash - crash in salsa20 due to incorrect API usage" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: salsa20 - fix blkcipher_walk API usage crypto: hmac - require that the underlying hash algorithm is unkeyed crypto: af_alg - fix NULL pointer dereference in crypto: algif_aead - fix reference counting of null skcipher crypto: rsa - fix buffer overread when stripping leading zeroes
2017-12-11fou: fix some member types in guehdrXin Long
guehdr struct is used to build or parse gue packets, which are always in big endian. It's better to define all guehdr members as __beXX types. Also, in validate_gue_flags it's not good to use a __be32 variable for both Standard flags(__be16) and Private flags (__be32), and pass it to other funcions. This patch could fix a bunch of sparse warnings from fou. Fixes: 5024c33ac354 ("gue: Add infrastructure for flags and options") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-11ptr_ring: add barriersMichael S. Tsirkin
Users of ptr_ring expect that it's safe to give the data structure a pointer and have it be available to consumers, but that actually requires an smb_wmb or a stronger barrier. In absence of such barriers and on architectures that reorder writes, consumer might read an un=initialized value from an skb pointer stored in the skb array. This was observed causing crashes. To fix, add memory barriers. The barrier we use is a wmb, the assumption being that producers do not need to read the value so we do not need to order these reads. Reported-by: George Cherian <george.cherian@cavium.com> Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-11PM / sleep: Avoid excess pm_runtime_enable() calls in device_resume()Rafael J. Wysocki
Middle-layer code doing suspend-time optimizations for devices with the DPM_FLAG_SMART_SUSPEND flag set (currently, the PCI bus type and the ACPI PM domain) needs to make the core skip ->thaw_early and ->thaw callbacks for those devices in some cases and it sets the power.direct_complete flag for them for this purpose. However, it turns out that setting power.direct_complete outside of the PM core is a bad idea as it triggers an excess invocation of pm_runtime_enable() in device_resume(). For this reason, provide a helper to clear power.is_late_suspended and power.is_suspended to be invoked by the middle-layer code in question instead of setting power.direct_complete and make that code call the new helper. Fixes: c4b65157aeef (PCI / PM: Take SMART_SUSPEND driver flag into account) Fixes: 05087360fd7a (ACPI / PM: Take SMART_SUSPEND driver flag into account) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
2017-12-10Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Radim Krčmář: "ARM: - A number of issues in the vgic discovered using SMATCH - A bit one-off calculation in out stage base address mask (32-bit and 64-bit) - Fixes to single-step debugging instructions that trap for other reasons such as MMMIO aborts - Printing unavailable hyp mode as error - Potential spinlock deadlock in the vgic - Avoid calling vgic vcpu free more than once - Broken bit calculation for big endian systems s390: - SPDX tags - Fence storage key accesses from problem state - Make sure that irq_state.flags is not used in the future x86: - Intercept port 0x80 accesses to prevent host instability (CVE) - Use userspace FPU context for guest FPU (mainly an optimization that fixes a double use of kernel FPU) - Do not leak one page per module load - Flush APIC page address cache from MMU invalidation notifiers" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits) KVM: x86: fix APIC page invalidation KVM: s390: Fix skey emulation permission check KVM: s390: mark irq_state.flags as non-usable KVM: s390: Remove redundant license text KVM: s390: add SPDX identifiers to the remaining files KVM: VMX: fix page leak in hardware_setup() KVM: VMX: remove I/O port 0x80 bypass on Intel hosts x86,kvm: remove KVM emulator get_fpu / put_fpu x86,kvm: move qemu/guest FPU switching out to vcpu_run KVM: arm/arm64: Fix broken GICH_ELRSR big endian conversion KVM: arm/arm64: kvm_arch_destroy_vm cleanups KVM: arm/arm64: Fix spinlock acquisition in vgic_set_owner kvm: arm: don't treat unavailable HYP mode as an error KVM: arm/arm64: Avoid attempting to load timer vgic state without a vgic kvm: arm64: handle single-step of hyp emulated mmio instructions kvm: arm64: handle single-step during SError exceptions kvm: arm64: handle single-step of userspace mmio instructions kvm: arm64: handle single-stepping trapped instructions KVM: arm/arm64: debug: Introduce helper for single-step arm: KVM: Fix VTTBR_BADDR_MASK BUG_ON off-by-one ...
2017-12-08kmemcheck: rip it out for realMichal Hocko
Commit 4675ff05de2d ("kmemcheck: rip it out") has removed the code but for some reason SPDX header stayed in place. This looks like a rebase mistake in the mmotm tree or the merge mistake. Let's drop those leftovers as well. Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) CAN fixes from Martin Kelly (cancel URBs properly in all the CAN usb drivers). 2) Revert returning -EEXIST from __dev_alloc_name() as this propagates to userspace and broke some apps. From Johannes Berg. 3) Fix conn memory leaks and crashes in TIPC, from Jon Malloc and Cong Wang. 4) Gianfar MAC can't do EEE so don't advertise it by default, from Claudiu Manoil. 5) Relax strict netlink attribute validation, but emit a warning. From David Ahern. 6) Fix regression in checksum offload of thunderx driver, from Florian Westphal. 7) Fix UAPI bpf issues on s390, from Hendrik Brueckner. 8) New card support in iwlwifi, from Ihab Zhaika. 9) BBR congestion control bug fixes from Neal Cardwell. 10) Fix port stats in nfp driver, from Pieter Jansen van Vuuren. 11) Fix leaks in qualcomm rmnet, from Subash Abhinov Kasiviswanathan. 12) Fix DMA API handling in sh_eth driver, from Thomas Petazzoni. 13) Fix spurious netpoll warnings in bnxt_en, from Calvin Owens. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (67 commits) net: mvpp2: fix the RSS table entry offset tcp: evaluate packet losses upon RTT change tcp: fix off-by-one bug in RACK tcp: always evaluate losses in RACK upon undo tcp: correctly test congestion state in RACK bnxt_en: Fix sources of spurious netpoll warnings tcp_bbr: reset long-term bandwidth sampling on loss recovery undo tcp_bbr: reset full pipe detection on loss recovery undo tcp_bbr: record "full bw reached" decision in new full_bw_reached bit sfc: pass valid pointers from efx_enqueue_unwind gianfar: Disable EEE autoneg by default tcp: invalidate rate samples during SACK reneging can: peak/pcie_fd: fix potential bug in restarting tx queue can: usb_8dev: cancel urb on -EPIPE and -EPROTO can: kvaser_usb: cancel urb on -EPIPE and -EPROTO can: esd_usb2: cancel urb on -EPIPE and -EPROTO can: ems_usb: cancel urb on -EPIPE and -EPROTO can: mcba_usb: cancel urb on -EPROTO usbnet: fix alignment for frames with no ethernet header tcp: use current time in tcp_rcv_space_adjust() ...
2017-12-08Merge tag 'drm-fixes-for-v4.15-rc3' of ↵Linus Torvalds
git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: "This pull is a bit larger than I'd like but a large bunch of it is license fixes, AMD wanted to fix the licenses for a bunch of files that were missing them, Otherwise a bunch of TTM regression fix since the hugepage support, some i915 and gvt fixes, a core connector free in a safe context fix, and one bridge fix" * tag 'drm-fixes-for-v4.15-rc3' of git://people.freedesktop.org/~airlied/linux: (26 commits) drm/bridge: analogix dp: Fix runtime PM state in get_modes() callback Revert "drm/i915: Display WA #1133 WaFbcSkipSegments:cnl, glk" drm/vc4: Fix false positive WARN() backtrace on refcount_inc() usage drm/i915: Call i915_gem_init_userptr() before taking struct_mutex drm/exynos: remove unnecessary function declaration drm/exynos: remove unnecessary descrptions drm/exynos: gem: Drop NONCONTIG flag for buffers allocated without IOMMU drm/exynos: Fix dma-buf import drm/ttm: swap consecutive allocated pooled pages v4 drm: safely free connectors from connector_iter drm/i915/gvt: set max priority for gvt context drm/i915/gvt: Don't mark vgpu context as inactive when preempted drm/i915/gvt: Limit read hw reg to active vgpu drm/i915/gvt: Export intel_gvt_render_mmio_to_ring_id() drm/i915/gvt: Emulate PCI expansion ROM base address register drm/ttm: swap consecutive allocated cached pages v3 drm/ttm: roundup the shrink request to prevent skip huge pool drm/ttm: add page order support in ttm_pages_put drm/ttm: add set_pages_wb for handling page order more than zero drm/ttm: add page order in page pool ...
2017-12-08tcp: invalidate rate samples during SACK renegingYousuk Seung
Mark tcp_sock during a SACK reneging event and invalidate rate samples while marked. Such rate samples may overestimate bw by including packets that were SACKed before reneging. < ack 6001 win 10000 sack 7001:38001 < ack 7001 win 0 sack 8001:38001 // Reneg detected > seq 7001:8001 // RTO, SACK cleared. < ack 38001 win 10000 In above example the rate sample taken after the last ack will count 7001-38001 as delivered while the actual delivery rate likely could be much lower i.e. 7001-8001. This patch adds a new field tcp_sock.sack_reneg and marks it when we declare SACK reneging and entering TCP_CA_Loss, and unmarks it after the last rate sample was taken before moving back to TCP_CA_Open. This patch also invalidates rate samples taken while tcp_sock.is_sack_reneg is set. Fixes: b9f64820fb22 ("tcp: track data delivery rate for a TCP connection") Signed-off-by: Yousuk Seung <ysseung@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2017-12-06 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fixing broken uapi for BPF tracing programs for s390 and arm64 architectures due to pt_regs being in-kernel only, and not part of uapi right now. A wrapper is added that exports pt_regs in an asm-generic way. For arm64 this maps to existing user_pt_regs structure and for s390 a user_pt_regs structure exporting the beginning of pt_regs is added and uapi-exported, thus fixing the BPF issues seen in perf (and BPF selftests), all from Hendrik. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-07usbnet: fix alignment for frames with no ethernet headerBjørn Mork
The qmi_wwan minidriver support a 'raw-ip' mode where frames are received without any ethernet header. This causes alignment issues because the skbs allocated by usbnet are "IP aligned". Fix by allowing minidrivers to disable the additional alignment offset. This is implemented using a per-device flag, since the same minidriver also supports 'ethernet' mode. Fixes: 32f7adf633b9 ("net: qmi_wwan: support "raw IP" mode") Reported-and-tested-by: Jay Foster <jay@systech.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-06Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Ingo Molnar: "Two fixes: use bool type consistently, plus a irq_matrix_available() bugfix" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqdesc: Use bool return type instead of int genirq/matrix: Fix the precedence fix for real
2017-12-06PCI: Add pci_get_domain_bus_and_slot() stubRandy Dunlap
The coretemp driver build fails when CONFIG_PCI is not enabled because it uses a function that does not have a stub for that config case, so add the function stub. ../drivers/hwmon/coretemp.c: In function 'adjust_tjmax': ../drivers/hwmon/coretemp.c:250:9: error: implicit declaration of function 'pci_get_domain_bus_and_slot' [-Werror=implicit-function-declaration] struct pci_dev *host_bridge = pci_get_domain_bus_and_slot(0, 0, devfn); ../drivers/hwmon/coretemp.c:250:32: warning: initialization makes pointer from integer without a cast [enabled by default] struct pci_dev *host_bridge = pci_get_domain_bus_and_slot(0, 0, devfn); Signed-off-by: Randy Dunlap <rdunlap@infradead.org> [bhelgaas: identical patch also by Arnd Bergmann <arnd@arndb.de>] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Guenter Roeck <linux@roeck-us.net>
2017-12-06efi: Move some sysfs files to be read-only by rootGreg Kroah-Hartman
Thanks to the scripts/leaking_addresses.pl script, it was found that some EFI values should not be readable by non-root users. So make them root-only, and to do that, add a __ATTR_RO_MODE() macro to make this easier, and use it in other places at the same time. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Tested-by: Dave Young <dyoung@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Cc: stable <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20171206095010.24170-2-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>