summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-08-18xen-blkfront: Avoid that gcc 7 warns about fall-through when building with W=1Bart Van Assche
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Roger Pau Monn303251 <roger.pau@citrix.com> Cc: xen-devel@lists.xenproject.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-18xen-blkback: Avoid that gcc 7 warns about fall-through when building with W=1Bart Van Assche
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Roger Pau Monn303251 <roger.pau@citrix.com> Cc: xen-devel@lists.xenproject.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-18xen-blkback: Fix indentationBart Van Assche
Avoid that smatch reports the following warning when building with C=2 CHECK="smatch -p=kernel": drivers/block/xen-blkback/blkback.c:710 xen_blkbk_unmap_prepare() warn: inconsistent indenting Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Roger Pau Monn303251 <roger.pau@citrix.com> Cc: xen-devel@lists.xenproject.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-18virtio_blk: Use blk_rq_is_scsi()Bart Van Assche
This patch does not change any functionality. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: virtualization@lists.linux-foundation.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-18ide-floppy: Use blk_rq_is_scsi()Bart Van Assche
This patch does not change any functionality. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Acked-by: David S. Miller <davem@davemloft.net> Reviewed-by: Hannes Reinecke <hare@suse.com> Cc: linux-ide@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-18genhd: Annotate all part and part_tbl pointer dereferencesBart Van Assche
Annotate gendisk.part_tbl and disk_part_tbl.part dereferences with rcu_dereference_protected(). This patch does not change the behavior of the modified code but ensures that sparse does not complain about disk->part_tbl manipulations nor about part_tbl->part accesses. Additionally, improve documentation of the locking requirements of the modified functions. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Cc: Tejun Heo <tj@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-18blk-mq-debugfs: Declare a local symbol staticBart Van Assche
This was detected by sparse. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-18blk-mq: Make blk_mq_reinit_tagset() calls easier to readBart Van Assche
Since blk_mq_ops.reinit_request is only called from inside blk_mq_reinit_tagset(), make this function pointer an argument of blk_mq_reinit_tagset() instead of a member of struct blk_mq_ops. This patch does not change any functionality but makes blk_mq_reinit_tagset() calls easier to read and to analyze. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: James Smart <james.smart@broadcom.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-18block: Unexport blk_queue_end_tag()Bart Van Assche
This function is only used inside the block layer core. Hence unexport it. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-18block: Fix two comments that refer to .queue_rq() return valuesBart Van Assche
Since patch "blk-mq: switch .queue_rq return value to blk_status_t" .queue_rq() returns a BLK_STS_* value instead of a BLK_MQ_RQ_* value. Hence refer to the former in comments about .queue_rq() return values. Fixes: commit 39a70c76b89b ("blk-mq: clarify dispatch may not be drained/blocked by stopping queue") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-17nbd: change the default nbd partitionsJosef Bacik
There's no reason to have partitions disabled for nbd by default, it costs us nothing to have it enabled and is just confusing/obnoxious to users who try to use partitions with nbd. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-17nbd: allow device creation at a specific indexJosef Bacik
If users really want to use a particular index for their nbd device and it doesn't already exist there's no reason we can't just create it for them. Do this instead of erroring out. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-15loop: fix to a race condition due to the early registration of deviceAnton Volkov
The early device registration made possible a race leading to allocations of disks with wrong minors. This patch moves the device registration further down the loop_init function to make the race infeasible. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Anton Volkov <avolkov@ispras.ru> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-11cfq: Give a chance for arming slice idle timer in case of group_idleRitesh Harjani
In below scenario blkio cgroup does not work as per their assigned weights :- 1. When the underlying device is nonrotational with a single HW queue with depth of >= CFQ_HW_QUEUE_MIN 2. When the use case is forming two blkio cgroups cg1(weight 1000) & cg2(wight 100) and two processes(file1 and file2) doing sync IO in their respective blkio cgroups. For above usecase result of fio (without this patch):- file1: (groupid=0, jobs=1): err= 0: pid=685: Thu Jan 1 19:41:49 1970 write: IOPS=1315, BW=41.1MiB/s (43.1MB/s)(1024MiB/24906msec) <...> file2: (groupid=0, jobs=1): err= 0: pid=686: Thu Jan 1 19:41:49 1970 write: IOPS=1295, BW=40.5MiB/s (42.5MB/s)(1024MiB/25293msec) <...> // both the process BW is equal even though they belong to diff. cgroups with weight of 1000(cg1) and 100(cg2) In above case (for non rotational NCQ devices), as soon as the request from cg1 is completed and even though it is provided with higher set_slice=10, because of CFQ algorithm when the driver tries to fetch the request, CFQ expires this group without providing any idle time nor weight priority and schedules another cfq group (in this case cg2). And thus both cfq groups(cg1 & cg2) keep alternating to get the disk time and hence loses the cgroup weight based scheduling. Below patch gives a chance to cfq algorithm (cfq_arm_slice_timer) to arm the slice timer in case group_idle is enabled. In case if group_idle is also not required (including for nonrotational NCQ drives), we need to explicitly set group_idle = 0 from sysfs for such cases. With this patch result of fio(for above usecase) :- file1: (groupid=0, jobs=1): err= 0: pid=690: Thu Jan 1 00:06:08 1970 write: IOPS=1706, BW=53.3MiB/s (55.9MB/s)(1024MiB/19197msec) <..> file2: (groupid=0, jobs=1): err= 0: pid=691: Thu Jan 1 00:06:08 1970 write: IOPS=1043, BW=32.6MiB/s (34.2MB/s)(1024MiB/31401msec) <..> // In this processes BW is as per their respective cgroups weight. Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-11block, bfq: boost throughput with flash-based non-queueing devicesPaolo Valente
When a queue associated with a process remains empty, there are cases where throughput gets boosted if the device is idled to await the arrival of a new I/O request for that queue. Currently, BFQ assumes that one of these cases is when the device has no internal queueing (regardless of the properties of the I/O being served). Unfortunately, this condition has proved to be too general. So, this commit refines it as "the device has no internal queueing and is rotational". This refinement provides a significant throughput boost with random I/O, on flash-based storage without internal queueing. For example, on a HiKey board, throughput increases by up to 125%, growing, e.g., from 6.9MB/s to 15.6MB/s with two or three random readers in parallel. Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Luca Miccio <lucmiccio@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-11block,bfq: refactor device-idling logicPaolo Valente
The logic that decides whether to idle the device is scattered across three functions. Almost all of the logic is in the function bfq_bfqq_may_idle, but (1) part of the decision is made in bfq_update_idle_window, and (2) the function bfq_bfqq_must_idle may switch off idling regardless of the output of bfq_bfqq_may_idle. In addition, both bfq_update_idle_window and bfq_bfqq_must_idle make their decisions as a function of parameters that are used, for similar purposes, also in bfq_bfqq_may_idle. This commit addresses these issues by moving all the logic into bfq_bfqq_may_idle. Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-10block: remove unused syncfull/asyncfull queue flagsJens Axboe
We haven't used these in years, but somehow the definitions still remained. Kill them, and renumber the QUEUE_FLAG_ space. We had a hole in the beginning of the space, too. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-09blk-mq: enable checking two part inflight counts at the same timeJens Axboe
Modify blk_mq_in_flight() to count both a partition and root at the same time. Then we only have to call it once, instead of potentially looping the tags twice. Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-09blk-mq: provide internal in-flight variantJens Axboe
We don't have to inc/dec some counter, since we can just iterate the tags. That makes inc/dec a noop, but means we have to iterate busy tags to get an in-flight count. Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-09block: make part_in_flight() take an array of two intsJens Axboe
Instead of returning the count that matches the partition, pass in an array of two ints. Index 0 will be filled with the inflight count for the partition in question, and index 1 will filled with the root inflight count, if the partition passed in is not the root. This is in preparation for being able to calculate both in one go. Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-09block: pass in queue to inflight accountingJens Axboe
No functional change in this patch, just in preparation for basing the inflight mechanism on the queue in question. Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-09blk-mq-tag: check for NULL rq when iterating tagsJens Axboe
Since we introduced blk-mq-sched, the tags->rqs[] array has been dynamically assigned. So we need to check for NULL when iterating, since there's a window of time where the bit is set, but we haven't dynamically assigned the tags->rqs[] array position yet. This is perfectly safe, since the memory backing of the request is never going away while the device is alive. Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-09dm-crypt: don't mess with BIP_BLOCK_INTEGRITYChristoph Hellwig
This flag is never set right after calling bio_integrity_alloc, so don't clear it and confuse the reader. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-09bio-integrity: move the bio integrity profile check earlier in ↵Christoph Hellwig
bio_integrity_prep This makes the code more obvious, and moves the most likely branch first in the function. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-07null_blk: make sure submit_queues > 0weiping zhang
set submit_queues to 1 by default, and make sure it's value > 0. Signed-off-by: weiping zhang <zhangweiping@didichuxing.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-07null_blk: simplify logic for use_per_node_hctxweiping zhang
make sure submit_queues equal nr_online_nodes. Signed-off-by: weiping zhang <zhangweiping@didichuxing.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-02block: Add comment to submit_bio_wait()Jan Kara
submit_bio_wait() does not consume bio reference. Add comment about that. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-01blk-mq: add warning to __blk_mq_run_hw_queue() for ints disabledJens Axboe
We recently had a bug in the IPR SCSI driver, where it would end up making the SCSI mid layer run the mq hardware queue with interrupts disabled. This isn't legal, since the software queue locking relies on never being grabbed from interrupt context. Additionally, drivers that set BLK_MQ_F_BLOCKING may schedule from this context. Add a WARN_ON_ONCE() to catch bad users up front. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29blk-mq: blk_mq_requeue_work() doesn't need to save IRQ flagsJens Axboe
We know we're in process context, so don't bother using the IRQ safe versions of the spin lock. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29block: DAC960: shut up format-overflow warningArnd Bergmann
gcc-7 points out that a large controller number would overflow the string length for the procfs name and the firmware version string: drivers/block/DAC960.c: In function 'DAC960_Probe': drivers/block/DAC960.c:6591:38: warning: 'sprintf' may write a terminating nul past the end of the destination [-Wformat-overflow=] drivers/block/DAC960.c: In function 'DAC960_V1_ReadControllerConfiguration': drivers/block/DAC960.c:1681:40: error: '%02d' directive writing between 2 and 3 bytes into a region of size between 2 and 5 [-Werror=format-overflow=] drivers/block/DAC960.c:1681:40: note: directive argument in the range [0, 255] drivers/block/DAC960.c:1681:3: note: 'sprintf' output between 10 and 14 bytes into a destination of size 12 Both of these seem appropriately sized, and using snprintf() instead of sprintf() improves this by ensuring that even incorrect data won't cause undefined behavior here. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29block: use standard blktrace API to output cgroup info for debug notesShaohua Li
Currently cfq/bfq/blk-throttle output cgroup info in trace in their own way. Now we have standard blktrace API for this, so convert them to use it. Note, this changes the behavior a little bit. cgroup info isn't output by default, we only do this with 'blk_cgroup' option enabled. cgroup info isn't output as a string by default too, we only do this with 'blk_cgname' option enabled. Also cgroup info is output in different position of the note string. I think these behavior changes aren't a big issue (actually we make trace data shorter which is good), since the blktrace note is solely for debugging. Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29blktrace: add an option to allow displaying cgroup pathShaohua Li
By default we output cgroup id in blktrace. This adds an option to display cgroup path. Since get cgroup path is a relativly heavy operation, we don't enable it by default. with the option enabled, blktrace will output something like this: dd-1353 [007] d..2 293.015252: 8,0 /test/level D R 24 + 8 [dd] Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29block: always attach cgroup info into bioShaohua Li
blkcg_bio_issue_check() already gets blkcg for a BIO. bio_associate_blkcg() uses a percpu refcounter, so it's a very cheap operation. There is no point we don't attach the cgroup info into bio at blkcg_bio_issue_check. This also makes blktrace outputs correct cgroup info. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29blktrace: export cgroup info in traceShaohua Li
Currently blktrace isn't cgroup aware. blktrace prints out task name of current context, but the task of current context isn't always in the cgroup where the BIO comes from. We can't use task name to find out IO cgroup. For example, Writeback BIOs always comes from flusher thread but the BIOs are for different blk cgroups. Request could be requeued and dispatched from completely different tasks. MD/DM are another examples. This patch tries to fix the gap. We print out cgroup fhandle info in blktrace. Userspace can use open_by_handle_at() syscall to find the cgroup by fhandle. Or userspace can use name_to_handle_at() syscall to find fhandle for a cgroup and use a BPF program to filter out blktrace for a specific cgroup. We add a new 'blk_cgroup' trace option for blk tracer. It's default off. Application which doesn't know the new option isn't affected. When it's on, we output fhandle info right after blk_io_trace with an extra bit set in event action. So from application point of view, blktrace with the option will output new actions. I didn't change blk trace event yet, since I'm not sure if changing the trace event output is an ABI issue. If not, I'll do it later. Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29cgroup: export fhandle info for a cgroupShaohua Li
Add an API to export cgroup fhandle info. We don't export a full 'struct file_handle', there are unrequired info. Sepcifically, cgroup is always a directory, so we don't need a 'FILEID_INO32_GEN_PARENT' type fhandle, we only need export the inode number and generation number just like what generic_fh_to_dentry does. And we can avoid the overhead of getting an inode too, since kernfs_node_id (ino and generation) has all the info required. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29kernfs: add exportfs operationsShaohua Li
Now we have the facilities to implement exportfs operations. The idea is cgroup can export the fhandle info to userspace, then userspace uses fhandle to find the cgroup name. Another example is userspace can get fhandle for a cgroup and BPF uses the fhandle to filter info for the cgroup. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29kernfs: introduce kernfs_node_idShaohua Li
inode number and generation can identify a kernfs node. We are going to export the identification by exportfs operations, so put ino and generation into a separate structure. It's convenient when later patches use the identification. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29kernfs: don't set dentry->d_fsdataShaohua Li
When working on adding exportfs operations in kernfs, I found it's hard to initialize dentry->d_fsdata in the exportfs operations. Looks there is no way to do it without race condition. Look at the kernfs code closely, there is no point to set dentry->d_fsdata. inode->i_private already points to kernfs_node, and we can get inode from a dentry. So this patch just delete the d_fsdata usage. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29kernfs: add an API to get kernfs node from inode numberShaohua Li
Add an API to get kernfs node from inode number. We will need this to implement exportfs operations. This API will be used in blktrace too later, so it should be as fast as possible. To make the API lock free, kernfs node is freed in RCU context. And we depend on kernfs_node count/ino number to filter out stale kernfs nodes. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29kernfs: implement i_generationShaohua Li
Set i_generation for kernfs inode. This is required to implement exportfs operations. The generation is 32-bit, so it's possible the generation wraps up and we find stale files. To reduce the posssibility, we don't reuse inode numer immediately. When the inode number allocation wraps, we increase generation number. In this way generation/inode number consist of a 64-bit number which is unlikely duplicated. This does make the idr tree more sparse and waste some memory. Since idr manages 32-bit keys, idr uses a 6-level radix tree, each level covers 6 bits of the key. In a 100k inode kernfs, the worst case will have around 300k radix tree node. Each node is 576bytes, so the tree will use about ~150M memory. Sounds not too bad, if this really is a problem, we should find better data structure. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29kernfs: use idr instead of ida to manage inode numberShaohua Li
kernfs uses ida to manage inode number. The problem is we can't get kernfs_node from inode number with ida. Switching to use idr, next patch will add an API to get kernfs_node from inode number. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-28Merge tag 'devicetree-fixes-for-4.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull DeviceTree fixes from Rob Herring: "Two small DT fixes: - Fix error handling in of_irq_to_resource_table() due to of_irq_to_resource() error return changes. - Fix dtx_diff script due to dts include path changes" * tag 'devicetree-fixes-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: of: irq: fix of_irq_to_resource() error check scripts/dtc: dtx_diff - update include dts paths to match build
2017-07-28Merge tag 'nfs-for-4.13-3' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client fixes from Anna Schumaker: "More NFS client bugfixes for 4.13. Most of these fix locking bugs that Ben and Neil noticed, but I also have a patch to fix one more access bug that was reported after last week. Stable fixes: - Fix a race where CB_NOTIFY_LOCK fails to wake a waiter - Invalidate file size when taking a lock to prevent corruption Other fixes: - Don't excessively generate tiny writes with fallocate - Use the raw NFS access mask in nfs4_opendata_access()" * tag 'nfs-for-4.13-3' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFSv4.1: Fix a race where CB_NOTIFY_LOCK fails to wake a waiter NFS: Optimize fallocate by refreshing mapping when needed. NFS: invalidate file size when taking a lock. NFS: Use raw NFS access mask in nfs4_opendata_access()
2017-07-28Merge tag 'xfs-4.13-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Darrick Wong: - fix firstfsb variables that we left uninitialized, which could lead to locking problems. - check for NULL metadata buffer pointers before using them. - don't allow btree cursor manipulation if the btree block is corrupt. Better to just shut down. - fix infinite loop problems in quotacheck. - fix buffer overrun when validating directory blocks. - fix deadlock problem in bunmapi. * tag 'xfs-4.13-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix multi-AG deadlock in xfs_bunmapi xfs: check that dir block entries don't off the end of the buffer xfs: fix quotacheck dquot id overflow infinite loop xfs: check _alloc_read_agf buffer pointer before using xfs: set firstfsb to NULLFSBLOCK before feeding it to _bmapi_write xfs: check _btree_check_block value
2017-07-28Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "s390: - SRCU fix PPC: - host crash fixes x86: - bugfixes, including making nested posted interrupts really work Generic: - tweaks to kvm_stat and to uevents" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: LAPIC: Fix reentrancy issues with preempt notifiers tools/kvm_stat: add '-f help' to get the available event list tools/kvm_stat: use variables instead of hard paths in help output KVM: nVMX: Fix loss of L2's NMI blocking state KVM: nVMX: Fix posted intr delivery when vcpu is in guest mode x86: irq: Define a global vector for nested posted interrupts KVM: x86: do mask out upper bits of PAE CR3 KVM: make pid available for uevents without debugfs KVM: s390: take srcu lock when getting/setting storage keys KVM: VMX: remove unused field KVM: PPC: Book3S HV: Fix host crash on changing HPT size KVM: PPC: Book3S HV: Enable TM before accessing TM registers
2017-07-28Merge tag 'for-linus-4.13b-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "Three minor cleanups for xen related drivers" * tag 'for-linus-4.13b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: dont fiddle with event channel masking in suspend/resume xen: selfballoon: remove unnecessary static in frontswap_selfshrink() xen: Drop un-informative message during boot
2017-07-28Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "I'd been collecting these whilst we debugged a CPU hotplug failure, but we ended up diagnosing that one to tglx, who has taken a fix via the -tip tree separately. We're seeing some NFS issues that we haven't gotten to the bottom of yet, and we've uncovered some issues with our backtracing too so there might be another fixes pull before we're done. Summary: - Ensure we have a guard page after the kernel image in vmalloc - Fix incorrect prefetch stride in copy_page - Ensure irqs are disabled in die() - Fix for event group validation in QCOM L2 PMU driver - Fix requesting of PMU IRQs on AMD Seattle - Minor cleanups and fixes" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: mmu: Place guard page after mapping of kernel image drivers/perf: arm_pmu: Request PMU SPIs with IRQF_PER_CPU arm64: sysreg: Fix unprotected macro argmuent in write_sysreg perf: qcom_l2: fix column exclusion check arm64/lib: copy_page: use consistent prefetch stride arm64/numa: Drop duplicate message perf: Convert to using %pOF instead of full_name arm64: Convert to using %pOF instead of full_name arm64: traps: disable irq in die() arm64: atomics: Remove '&' from '+&' asm constraint in lse atomics arm64: uaccess: Remove redundant __force from addr cast in __range_ok
2017-07-28Merge tag 'powerpc-4.13-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "The highlight is Ben's patch to work around a host killing bug when running KVM guests with the Radix MMU on Power9. See the long change log of that commit for more detail. And then three fairly minor fixes: - fix of_node_put() underflow during reconfig remove, using old DLPAR tools. - fix recently introduced ld version check with 64-bit LE-only toolchain. - free the subpage_prot_table correctly, avoiding a memory leak. Thanks to: Aneesh Kumar K.V, Benjamin Herrenschmidt, Laurent Vivier" * tag 'powerpc-4.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/mm/hash: Free the subpage_prot_table correctly powerpc/Makefile: Fix ld version check with 64-bit LE-only toolchain powerpc/pseries: Fix of_node_put() underflow during reconfig remove powerpc/mm/radix: Workaround prefetch issue with KVM
2017-07-28NFSv4.1: Fix a race where CB_NOTIFY_LOCK fails to wake a waiterBenjamin Coddington
nfs4_retry_setlk() sets the task's state to TASK_INTERRUPTIBLE within the same region protected by the wait_queue's lock after checking for a notification from CB_NOTIFY_LOCK callback. However, after releasing that lock, a wakeup for that task may race in before the call to freezable_schedule_timeout_interruptible() and set TASK_WAKING, then freezable_schedule_timeout_interruptible() will set the state back to TASK_INTERRUPTIBLE before the task will sleep. The result is that the task will sleep for the entire duration of the timeout. Since we've already set TASK_INTERRUPTIBLE in the locked section, just use freezable_schedule_timout() instead. Fixes: a1d617d8f134 ("nfs: allow blocking locks to be awoken by lock callbacks") Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-07-28Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: - remove broken dt bindings in inside-secure - fix authencesn crash when used with digest_null - fix cavium/nitrox firmware path - fix SHA3 failure in brcm - fix Kconfig dependency for brcm * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: authencesn - Fix digest_null crash crypto: brcm - remove BCM_PDC_MBOX dependency in Kconfig Documentation/bindings: crypto: remove the dma-mask property crypto: inside-secure - do not parse the dma mask from dt crypto: cavium/nitrox - Change in firmware path. crypto: brcm - Fix SHA3-512 algorithm failure