summaryrefslogtreecommitdiff
path: root/block
AgeCommit message (Collapse)Author
2024-09-03iomap: add a private argument for iomap_file_buffered_writeJosef Bacik
In order to switch fuse over to using iomap for buffered writes we need to be able to have the struct file for the original write, in case we have to read in the page to make it uptodate. Handle this by using the existing private field in the iomap_iter, and add the argument to iomap_file_buffered_write. This will allow us to pass the file in through the iomap buffered write path, and is flexible for any other file systems needs. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/7f55c7c32275004ba00cddf862d970e6e633f750.1724755651.git.josef@toxicpanda.com Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-29block: don't use bio_split_rw on misc operationsChristoph Hellwig
bio_split_rw is designed to split read and write bios with a payload. Currently it is called by __bio_split_to_limits for all operations not explicitly list, which works because bio_may_need_split explicitly checks for bi_vcnt == 1 and thus skips the bypass if there is no payload and bio_for_each_bvec loop will never execute it's body if bi_size is 0. But all this is hard to understand, fragile and wasted pointless cycles. Switch __bio_split_to_limits to only call bio_split_rw for READ and WRITE command and don't attempt any kind split for operation that do not require splitting. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Tested-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Link: https://lore.kernel.org/r/20240826173820.1690925-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-08-29block: properly handle REQ_OP_ZONE_APPEND in __bio_split_to_limitsChristoph Hellwig
Currently REQ_OP_ZONE_APPEND is handled by the bio_split_rw case in __bio_split_to_limits. This is harmful because REQ_OP_ZONE_APPEND bios do not adhere to the soft max_limits value but instead use their own capped version of max_hw_sectors, leading to incorrect splits that later blow up in bio_split. We still need the bio_split_rw logic to count nr_segs for blk-mq code, so add a new wrapper that passes in the right limit, and turns any bio that would need a split into an error as an additional debugging aid. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Tested-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Link: https://lore.kernel.org/r/20240826173820.1690925-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-08-29block: rework bio splittingChristoph Hellwig
The current setup with bio_may_exceed_limit and __bio_split_to_limits is a bit of a mess. Change it so that __bio_split_to_limits does all the work and is just a variant of bio_split_to_limits that returns nr_segs. This is done by inlining it and instead have the various bio_split_* helpers directly submit the potentially split bios. To support btrfs, the rw version has a lower level helper split out that just returns the offset to split. This turns out to nicely clean up the btrfs flow as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: David Sterba <dsterba@suse.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Tested-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Link: https://lore.kernel.org/r/20240826173820.1690925-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-08-28block: remove checks for FALLOC_FL_NO_HIDE_STALEChristoph Hellwig
While the FALLOC_FL_NO_HIDE_STALE value has been registered, it has always been rejected by vfs_fallocate before making it into blkdev_fallocate because it isn't in the supported mask. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240827065123.1762168-2-hch@lst.de Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-28block: fix detection of unsupported WRITE SAME in blkdev_issue_write_zeroesDarrick J. Wong
On error, blkdev_issue_write_zeroes used to recheck the block device's WRITE SAME queue limits after submitting WRITE SAME bios. As stated in the comment, the purpose of this was to collapse all IO errors to EOPNOTSUPP if the effect of issuing bios was that WRITE SAME got turned off in the queue limits. Therefore, it does not make sense to reuse the zeroes limit that was read earlier in the function because we only care about the queue limit *now*, not what it was at the start of the function. Found by running generic/351 from fstests. Fixes: 64b582ca88ca1 ("block: Read max write zeroes once for __blkdev_issue_write_zeroes()") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240827175340.GB1977952@frogsfrogsfrogs Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-08-26blk_iocost: fix more out of bound shiftsKonstantin Ovsepian
Recently running UBSAN caught few out of bound shifts in the ioc_forgive_debts() function: UBSAN: shift-out-of-bounds in block/blk-iocost.c:2142:38 shift exponent 80 is too large for 64-bit type 'u64' (aka 'unsigned long long') ... UBSAN: shift-out-of-bounds in block/blk-iocost.c:2144:30 shift exponent 80 is too large for 64-bit type 'u64' (aka 'unsigned long long') ... Call Trace: <IRQ> dump_stack_lvl+0xca/0x130 __ubsan_handle_shift_out_of_bounds+0x22c/0x280 ? __lock_acquire+0x6441/0x7c10 ioc_timer_fn+0x6cec/0x7750 ? blk_iocost_init+0x720/0x720 ? call_timer_fn+0x5d/0x470 call_timer_fn+0xfa/0x470 ? blk_iocost_init+0x720/0x720 __run_timer_base+0x519/0x700 ... Actual impact of this issue was not identified but I propose to fix the undefined behaviour. The proposed fix to prevent those out of bound shifts consist of precalculating exponent before using it the shift operations by taking min value from the actual exponent and maximum possible number of bits. Reported-by: Breno Leitao <leitao@debian.org> Signed-off-by: Konstantin Ovsepian <ovs@ovs.to> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20240822154137.2627818-1-ovs@ovs.to Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-08-20block,lsm: add LSM blob and new LSM hooks for block devicesDeven Bowers
This patch introduces a new LSM blob to the block_device structure, enabling the security subsystem to store security-sensitive data related to block devices. Currently, for a device mapper's mapped device containing a dm-verity target, critical security information such as the roothash and its signing state are not readily accessible. Specifically, while the dm-verity volume creation process passes the dm-verity roothash and its signature from userspace to the kernel, the roothash is stored privately within the dm-verity target, and its signature is discarded post-verification. This makes it extremely hard for the security subsystem to utilize these data. With the addition of the LSM blob to the block_device structure, the security subsystem can now retain and manage important security metadata such as the roothash and the signing state of a dm-verity by storing them inside the blob. Access decisions can then be based on these stored data. The implementation follows the same approach used for security blobs in other structures like struct file, struct inode, and struct superblock. The initialization of the security blob occurs after the creation of the struct block_device, performed by the security subsystem. Similarly, the security blob is freed by the security subsystem before the struct block_device is deallocated or freed. This patch also introduces a new hook security_bdev_setintegrity() to save block device's integrity data to the new LSM blob. For example, for dm-verity, it can use this hook to expose its roothash and signing state to LSMs, then LSMs can save these data into the LSM blob. Please note that the new hook should be invoked every time the security information is updated to keep these data current. For example, in dm-verity, if the mapping table is reloaded and configured to use a different dm-verity target with a new roothash and signing information, the previously stored data in the LSM blob will become obsolete. It is crucial to re-invoke the hook to refresh these data and ensure they are up to date. This necessity arises from the design of device-mapper, where a device-mapper device is first created, and then targets are subsequently loaded into it. These targets can be modified multiple times during the device's lifetime. Therefore, while the LSM blob is allocated during the creation of the block device, its actual contents are not initialized at this stage and can change substantially over time. This includes alterations from data that the LSM 'trusts' to those it does not, making it essential to handle these changes correctly. Failure to address this dynamic aspect could potentially allow for bypassing LSM checks. Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com> Signed-off-by: Fan Wu <wufan@linux.microsoft.com> [PM: merge fuzz, subject line tweaks] Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-08-20softirq: Remove unused 'action' parameter from action callbackCaleb Sander Mateos
When soft interrupt actions are called, they are passed a pointer to the struct softirq action which contains the action's function pointer. This pointer isn't useful, as the action callback already knows what function it is. And since each callback handles a specific soft interrupt, the callback also knows which soft interrupt number is running. No soft interrupt action callback actually uses this parameter, so remove it from the function pointer signature. This clarifies that soft interrupt actions are global routines and makes it slightly cheaper to call them. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/all/20240815171549.3260003-1-csander@purestorage.com
2024-08-19block: Read max write zeroes once for __blkdev_issue_write_zeroes()John Garry
As reported in [0], we may get a hang when formatting a XFS FS on a RAID0 drive. Commit 73a768d5f955 ("block: factor out a blk_write_zeroes_limit helper") changed __blkdev_issue_write_zeroes() to read the max write zeroes value in the loop. This is not safe as max write zeroes may change in value. Specifically for the case of [0], the value goes to 0, and we get an infinite loop. Lift the limit reading out of the loop. [0] https://lore.kernel.org/linux-xfs/4d31268f-310b-4220-88a2-e191c3932a82@oracle.com/T/#t Fixes: 73a768d5f955 ("block: factor out a blk_write_zeroes_limit helper") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240815163228.216051-2-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-08-16blk-cgroup: Remove unused declaration blkg_path()Yue Haibing
Commit bb7e5a193d8b ("block, bfq: remove blkg_path()") removed the implementation but leave declaration. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20240816095821.877842-1-yuehaibing@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-08-15block: Fix lockdep warning in blk_mq_mark_tag_waitLi Lingfeng
Lockdep reported a warning in Linux version 6.6: [ 414.344659] ================================ [ 414.345155] WARNING: inconsistent lock state [ 414.345658] 6.6.0-07439-gba2303cacfda #6 Not tainted [ 414.346221] -------------------------------- [ 414.346712] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. [ 414.347545] kworker/u10:3/1152 [HC0[0]:SC0[0]:HE0:SE1] takes: [ 414.349245] ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0 [ 414.351204] {IN-SOFTIRQ-W} state was registered at: [ 414.351751] lock_acquire+0x18d/0x460 [ 414.352218] _raw_spin_lock_irqsave+0x39/0x60 [ 414.352769] __wake_up_common_lock+0x22/0x60 [ 414.353289] sbitmap_queue_wake_up+0x375/0x4f0 [ 414.353829] sbitmap_queue_clear+0xdd/0x270 [ 414.354338] blk_mq_put_tag+0xdf/0x170 [ 414.354807] __blk_mq_free_request+0x381/0x4d0 [ 414.355335] blk_mq_free_request+0x28b/0x3e0 [ 414.355847] __blk_mq_end_request+0x242/0xc30 [ 414.356367] scsi_end_request+0x2c1/0x830 [ 414.345155] WARNING: inconsistent lock state [ 414.345658] 6.6.0-07439-gba2303cacfda #6 Not tainted [ 414.346221] -------------------------------- [ 414.346712] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. [ 414.347545] kworker/u10:3/1152 [HC0[0]:SC0[0]:HE0:SE1] takes: [ 414.349245] ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0 [ 414.351204] {IN-SOFTIRQ-W} state was registered at: [ 414.351751] lock_acquire+0x18d/0x460 [ 414.352218] _raw_spin_lock_irqsave+0x39/0x60 [ 414.352769] __wake_up_common_lock+0x22/0x60 [ 414.353289] sbitmap_queue_wake_up+0x375/0x4f0 [ 414.353829] sbitmap_queue_clear+0xdd/0x270 [ 414.354338] blk_mq_put_tag+0xdf/0x170 [ 414.354807] __blk_mq_free_request+0x381/0x4d0 [ 414.355335] blk_mq_free_request+0x28b/0x3e0 [ 414.355847] __blk_mq_end_request+0x242/0xc30 [ 414.356367] scsi_end_request+0x2c1/0x830 [ 414.356863] scsi_io_completion+0x177/0x1610 [ 414.357379] scsi_complete+0x12f/0x260 [ 414.357856] blk_complete_reqs+0xba/0xf0 [ 414.358338] __do_softirq+0x1b0/0x7a2 [ 414.358796] irq_exit_rcu+0x14b/0x1a0 [ 414.359262] sysvec_call_function_single+0xaf/0xc0 [ 414.359828] asm_sysvec_call_function_single+0x1a/0x20 [ 414.360426] default_idle+0x1e/0x30 [ 414.360873] default_idle_call+0x9b/0x1f0 [ 414.361390] do_idle+0x2d2/0x3e0 [ 414.361819] cpu_startup_entry+0x55/0x60 [ 414.362314] start_secondary+0x235/0x2b0 [ 414.362809] secondary_startup_64_no_verify+0x18f/0x19b [ 414.363413] irq event stamp: 428794 [ 414.363825] hardirqs last enabled at (428793): [<ffffffff816bfd1c>] ktime_get+0x1dc/0x200 [ 414.364694] hardirqs last disabled at (428794): [<ffffffff85470177>] _raw_spin_lock_irq+0x47/0x50 [ 414.365629] softirqs last enabled at (428444): [<ffffffff85474780>] __do_softirq+0x540/0x7a2 [ 414.366522] softirqs last disabled at (428419): [<ffffffff813f65ab>] irq_exit_rcu+0x14b/0x1a0 [ 414.367425] other info that might help us debug this: [ 414.368194] Possible unsafe locking scenario: [ 414.368900] CPU0 [ 414.369225] ---- [ 414.369548] lock(&sbq->ws[i].wait); [ 414.370000] <Interrupt> [ 414.370342] lock(&sbq->ws[i].wait); [ 414.370802] *** DEADLOCK *** [ 414.371569] 5 locks held by kworker/u10:3/1152: [ 414.372088] #0: ffff88810130e938 ((wq_completion)writeback){+.+.}-{0:0}, at: process_scheduled_works+0x357/0x13f0 [ 414.373180] #1: ffff88810201fdb8 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: process_scheduled_works+0x3a3/0x13f0 [ 414.374384] #2: ffffffff86ffbdc0 (rcu_read_lock){....}-{1:2}, at: blk_mq_run_hw_queue+0x637/0xa00 [ 414.375342] #3: ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0 [ 414.376377] #4: ffff888106205a08 (&hctx->dispatch_wait_lock){+.-.}-{2:2}, at: blk_mq_dispatch_rq_list+0x1337/0x1ee0 [ 414.378607] stack backtrace: [ 414.379177] CPU: 0 PID: 1152 Comm: kworker/u10:3 Not tainted 6.6.0-07439-gba2303cacfda #6 [ 414.380032] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 414.381177] Workqueue: writeback wb_workfn (flush-253:0) [ 414.381805] Call Trace: [ 414.382136] <TASK> [ 414.382429] dump_stack_lvl+0x91/0xf0 [ 414.382884] mark_lock_irq+0xb3b/0x1260 [ 414.383367] ? __pfx_mark_lock_irq+0x10/0x10 [ 414.383889] ? stack_trace_save+0x8e/0xc0 [ 414.384373] ? __pfx_stack_trace_save+0x10/0x10 [ 414.384903] ? graph_lock+0xcf/0x410 [ 414.385350] ? save_trace+0x3d/0xc70 [ 414.385808] mark_lock.part.20+0x56d/0xa90 [ 414.386317] mark_held_locks+0xb0/0x110 [ 414.386791] ? __pfx_do_raw_spin_lock+0x10/0x10 [ 414.387320] lockdep_hardirqs_on_prepare+0x297/0x3f0 [ 414.387901] ? _raw_spin_unlock_irq+0x28/0x50 [ 414.388422] trace_hardirqs_on+0x58/0x100 [ 414.388917] _raw_spin_unlock_irq+0x28/0x50 [ 414.389422] __blk_mq_tag_busy+0x1d6/0x2a0 [ 414.389920] __blk_mq_get_driver_tag+0x761/0x9f0 [ 414.390899] blk_mq_dispatch_rq_list+0x1780/0x1ee0 [ 414.391473] ? __pfx_blk_mq_dispatch_rq_list+0x10/0x10 [ 414.392070] ? sbitmap_get+0x2b8/0x450 [ 414.392533] ? __blk_mq_get_driver_tag+0x210/0x9f0 [ 414.393095] __blk_mq_sched_dispatch_requests+0xd99/0x1690 [ 414.393730] ? elv_attempt_insert_merge+0x1b1/0x420 [ 414.394302] ? __pfx___blk_mq_sched_dispatch_requests+0x10/0x10 [ 414.394970] ? lock_acquire+0x18d/0x460 [ 414.395456] ? blk_mq_run_hw_queue+0x637/0xa00 [ 414.395986] ? __pfx_lock_acquire+0x10/0x10 [ 414.396499] blk_mq_sched_dispatch_requests+0x109/0x190 [ 414.397100] blk_mq_run_hw_queue+0x66e/0xa00 [ 414.397616] blk_mq_flush_plug_list.part.17+0x614/0x2030 [ 414.398244] ? __pfx_blk_mq_flush_plug_list.part.17+0x10/0x10 [ 414.398897] ? writeback_sb_inodes+0x241/0xcc0 [ 414.399429] blk_mq_flush_plug_list+0x65/0x80 [ 414.399957] __blk_flush_plug+0x2f1/0x530 [ 414.400458] ? __pfx___blk_flush_plug+0x10/0x10 [ 414.400999] blk_finish_plug+0x59/0xa0 [ 414.401467] wb_writeback+0x7cc/0x920 [ 414.401935] ? __pfx_wb_writeback+0x10/0x10 [ 414.402442] ? mark_held_locks+0xb0/0x110 [ 414.402931] ? __pfx_do_raw_spin_lock+0x10/0x10 [ 414.403462] ? lockdep_hardirqs_on_prepare+0x297/0x3f0 [ 414.404062] wb_workfn+0x2b3/0xcf0 [ 414.404500] ? __pfx_wb_workfn+0x10/0x10 [ 414.404989] process_scheduled_works+0x432/0x13f0 [ 414.405546] ? __pfx_process_scheduled_works+0x10/0x10 [ 414.406139] ? do_raw_spin_lock+0x101/0x2a0 [ 414.406641] ? assign_work+0x19b/0x240 [ 414.407106] ? lock_is_held_type+0x9d/0x110 [ 414.407604] worker_thread+0x6f2/0x1160 [ 414.408075] ? __kthread_parkme+0x62/0x210 [ 414.408572] ? lockdep_hardirqs_on_prepare+0x297/0x3f0 [ 414.409168] ? __kthread_parkme+0x13c/0x210 [ 414.409678] ? __pfx_worker_thread+0x10/0x10 [ 414.410191] kthread+0x33c/0x440 [ 414.410602] ? __pfx_kthread+0x10/0x10 [ 414.411068] ret_from_fork+0x4d/0x80 [ 414.411526] ? __pfx_kthread+0x10/0x10 [ 414.411993] ret_from_fork_asm+0x1b/0x30 [ 414.412489] </TASK> When interrupt is turned on while a lock holding by spin_lock_irq it throws a warning because of potential deadlock. blk_mq_prep_dispatch_rq blk_mq_get_driver_tag __blk_mq_get_driver_tag __blk_mq_alloc_driver_tag blk_mq_tag_busy -> tag is already busy // failed to get driver tag blk_mq_mark_tag_wait spin_lock_irq(&wq->lock) -> lock A (&sbq->ws[i].wait) __add_wait_queue(wq, wait) -> wait queue active blk_mq_get_driver_tag __blk_mq_tag_busy -> 1) tag must be idle, which means there can't be inflight IO spin_lock_irq(&tags->lock) -> lock B (hctx->tags) spin_unlock_irq(&tags->lock) -> unlock B, turn on interrupt accidentally -> 2) context must be preempt by IO interrupt to trigger deadlock. As shown above, the deadlock is not possible in theory, but the warning still need to be fixed. Fix it by using spin_lock_irqsave to get lockB instead of spin_lock_irq. Fixes: 4f1731df60f9 ("blk-mq: fix potential io hang by wrong 'wake_batch'") Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240815024736.2040971-1-lilingfeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-08-13block: constify ext_pi_ref_escape()Alexey Dobriyan
This function doesn't mutate data. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/d24611b3-dddf-473a-903d-39290db03b11@p183 Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-08-13block: delete module stuff from t10-piAlexey Dobriyan
It is not possible to build t10-pi.ko anymore. This file doesn't even export functions. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/216ccc79-5b80-47b2-b507-990951aa810c@p183 Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-08-12scsi: block: Don't check REQ_ATOMIC for readsJohn Garry
We check in submit_bio_noacct() if flag REQ_ATOMIC is set for both read and write operations, and then validate the atomic operation if set. Flag REQ_ATOMIC can only be set for writes, so don't bother checking for reads. Fixes: 9da3d1e912f3 ("block: Add core atomic write support") Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240805113315.1048591-3-john.g.garry@oracle.com Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-08-07fs: Convert aops->write_begin to take a folioMatthew Wilcox (Oracle)
Convert all callers from working on a page to working on one page of a folio (support for working on an entire folio can come later). Removes a lot of folio->page->folio conversions. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-07fs: Convert aops->write_end to take a folioMatthew Wilcox (Oracle)
Most callers have a folio, and most implementations operate on a folio, so remove the conversion from folio->page->folio to fit through this interface. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-07buffer: Convert block_write_end() to take a folioMatthew Wilcox (Oracle)
All callers now have a folio, so pass it in instead of converting from a folio to a page and back to a folio again. Saves a call to compound_head(). Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-07block: Use a folio in blkdev_write_end()Matthew Wilcox (Oracle)
Replaces two hidden calls to compound_head() with one explicit one. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-28blk-ioprio: remove per-disk structureYu Kuai
ioprio works on the blk-cgroup level, all disks in the same cgroup are the same, and the struct ioprio_blkg doesn't have anything in it. Hence register the policy is enough, because cpd_alloc/free_fn will be handled for each blk-cgroup, and there is no need to activate the policy for disk. Hence remove blk_ioprio_init/exit and ioprio_alloc/free_pd. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240719071506.158075-4-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-28blk-ioprio: remove ioprio_blkcg_from_bio()Yu Kuai
Currently, if config is enabled, then ioprio is always enabled by default from blkcg_init_disk(), hence there is no point to check if the policy is enabled from blkg in ioprio_blkcg_from_bio(). Hence remove it and get blkcg directly from bio. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240719071506.158075-3-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-28blk-cgroup: check for pd_(alloc|free)_fn in blkcg_activate_policy()Yu Kuai
Currently all policies implement pd_(alloc|free)_fn, however, this is not necessary for ioprio that only works for blkcg, not blkg. There are no functional changes, prepare to cleanup activating ioprio policy. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20240719071506.158075-2-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-27blk-throttle: remove more latency dead-codeDr. David Alan Gilbert
The struct 'latency_bucket' and the #define 'request_bucket_index' are unused since commit bf20ab538c81 ("blk-throttle: remove CONFIG_BLK_DEV_THROTTLING_LOW") and the 'LATENCY_BUCKET_SIZE' #define was only used by the 'request_bucket_index' define. Remove them. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://lore.kernel.org/r/20240727155824.1000042-1-linux@treblig.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-24block: fix deadlock between sd_remove & sd_releaseYang Yang
Our test report the following hung task: [ 2538.459400] INFO: task "kworker/0:0":7 blocked for more than 188 seconds. [ 2538.459427] Call trace: [ 2538.459430] __switch_to+0x174/0x338 [ 2538.459436] __schedule+0x628/0x9c4 [ 2538.459442] schedule+0x7c/0xe8 [ 2538.459447] schedule_preempt_disabled+0x24/0x40 [ 2538.459453] __mutex_lock+0x3ec/0xf04 [ 2538.459456] __mutex_lock_slowpath+0x14/0x24 [ 2538.459459] mutex_lock+0x30/0xd8 [ 2538.459462] del_gendisk+0xdc/0x350 [ 2538.459466] sd_remove+0x30/0x60 [ 2538.459470] device_release_driver_internal+0x1c4/0x2c4 [ 2538.459474] device_release_driver+0x18/0x28 [ 2538.459478] bus_remove_device+0x15c/0x174 [ 2538.459483] device_del+0x1d0/0x358 [ 2538.459488] __scsi_remove_device+0xa8/0x198 [ 2538.459493] scsi_forget_host+0x50/0x70 [ 2538.459497] scsi_remove_host+0x80/0x180 [ 2538.459502] usb_stor_disconnect+0x68/0xf4 [ 2538.459506] usb_unbind_interface+0xd4/0x280 [ 2538.459510] device_release_driver_internal+0x1c4/0x2c4 [ 2538.459514] device_release_driver+0x18/0x28 [ 2538.459518] bus_remove_device+0x15c/0x174 [ 2538.459523] device_del+0x1d0/0x358 [ 2538.459528] usb_disable_device+0x84/0x194 [ 2538.459532] usb_disconnect+0xec/0x300 [ 2538.459537] hub_event+0xb80/0x1870 [ 2538.459541] process_scheduled_works+0x248/0x4dc [ 2538.459545] worker_thread+0x244/0x334 [ 2538.459549] kthread+0x114/0x1bc [ 2538.461001] INFO: task "fsck.":15415 blocked for more than 188 seconds. [ 2538.461014] Call trace: [ 2538.461016] __switch_to+0x174/0x338 [ 2538.461021] __schedule+0x628/0x9c4 [ 2538.461025] schedule+0x7c/0xe8 [ 2538.461030] blk_queue_enter+0xc4/0x160 [ 2538.461034] blk_mq_alloc_request+0x120/0x1d4 [ 2538.461037] scsi_execute_cmd+0x7c/0x23c [ 2538.461040] ioctl_internal_command+0x5c/0x164 [ 2538.461046] scsi_set_medium_removal+0x5c/0xb0 [ 2538.461051] sd_release+0x50/0x94 [ 2538.461054] blkdev_put+0x190/0x28c [ 2538.461058] blkdev_release+0x28/0x40 [ 2538.461063] __fput+0xf8/0x2a8 [ 2538.461066] __fput_sync+0x28/0x5c [ 2538.461070] __arm64_sys_close+0x84/0xe8 [ 2538.461073] invoke_syscall+0x58/0x114 [ 2538.461078] el0_svc_common+0xac/0xe0 [ 2538.461082] do_el0_svc+0x1c/0x28 [ 2538.461087] el0_svc+0x38/0x68 [ 2538.461090] el0t_64_sync_handler+0x68/0xbc [ 2538.461093] el0t_64_sync+0x1a8/0x1ac T1: T2: sd_remove del_gendisk __blk_mark_disk_dead blk_freeze_queue_start ++q->mq_freeze_depth bdev_release mutex_lock(&disk->open_mutex) sd_release scsi_execute_cmd blk_queue_enter wait_event(!q->mq_freeze_depth) mutex_lock(&disk->open_mutex) SCSI does not set GD_OWNS_QUEUE, so QUEUE_FLAG_DYING is not set in this scenario. This is a classic ABBA deadlock. To fix the deadlock, make sure we don't try to acquire disk->open_mutex after freezing the queue. Cc: stable@vger.kernel.org Fixes: eec1be4c30df ("block: delete partitions later in del_gendisk") Signed-off-by: Yang Yang <yang.yang@vivo.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Fixes: and Cc: stable tags are missing. Otherwise this patch looks fine Link: https://lore.kernel.org/r/20240724070412.22521-1-yang.yang@vivo.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-22Merge tag 'for-6.11/block-20240722' of git://git.kernel.dk/linuxLinus Torvalds
Pull more block updates from Jens Axboe: - MD fixes via Song: - md-cluster fixes (Heming Zhao) - raid1 fix (Mateusz Jończyk) - s390/dasd module description (Jeff) - Series cleaning up and hardening the blk-mq debugfs flag handling (John, Christoph) - blk-cgroup cleanup (Xiu) - Error polled IO attempts if backend doesn't support it (hexue) - Fix for an sbitmap hang (Yang) * tag 'for-6.11/block-20240722' of git://git.kernel.dk/linux: (23 commits) blk-cgroup: move congestion_count to struct blkcg sbitmap: fix io hung due to race on sbitmap_word::cleared block: avoid polling configuration errors block: Catch possible entries missing from rqf_name[] block: Simplify definition of RQF_NAME() block: Use enum to define RQF_x bit indexes block: Catch possible entries missing from cmd_flag_name[] block: Catch possible entries missing from alloc_policy_name[] block: Catch possible entries missing from hctx_flag_name[] block: Catch possible entries missing from hctx_state_name[] block: Catch possible entries missing from blk_queue_flag_name[] block: Make QUEUE_FLAG_x as an enum block: Relocate BLK_MQ_MAX_DEPTH block: Relocate BLK_MQ_CPU_WORK_BATCH block: remove QUEUE_FLAG_STOPPED block: Add missing entry to hctx_flag_name[] block: Add zone write plugging entry to rqf_name[] block: Add missing entries from cmd_flag_name[] s390/dasd: fix error checks in dasd_copy_pair_store() s390/dasd: add missing MODULE_DESCRIPTION() macros ...
2024-07-22Merge tag 'for-6.11/block-post-20240722' of git://git.kernel.dk/linuxLinus Torvalds
Pull block integrity mapping updates from Jens Axboe: "A set of cleanups and fixes for the block integrity support. Sent separately from the main block changes from last week, as they depended on later fixes in the 6.10-rc cycle" * tag 'for-6.11/block-post-20240722' of git://git.kernel.dk/linux: block: don't free the integrity payload in bio_integrity_unmap_free_user block: don't free submitter owned integrity payload on I/O completion block: call bio_integrity_unmap_free_user from blk_rq_unmap_user block: don't call bio_uninit from bio_endio block: also return bio_integrity_payload * from stubs block: split integrity support out of bio.h
2024-07-19Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull SCSI updates from James Bottomley: "Updates to the usual drivers (ufs, lpfc, qla2xxx, mpi3mr) plus some misc small fixes. The only core changes are to both bsg and scsi to pass in the device instead of setting it afterwards as q->queuedata, so no functional change" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (69 commits) scsi: aha152x: Use DECLARE_COMPLETION_ONSTACK for non-constant completion scsi: qla2xxx: Convert comma to semicolon scsi: qla2xxx: Update version to 10.02.09.300-k scsi: qla2xxx: Use QP lock to search for bsg scsi: qla2xxx: Reduce fabric scan duplicate code scsi: qla2xxx: Fix optrom version displayed in FDMI scsi: qla2xxx: During vport delete send async logout explicitly scsi: qla2xxx: Complete command early within lock scsi: qla2xxx: Fix flash read failure scsi: qla2xxx: Return ENOBUFS if sg_cnt is more than one for ELS cmds scsi: qla2xxx: Fix for possible memory corruption scsi: qla2xxx: validate nvme_local_port correctly scsi: qla2xxx: Unable to act on RSCN for port online scsi: ufs: exynos: Add support for Flash Memory Protector (FMP) scsi: ufs: core: Add UFSHCD_QUIRK_KEYS_IN_PRDT scsi: ufs: core: Add fill_crypto_prdt variant op scsi: ufs: core: Add UFSHCD_QUIRK_BROKEN_CRYPTO_ENABLE scsi: ufs: core: fold ufshcd_clear_keyslot() into its caller scsi: ufs: core: Add UFSHCD_QUIRK_CUSTOM_CRYPTO_PROFILE scsi: ufs: mcq: Make .get_hba_mac() optional ...
2024-07-19blk-cgroup: move congestion_count to struct blkcgXiu Jianfeng
The congestion_count was introduced into the struct cgroup by commit d09d8df3a294 ("blkcg: add generic throttling mechanism"), but since it is closely related to the blkio subsys, it is not appropriate to put it in the struct cgroup, so let's move it to struct blkcg. There should be no functional changes because blkcg is per cgroup. Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20240716133058.3491350-1-xiujianfeng@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-19block: avoid polling configuration errorshexue
This patch adds a poll queue check, aiming to help users use polled IO accurately. If users do polled IO but the device doesn't have poll queues, they will get suboptimal performance data and waste CPU resources. Add a poll queue check batching this. If users don't have the device properly configured, or if it simply doesn't support polled IO, it will error the IO with -EOPNOTSUPP. This is similar to what we used to do for sync polled IO, which is no longer supported. Signed-off-by: hexue <xue01.he@samsung.com> Link: https://lore.kernel.org/r/20240718070817.1031494-1-xue01.he@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-19block: Catch possible entries missing from rqf_name[]John Garry
Add a BUILD_BUG_ON() call to ensure that we are not missing entries in rqf_name[]. Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240719112912.3830443-16-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-19block: Simplify definition of RQF_NAME()John Garry
Now that we have a bit index for RQF_x in __RQF_x, use __RQF_x to simplify the definition of RQF_NAME() by not using ilog2((__force u32()). Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240719112912.3830443-15-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-19block: Catch possible entries missing from cmd_flag_name[]John Garry
Add a BUILD_BUG_ON() call to ensure that we are not missing entries in cmd_flag_name[]. Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240719112912.3830443-13-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-19block: Catch possible entries missing from alloc_policy_name[]John Garry
Make BLK_TAG_ALLOC_x an enum and add a "max" entry. Add a BUILD_BUG_ON() call to ensure that we are not missing entries in hctx_flag_name[]. Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240719112912.3830443-12-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-19block: Catch possible entries missing from hctx_flag_name[]John Garry
Refresh values in BLK_MQ_F_x enum, and then re-arrange members in hctx_flag_name[] to match that enum. Renumber BLK_MQ_F_ALLOC_POLICY_START_BIT to match the value refresh. Add a BUILD_BUG_ON() call to ensure that we are not missing entries in hctx_flag_name[]. Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240719112912.3830443-11-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-19block: Catch possible entries missing from hctx_state_name[]John Garry
Add a build-time assert that we are not missing entries from hctx_state_name[]. For this, create a separate enum for state flags and add a "max" entry for BLK_MQ_S_x flags. The numbering for those enum values is as default, so don't explicitly number. Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240719112912.3830443-10-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-19block: Catch possible entries missing from blk_queue_flag_name[]John Garry
Assert that we are not missing flag entries in blk_queue_flag_name[]. Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240719112912.3830443-9-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-19block: Relocate BLK_MQ_CPU_WORK_BATCHJohn Garry
BLK_MQ_CPU_WORK_BATCH is defined in include/linux/blk-mq.h, but only used in blk-mq.c, so relocate to block/blk-mq.h Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240719112912.3830443-6-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-19block: remove QUEUE_FLAG_STOPPEDChristoph Hellwig
QUEUE_FLAG_STOPPED is entirely unused. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240719112912.3830443-5-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-19block: Add missing entry to hctx_flag_name[]John Garry
Add missing entry for NO_SCHED_BY_DEFAULT and reorder to match the enum. Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240719112912.3830443-4-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-19block: Add zone write plugging entry to rqf_name[]John Garry
Add missing entry. Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240719112912.3830443-3-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-19block: Add missing entries from cmd_flag_name[]John Garry
Add missing entries for req_flag_bits. Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240719112912.3830443-2-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-15Merge tag 'for-6.11/block-20240710' of git://git.kernel.dk/linuxLinus Torvalds
Pull block updates from Jens Axboe: - NVMe updates via Keith: - Device initialization memory leak fixes (Keith) - More constants defined (Weiwen) - Target debugfs support (Hannes) - PCIe subsystem reset enhancements (Keith) - Queue-depth multipath policy (Redhat and PureStorage) - Implement get_unique_id (Christoph) - Authentication error fixes (Gaosheng) - MD updates via Song - sync_action fix and refactoring (Yu Kuai) - Various small fixes (Christoph Hellwig, Li Nan, and Ofir Gal, Yu Kuai, Benjamin Marzinski, Christophe JAILLET, Yang Li) - Fix loop detach/open race (Gulam) - Fix lower control limit for blk-throttle (Yu) - Add module descriptions to various drivers (Jeff) - Add support for atomic writes for block devices, and statx reporting for same. Includes SCSI and NVMe (John, Prasad, Alan) - Add IO priority information to block trace points (Dongliang) - Various zone improvements and tweaks (Damien) - mq-deadline tag reservation improvements (Bart) - Ignore direct reclaim swap writes in writeback throttling (Baokun) - Block integrity improvements and fixes (Anuj) - Add basic support for rust based block drivers. Has a dummy null_blk variant for now (Andreas) - Series converting driver settings to queue limits, and cleanups and fixes related to that (Christoph) - Cleanup for poking too deeply into the bvec internals, in preparation for DMA mapping API changes (Christoph) - Various minor tweaks and fixes (Jiapeng, John, Kanchan, Mikulas, Ming, Zhu, Damien, Christophe, Chaitanya) * tag 'for-6.11/block-20240710' of git://git.kernel.dk/linux: (206 commits) floppy: add missing MODULE_DESCRIPTION() macro loop: add missing MODULE_DESCRIPTION() macro ublk_drv: add missing MODULE_DESCRIPTION() macro xen/blkback: add missing MODULE_DESCRIPTION() macro block/rnbd: Constify struct kobj_type block: take offset into account in blk_bvec_map_sg again block: fix get_max_segment_size() warning loop: Don't bother validating blocksize virtio_blk: Don't bother validating blocksize null_blk: Don't bother validating blocksize block: Validate logical block size in blk_validate_limits() virtio_blk: Fix default logical block size fallback nvmet-auth: fix nvmet_auth hash error handling nvme: implement ->get_unique_id block: pass a phys_addr_t to get_max_segment_size block: add a bvec_phys helper blk-lib: check for kill signal in ioctl BLKZEROOUT block: limit the Write Zeroes to manually writing zeroes fallback block: refacto blkdev_issue_zeroout block: move read-only and supported checks into (__)blkdev_issue_zeroout ...
2024-07-09block: take offset into account in blk_bvec_map_sg againChristoph Hellwig
The rebase of commit 09595e0c9d65 ("block: pass a phys_addr_t to get_max_segment_size") lost adding the total to to the offset in blk_bvec_map_sg. Add it back. Fixes: 09595e0c9d65 ("block: pass a phys_addr_t to get_max_segment_size") Reported-by: Yi Zhang <yi.zhang@redhat.com> Reported-by: Chaitanya Kulkarni <chaitanyak@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240709070126.3019940-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-09block: fix get_max_segment_size() warningChaitanya Kulkarni
Correct the parameter name in the comment of get_max_segment_size() to fix following warning:- block/blk-merge.c:220: warning: Function parameter or struct member 'len' not described in 'get_max_segment_size' block/blk-merge.c:220: warning: Excess function parameter 'max_len' description in 'get_max_segment_size' Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240709045432.8688-1-kch@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-09block: Validate logical block size in blk_validate_limits()John Garry
Some drivers validate that their own logical block size. It is no harm to always do this, so validate in blk_validate_limits(). This allows us to remove the validation in most of those drivers. Add a comment to blk_validate_block_size() to inform users that self- validation of LBS is usually unnecessary. Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240708091651.177447-3-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-08block: pass a phys_addr_t to get_max_segment_sizeChristoph Hellwig
Work on a single address to simplify the logic, and prepare the callers from using better helpers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20240706075228.2350978-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-08block: add a bvec_phys helperChristoph Hellwig
Get callers out of poking into bvec internals a bit more. Not a huge win right now, but with the proposed new DMA mapping API we might end up with a lot more of this otherwise. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20240706075228.2350978-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-05blk-lib: check for kill signal in ioctl BLKZEROOUTChristoph Hellwig
Zeroout can access a significant capacity and take longer than the user expected. A user may change their mind about wanting to run that command and attempt to kill the process and do something else with their device. But since the task is uninterruptable, they have to wait for it to finish, which could be many hours. Add a new BLKDEV_ZERO_KILLABLE flag for blkdev_issue_zeroout that checks for a fatal signal at each iteration so the user doesn't have to wait for their regretted operation to complete naturally. Heavily based on an earlier patch from Keith Busch. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240701165219.1571322-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-05block: limit the Write Zeroes to manually writing zeroes fallbackChristoph Hellwig
Only fall back from hardware Write Zeroes failures when blkdev_issue_write_zeroes returns -EOPNOTSUPP; Note that blkdev_issue_write_zeroes turns any failure into -EOPNOTSUPP when the write zeroes queue limit has been cleared to 0, so this still catches all I/O errors where the driver detected missing support for the hardware acceleration. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240701165219.1571322-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-05block: refacto blkdev_issue_zerooutChristoph Hellwig
Split out two well-defined helpers for hardware supported Write Zeroes and manually writing zeroes using the Write command. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240701165219.1571322-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>