summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-04-18block: turn bdev->bd_openers into an atomic_tChristoph Hellwig
All manipulation of bd_openers is under disk->open_mutex and will remain so for the foreseeable future. But at least one place reads it without the lock (blkdev_get) and there are more to be added. So make sure the compiler does not do turn the increments and decrements into non-atomic sequences by using an atomic_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220330052917.2566582-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-18block: add a disk_openers helperChristoph Hellwig
Add a helper that returns the openers for a given gendisk to avoid having drivers poke into disk->part0 to get at this information in a somewhat cumbersome way. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220330052917.2566582-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-18zram: cleanup zram_removeChristoph Hellwig
Remove the bdev variable and just use the gendisk pointed to by the zram_device directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220330052917.2566582-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-18zram: cleanup reset_storeChristoph Hellwig
Use a local variable for the gendisk instead of the part0 block_device, as the gendisk is what this function actually operates on. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220330052917.2566582-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-18nbd: use the correct block_device in nbd_bdev_resetChristoph Hellwig
The bdev parameter to ->ioctl contains the block device that the ioctl is called on, which can be the partition. But the openers check in nbd_bdev_reset really needs to check use the whole device, so switch to using that. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220330052917.2566582-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17drbd: Return true/false (not 1/0) from bool functionsHaowen Bai
Return boolean values ("true" or "false") instead of 1 or 0 from bool functions. This fixes the following warnings from coccicheck: ./drivers/block/drbd/drbd_req.c:912:9-10: WARNING: return of 0/1 in function 'remote_due_to_read_balancing' with return type bool Signed-off-by: Haowen Bai <baihaowen@meizu.com> Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20220406190715.1938174-8-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17drdb: Switch to kvfree_rcu() APIUladzislau Rezki (Sony)
Instead of invoking a synchronize_rcu() to free a pointer after a grace period we can directly make use of new API that does the same but in more efficient way. TO: Jens Axboe <axboe@kernel.dk> TO: Philipp Reisner <philipp.reisner@linbit.com> TO: Jason Gunthorpe <jgg@nvidia.com> TO: drbd-dev@lists.linbit.com TO: linux-block@vger.kernel.org Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20220406190715.1938174-7-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17drbd: Replace "unsigned" with "unsigned int"Cai Huoqing
when run checkpath.pl for the first patch, found that WARNING: Prefer 'unsigned int' to bare use of 'unsigned'. so fix it. BTW Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20220406190715.1938174-6-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17drbd: Make use of PFN_UP helper macroCai Huoqing
it's a refactor to make use of PFN_UP helper macro Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20220406190715.1938174-5-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17block: drbd: drbd_receiver: Remove redundant assignment to errJiapeng Chong
Variable err is set to '-EIO' but this value is never read as it is overwritten or not used later on, hence it is a redundant assignment and can be removed. Clean up the following clang-analyzer warning: drivers/block/drbd/drbd_receiver.c:3955:5: warning: Value stored to 'err' is never read [clang-analyzer-deadcode.DeadStores]. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20220406190715.1938174-4-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17drbd: address enum mismatch warningsArnd Bergmann
gcc -Wextra warns about mixing drbd_state_rv with drbd_ret_code in a couple of places: drivers/block/drbd/drbd_nl.c: In function 'drbd_adm_set_role': drivers/block/drbd/drbd_nl.c:777:14: warning: comparison between 'enum drbd_state_rv' and 'enum drbd_ret_code' [-Wenum-compare] 777 | if (retcode != NO_ERROR) | ^~ drivers/block/drbd/drbd_nl.c:784:12: warning: implicit conversion from 'enum drbd_ret_code' to 'enum drbd_state_rv' [-Wenum-conversion] 784 | retcode = ERR_MANDATORY_TAG; | ^ drivers/block/drbd/drbd_nl.c: In function 'drbd_adm_attach': drivers/block/drbd/drbd_nl.c:1965:10: warning: implicit conversion from 'enum drbd_state_rv' to 'enum drbd_ret_code' [-Wenum-conversion] 1965 | retcode = rv; /* FIXME: Type mismatch. */ | ^ drivers/block/drbd/drbd_nl.c: In function 'drbd_adm_connect': drivers/block/drbd/drbd_nl.c:2690:10: warning: implicit conversion from 'enum drbd_state_rv' to 'enum drbd_ret_code' [-Wenum-conversion] 2690 | retcode = conn_request_state(connection, NS(conn, C_UNCONNECTED), CS_VERBOSE); | ^ drivers/block/drbd/drbd_nl.c: In function 'drbd_adm_disconnect': drivers/block/drbd/drbd_nl.c:2803:11: warning: implicit conversion from 'enum drbd_state_rv' to 'enum drbd_ret_code' [-Wenum-conversion] 2803 | retcode = rv; /* FIXME: Type mismatch. */ | ^ In each case, both are passed into drbd_adm_finish(), which just takes a 32-bit integer and is happy with either, presumably intentionally. Restructure the code to pass either type directly in there in most cases, avoiding the warnings. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20220406190715.1938174-3-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17drbd: fix duplicate array initializerArnd Bergmann
There are two initializers for P_RETRY_WRITE: drivers/block/drbd/drbd_main.c:3676:22: warning: initialized field overwritten [-Woverride-init] Remove the first one since it was already ignored by the compiler and reorder the list to match the enum definition. As P_ZEROES had no entry, add that one instead. Fixes: 036b17eaab93 ("drbd: Receiving part for the PROTOCOL_UPDATE packet") Fixes: f31e583aa2c2 ("drbd: introduce P_ZEROES (REQ_OP_WRITE_ZEROES on the "wire")") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20220406190715.1938174-2-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17direct-io: remove random prefetchesChristoph Hellwig
Randomly poking into block device internals for manual prefetches isn't exactly a very maintainable thing to do. And none of the performance critical direct I/O implementations still use this library function anyway, so just drop it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20220415045258.199825-28-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17block: decouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARDChristoph Hellwig
Secure erase is a very different operation from discard in that it is a data integrity operation vs hint. Fully split the limits and helper infrastructure to make the separation more clear. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> [nifs2] Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> [f2fs] Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Acked-by: Chao Yu <chao@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-27-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17block: add a bdev_discard_granularity helperChristoph Hellwig
Abstract away implementation details from file systems by providing a block_device based helper to retrieve the discard granularity. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Acked-by: David Sterba <dsterba@suse.com> [btrfs] Link: https://lore.kernel.org/r/20220415045258.199825-26-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17block: remove QUEUE_FLAG_DISCARDChristoph Hellwig
Just use a non-zero max_discard_sectors as an indicator for discard support, similar to what is done for write zeroes. The only places where needs special attention is the RAID5 driver, which must clear discard support for security reasons by default, even if the default stacking rules would allow for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Jan Höppner <hoeppner@linux.ibm.com> [s390] Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-25-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17block: add a bdev_max_discard_sectors helperChristoph Hellwig
Add a helper to query the number of sectors support per each discard bio based on the block device and use this helper to stop various places from poking into the request_queue to see if discard is supported and if so how much. This mirrors what is done e.g. for write zeroes as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-24-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17block: refactor discard bio size limitingChristoph Hellwig
Move all the logic to limit the discard bio size into a common helper so that it is better documented. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Coly Li <colyli@suse.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-23-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17block: move {bdev,queue_limit}_discard_alignment out of lineChristoph Hellwig
No need to inline these fairly larger helpers. Also fix the return value to be unsigned, just like the field in struct queue_limits. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220415045258.199825-22-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17block: use bdev_discard_alignment in part_discard_alignment_showChristoph Hellwig
Use the bdev based alignment helper instead of open coding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220415045258.199825-21-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17block: remove queue_discard_alignmentChristoph Hellwig
Just use bdev_alignment_offset in disk_discard_alignment_show instead. That helpers is the same except for an always false branch that doesn't matter in this slow path. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220415045258.199825-20-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17block: move bdev_alignment_offset and queue_limit_alignment_offset out of lineChristoph Hellwig
No need to inline these fairly larger helpers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220415045258.199825-19-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17block: use bdev_alignment_offset in disk_alignment_offset_showChristoph Hellwig
This does the same as the open coded variant except for an extra branch, and allows to remove queue_alignment_offset entirely. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220415045258.199825-18-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17block: use bdev_alignment_offset in part_alignment_offset_showChristoph Hellwig
Replace the open coded offset calculation with the proper helper. This is an ABI change in that the -1 for a misaligned partition is properly propagated, which can be considered a bug fix and matches what is done on the whole device. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-17-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17block: add a bdev_max_zone_append_sectors helperChristoph Hellwig
Add a helper to check the max supported sectors for zone append based on the block_device instead of having to poke into the block layer internal request_queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17block: add a bdev_stable_writes helperChristoph Hellwig
Add a helper to check the stable writes flag based on the block_device instead of having to poke into the block layer internal request_queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17block: add a bdev_fua helperChristoph Hellwig
Add a helper to check the FUA flag based on the block_device instead of having to poke into the block layer internal request_queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17block: add a bdev_write_cache helperChristoph Hellwig
Add a helper to check the write cache flag based on the block_device instead of having to poke into the block layer internal request_queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: David Sterba <dsterba@suse.com> [btrfs] Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-13-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17block: add a bdev_nonrot helperChristoph Hellwig
Add a helper to check the nonrot flag based on the block_device instead of having to poke into the block layer internal request_queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: David Sterba <dsterba@suse.com> [btrfs] Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17mm: use bdev_is_zoned in claim_swapfileChristoph Hellwig
Use the bdev based helper instead of poking into the queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17ntfs3: use bdev_logical_block_size instead of open coding itChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220415045258.199825-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17btrfs: use bdev_max_active_zones instead of open coding itChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Acked-by: David Sterba <dsterba@suse.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17drbd: cleanup decide_on_discard_supportChristoph Hellwig
Sanitize the calling conventions and use a goto label to cleanup the code flow. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20220415045258.199825-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17drbd: use bdev_alignment_offset instead of queue_alignment_offsetChristoph Hellwig
The bdev version does the right thing for partitions, so use that. Fixes: 9104d31a759f ("drbd: introduce WRITE_SAME support") Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20220415045258.199825-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17drbd: use bdev based limit helpers in drbd_send_sizesChristoph Hellwig
Use the bdev based limits helpers where they exist. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20220415045258.199825-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17drbd: remove assign_p_sizes_qlimChristoph Hellwig
Fold each branch into its only caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20220415045258.199825-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17target: fix discard alignment on partitionsChristoph Hellwig
Use the proper bdev_discard_alignment helper that accounts for partition offsets. Fixes: c66ac9db8d4a ("[SCSI] target: Add LIO target core v4.0.0-rc6") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17target: pass a block_device to target_configure_unmap_from_queueChristoph Hellwig
The SCSI target drivers is a consumer of the block layer and shoul d generally work on struct block_device. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17target: remove an incorrect unmap zeroes data deductionChristoph Hellwig
For block devices, the SCSI target drivers implements UNMAP as calls to blkdev_issue_discard, which does not guarantee zeroing just because Write Zeroes is supported. Note that this does not affect the file backed path which uses fallocate to punch holes. Fixes: 2237498f0b5c ("target/iblock: Convert WRITE_SAME to blkdev_issue_zeroout") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17bfq: Make sure bfqg for which we are queueing requests is onlineJan Kara
Bios queued into BFQ IO scheduler can be associated with a cgroup that was already offlined. This may then cause insertion of this bfq_group into a service tree. But this bfq_group will get freed as soon as last bio associated with it is completed leading to use after free issues for service tree users. Fix the problem by making sure we always operate on online bfq_group. If the bfq_group associated with the bio is not online, we pick the first online parent. CC: stable@vger.kernel.org Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical scheduling and cgroups support") Tested-by: "yukuai (C)" <yukuai3@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220401102752.8599-9-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17bfq: Get rid of __bio_blkcg() usageJan Kara
BFQ usage of __bio_blkcg() is a relict from the past. Furthermore if bio would not be associated with any blkcg, the usage of __bio_blkcg() in BFQ is prone to races with the task being migrated between cgroups as __bio_blkcg() calls at different places could return different blkcgs. Convert BFQ to the new situation where bio->bi_blkg is initialized in bio_set_dev() and thus practically always valid. This allows us to save blkcg_gq lookup and noticeably simplify the code. CC: stable@vger.kernel.org Fixes: 0fe061b9f03c ("blkcg: fix ref count issue with bio_blkcg() using task_css") Tested-by: "yukuai (C)" <yukuai3@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220401102752.8599-8-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17bfq: Track whether bfq_group is still onlineJan Kara
Track whether bfq_group is still online. We cannot rely on blkcg_gq->online because that gets cleared only after all policies are offlined and we need something that gets updated already under bfqd->lock when we are cleaning up our bfq_group to be able to guarantee that when we see online bfq_group, it will stay online while we are holding bfqd->lock lock. CC: stable@vger.kernel.org Tested-by: "yukuai (C)" <yukuai3@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220401102752.8599-7-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17bfq: Remove pointless bfq_init_rq() callsJan Kara
We call bfq_init_rq() from request merging functions where requests we get should have already gone through bfq_init_rq() during insert and anyway we want to do anything only if the request is already tracked by BFQ. So replace calls to bfq_init_rq() with RQ_BFQQ() instead to simply skip requests untracked by BFQ. We move bfq_init_rq() call in bfq_insert_request() a bit earlier to cover request merging and thus can transfer FIFO position in case of a merge. CC: stable@vger.kernel.org Tested-by: "yukuai (C)" <yukuai3@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220401102752.8599-6-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17bfq: Drop pointless unlock-lock pairJan Kara
In bfq_insert_request() we unlock bfqd->lock only to call trace_block_rq_insert() and then lock bfqd->lock again. This is really pointless since tracing is disabled if we really care about performance and even if the tracepoint is enabled, it is a quick call. CC: stable@vger.kernel.org Tested-by: "yukuai (C)" <yukuai3@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220401102752.8599-5-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17bfq: Update cgroup information before merging bioJan Kara
When the process is migrated to a different cgroup (or in case of writeback just starts submitting bios associated with a different cgroup) bfq_merge_bio() can operate with stale cgroup information in bic. Thus the bio can be merged to a request from a different cgroup or it can result in merging of bfqqs for different cgroups or bfqqs of already dead cgroups and causing possible use-after-free issues. Fix the problem by updating cgroup information in bfq_merge_bio(). CC: stable@vger.kernel.org Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical scheduling and cgroups support") Tested-by: "yukuai (C)" <yukuai3@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220401102752.8599-4-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17bfq: Split shared queues on move between cgroupsJan Kara
When bfqq is shared by multiple processes it can happen that one of the processes gets moved to a different cgroup (or just starts submitting IO for different cgroup). In case that happens we need to split the merged bfqq as otherwise we will have IO for multiple cgroups in one bfqq and we will just account IO time to wrong entities etc. Similarly if the bfqq is scheduled to merge with another bfqq but the merge didn't happen yet, cancel the merge as it need not be valid anymore. CC: stable@vger.kernel.org Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical scheduling and cgroups support") Tested-by: "yukuai (C)" <yukuai3@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220401102752.8599-3-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17bfq: Avoid merging queues with different parentsJan Kara
It can happen that the parent of a bfqq changes between the moment we decide two queues are worth to merge (and set bic->stable_merge_bfqq) and the moment bfq_setup_merge() is called. This can happen e.g. because the process submitted IO for a different cgroup and thus bfqq got reparented. It can even happen that the bfqq we are merging with has parent cgroup that is already offline and going to be destroyed in which case the merge can lead to use-after-free issues such as: BUG: KASAN: use-after-free in __bfq_deactivate_entity+0x9cb/0xa50 Read of size 8 at addr ffff88800693c0c0 by task runc:[2:INIT]/10544 CPU: 0 PID: 10544 Comm: runc:[2:INIT] Tainted: G E 5.15.2-0.g5fb85fd-default #1 openSUSE Tumbleweed (unreleased) f1f3b891c72369aebecd2e43e4641a6358867c70 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0x46/0x5a print_address_description.constprop.0+0x1f/0x140 ? __bfq_deactivate_entity+0x9cb/0xa50 kasan_report.cold+0x7f/0x11b ? __bfq_deactivate_entity+0x9cb/0xa50 __bfq_deactivate_entity+0x9cb/0xa50 ? update_curr+0x32f/0x5d0 bfq_deactivate_entity+0xa0/0x1d0 bfq_del_bfqq_busy+0x28a/0x420 ? resched_curr+0x116/0x1d0 ? bfq_requeue_bfqq+0x70/0x70 ? check_preempt_wakeup+0x52b/0xbc0 __bfq_bfqq_expire+0x1a2/0x270 bfq_bfqq_expire+0xd16/0x2160 ? try_to_wake_up+0x4ee/0x1260 ? bfq_end_wr_async_queues+0xe0/0xe0 ? _raw_write_unlock_bh+0x60/0x60 ? _raw_spin_lock_irq+0x81/0xe0 bfq_idle_slice_timer+0x109/0x280 ? bfq_dispatch_request+0x4870/0x4870 __hrtimer_run_queues+0x37d/0x700 ? enqueue_hrtimer+0x1b0/0x1b0 ? kvm_clock_get_cycles+0xd/0x10 ? ktime_get_update_offsets_now+0x6f/0x280 hrtimer_interrupt+0x2c8/0x740 Fix the problem by checking that the parent of the two bfqqs we are merging in bfq_setup_merge() is the same. Link: https://lore.kernel.org/linux-block/20211125172809.GC19572@quack2.suse.cz/ CC: stable@vger.kernel.org Fixes: 430a67f9d616 ("block, bfq: merge bursts of newly-created queues") Tested-by: "yukuai (C)" <yukuai3@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220401102752.8599-2-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17bfq: Avoid false marking of bic as stably mergedJan Kara
bfq_setup_cooperator() can mark bic as stably merged even though it decides to not merge its bfqqs (when bfq_setup_merge() returns NULL). Make sure to mark bic as stably merged only if we are really going to merge bfqqs. CC: stable@vger.kernel.org Tested-by: "yukuai (C)" <yukuai3@huawei.com> Fixes: 430a67f9d616 ("block, bfq: merge bursts of newly-created queues") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220401102752.8599-1-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17pktcdvd: stop using bio_resetChristoph Hellwig
Just initialize the bios on-demand. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220406061228.410163-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17block: turn bio_kmalloc into a simple kmalloc wrapperChristoph Hellwig
Remove the magic autofree semantics and require the callers to explicitly call bio_init to initialize the bio. This allows bio_free to catch accidental bio_put calls on bio_init()ed bios as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Coly Li <colyli@suse.de> Acked-by: Mike Snitzer <snitzer@kernel.org> Link: https://lore.kernel.org/r/20220406061228.410163-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>