summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-11-18nvme: define the remaining used sgls constantsKeith Busch
This provides a little more context when reading the code than hardcoded magic numbers. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-18nvme-pci: add support for sgl metadataKeith Busch
Supporting this mode allows creating and merging multi-segment metadata requests that wouldn't be possible otherwise. It also allows directly using user space requests that straddle physically discontiguous pages. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-18nvme/multipath: Fix RCU list traversal to use SRCU primitiveBreno Leitao
The code currently uses list_for_each_entry_rcu() while holding an SRCU lock, triggering false positive warnings with CONFIG_PROVE_RCU=y enabled: drivers/nvme/host/multipath.c:168 RCU-list traversed in non-reader section!! drivers/nvme/host/multipath.c:227 RCU-list traversed in non-reader section!! drivers/nvme/host/multipath.c:260 RCU-list traversed in non-reader section!! While the list is properly protected by SRCU lock, the code uses the wrong list traversal primitive. Replace list_for_each_entry_rcu() with list_for_each_entry_srcu() to correctly indicate SRCU-based protection and eliminate the false warning. Signed-off-by: Breno Leitao <leitao@debian.org> Fixes: be647e2c76b2 ("nvme: use srcu for iterating namespace list") Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-18rust: block: simplify Result<()> in validate_block_size returnManas
`Result` is used in place of `Result<()>` because the default type parameters are unit `()` and `Error` types, which are automatically inferred. Thus keep the usage consistent throughout codebase. Suggested-by: Miguel Ojeda <ojeda@kernel.org> Link: https://github.com/Rust-for-Linux/linux/issues/1128 Signed-off-by: Manas <manas18244@iiitd.ac.in> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Link: https://lore.kernel.org/r/20241118-simplify-result-v3-1-6b1566a77eab@iiitd.ac.in Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-15Merge tag 'md-6.13-20241115' of ↵Jens Axboe
https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux into for-6.13/block Pull MD fixes from Song: "This set contains a fix for a W=1 warning, by John Garry, and a MAINTAINERS update." * tag 'md-6.13-20241115' of https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux: MAINTAINERS: Update git tree for mdraid subsystem md/raid5: Increase r5conf.cache_name size
2024-11-15MAINTAINERS: Update git tree for mdraid subsystemSong Liu
Moving the official git tree to the MDRAID Group account. Signed-off-by: Song Liu <song@kernel.org>
2024-11-15block: make struct rq_list available for !CONFIG_BLOCKJens Axboe
A previous commit changed how requests are linked in the plug structure, but unlike the previous method, it uses a new type for it rather than struct request. The latter is available even for !CONFIG_BLOCK, while struct rq_list is now. Move it outside CONFIG_BLOCK. Reported-by: Nathan Chancellor <nathan@kernel.org> Fixes: a3396b99990d ("block: add a rq_list type") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-13block/genhd: use seq_put_decimal_ull for diskstats decimal valuesDavid Wang
seq_printf is costly. For each block device, 19 decimal values are yielded in /proc/diskstats via seq_printf. On a system with 16 logical block devices, profiling for open/read/close sequences shows seq_printf took ~75% samples of diskstats_show: diskstats_show(92.626% 2269372/2450040) seq_printf(76.026% 1725313/2269372) vsnprintf(99.163% 1710866/1725313) format_decode(26.597% 455040/1710866) number(19.554% 334542/1710866) memcpy_orig(4.183% 71570/1710866) ... srso_return_thunk(0.009% 148/1725313) part_stat_read_all(8.030% 182236/2269372) One million rounds of open/read/close /proc/diskstats takes: real 0m37.687s user 0m0.264s sys 0m32.911s On average, each sequence tooks ~0.032ms With this patch, most decimal values are yield via seq_put_decimal_ull, performance is significantly improved: real 0m20.792s user 0m0.316s sys 0m20.463s On average, each sequence tooks ~0.020ms, a ~37.5% improvement. Signed-off-by: David Wang <00107082@163.com> Link: https://lore.kernel.org/r/20241108054500.4251-1-00107082@163.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-13block: don't reorder requests in blk_mq_add_to_batchChristoph Hellwig
LIFO ordering for batched completions is a bit unexpected and also defeats some merging optimizations in e.g. the XFS buffered write code. Now that we can easily add the request to the tail of the list do that. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241113152050.157179-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-13block: don't reorder requests in blk_add_rq_to_plugChristoph Hellwig
Add requests to the tail of the list instead of the front so that they are queued up in submission order. Remove the re-reordering in blk_mq_dispatch_plug_list, virtio_queue_rqs and nvme_queue_rqs now that the list is ordered as expected. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241113152050.157179-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-13block: add a rq_list typeChristoph Hellwig
Replace the semi-open coded request list helpers with a proper rq_list type that mirrors the bio_list and has head and tail pointers. Besides better type safety this actually allows to insert at the tail of the list, which will be useful soon. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241113152050.157179-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-13block: remove rq_list_moveChristoph Hellwig
Unused now. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241113152050.157179-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-13virtio_blk: reverse request order in virtio_queue_rqsChristoph Hellwig
blk_mq_flush_plug_list submits requests in the reverse order that they were submitted, which leads to a rather suboptimal I/O pattern especially in rotational devices. Fix this by rewriting virtio_queue_rqs so that it always pops the requests from the passed in request list, and then adds them to the head of a local submit list. This actually simplifies the code a bit as it removes the complicated list splicing, at the cost of extra updates of the rq_next pointer. As that should be cache hot anyway it should be an easy price to pay. Fixes: 0e9911fa768f ("virtio-blk: support mq_ops->queue_rqs()") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241113152050.157179-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-13nvme-pci: reverse request order in nvme_queue_rqsChristoph Hellwig
blk_mq_flush_plug_list submits requests in the reverse order that they were submitted, which leads to a rather suboptimal I/O pattern especially in rotational devices. Fix this by rewriting nvme_queue_rqs so that it always pops the requests from the passed in request list, and then adds them to the head of a local submit list. This actually simplifies the code a bit as it removes the complicated list splicing, at the cost of extra updates of the rq_next pointer. As that should be cache hot anyway it should be an easy price to pay. Fixes: d62cbcf62f2f ("nvme: add support for mq_ops->queue_rqs()") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241113152050.157179-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-13btrfs: validate queue limitsChristoph Hellwig
Call blk_validate_limits on the queue limits used for zone append splitting so that calculated values get filled in and any stacking conflicts get cought. Without this there isn't a max_zone_append_sectors limits as of commit 559218d43ec9 ("block: pre-calculate max_zone_append_sectors"). Fixes: 559218d43ec9 ("block: pre-calculate max_zone_append_sectors") Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20241113084541.34315-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-13block: export blk_validate_limitsChristoph Hellwig
While block drivers do the validation as part of committing them to the queue, users that use the limit outside of a block device context have to validate the limits and fill in the calculated values as well. So far btrfs is the only user of queue limits without a block device, and it has gotten away with that more or less by accident. But with commit 559218d43ec9 ("block: pre-calculate max_zone_append_sectors") this became fatal for setups that have small max zone append size, as it won't be limited now. Export blk_validate_limits so that it can be called directly from btrfs. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20241113084541.34315-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-13Merge tag 'nvme-6.13-2024-11-13' of git://git.infradead.org/nvme into ↵Jens Axboe
for-6.13/block Pull NVMe updates from Keith: "nvme updates for Linux 6.13 - Use uring_cmd helper (Pavel) - Host Memory Buffer allocation enhancements (Christoph) - Target persistent reservation support (Guixin) - Persistent reservation tracing (Guixen) - NVMe 2.1 specification support (Keith) - Rotational Meta Support (Matias, Wang, Keith) - Volatile cache detection enhancment (Guixen)" * tag 'nvme-6.13-2024-11-13' of git://git.infradead.org/nvme: (22 commits) nvmet: add tracing of reservation commands nvme: parse reservation commands's action and rtype to string nvmet: report ns's vwc not present nvme: check ns's volatile write cache not present nvme: add rotational support nvme: use command set independent id ns if available nvmet: support for csi identify ns nvmet: implement rotational media information log nvmet: implement endurance groups nvmet: declare 2.1 version compliance nvmet: implement crto property nvmet: implement supported features log nvmet: implement supported log pages nvmet: implement active command set ns list nvmet: implement id ns for nvm command set nvmet: support reservation feature nvme: add reservation command's defines nvme-core: remove repeated wq flags nvmet: make nvmet_wq visible in sysfs nvme-pci: use dma_alloc_noncontigous if possible ...
2024-11-13nvmet: add tracing of reservation commandsGuixin Liu
Add tracing of reservation commands, including register, acquire, release and report, and also parse the action and rtype to string to make the trace log more human-readable. Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-13nvme: parse reservation commands's action and rtype to stringGuixin Liu
Parse reservation commands's action(including rrega, racqa and rrela) and rtype to string to make the trace log more human-readable. Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-13nvmet: report ns's vwc not presentGuixin Liu
Currently, we report that controller has vwc even though the ns may not have vwc. Report ns's vwc not present when not buffered_io or backdev doesn't have vwc. Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-12md/raid5: Increase r5conf.cache_name sizeJohn Garry
For compiling with W=1, the following warning can be seen: drivers/md/raid5.c: In function ‘setup_conf’: drivers/md/raid5.c:2423:12: error: ‘%s’ directive output may be truncated writing up to 31 bytes into a region of size between 16 and 26 [-Werror=format-truncation=] "raid%d-%s", conf->level, mdname(conf->mddev)); ^~ drivers/md/raid5.c:2422:3: note: ‘snprintf’ output between 7 and 48 bytes into a destination of size 32 snprintf(conf->cache_name[0], namelen, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ "raid%d-%s", conf->level, mdname(conf->mddev)); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors Increase the array size to avoid this warning. Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241112161019.4154616-2-john.g.garry@oracle.com Signed-off-by: Song Liu <song@kernel.org>
2024-11-12block: remove the ioprio field from struct requestChristoph Hellwig
The request ioprio is only initialized from the first attached bio, so requests without a bio already never set it. Directly use the bio field instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20241112170050.1612998-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-12block: remove the write_hint field from struct requestChristoph Hellwig
The write_hint is only used for read/write requests, which must have a bio attached to them. Just use the bio field instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20241112170050.1612998-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-11nvme: check ns's volatile write cache not presentGuixin Liu
When the VWC of a namespace does not exist, the BLK_FEAT_WRITE_CACHE flag should not be set when registering the block device, regardless of whether the controller supports VWC. Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-11nvme: add rotational supportWang Yugui
Rotational devices, such as hard-drives, can be detected using the rotational bit in the namespace independent identify namespace data structure. Make the bit visible to the block layer through the rotational queue setting. Signed-off-by: Wang Yugui <wangyugui@e16-tech.com> Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-11nvme: use command set independent id ns if availableMatias Bjørling
The NVMe 2.0 specification adds an independent identify namespace data structure that contains generic attributes that apply to all namespace types. Some attributes carry over from the NVM command set identify namespace data structure, and others are new. Currently, the data structure only considered when CRIMS is enabled or when the namespace type is key-value. However, the independent namespace data structure is mandatory for devices that implement features from the 2.0+ specification. Therefore, we can check this data structure first. If unavailable, retrieve the generic attributes from the NVM command set identify namespace data structure. Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-11nvmet: support for csi identify nsKeith Busch
Implements reporting the I/O Command Set Independent Identify Namespace command. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-11nvmet: implement rotational media information logKeith Busch
Most of the information is stubbed. Supporting these commands is a requirement for supporting rotational media. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-11nvmet: implement endurance groupsKeith Busch
Most of the returned information is just stubbed data. The target must support these in order to report rotational media. Since this driver doesn't know any better, each namespace is its own endurance group with the engid value matching the nsid. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-11nvmet: declare 2.1 version complianceKeith Busch
The target driver implements all the mandatory logs, identifications, features, and properties up to nvme sepcification 2.1. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-11nvmet: implement crto propertyKeith Busch
This property is required for nvme 2.1. The target only supports ready with media, so this is just the same value as CAP.TO. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-11nvmet: implement supported features logKeith Busch
This log is required for nvme 2.1. Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-11nvmet: implement supported log pagesKeith Busch
This log is required for nvme 2.1. Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-11nvmet: implement active command set ns listKeith Busch
This is required for nvme 2.1 for targets that support multiple command sets. We support NVM and ZNS, so are required to support this identification. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-11nvmet: implement id ns for nvm command setKeith Busch
We don't report anything here, but it's a mandatory identification for nvme 2.1. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-11nvmet: support reservation featureGuixin Liu
This patch implements the reservation feature, including: 1. reservation register(register, unregister and replace). 2. reservation acquire(acquire, preempt, preempt and abort). 3. reservation release(release and clear). 4. reservation report. 5. set feature and get feature of reservation notify mask. 6. get log page of reservation event. Not supported: 1. persistent reservation through power loss. Test cases: Use nvme-cli and fio to test all implemented sub features: 1. use nvme resv-register to register host a registrant or unregister or replace a new key. 2. use nvme resv-acquire to set host to the holder, and use fio to send read and write io in all reservation type. And also test preempt and "preempt and abort". 3. use nvme resv-report to show all registrants and reservation status. 4. use nvme resv-release to release all registrants. 5. use nvme get-log to get events generated by the preceding operations. In addition, make reservation configurable, one can set ns to support reservation before enable ns. The default of resv_enable is false. Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Reviewed-by: Dmitry Bogdanov <d.bogdanov@yadro.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-11nvme-multipath: don't bother clearing max_hw_zone_append_sectorsChristoph Hellwig
The limits stacking now properly zeroes it if at least one of the underlying limits clears it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20241108154657.845768-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-11block: pre-calculate max_zone_append_sectorsChristoph Hellwig
max_zone_append_sectors differs from all other queue limits in that the final value used is not stored in the queue_limits but needs to be obtained using queue_limits_max_zone_append_sectors helper. This not only adds (tiny) extra overhead to the I/O path, but also can be easily forgotten in file system code. Add a new max_hw_zone_append_sectors value to queue_limits which is set by the driver, and calculate max_zone_append_sectors from that and the other inputs in blk_validate_zoned_limits, similar to how max_sectors is calculated to fix this. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241104073955.112324-3-hch@lst.de Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20241108154657.845768-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-11block: lift bio_is_zone_append to bio.hChristoph Hellwig
Make bio_is_zone_append globally available, because file systems need to use to check for a zone append bio in their end_io handlers to deal with the block layer emulation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20241104062647.91160-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-11block: fix bio_split_rw_at to take zone_write_granularity into accountChristoph Hellwig
Otherwise it can create unaligned writes on zoned devices. Fixes: a805a4fa4fa3 ("block: introduce zone_write_granularity limit") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20241104062647.91160-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-11block: take chunk_sectors into account in bio_split_write_zeroesChristoph Hellwig
For zoned devices, write zeroes must be split at the zone boundary which is represented as chunk_sectors. For other uses like the internally RAIDed NVMe devices it is probably at least useful. Enhance get_max_io_size to know about write zeroes and use it in bio_split_write_zeroes. Also add a comment about the seemingly nonsensical zero max_write_zeroes limit. Fixes: 885fa13f6559 ("block: implement splitting of REQ_OP_WRITE_ZEROES bios") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20241104062647.91160-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-11md/raid10: Handle bio_split() errorsJohn Garry
Add proper bio_split() error handling. For any error, call raid_end_bio_io() and return. Except for discard, where we end the bio directly. Reviewed-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241111112150.3756529-7-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-11md/raid1: Handle bio_split() errorsJohn Garry
Add proper bio_split() error handling. For any error, call raid_end_bio_io() and return. For the case of an in the write path, we need to undo the increment in the rdev pending count and NULLify the r1_bio->bios[] pointers. For read path failure, we need to undo rdev pending count increment from the earlier read_balance() call. Reviewed-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241111112150.3756529-6-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-11md/raid0: Handle bio_split() errorsJohn Garry
Add proper bio_split() error handling. For any error, set bi_status, end the bio, and return. Reviewed-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241111112150.3756529-5-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-11block: Handle bio_split() errors in bio_submit_split()John Garry
bio_split() may error, so check this. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241111112150.3756529-4-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-11block: Error an attempt to split an atomic write in bio_split()John Garry
This is disallowed. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241111112150.3756529-3-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-11block: Rework bio_split() return valueJohn Garry
Instead of returning an inconclusive value of NULL for an error in calling bio_split(), return a ERR_PTR() always. Also remove the BUG_ON() calls, and WARN_ON_ONCE() instead. Indeed, since almost all callers don't check the return code from bio_split(), we'll crash anyway (for those failures). Fix up the only user which checks bio_split() return code today (directly or indirectly), blk_crypto_fallback_split_bio_if_needed(). The md/bcache code does check the return code in cached_dev_cache_miss() -> bio_next_split() -> bio_split(), but only to see if there was a split, so there would be no change in behaviour here (when returning a ERR_PTR()). Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241111112150.3756529-2-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-11ublk: fix ublk_ch_mmap() for 64K page sizeMing Lei
In ublk_ch_mmap(), queue id is calculated in the following way: (vma->vm_pgoff << PAGE_SHIFT) / `max_cmd_buf_size` 'max_cmd_buf_size' is equal to `UBLK_MAX_QUEUE_DEPTH * sizeof(struct ublksrv_io_desc)` and UBLK_MAX_QUEUE_DEPTH is 4096 and part of UAPI, so 'max_cmd_buf_size' is always page aligned in 4K page size kernel. However, it isn't true in 64K page size kernel. Fixes the issue by always rounding up 'max_cmd_buf_size' with PAGE_SIZE. Cc: stable@vger.kernel.org Fixes: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241111110718.1394001-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-09s390/dasd: Fix typo in commentYu Jiaoliang
Fix typo in comment: requeust->request, Removve->Remove, notthing->nothing. Signed-off-by: Yu Jiaoliang <yujiaoliang@vivo.com> Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20241108133913.3068782-3-sth@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-09s390/dasd: fix redundant /proc/dasd* entries removalMiroslav Franc
In case of an early failure in dasd_init, dasd_proc_init is never called and /proc/dasd* files are never created. That can happen, for example, if an incompatible or incorrect argument is provided to the dasd_mod.dasd= kernel parameter. However, the attempted removal of /proc/dasd* files causes 8 warnings and backtraces in this case. 4 on the error path within dasd_init and 4 when the dasd module is unloaded. Notice the "removing permanent /proc entry 'devices'" message that is caused by the dasd_proc_exit function trying to remove /proc/devices instead of /proc/dasd/devices since dasd_proc_root_entry is NULL and /proc/devices is indeed permanent. Example: ------------[ cut here ]------------ removing permanent /proc entry 'devices' WARNING: CPU: 6 PID: 557 at fs/proc/generic.c:701 remove_proc_entry+0x22e/0x240 CPU: 6 PID: 557 Comm: modprobe Not tainted 6.10.5-1-default #1 openSUSE Tumbleweed f6917bfd6e5a5c7a7e900e0e3b517786fb5c6301 Hardware name: QEMU 8561 QEMU (KVM/Linux) Krnl PSW : 0704c00180000000 000003fffed0e9f2 (remove_proc_entry+0x232/0x240) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 Krnl GPRS: 000003ff00000027 000003ff00000023 0000000000000028 000002f200000000 000002f3f05bec20 0000037ffecfb7d0 000003ffffdabab0 000003ff7ee4ec72 000003ff7ee4ec72 0000000000000007 000002f280e22600 000002f280e22688 000003ffa252cfa0 0000000000010000 000003fffed0e9ee 0000037ffecfba38 Krnl Code: 000003fffed0e9e2: c020004e7017 larl %r2,000003ffff6dca10 000003fffed0e9e8: c0e5ffdfad24 brasl %r14,000003fffe904430 #000003fffed0e9ee: af000000 mc 0,0 >000003fffed0e9f2: a7f4ff4c brc 15,000003fffed0e88a 000003fffed0e9f6: 0707 bcr 0,%r7 000003fffed0e9f8: 0707 bcr 0,%r7 000003fffed0e9fa: 0707 bcr 0,%r7 000003fffed0e9fc: 0707 bcr 0,%r7 Call Trace: [<000003fffed0e9f2>] remove_proc_entry+0x232/0x240 ([<000003fffed0e9ee>] remove_proc_entry+0x22e/0x240) [<000003ff7ef5a084>] dasd_proc_exit+0x34/0x60 [dasd_mod] [<000003ff7ef560c2>] dasd_exit+0x22/0xc0 [dasd_mod] [<000003ff7ee5a26e>] dasd_init+0x26e/0x280 [dasd_mod] [<000003fffe8ac9d0>] do_one_initcall+0x40/0x220 [<000003fffe9bc758>] do_init_module+0x78/0x260 [<000003fffe9bf3a6>] __do_sys_init_module+0x216/0x250 [<000003ffff37ac9e>] __do_syscall+0x24e/0x2d0 [<000003ffff38cca8>] system_call+0x70/0x98 Last Breaking-Event-Address: [<000003fffef7ea20>] __s390_indirect_jump_r14+0x0/0x10 ---[ end trace 0000000000000000 ]--- ------------[ cut here ]------------ While the cause is a user failure, the dasd module should handle the situation more gracefully. One of the simplest solutions is to make removal of the /proc/dasd* entries idempotent. Signed-off-by: Miroslav Franc <mfranc@suse.cz> [ sth: shortened if clause ] Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20241108133913.3068782-2-sth@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>