summaryrefslogtreecommitdiff
path: root/arch/um/drivers/ubd_kern.c
AgeCommit message (Collapse)Author
2024-12-23block: remove BLK_MQ_F_SHOULD_MERGEChristoph Hellwig
BLK_MQ_F_SHOULD_MERGE is set for all tag_sets except those that purely process passthrough commands (bsg-lib, ufs tmf, various nvme admin queues) and thus don't even check the flag. Remove it to simplify the driver interface. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241219060214.1928848-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-07um: ubd: Do not use drvdata in releaseTiwei Bie
The drvdata is not available in release. Let's just use container_of() to get the ubd instance. Otherwise, removing a ubd device will result in a crash: RIP: 0033:blk_mq_free_tag_set+0x1f/0xba RSP: 00000000e2083bf0 EFLAGS: 00010246 RAX: 000000006021463a RBX: 0000000000000348 RCX: 0000000062604d00 RDX: 0000000004208060 RSI: 00000000605241a0 RDI: 0000000000000348 RBP: 00000000e2083c10 R08: 0000000062414010 R09: 00000000601603f7 R10: 000000000000133a R11: 000000006038c4bd R12: 0000000000000000 R13: 0000000060213a5c R14: 0000000062405d20 R15: 00000000604f7aa0 Kernel panic - not syncing: Segfault with no mm CPU: 0 PID: 17 Comm: kworker/0:1 Not tainted 6.8.0-rc3-00107-gba3f67c11638 #1 Workqueue: events mc_work_proc Stack: 00000000 604f7ef0 62c5d000 62405d20 e2083c30 6002c776 6002c755 600e47ff e2083c60 6025ffe3 04208060 603d36e0 Call Trace: [<6002c776>] ubd_device_release+0x21/0x55 [<6002c755>] ? ubd_device_release+0x0/0x55 [<600e47ff>] ? kfree+0x0/0x100 [<6025ffe3>] device_release+0x70/0xba [<60381d6a>] kobject_put+0xb5/0xe2 [<6026027b>] put_device+0x19/0x1c [<6026a036>] platform_device_put+0x26/0x29 [<6026ac5a>] platform_device_unregister+0x2c/0x2e [<6002c52e>] ubd_remove+0xb8/0xd6 [<6002bb74>] ? mconsole_reply+0x0/0x50 [<6002b926>] mconsole_remove+0x160/0x1cc [<6002bbbc>] ? mconsole_reply+0x48/0x50 [<6003379c>] ? um_set_signals+0x3b/0x43 [<60061c55>] ? update_min_vruntime+0x14/0x70 [<6006251f>] ? dequeue_task_fair+0x164/0x235 [<600620aa>] ? update_cfs_group+0x0/0x40 [<603a0e77>] ? __schedule+0x0/0x3ed [<60033761>] ? um_set_signals+0x0/0x43 [<6002af6a>] mc_work_proc+0x77/0x91 [<600520b4>] process_scheduled_works+0x1af/0x2c3 [<6004ede3>] ? assign_work+0x0/0x58 [<600527a1>] worker_thread+0x2f7/0x37a [<6004ee3b>] ? set_pf_worker+0x0/0x64 [<6005765d>] ? arch_local_irq_save+0x0/0x2d [<60058e07>] ? kthread_exit+0x0/0x3a [<600524aa>] ? worker_thread+0x0/0x37a [<60058f9f>] kthread+0x130/0x135 [<6002068e>] new_thread_handler+0x85/0xb6 Cc: stable@vger.kernel.org Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Link: https://patch.msgid.link/20241104163203.435515-3-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-11-07um: ubd: Initialize ubd's disk pointer in ubd_addTiwei Bie
Currently, the initialization of the disk pointer in the ubd structure is missing. It should be initialized with the allocated gendisk pointer in ubd_add(). Fixes: 32621ad7a7ea ("ubd: remove the ubd_gendisk array") Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Link: https://patch.msgid.link/20241104163203.435515-2-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-25um: Set parent-death signal for ubd io thread/processTiwei Bie
The ubd io thread is not really a traditional thread. Set the parent-death signal for it to ensure that it will be killed if the UML kernel dies unexpectedly without proper cleanup. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20241024142828.2612828-3-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-25Merge tag 'uml-for-linus-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull UML updates from Richard Weinberger: - Support for preemption - i386 Rust support - Huge cleanup by Benjamin Berg - UBSAN support - Removal of dead code * tag 'uml-for-linus-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (41 commits) um: vector: always reset vp->opened um: vector: remove vp->lock um: register power-off handler um: line: always fill *error_out in setup_one_line() um: remove pcap driver from documentation um: Enable preemption in UML um: refactor TLB update handling um: simplify and consolidate TLB updates um: remove force_flush_all from fork_handler um: Do not flush MM in flush_thread um: Delay flushing syscalls until the thread is restarted um: remove copy_context_skas0 um: remove LDT support um: compress memory related stub syscalls while adding them um: Rework syscall handling um: Add generic stub_syscall6 function um: Create signal stack memory assignment in stub_data um: Remove stub-data.h include from common-offsets.h um: time-travel: fix signal blocking race/hang um: time-travel: remove time_exit() ...
2024-07-03um: refactor TLB update handlingBenjamin Berg
Conceptually, we want the memory mappings to always be up to date and represent whatever is in the TLB. To ensure that, we need to sync them over in the userspace case and for the kernel we need to process the mappings. The kernel will call flush_tlb_* if page table entries that were valid before become invalid. Unfortunately, this is not the case if entries are added. As such, change both flush_tlb_* and set_ptes to track the memory range that has to be synchronized. For the kernel, we need to execute a flush_tlb_kern_* immediately but we can wait for the first page fault in case of set_ptes. For userspace in contrast we only store that a range of memory needs to be synced and do so whenever we switch to that process. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240703134536.1161108-13-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03ubd: Remove unused mutex 'ubd_mutex'Dr. David Alan Gilbert
Commit fb5d1d389c9e ("ubd: open the backing files in ubd_add") removed the last use of ubd_mutex. Remove it. Build and kernel startup test only. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20240505001508.255096-1-linux@treblig.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-06-19block: move the nonrot flag to queue_limitsChristoph Hellwig
Move the nonrot flag into the queue_limits feature field so that it can be set atomically with the queue frozen. Use the chance to switch to defaulting to non-rotational and require the driver to opt into rotational, which matches the polarity of the sysfs interface. For the z2ram, ps3vram, 2x memstick, ubiblock and dcssblk the new rotational flag is not set as they clearly are not rotational despite this being a behavior change. There are some other drivers that unconditionally set the rotational flag to keep the existing behavior as they arguably can be used on rotational devices even if that is probably not their main use today (e.g. virtio_blk and drbd). The flag is automatically inherited in blk_stack_limits matching the existing behavior in dm and md. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240617060532.127975-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-19block: move cache control settings out of queue->flagsChristoph Hellwig
Move the cache control settings into the queue_limits so that the flags can be set atomically with the device queue frozen. Add new features and flags field for the driver set flags, and internal (usually sysfs-controlled) flags in the block layer. Note that we'll eventually remove enough field from queue_limits to bring it back to the previous size. The disable flag is inverted compared to the previous meaning, which means it now survives a rescan, similar to the max_sectors and max_discard_sectors user limits. The FLUSH and FUA flags are now inherited by blk_stack_limits, which simplified the code in dm a lot, but also causes a slight behavior change in that dm-switch and dm-unstripe now advertise a write cache despite setting num_flush_bios to 0. The I/O path will handle this gracefully, but as far as I can tell the lack of num_flush_bios and thus flush support is a pre-existing data integrity bug in those targets that really needs fixing, after which a non-zero num_flush_bios should be required in dm for targets that map to underlying devices. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240617060532.127975-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-14block: add special APIs for run-time disabling of discard and friendsChristoph Hellwig
A few drivers optimistically try to support discard, write zeroes and secure erase and disable the features from the I/O completion handler if the hardware can't support them. This disable can't be done using the atomic queue limits API because the I/O completion handlers can't take sleeping locks or freeze the queue. Keep the existing clearing of the relevant field to zero, but replace the old blk_queue_max_* APIs with new disable APIs that force the value to 0. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240531074837.1648501-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-14ubd: untagle discard vs write zeroes not support handlingChristoph Hellwig
Discard and Write Zeroes are different operation and implemented by different fallocate opcodes for ubd. If one fails the other one can work and vice versa. Split the code to disable the operations in ubd_handler to only disable the operation that actually failed. Fixes: 50109b5a03b4 ("um: Add support for DISCARD in the UBD Driver") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Link: https://lore.kernel.org/r/20240531074837.1648501-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-14ubd: refactor the interrupt handlerChristoph Hellwig
Instead of a separate handler function that leaves no work in the interrupt hanler itself, split out a per-request end I/O helper and clean up the coding style and variable naming while we're at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Link: https://lore.kernel.org/r/20240531074837.1648501-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-04-22um: Fix return value in ubd_init()Duoming Zhou
When kmalloc_array() fails to allocate memory, the ubd_init() should return -ENOMEM instead of -1. So, fix it. Fixes: f88f0bdfc32f ("um: UBD Improvements") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-02-27ubd: open the backing files in ubd_addChristoph Hellwig
Opening the backing device only when the block device is opened is a bit weird as no one configures block devices to not use them. Opend them at add time, close them at remove time and remove the now superflous opened counter as remove can simply check for disk_openers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Richard Weinberger <richard@nod.at> Link: https://lore.kernel.org/r/20240222072417.3773131-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-27ubd: remove the queue pointer in struct ubdChristoph Hellwig
No need for it now, everything goes through the gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Richard Weinberger <richard@nod.at> Link: https://lore.kernel.org/r/20240222072417.3773131-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-27ubd: move set_disk_ro to ubd_addChristoph Hellwig
No need to delay this until open time. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Richard Weinberger <richard@nod.at> Link: https://lore.kernel.org/r/20240222072417.3773131-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-27ubd: move setting the variable queue limits to ubd_addChristoph Hellwig
No reason to delay this until open time. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Richard Weinberger <richard@nod.at> Link: https://lore.kernel.org/r/20240222072417.3773131-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-27ubd: move setting the nonrot flag to ubd_addChristoph Hellwig
No reason to delay this until open time. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Richard Weinberger <richard@nod.at> Link: https://lore.kernel.org/r/20240222072417.3773131-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-27ubd: remove ubd_disk_registerChristoph Hellwig
Fold it into the only caller to remove lots of references to the global ubd_devs array. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Richard Weinberger <richard@nod.at> Link: https://lore.kernel.org/r/20240222072417.3773131-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-27ubd: remove the ubd_gendisk arrayChristoph Hellwig
And add a disk pointer to the ubd structure instead to keep all the per-device information together. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Richard Weinberger <richard@nod.at> Link: https://lore.kernel.org/r/20240222072417.3773131-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-19ubd: pass queue_limits to blk_mq_alloc_diskChristoph Hellwig
Pass the few limits ubd imposes directly to blk_mq_alloc_disk instead of setting them one at a time. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240215070300.2200308-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-13block: pass a queue_limits argument to blk_mq_alloc_diskChristoph Hellwig
Pass a queue_limits to blk_mq_alloc_disk and apply it if non-NULL. This will allow allocating queues with valid queue limits instead of setting the values one at a time later. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240213073425.1621680-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-29ubd: use the default discard granularityChristoph Hellwig
The discard granularity now defaults to a single sector, so don't set that value explicitly. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Richard Weinberger <richard@nod.at> Link: https://lore.kernel.org/r/20231228075545.362768-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12block: replace fmode_t with a block-specific type for block open flagsChristoph Hellwig
The only overlap between the block open flags mapped into the fmode_t and other uses of fmode_t are FMODE_READ and FMODE_WRITE. Define a new blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and ->ioctl and stop abusing fmode_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230608110258.189493-28-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12ubd: remove commented out code in ubd_openChristoph Hellwig
This code has been dead forever, make sure it doesn't show up in code searches. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Richard Weinberger <richard@nod.at> Link: https://lore.kernel.org/r/20230608110258.189493-25-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12block: remove the unused mode argument to ->releaseChristoph Hellwig
The mode argument to the ->release block_device_operation is never used, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Link: https://lore.kernel.org/r/20230608110258.189493-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12block: pass a gendisk to ->openChristoph Hellwig
->open is only called on the whole device. Make that explicit by passing a gendisk instead of the block_device. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Link: https://lore.kernel.org/r/20230608110258.189493-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-19um: Do not initialise statics to 0.Xin Gao
do not initialise statics to 0. Signed-off-by: Xin Gao <gaoxin@cdjrlc.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2022-07-14um: Use enum req_op where appropriateBart Van Assche
Improve static type checking by using type enum req_op instead of int where appropriate. Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-21-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-28block: remove blk_cleanup_diskChristoph Hellwig
blk_cleanup_disk is nothing but a trivial wrapper for put_disk now, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20220619060552.1850436-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-03ubd: don't set the discard_alignment queue limitChristoph Hellwig
The discard_alignment queue limit is named a bit misleading means the offset into the block device at which the discard granularity starts. Setting it to the discard granularity as done by ubd is mostly harmless but also useless. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220418045314.360785-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17block: remove QUEUE_FLAG_DISCARDChristoph Hellwig
Just use a non-zero max_discard_sectors as an indicator for discard support, similar to what is done for write zeroes. The only places where needs special attention is the RAID5 driver, which must clear discard support for security reasons by default, even if the default stacking rules would allow for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Jan Höppner <hoeppner@linux.ibm.com> [s390] Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-25-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-11um: Fix WRITE_ZEROES in the UBD DriverFrédéric Danis
Call to fallocate with FALLOC_FL_PUNCH_HOLE on a device backed by a sparse file can end up by missing data, zeroes data range, if the underlying file is used with a tool like bmaptool which will referenced only used spaces. Signed-off-by: Frédéric Danis <frederic.danis@collabora.com> Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-10-21um/drivers/ubd_kern: add error handling support for add_disk()Luis Chamberlain
We never checked for errors on add_disk() as this function returned void. Now that this is fixed, use the shiny new error handling. ubd_disk_register() never returned an error, so just fix that now and let the caller handle the error condition. Reviewed-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20211015233028.2167651-8-mcgrof@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: drop unused includes in <linux/genhd.h>Christoph Hellwig
Drop various include not actually used in genhd.h itself, and move the remaning includes closer together. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210920123328.1399408-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-16ubd: use bvec_virtChristoph Hellwig
Use bvec_virt instead of open coding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Link: https://lore.kernel.org/r/20210804095634.460779-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-09Merge tag 'block-5.14-2021-07-08' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull more block updates from Jens Axboe: "A combination of changes that ended up depending on both the driver and core branch (and/or the IDE removal), and a few late arriving fixes. In detail: - Fix io ticks wrap-around issue (Chunguang) - nvme-tcp sock locking fix (Maurizio) - s390-dasd fixes (Kees, Christoph) - blk_execute_rq polling support (Keith) - blk-cgroup RCU iteration fix (Yu) - nbd backend ID addition (Prasanna) - Partition deletion fix (Yufen) - Use blk_mq_alloc_disk for mmc, mtip32xx, ubd (Christoph) - Removal of now dead block request types due to IDE removal (Christoph) - Loop probing and control device cleanups (Christoph) - Device uevent fix (Christoph) - Misc cleanups/fixes (Tetsuo, Christoph)" * tag 'block-5.14-2021-07-08' of git://git.kernel.dk/linux-block: (34 commits) blk-cgroup: prevent rcu_sched detected stalls warnings while iterating blkgs block: fix the problem of io_ticks becoming smaller nvme-tcp: can't set sk_user_data without write_lock loop: remove unused variable in loop_set_status() block: remove the bdgrab in blk_drop_partitions block: grab a device refcount in disk_uevent s390/dasd: Avoid field over-reading memcpy() dasd: unexport dasd_set_target_state block: check disk exist before trying to add partition ubd: remove dead code in ubd_setup_common nvme: use return value from blk_execute_rq() block: return errors from blk_execute_rq() nvme: use blk_execute_rq() for passthrough commands block: support polling through blk_execute_rq block: remove REQ_OP_SCSI_{IN,OUT} block: mark blk_mq_init_queue_data static loop: rewrite loop_exit using idr_for_each_entry loop: split loop_lookup loop: don't allow deleting an unspecified loop device loop: move loop_ctl_mutex locking into loop_add ...
2021-06-30ubd: remove dead code in ubd_setup_commonChristoph Hellwig
Remove some leftovers of the fake major number parsing that cause complains from some compilers. Fixes: 2933a1b2c6f3 ("ubd: remove the code to register as the legacy IDE driver") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210628093937.1325608-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30ubd: use blk_mq_alloc_disk and blk_cleanup_diskChristoph Hellwig
Use blk_mq_alloc_disk and blk_cleanup_disk to simplify the gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210614060759.3965724-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30ubd: remove the code to register as the legacy IDE driverChristoph Hellwig
With the legacy IDE driver long deprecated, and modern userspace being much more flexible about dev_t assignments there is no reason to fake a registration as the legacy IDE driver in ubd. This registeration is a little problematic as it registers the same request_queue for multiple gendisks, so just remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Link: https://lore.kernel.org/r/20210614060759.3965724-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-17um: Fix stack pointer alignmentYiFei Zhu
GCC assumes that stack is aligned to 16-byte on call sites [1]. Since GCC 8, GCC began using 16-byte aligned SSE instructions to implement assignments to structs on stack. When CC_OPTIMIZE_FOR_PERFORMANCE is enabled, this affects os-Linux/sigio.c, write_sigio_thread: struct pollfds *fds, tmp; tmp = current_poll; Note that struct pollfds is exactly 16 bytes in size. GCC 8+ generates assembly similar to: movdqa (%rdi),%xmm0 movaps %xmm0,-0x50(%rbp) This is an issue, because movaps will #GP if -0x50(%rbp) is not aligned to 16 bytes [2], and how rbp gets assigned to is via glibc clone thread_start, then function prologue, going though execution trace similar to (showing only relevant instructions): sub $0x10,%rsi mov %rcx,0x8(%rsi) mov %rdi,(%rsi) syscall pop %rax pop %rdi callq *%rax push %rbp mov %rsp,%rbp The stack pointer always points to the topmost element on stack, rather then the space right above the topmost. On push, the pointer decrements first before writing to the memory pointed to by it. Therefore, there is no need to have the stack pointer pointer always point to valid memory unless the stack is poped; so the `- sizeof(void *)` in the code is unnecessary. On the other hand, glibc reserves the 16 bytes it needs on stack and pops itself, so by the call instruction the stack pointer is exactly the caller-supplied sp. It then push the 16 bytes of the return address and the saved stack pointer, so the base pointer will be 16-byte aligned if and only if the caller supplied sp is 16-byte aligned. Therefore, the caller must supply a 16-byte aligned pointer, which `stack + UM_KERN_PAGE_SIZE` already satisfies. On a side note, musl is unaffected by this issue because it forces 16 byte alignment via `and $-16,%rsi` in its clone wrapper. Similarly, glibc i386 is also unaffected because it has `andl $0xfffffff0, %ecx`. To reproduce this bug, enable CONFIG_UML_RTC and CC_OPTIMIZE_FOR_PERFORMANCE. uml_rtc will call add_sigio_fd which will then cause write_sigio_thread to either go into segfault loop or panic with "Segfault with no mm". Similarly, signal stacks will be aligned by the host kernel upon signal delivery. `- sizeof(void *)` to sigaltstack is unconventional and extraneous. On a related note, initialization of longjmp buffers do require `- sizeof(void *)`. This is to account for the return address that would have been pushed to the stack at the call site. The reason for uml to respect 16-byte alignment, rather than telling GCC to assume 8-byte alignment like the host kernel since commit d9b0cde91c60 ("x86-64, gcc: Use -mpreferred-stack-boundary=3 if supported"), is because uml links against libc. There is no reason to assume libc is also compiled with that flag and assumes 8-byte alignment rather than 16-byte. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838 [2] https://c9x.me/x86/html/file_module_x86_id_180.html Signed-off-by: YiFei Zhu <zhuyifei1999@gmail.com> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-01-26Revert "um: allocate a guard page to helper threads"Johannes Berg
This reverts commit ef4459a6da09 ("um: allocate a guard page to helper threads"), it's broken in multiple ways: 1) the free no longer matches the alloc; and 2) more importantly, the set_memory_ro() causes allocation of page tables for the normal memory that doesn't have any, and that later causes corruption and crashes (usually but not always in vfree()). We could fix the first bug and use vmalloc() to work around the second, but set_memory_ro() actually doesn't do anything either so I'll just revert that as well. Reported-by: Benjamin Berg <benjamin@sipsolutions.net> Fixes: ef4459a6da09 ("um: allocate a guard page to helper threads") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-01-26um: ubd: fix command line handling of ubdHajime Tazaki
This commit fixes a regression to handle command line parameters of ubd. With a simple line "./linux ubd0="./disk-ext4.img", it fails at ubd_setup_common(). The commit adds additional checks to the variables in order to properly parse the paremeters which previously worked. Fixes: ef3ba87cb7c9 ("um: ubd: Set device serial attribute from cmdline") Cc: Christopher Obbard <chris.obbard@collabora.com> Signed-off-by: Hajime Tazaki <thehajime@gmail.com> Acked-by: Christopher Obbard <chris.obbard@collabora.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: allocate a guard page to helper threadsJohannes Berg
We've been running into stack overflows in helper threads corrupting memory (e.g. because somebody put printf() or os_info() there), so to avoid those causing hard-to-debug issues later on, allocate a guard page for helper thread stacks and mark it read-only. Unfortunately, the crash dump at that point is useless as the stack tracer will try to backtrace the *kernel* thread, not the helper thread, but at least we don't survive to a random issue caused by corruption. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: Support dynamic IRQ allocationJohannes Berg
It's cumbersome and error-prone to keep adding fixed IRQ numbers, and for proper device wakeup support for the virtio/vhost-user support we need to have different IRQs for each device. Even if in theory two IRQs (with and without wake) might be sufficient, it's much easier to reason about it when we have dynamic number assignment. It also makes it easier to add new devices that may dynamically exist or depending on the configuration, etc. Add support for this, up to 64 IRQs (the same limit as epoll FDs we have right now). Since it's not easy to port all the existing places to dynamic allocation (some data is statically initialized) keep the low numbers are reserved for the existing hard-coded IRQ numbers. Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: ubd: Set device serial attribute from cmdlineChristopher Obbard
Adds the ability to set the UBD device serial number from the commandline, disabling the serial number functionality by default. In some cases it may be useful to set a serial to the UBD device, such that downstream users (i.e. udev) can use this information to better describe the hardware to the user from the UML cmdline. In our case we use this parameter to create some entries under /dev/disk/by-ubd-id/ for each of the UBD devices passed through the UML cmdline. Signed-off-by: Christopher Obbard <chris.obbard@collabora.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: ubd: Submit all data segments atomicallyGabriel Krisman Bertazi
Internally, UBD treats each physical IO segment as a separate command to be submitted in the execution pipe. If the pipe returns a transient error after a few segments have already been written, UBD will tell the block layer to requeue the request, but there is no way to reclaim the segments already submitted. When a new attempt to dispatch the request is done, those segments already submitted will get duplicated, causing the WARN_ON below in the best case, and potentially data corruption. In my system, running a UML instance with 2GB of RAM and a 50M UBD disk, I can reproduce the WARN_ON by simply running mkfs.fvat against the disk on a freshly booted system. There are a few ways to around this, like reducing the pressure on the pipe by reducing the queue depth, which almost eliminates the occurrence of the problem, increasing the pipe buffer size on the host system, or by limiting the request to one physical segment, which causes the block layer to submit way more requests to resolve a single operation. Instead, this patch modifies the format of a UBD command, such that all segments are sent through a single element in the communication pipe, turning the command submission atomic from the point of view of the block layer. The new format has a variable size, depending on the number of elements, and looks like this: +------------+-----------+-----------+------------ | cmd_header | segment 0 | segment 1 | segment ... +------------+-----------+-----------+------------ With this format, we push a pointer to cmd_header in the submission pipe. This has the advantage of reducing the memory footprint of executing a single request, since it allow us to merge some fields in the header. It is possible to reduce even further each segment memory footprint, by merging bitmap_words and cow_offset, for instance, but this is not the focus of this patch and is left as future work. One issue with the patch is that for a big number of segments, we now perform one big memory allocation instead of multiple small ones, but I wasn't able to trigger any real issues or -ENOMEM because of this change, that wouldn't be reproduced otherwise. This was tested using fio with the verify-crc32 option, and by running an ext4 filesystem over this UBD device. The original WARN_ON was: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at lib/refcount.c:28 refcount_warn_saturate+0x13f/0x141 refcount_t: underflow; use-after-free. Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 5.5.0-rc6-00002-g2a5bb2cf75c8 #346 Stack: 6084eed0 6063dc77 00000009 6084ef60 00000000 604b8d9f 6084eee0 6063dcbc 6084ef40 6006ab8d e013d780 1c00000000 Call Trace: [<600a0c1c>] ? printk+0x0/0x94 [<6004a888>] show_stack+0x13b/0x155 [<6063dc77>] ? dump_stack_print_info+0xdf/0xe8 [<604b8d9f>] ? refcount_warn_saturate+0x13f/0x141 [<6063dcbc>] dump_stack+0x2a/0x2c [<6006ab8d>] __warn+0x107/0x134 [<6008da6c>] ? wake_up_process+0x17/0x19 [<60487628>] ? blk_queue_max_discard_sectors+0x0/0xd [<6006b05f>] warn_slowpath_fmt+0xd1/0xdf [<6006af8e>] ? warn_slowpath_fmt+0x0/0xdf [<600acc14>] ? raw_read_seqcount_begin.constprop.0+0x0/0x15 [<600619ae>] ? os_nsecs+0x1d/0x2b [<604b8d9f>] refcount_warn_saturate+0x13f/0x141 [<6048bc8f>] refcount_sub_and_test.constprop.0+0x2f/0x37 [<6048c8de>] blk_mq_free_request+0xf1/0x10d [<6048ca06>] __blk_mq_end_request+0x10c/0x114 [<6005ac0f>] ubd_intr+0xb5/0x169 [<600a1a37>] __handle_irq_event_percpu+0x6b/0x17e [<600a1b70>] handle_irq_event_percpu+0x26/0x69 [<600a1bd9>] handle_irq_event+0x26/0x34 [<600a1bb3>] ? handle_irq_event+0x0/0x34 [<600a5186>] ? unmask_irq+0x0/0x37 [<600a57e6>] handle_edge_irq+0xbc/0xd6 [<600a131a>] generic_handle_irq+0x21/0x29 [<60048f6e>] do_IRQ+0x39/0x54 [...] ---[ end trace c6e7444e55386c0f ]--- Cc: Christopher Obbard <chris.obbard@collabora.com> Reported-by: Martyn Welch <martyn@collabora.com> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Tested-by: Christopher Obbard <chris.obbard@collabora.com> Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-03-29um: ubd: Retry buffer read on any kind of errorGabriel Krisman Bertazi
Should bulk_req_safe_read return an error, we want to retry the read, otherwise, even though no IO will be done, os_write_file might still end up writing garbage to the pipe. Cc: Martyn Welch <martyn.welch@collabora.com> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-03-29um: ubd: Prevent buffer overrun on command completionGabriel Krisman Bertazi
On the hypervisor side, when completing commands and the pipe is full, we retry writing only the entries that failed, by offsetting io_req_buffer, but we don't reduce the number of bytes written, which can cause a buffer overrun of io_req_buffer, and write garbage to the pipe. Cc: Martyn Welch <martyn.welch@collabora.com> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-01-29Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull SCSI updates from James Bottomley: "This series is slightly unusual because it includes Arnd's compat ioctl tree here: 1c46a2cf2dbd Merge tag 'block-ioctl-cleanup-5.6' into 5.6/scsi-queue Excluding Arnd's changes, this is mostly an update of the usual drivers: megaraid_sas, mpt3sas, qla2xxx, ufs, lpfc, hisi_sas. There are a couple of core and base updates around error propagation and atomicity in the attribute container base we use for the SCSI transport classes. The rest is minor changes and updates" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (149 commits) scsi: hisi_sas: Rename hisi_sas_cq.pci_irq_mask scsi: hisi_sas: Add prints for v3 hw interrupt converge and automatic affinity scsi: hisi_sas: Modify the file permissions of trigger_dump to write only scsi: hisi_sas: Replace magic number when handle channel interrupt scsi: hisi_sas: replace spin_lock_irqsave/spin_unlock_restore with spin_lock/spin_unlock scsi: hisi_sas: use threaded irq to process CQ interrupts scsi: ufs: Use UFS device indicated maximum LU number scsi: ufs: Add max_lu_supported in struct ufs_dev_info scsi: ufs: Delete is_init_prefetch from struct ufs_hba scsi: ufs: Inline two functions into their callers scsi: ufs: Move ufshcd_get_max_pwr_mode() to ufshcd_device_params_init() scsi: ufs: Split ufshcd_probe_hba() based on its called flow scsi: ufs: Delete struct ufs_dev_desc scsi: ufs: Fix ufshcd_probe_hba() reture value in case ufshcd_scsi_add_wlus() fails scsi: ufs-mediatek: enable low-power mode for hibern8 state scsi: ufs: export some functions for vendor usage scsi: ufs-mediatek: add dbg_register_dump implementation scsi: qla2xxx: Fix a NULL pointer dereference in an error path scsi: qla1280: Make checking for 64bit support consistent scsi: megaraid_sas: Update driver version to 07.713.01.00-rc1 ...