diff options
| author | Linus Torvalds <torvalds@linux-foundation.org> | 2025-07-28 16:43:54 -0700 |
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2025-07-28 16:43:54 -0700 |
| commit | 6e11664f148454a127dd89e8698c3e3e80e5f62f (patch) | |
| tree | 1dda14e522a1fd0abfe320cc49c16bcf0110ff24 /drivers | |
| parent | c3018a2c6adae9b32f7b9259f5b38257ba9a758e (diff) | |
| parent | 5989bfe6ac6bf230c2c84e118c786be0ed4be3f4 (diff) | |
| download | linux-next-6e11664f148454a127dd89e8698c3e3e80e5f62f.tar.gz linux-next-6e11664f148454a127dd89e8698c3e3e80e5f62f.zip | |
Merge tag 'for-6.17/block-20250728' of git://git.kernel.dk/linux
Pull block updates from Jens Axboe:
- MD pull request via Yu:
- call del_gendisk synchronously (Xiao)
- cleanup unused variable (John)
- cleanup workqueue flags (Ryo)
- fix faulty rdev can't be removed during resync (Qixing)
- NVMe pull request via Christoph:
- try PCIe function level reset on init failure (Keith Busch)
- log TLS handshake failures at error level (Maurizio Lombardi)
- pci-epf: do not complete commands twice if nvmet_req_init()
fails (Rick Wertenbroek)
- misc cleanups (Alok Tiwari)
- Removal of the pktcdvd driver
This has been more than a decade coming at this point, and some
recently revealed breakages that had it causing issues even for cases
where it isn't required made me re-pull the trigger on this one. It's
known broken and nobody has stepped up to maintain the code
- Series for ublk supporting batch commands, enabling the use of
multishot where appropriate
- Speed up ublk exit handling
- Fix for the two-stage elevator fixing which could leak data
- Convert NVMe to use the new IOVA based API
- Increase default max transfer size to something more reasonable
- Series fixing write operations on zoned DM devices
- Add tracepoints for zoned block device operations
- Prep series working towards improving blk-mq queue management in the
presence of isolated CPUs
- Don't allow updating of the block size of a loop device that is
currently under exclusively ownership/open
- Set chunk sectors from stacked device stripe size and use it for the
atomic write size limit
- Switch to folios in bcache read_super()
- Fix for CD-ROM MRW exit flush handling
- Various tweaks, fixes, and cleanups
* tag 'for-6.17/block-20250728' of git://git.kernel.dk/linux: (94 commits)
block: restore two stage elevator switch while running nr_hw_queue update
cdrom: Call cdrom_mrw_exit from cdrom_release function
sunvdc: Balance device refcount in vdc_port_mpgroup_check
nvme-pci: try function level reset on init failure
dm: split write BIOs on zone boundaries when zone append is not emulated
block: use chunk_sectors when evaluating stacked atomic write limits
dm-stripe: limit chunk_sectors to the stripe size
md/raid10: set chunk_sectors limit
md/raid0: set chunk_sectors limit
block: sanitize chunk_sectors for atomic write limits
ilog2: add max_pow_of_two_factor()
nvmet: pci-epf: Do not complete commands twice if nvmet_req_init() fails
nvme-tcp: log TLS handshake failures at error level
docs: nvme: fix grammar in nvme-pci-endpoint-target.rst
nvme: fix typo in status code constant for self-test in progress
nvmet: remove redundant assignment of error code in nvmet_ns_enable()
nvme: fix incorrect variable in io cqes error message
nvme: fix multiple spelling and grammar issues in host drivers
block: fix blk_zone_append_update_request_bio() kernel-doc
md/raid10: fix set but not used variable in sync_request_write()
...
Diffstat (limited to 'drivers')
41 files changed, 1055 insertions, 3613 deletions
diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig index 0f70e2374e7f..df38fb364904 100644 --- a/drivers/block/Kconfig +++ b/drivers/block/Kconfig @@ -256,49 +256,6 @@ config BLK_DEV_RAM_SIZE The default value is 4096 kilobytes. Only change this if you know what you are doing. -config CDROM_PKTCDVD - tristate "Packet writing on CD/DVD media (DEPRECATED)" - depends on !UML - depends on SCSI - select CDROM - help - Note: This driver is deprecated and will be removed from the - kernel in the near future! - - If you have a CDROM/DVD drive that supports packet writing, say - Y to include support. It should work with any MMC/Mt Fuji - compliant ATAPI or SCSI drive, which is just about any newer - DVD/CD writer. - - Currently only writing to CD-RW, DVD-RW, DVD+RW and DVDRAM discs - is possible. - DVD-RW disks must be in restricted overwrite mode. - - See the file <file:Documentation/cdrom/packet-writing.rst> - for further information on the use of this driver. - - To compile this driver as a module, choose M here: the - module will be called pktcdvd. - -config CDROM_PKTCDVD_BUFFERS - int "Free buffers for data gathering" - depends on CDROM_PKTCDVD - default "8" - help - This controls the maximum number of active concurrent packets. More - concurrent packets can increase write performance, but also require - more memory. Each concurrent packet will require approximately 64Kb - of non-swappable kernel memory, memory which will be allocated when - a disc is opened for writing. - -config CDROM_PKTCDVD_WCACHE - bool "Enable write caching" - depends on CDROM_PKTCDVD - help - If enabled, write caching will be set for the CD-R/W device. For now - this option is dangerous unless the CD-RW media is known good, as we - don't do deferred write error handling yet. - config ATA_OVER_ETH tristate "ATA over Ethernet support" depends on NET diff --git a/drivers/block/Makefile b/drivers/block/Makefile index 097707aca725..a695ce74ef22 100644 --- a/drivers/block/Makefile +++ b/drivers/block/Makefile @@ -23,7 +23,6 @@ obj-$(CONFIG_AMIGA_Z2RAM) += z2ram.o obj-$(CONFIG_N64CART) += n64cart.o obj-$(CONFIG_BLK_DEV_RAM) += brd.o obj-$(CONFIG_BLK_DEV_LOOP) += loop.o -obj-$(CONFIG_CDROM_PKTCDVD) += pktcdvd.o obj-$(CONFIG_SUNVDC) += sunvdc.o obj-$(CONFIG_BLK_DEV_NBD) += nbd.o diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c index e5a2e5f7887b..975024cf03c5 100644 --- a/drivers/block/drbd/drbd_receiver.c +++ b/drivers/block/drbd/drbd_receiver.c @@ -2500,7 +2500,11 @@ static int handle_write_conflicts(struct drbd_device *device, peer_req->w.cb = superseded ? e_send_superseded : e_send_retry_write; list_add_tail(&peer_req->w.list, &device->done_ee); - queue_work(connection->ack_sender, &peer_req->peer_device->send_acks_work); + /* put is in drbd_send_acks_wf() */ + kref_get(&device->kref); + if (!queue_work(connection->ack_sender, + &peer_req->peer_device->send_acks_work)) + kref_put(&device->kref, drbd_destroy_device); err = -ENOENT; goto out; diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c index e97432032f01..24be0c2c4075 100644 --- a/drivers/block/floppy.c +++ b/drivers/block/floppy.c @@ -3411,7 +3411,7 @@ static int fd_locked_ioctl(struct block_device *bdev, blk_mode_t mode, struct floppy_max_errors max_errors; struct floppy_drive_params dp; } inparam; /* parameters coming from user space */ - const void *outparam; /* parameters passed back to user space */ + const void *outparam = NULL; /* parameters passed back to user space */ /* convert compatibility eject ioctls into floppy eject ioctl. * We do this in order to provide a means to eject floppy disks before diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 8d994cae3b83..1b6ee91f8eb9 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1431,17 +1431,34 @@ static int loop_set_dio(struct loop_device *lo, unsigned long arg) return 0; } -static int loop_set_block_size(struct loop_device *lo, unsigned long arg) +static int loop_set_block_size(struct loop_device *lo, blk_mode_t mode, + struct block_device *bdev, unsigned long arg) { struct queue_limits lim; unsigned int memflags; int err = 0; - if (lo->lo_state != Lo_bound) - return -ENXIO; + /* + * If we don't hold exclusive handle for the device, upgrade to it + * here to avoid changing device under exclusive owner. + */ + if (!(mode & BLK_OPEN_EXCL)) { + err = bd_prepare_to_claim(bdev, loop_set_block_size, NULL); + if (err) + return err; + } + + err = mutex_lock_killable(&lo->lo_mutex); + if (err) + goto abort_claim; + + if (lo->lo_state != Lo_bound) { + err = -ENXIO; + goto unlock; + } if (lo->lo_queue->limits.logical_block_size == arg) - return 0; + goto unlock; sync_blockdev(lo->lo_device); invalidate_bdev(lo->lo_device); @@ -1454,6 +1471,11 @@ static int loop_set_block_size(struct loop_device *lo, unsigned long arg) loop_update_dio(lo); blk_mq_unfreeze_queue(lo->lo_queue, memflags); +unlock: + mutex_unlock(&lo->lo_mutex); +abort_claim: + if (!(mode & BLK_OPEN_EXCL)) + bd_abort_claiming(bdev, loop_set_block_size); return err; } @@ -1472,9 +1494,6 @@ static int lo_simple_ioctl(struct loop_device *lo, unsigned int cmd, case LOOP_SET_DIRECT_IO: err = loop_set_dio(lo, arg); break; - case LOOP_SET_BLOCK_SIZE: - err = loop_set_block_size(lo, arg); - break; default: err = -EINVAL; } @@ -1529,9 +1548,12 @@ static int lo_ioctl(struct block_device *bdev, blk_mode_t mode, break; case LOOP_GET_STATUS64: return loop_get_status64(lo, argp); + case LOOP_SET_BLOCK_SIZE: + if (!(mode & BLK_OPEN_WRITE) && !capable(CAP_SYS_ADMIN)) + return -EPERM; + return loop_set_block_size(lo, mode, bdev, arg); case LOOP_SET_CAPACITY: case LOOP_SET_DIRECT_IO: - case LOOP_SET_BLOCK_SIZE: if (!(mode & BLK_OPEN_WRITE) && !capable(CAP_SYS_ADMIN)) return -EPERM; fallthrough; diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c index 66ce6b81c7d9..8fc7761397bd 100644 --- a/drivers/block/mtip32xx/mtip32xx.c +++ b/drivers/block/mtip32xx/mtip32xx.c @@ -2040,11 +2040,12 @@ static int mtip_hw_ioctl(struct driver_data *dd, unsigned int cmd, * @dir Direction (read or write) * * return value - * None + * 0 The IO completed successfully. + * -ENOMEM The DMA mapping failed. */ -static void mtip_hw_submit_io(struct driver_data *dd, struct request *rq, - struct mtip_cmd *command, - struct blk_mq_hw_ctx *hctx) +static int mtip_hw_submit_io(struct driver_data *dd, struct request *rq, + struct mtip_cmd *command, + struct blk_mq_hw_ctx *hctx) { struct mtip_cmd_hdr *hdr = dd->port->command_list + sizeof(struct mtip_cmd_hdr) * rq->tag; @@ -2056,12 +2057,14 @@ static void mtip_hw_submit_io(struct driver_data *dd, struct request *rq, unsigned int nents; /* Map the scatter list for DMA access */ - nents = blk_rq_map_sg(rq, command->sg); - nents = dma_map_sg(&dd->pdev->dev, command->sg, nents, dma_dir); + command->scatter_ents = blk_rq_map_sg(rq, command->sg); + nents = dma_map_sg(&dd->pdev->dev, command->sg, + command->scatter_ents, dma_dir); + if (!nents) + return -ENOMEM; - prefetch(&port->flags); - command->scatter_ents = nents; + prefetch(&port->flags); /* * The number of retries for this command before it is @@ -2112,11 +2115,13 @@ static void mtip_hw_submit_io(struct driver_data *dd, struct request *rq, if (unlikely(port->flags & MTIP_PF_PAUSE_IO)) { set_bit(rq->tag, port->cmds_to_issue); set_bit(MTIP_PF_ISSUE_CMDS_BIT, &port->flags); - return; + return 0; } /* Issue the command to the hardware */ mtip_issue_ncq_command(port, rq->tag); + + return 0; } /* @@ -3315,7 +3320,9 @@ static blk_status_t mtip_queue_rq(struct blk_mq_hw_ctx *hctx, blk_mq_start_request(rq); - mtip_hw_submit_io(dd, rq, cmd, hctx); + if (mtip_hw_submit_io(dd, rq, cmd, hctx)) + return BLK_STS_IOERR; + return BLK_STS_OK; } diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 2592bd19ebc1..6463d0e8d0ce 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -1473,7 +1473,17 @@ static int nbd_start_device(struct nbd_device *nbd) return -EINVAL; } - blk_mq_update_nr_hw_queues(&nbd->tag_set, config->num_connections); +retry: + mutex_unlock(&nbd->config_lock); + blk_mq_update_nr_hw_queues(&nbd->tag_set, num_connections); + mutex_lock(&nbd->config_lock); + + /* if another code path updated nr_hw_queues, retry until succeed */ + if (num_connections != config->num_connections) { + num_connections = config->num_connections; + goto retry; + } + nbd->pid = task_pid_nr(current); nbd_parse_flags(nbd); diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c deleted file mode 100644 index d5cc7bd2875c..000000000000 --- a/drivers/block/pktcdvd.c +++ /dev/null @@ -1,2916 +0,0 @@ -/* - * Copyright (C) 2000 Jens Axboe <axboe@suse.de> - * Copyright (C) 2001-2004 Peter Osterlund <petero2@telia.com> - * Copyright (C) 2006 Thomas Maier <balagi@justmail.de> - * - * May be copied or modified under the terms of the GNU General Public - * License. See linux/COPYING for more information. - * - * Packet writing layer for ATAPI and SCSI CD-RW, DVD+RW, DVD-RW and - * DVD-RAM devices. - * - * Theory of operation: - * - * At the lowest level, there is the standard driver for the CD/DVD device, - * such as drivers/scsi/sr.c. This driver can handle read and write requests, - * but it doesn't know anything about the special restrictions that apply to - * packet writing. One restriction is that write requests must be aligned to - * packet boundaries on the physical media, and the size of a write request - * must be equal to the packet size. Another restriction is that a - * GPCMD_FLUSH_CACHE command has to be issued to the drive before a read - * command, if the previous command was a write. - * - * The purpose of the packet writing driver is to hide these restrictions from - * higher layers, such as file systems, and present a block device that can be - * randomly read and written using 2kB-sized blocks. - * - * The lowest layer in the packet writing driver is the packet I/O scheduler. - * Its data is defined by the struct packet_iosched and includes two bio - * queues with pending read and write requests. These queues are processed - * by the pkt_iosched_process_queue() function. The write requests in this - * queue are already properly aligned and sized. This layer is responsible for - * issuing the flush cache commands and scheduling the I/O in a good order. - * - * The next layer transforms unaligned write requests to aligned writes. This - * transformation requires reading missing pieces of data from the underlying - * block device, assembling the pieces to full packets and queuing them to the - * packet I/O scheduler. - * - * At the top layer there is a custom ->submit_bio function that forwards - * read requests directly to the iosched queue and puts write requests in the - * unaligned write queue. A kernel thread performs the necessary read - * gathering to convert the unaligned writes to aligned writes and then feeds - * them to the packet I/O scheduler. - * - *************************************************************************/ - -#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt - -#include <linux/backing-dev.h> -#include <linux/compat.h> -#include <linux/debugfs.h> -#include <linux/device.h> -#include <linux/errno.h> -#include <linux/file.h> -#include <linux/freezer.h> -#include <linux/kernel.h> -#include <linux/kthread.h> -#include <linux/miscdevice.h> -#include <linux/module.h> -#include <linux/mutex.h> -#include <linux/nospec.h> -#include <linux/pktcdvd.h> -#include <linux/proc_fs.h> -#include <linux/seq_file.h> -#include <linux/slab.h> -#include <linux/spinlock.h> -#include <linux/types.h> -#include <linux/uaccess.h> - -#include <scsi/scsi.h> -#include <scsi/scsi_cmnd.h> -#include <scsi/scsi_ioctl.h> - -#include <linux/unaligned.h> - -#define DRIVER_NAME "pktcdvd" - -#define MAX_SPEED 0xffff - -static DEFINE_MUTEX(pktcdvd_mutex); -static struct pktcdvd_device *pkt_devs[MAX_WRITERS]; -static struct proc_dir_entry *pkt_proc; -static int pktdev_major; -static int write_congestion_on = PKT_WRITE_CONGESTION_ON; -static int write_congestion_off = PKT_WRITE_CONGESTION_OFF; -static struct mutex ctl_mutex; /* Serialize open/close/setup/teardown */ -static mempool_t psd_pool; -static struct bio_set pkt_bio_set; - -/* /sys/class/pktcdvd */ -static struct class class_pktcdvd; -static struct dentry *pkt_debugfs_root = NULL; /* /sys/kernel/debug/pktcdvd */ - -/* forward declaration */ -static int pkt_setup_dev(dev_t dev, dev_t* pkt_dev); -static int pkt_remove_dev(dev_t pkt_dev); - -static sector_t get_zone(sector_t sector, struct pktcdvd_device *pd) -{ - return (sector + pd->offset) & ~(sector_t)(pd->settings.size - 1); -} - -/********************************************************** - * sysfs interface for pktcdvd - * by (C) 2006 Thomas Maier <balagi@justmail.de> - - /sys/class/pktcdvd/pktcdvd[0-7]/ - stat/reset - stat/packets_started - stat/packets_finished - stat/kb_written - stat/kb_read - stat/kb_read_gather - write_queue/size - write_queue/congestion_off - write_queue/congestion_on - **********************************************************/ - -static ssize_t packets_started_show(struct device *dev, - struct device_attribute *attr, char *buf) -{ - struct pktcdvd_device *pd = dev_get_drvdata(dev); - - return sysfs_emit(buf, "%lu\n", pd->stats.pkt_started); -} -static DEVICE_ATTR_RO(packets_started); - -static ssize_t packets_finished_show(struct device *dev, - struct device_attribute *attr, char *buf) -{ - struct pktcdvd_device *pd = dev_get_drvdata(dev); - - return sysfs_emit(buf, "%lu\n", pd->stats.pkt_ended); -} -static DEVICE_ATTR_RO(packets_finished); - -static ssize_t kb_written_show(struct device *dev, - struct device_attribute *attr, char *buf) -{ - struct pktcdvd_device *pd = dev_get_drvdata(dev); - - return sysfs_emit(buf, "%lu\n", pd->stats.secs_w >> 1); -} -static DEVICE_ATTR_RO(kb_written); - -static ssize_t kb_read_show(struct device *dev, - struct device_attribute *attr, char *buf) -{ - struct pktcdvd_device *pd = dev_get_drvdata(dev); - - return sysfs_emit(buf, "%lu\n", pd->stats.secs_r >> 1); -} -static DEVICE_ATTR_RO(kb_read); - -static ssize_t kb_read_gather_show(struct device *dev, - struct device_attribute *attr, char *buf) -{ - struct pktcdvd_device *pd = dev_get_drvdata(dev); - - return sysfs_emit(buf, "%lu\n", pd->stats.secs_rg >> 1); -} -static DEVICE_ATTR_RO(kb_read_gather); - -static ssize_t reset_store(struct device *dev, struct device_attribute *attr, - const char *buf, size_t len) -{ - struct pktcdvd_device *pd = dev_get_drvdata(dev); - - if (len > 0) { - pd->stats.pkt_started = 0; - pd->stats.pkt_ended = 0; - pd->stats.secs_w = 0; - pd->stats.secs_rg = 0; - pd->stats.secs_r = 0; - } - return len; -} -static DEVICE_ATTR_WO(reset); - -static struct attribute *pkt_stat_attrs[] = { - &dev_attr_packets_finished.attr, - &dev_attr_packets_started.attr, - &dev_attr_kb_read.attr, - &dev_attr_kb_written.attr, - &dev_attr_kb_read_gather.attr, - &dev_attr_reset.attr, - NULL, -}; - -static const struct attribute_group pkt_stat_group = { - .name = "stat", - .attrs = pkt_stat_attrs, -}; - -static ssize_t size_show(struct device *dev, - struct device_attribute *attr, char *buf) -{ - struct pktcdvd_device *pd = dev_get_drvdata(dev); - int n; - - spin_lock(&pd->lock); - n = sysfs_emit(buf, "%d\n", pd->bio_queue_size); - spin_unlock(&pd->lock); - return n; -} -static DEVICE_ATTR_RO(size); - -static void init_write_congestion_marks(int* lo, int* hi) -{ - if (*hi > 0) { - *hi = max(*hi, 500); - *hi = min(*hi, 1000000); - if (*lo <= 0) - *lo = *hi - 100; - else { - *lo = min(*lo, *hi - 100); - *lo = max(*lo, 100); - } - } else { - *hi = -1; - *lo = -1; - } -} - -static ssize_t congestion_off_show(struct device *dev, - struct device_attribute *attr, char *buf) -{ - struct pktcdvd_device *pd = dev_get_drvdata(dev); - int n; - - spin_lock(&pd->lock); - n = sysfs_emit(buf, "%d\n", pd->write_congestion_off); - spin_unlock(&pd->lock); - return n; -} - -static ssize_t congestion_off_store(struct device *dev, - struct device_attribute *attr, - const char *buf, size_t len) -{ - struct pktcdvd_device *pd = dev_get_drvdata(dev); - int val, ret; - - ret = kstrtoint(buf, 10, &val); - if (ret) - return ret; - - spin_lock(&pd->lock); - pd->write_congestion_off = val; - init_write_congestion_marks(&pd->write_congestion_off, &pd->write_congestion_on); - spin_unlock(&pd->lock); - return len; -} -static DEVICE_ATTR_RW(congestion_off); - -static ssize_t congestion_on_show(struct device *dev, - struct device_attribute *attr, char *buf) -{ - struct pktcdvd_device *pd = dev_get_drvdata(dev); - int n; - - spin_lock(&pd->lock); - n = sysfs_emit(buf, "%d\n", pd->write_congestion_on); - spin_unlock(&pd->lock); - return n; -} - -static ssize_t congestion_on_store(struct device *dev, - struct device_attribute *attr, - const char *buf, size_t len) -{ - struct pktcdvd_device *pd = dev_get_drvdata(dev); - int val, ret; - - ret = kstrtoint(buf, 10, &val); - if (ret) - return ret; - - spin_lock(&pd->lock); - pd->write_congestion_on = val; - init_write_congestion_marks(&pd->write_congestion_off, &pd->write_congestion_on); - spin_unlock(&pd->lock); - return len; -} -static DEVICE_ATTR_RW(congestion_on); - -static struct attribute *pkt_wq_attrs[] = { - &dev_attr_congestion_on.attr, - &dev_attr_congestion_off.attr, - &dev_attr_size.attr, - NULL, -}; - -static const struct attribute_group pkt_wq_group = { - .name = "write_queue", - .attrs = pkt_wq_attrs, -}; - -static const struct attribute_group *pkt_groups[] = { - &pkt_stat_group, - &pkt_wq_group, - NULL, -}; - -static void pkt_sysfs_dev_new(struct pktcdvd_device *pd) -{ - if (class_is_registered(&class_pktcdvd)) { - pd->dev = device_create_with_groups(&class_pktcdvd, NULL, - MKDEV(0, 0), pd, pkt_groups, - "%s", pd->disk->disk_name); - if (IS_ERR(pd->dev)) - pd->dev = NULL; - } -} - -static void pkt_sysfs_dev_remove(struct pktcdvd_device *pd) -{ - if (class_is_registered(&class_pktcdvd)) - device_unregister(pd->dev); -} - - -/******************************************************************** - /sys/class/pktcdvd/ - add map block device - remove unmap packet dev - device_map show mappings - *******************************************************************/ - -static ssize_t device_map_show(const struct class *c, const struct class_attribute *attr, - char *data) -{ - int n = 0; - int idx; - mutex_lock_nested(&ctl_mutex, SINGLE_DEPTH_NESTING); - for (idx = 0; idx < MAX_WRITERS; idx++) { - struct pktcdvd_device *pd = pkt_devs[idx]; - if (!pd) - continue; - n += sysfs_emit_at(data, n, "%s %u:%u %u:%u\n", - pd->disk->disk_name, - MAJOR(pd->pkt_dev), MINOR(pd->pkt_dev), - MAJOR(file_bdev(pd->bdev_file)->bd_dev), - MINOR(file_bdev(pd->bdev_file)->bd_dev)); - } - mutex_unlock(&ctl_mutex); - return n; -} -static CLASS_ATTR_RO(device_map); - -static ssize_t add_store(const struct class *c, const struct class_attribute *attr, - const char *buf, size_t count) -{ - unsigned int major, minor; - - if (sscanf(buf, "%u:%u", &major, &minor) == 2) { - /* pkt_setup_dev() expects caller to hold reference to self */ - if (!try_module_get(THIS_MODULE)) - return -ENODEV; - - pkt_setup_dev(MKDEV(major, minor), NULL); - - module_put(THIS_MODULE); - - return count; - } - - return -EINVAL; -} -static CLASS_ATTR_WO(add); - -static ssize_t remove_store(const struct class *c, const struct class_attribute *attr, - const char *buf, size_t count) -{ - unsigned int major, minor; - if (sscanf(buf, "%u:%u", &major, &minor) == 2) { - pkt_remove_dev(MKDEV(major, minor)); - return count; - } - return -EINVAL; -} -static CLASS_ATTR_WO(remove); - -static struct attribute *class_pktcdvd_attrs[] = { - &class_attr_add.attr, - &class_attr_remove.attr, - &class_attr_device_map.attr, - NULL, -}; -ATTRIBUTE_GROUPS(class_pktcdvd); - -static struct class class_pktcdvd = { - .name = DRIVER_NAME, - .class_groups = class_pktcdvd_groups, -}; - -static int pkt_sysfs_init(void) -{ - /* - * create control files in sysfs - * /sys/class/pktcdvd/... - */ - return class_register(&class_pktcdvd); -} - -static void pkt_sysfs_cleanup(void) -{ - class_unregister(&class_pktcdvd); -} - -/******************************************************************** - entries in debugfs - - /sys/kernel/debug/pktcdvd[0-7]/ - info - - *******************************************************************/ - -static void pkt_count_states(struct pktcdvd_device *pd, int *states) -{ - struct packet_data *pkt; - int i; - - for (i = 0; i < PACKET_NUM_STATES; i++) - states[i] = 0; - - spin_lock(&pd->cdrw.active_list_lock); - list_for_each_entry(pkt, &pd->cdrw.pkt_active_list, list) { - states[pkt->state]++; - } - spin_unlock(&pd->cdrw.active_list_lock); -} - -static int pkt_seq_show(struct seq_file *m, void *p) -{ - struct pktcdvd_device *pd = m->private; - char *msg; - int states[PACKET_NUM_STATES]; - - seq_printf(m, "Writer %s mapped to %pg:\n", pd->disk->disk_name, - file_bdev(pd->bdev_file)); - - seq_printf(m, "\nSettings:\n"); - seq_printf(m, "\tpacket size:\t\t%dkB\n", pd->settings.size / 2); - - if (pd->settings.write_type == 0) - msg = "Packet"; - else - msg = "Unknown"; - seq_printf(m, "\twrite type:\t\t%s\n", msg); - - seq_printf(m, "\tpacket type:\t\t%s\n", pd->settings.fp ? "Fixed" : "Variable"); - seq_printf(m, "\tlink loss:\t\t%d\n", pd->settings.link_loss); - - seq_printf(m, "\ttrack mode:\t\t%d\n", pd->settings.track_mode); - - if (pd->settings.block_mode == PACKET_BLOCK_MODE1) - msg = "Mode 1"; - else if (pd->settings.block_mode == PACKET_BLOCK_MODE2) - msg = "Mode 2"; - else - msg = "Unknown"; - seq_printf(m, "\tblock mode:\t\t%s\n", msg); - - seq_printf(m, "\nStatistics:\n"); - seq_printf(m, "\tpackets started:\t%lu\n", pd->stats.pkt_started); - seq_printf(m, "\tpackets ended:\t\t%lu\n", pd->stats.pkt_ended); - seq_printf(m, "\twritten:\t\t%lukB\n", pd->stats.secs_w >> 1); - seq_printf(m, "\tread gather:\t\t%lukB\n", pd->stats.secs_rg >> 1); - seq_printf(m, "\tread:\t\t\t%lukB\n", pd->stats.secs_r >> 1); - - seq_printf(m, "\nMisc:\n"); - seq_printf(m, "\treference count:\t%d\n", pd->refcnt); - seq_printf(m, "\tflags:\t\t\t0x%lx\n", pd->flags); - seq_printf(m, "\tread speed:\t\t%ukB/s\n", pd->read_speed); - seq_printf(m, "\twrite speed:\t\t%ukB/s\n", pd->write_speed); - seq_printf(m, "\tstart offset:\t\t%lu\n", pd->offset); - seq_printf(m, "\tmode page offset:\t%u\n", pd->mode_offset); - - seq_printf(m, "\nQueue state:\n"); - seq_printf(m, "\tbios queued:\t\t%d\n", pd->bio_queue_size); - seq_printf(m, "\tbios pending:\t\t%d\n", atomic_read(&pd->cdrw.pending_bios)); - seq_printf(m, "\tcurrent sector:\t\t0x%llx\n", pd->current_sector); - - pkt_count_states(pd, states); - seq_printf(m, "\tstate:\t\t\ti:%d ow:%d rw:%d ww:%d rec:%d fin:%d\n", - states[0], states[1], states[2], states[3], states[4], states[5]); - - seq_printf(m, "\twrite congestion marks:\toff=%d on=%d\n", - pd->write_congestion_off, - pd->write_congestion_on); - return 0; -} -DEFINE_SHOW_ATTRIBUTE(pkt_seq); - -static void pkt_debugfs_dev_new(struct pktcdvd_device *pd) -{ - if (!pkt_debugfs_root) - return; - pd->dfs_d_root = debugfs_create_dir(pd->disk->disk_name, pkt_debugfs_root); - - pd->dfs_f_info = debugfs_create_file("info", 0444, pd->dfs_d_root, - pd, &pkt_seq_fops); -} - -static void pkt_debugfs_dev_remove(struct pktcdvd_device *pd) -{ - if (!pkt_debugfs_root) - return; - debugfs_remove(pd->dfs_f_info); - debugfs_remove(pd->dfs_d_root); - pd->dfs_f_info = NULL; - pd->dfs_d_root = NULL; -} - -static void pkt_debugfs_init(void) -{ - pkt_debugfs_root = debugfs_create_dir(DRIVER_NAME, NULL); -} - -static void pkt_debugfs_cleanup(void) -{ - debugfs_remove(pkt_debugfs_root); - pkt_debugfs_root = NULL; -} - -/* ----------------------------------------------------------*/ - - -static void pkt_bio_finished(struct pktcdvd_device *pd) -{ - struct device *ddev = disk_to_dev(pd->disk); - - BUG_ON(atomic_read(&pd->cdrw.pending_bios) <= 0); - if (atomic_dec_and_test(&pd->cdrw.pending_bios)) { - dev_dbg(ddev, "queue empty\n"); - atomic_set(&pd->iosched.attention, 1); - wake_up(&pd->wqueue); - } -} - -/* - * Allocate a packet_data struct - */ -static struct packet_data *pkt_alloc_packet_data(int frames) -{ - int i; - struct packet_data *pkt; - - pkt = kzalloc(sizeof(struct packet_data), GFP_KERNEL); - if (!pkt) - goto no_pkt; - - pkt->frames = frames; - pkt->w_bio = bio_kmalloc(frames, GFP_KERNEL); - if (!pkt->w_bio) - goto no_bio; - - for (i = 0; i < frames / FRAMES_PER_PAGE; i++) { - pkt->pages[i] = alloc_page(GFP_KERNEL|__GFP_ZERO); - if (!pkt->pages[i]) - goto no_page; - } - - spin_lock_init(&pkt->lock); - bio_list_init(&pkt->orig_bios); - - for (i = 0; i < frames; i++) { - pkt->r_bios[i] = bio_kmalloc(1, GFP_KERNEL); - if (!pkt->r_bios[i]) - goto no_rd_bio; - } - - return pkt; - -no_rd_bio: - for (i = 0; i < frames; i++) - kfree(pkt->r_bios[i]); -no_page: - for (i = 0; i < frames / FRAMES_PER_PAGE; i++) - if (pkt->pages[i]) - __free_page(pkt->pages[i]); - kfree(pkt->w_bio); -no_bio: - kfree(pkt); -no_pkt: - return NULL; -} - -/* - * Free a packet_data struct - */ -static void pkt_free_packet_data(struct packet_data *pkt) -{ - int i; - - for (i = 0; i < pkt->frames; i++) - kfree(pkt->r_bios[i]); - for (i = 0; i < pkt->frames / FRAMES_PER_PAGE; i++) - __free_page(pkt->pages[i]); - kfree(pkt->w_bio); - kfree(pkt); -} - -static void pkt_shrink_pktlist(struct pktcdvd_device *pd) -{ - struct packet_data *pkt, *next; - - BUG_ON(!list_empty(&pd->cdrw.pkt_active_list)); - - list_for_each_entry_safe(pkt, next, &pd->cdrw.pkt_free_list, list) { - pkt_free_packet_data(pkt); - } - INIT_LIST_HEAD(&pd->cdrw.pkt_free_list); -} - -static int pkt_grow_pktlist(struct pktcdvd_device *pd, int nr_packets) -{ - struct packet_data *pkt; - - BUG_ON(!list_empty(&pd->cdrw.pkt_free_list)); - - while (nr_packets > 0) { - pkt = pkt_alloc_packet_data(pd->settings.size >> 2); - if (!pkt) { - pkt_shrink_pktlist(pd); - return 0; - } - pkt->id = nr_packets; - pkt->pd = pd; - list_add(&pkt->list, &pd->cdrw.pkt_free_list); - nr_packets--; - } - return 1; -} - -static inline struct pkt_rb_node *pkt_rbtree_next(struct pkt_rb_node *node) -{ - struct rb_node *n = rb_next(&node->rb_node); - if (!n) - return NULL; - return rb_entry(n, struct pkt_rb_node, rb_node); -} - -static void pkt_rbtree_erase(struct pktcdvd_device *pd, struct pkt_rb_node *node) -{ - rb_erase(&node->rb_node, &pd->bio_queue); - mempool_free(node, &pd->rb_pool); - pd->bio_queue_size--; - BUG_ON(pd->bio_queue_size < 0); -} - -/* - * Find the first node in the pd->bio_queue rb tree with a starting sector >= s. - */ -static struct pkt_rb_node *pkt_rbtree_find(struct pktcdvd_device *pd, sector_t s) -{ - struct rb_node *n = pd->bio_queue.rb_node; - struct rb_node *next; - struct pkt_rb_node *tmp; - - if (!n) { - BUG_ON(pd->bio_queue_size > 0); - return NULL; - } - - for (;;) { - tmp = rb_entry(n, struct pkt_rb_node, rb_node); - if (s <= tmp->bio->bi_iter.bi_sector) - next = n->rb_left; - else - next = n->rb_right; - if (!next) - break; - n = next; - } - - if (s > tmp->bio->bi_iter.bi_sector) { - tmp = pkt_rbtree_next(tmp); - if (!tmp) - return NULL; - } - BUG_ON(s > tmp->bio->bi_iter.bi_sector); - return tmp; -} - -/* - * Insert a node into the pd->bio_queue rb tree. - */ -static void pkt_rbtree_insert(struct pktcdvd_device *pd, struct pkt_rb_node *node) -{ - struct rb_node **p = &pd->bio_queue.rb_node; - struct rb_node *parent = NULL; - sector_t s = node->bio->bi_iter.bi_sector; - struct pkt_rb_node *tmp; - - while (*p) { - parent = *p; - tmp = rb_entry(parent, struct pkt_rb_node, rb_node); - if (s < tmp->bio->bi_iter.bi_sector) - p = &(*p)->rb_left; - else - p = &(*p)->rb_right; - } - rb_link_node(&node->rb_node, parent, p); - rb_insert_color(&node->rb_node, &pd->bio_queue); - pd->bio_queue_size++; -} - -/* - * Send a packet_command to the underlying block device and - * wait for completion. - */ -static int pkt_generic_packet(struct pktcdvd_device *pd, struct packet_command *cgc) -{ - struct request_queue *q = bdev_get_queue(file_bdev(pd->bdev_file)); - struct scsi_cmnd *scmd; - struct request *rq; - int ret = 0; - - rq = scsi_alloc_request(q, (cgc->data_direction == CGC_DATA_WRITE) ? - REQ_OP_DRV_OUT : REQ_OP_DRV_IN, 0); - if (IS_ERR(rq)) - return PTR_ERR(rq); - scmd = blk_mq_rq_to_pdu(rq); - - if (cgc->buflen) { - ret = blk_rq_map_kern(rq, cgc->buffer, cgc->buflen, - GFP_NOIO); - if (ret) - goto out; - } - - scmd->cmd_len = COMMAND_SIZE(cgc->cmd[0]); - memcpy(scmd->cmnd, cgc->cmd, CDROM_PACKET_SIZE); - - rq->timeout = 60*HZ; - if (cgc->quiet) - rq->rq_flags |= RQF_QUIET; - - blk_execute_rq(rq, false); - if (scmd->result) - ret = -EIO; -out: - blk_mq_free_request(rq); - return ret; -} - -static const char *sense_key_string(__u8 index) -{ - static const char * const info[] = { - "No sense", "Recovered error", "Not ready", - "Medium error", "Hardware error", "Illegal request", - "Unit attention", "Data protect", "Blank check", - }; - - return index < ARRAY_SIZE(info) ? info[index] : "INVALID"; -} - -/* - * A generic sense dump / resolve mechanism should be implemented across - * all ATAPI + SCSI devices. - */ -static void pkt_dump_sense(struct pktcdvd_device *pd, - struct packet_command *cgc) -{ - struct device *ddev = disk_to_dev(pd->disk); - struct scsi_sense_hdr *sshdr = cgc->sshdr; - - if (sshdr) - dev_err(ddev, "%*ph - sense %02x.%02x.%02x (%s)\n", - CDROM_PACKET_SIZE, cgc->cmd, - sshdr->sense_key, sshdr->asc, sshdr->ascq, - sense_key_string(sshdr->sense_key)); - else - dev_err(ddev, "%*ph - no sense\n", CDROM_PACKET_SIZE, cgc->cmd); -} - -/* - * flush the drive cache to media - */ -static int pkt_flush_cache(struct pktcdvd_device *pd) -{ - struct packet_command cgc; - - init_cdrom_command(&cgc, NULL, 0, CGC_DATA_NONE); - cgc.cmd[0] = GPCMD_FLUSH_CACHE; - cgc.quiet = 1; - - /* - * the IMMED bit -- we default to not setting it, although that - * would allow a much faster close, this is safer - */ -#if 0 - cgc.cmd[1] = 1 << 1; -#endif - return pkt_generic_packet(pd, &cgc); -} - -/* - * speed is given as the normal factor, e.g. 4 for 4x - */ -static noinline_for_stack int pkt_set_speed(struct pktcdvd_device *pd, - unsigned write_speed, unsigned read_speed) -{ - struct packet_command cgc; - struct scsi_sense_hdr sshdr; - int ret; - - init_cdrom_command(&cgc, NULL, 0, CGC_DATA_NONE); - cgc.sshdr = &sshdr; - cgc.cmd[0] = GPCMD_SET_SPEED; - put_unaligned_be16(read_speed, &cgc.cmd[2]); - put_unaligned_be16(write_speed, &cgc.cmd[4]); - - ret = pkt_generic_packet(pd, &cgc); - if (ret) - pkt_dump_sense(pd, &cgc); - - return ret; -} - -/* - * Queue a bio for processing by the low-level CD device. Must be called - * from process context. - */ -static void pkt_queue_bio(struct pktcdvd_device *pd, struct bio *bio) -{ - /* - * Some CDRW drives can not handle writes larger than one packet, - * even if the size is a multiple of the packet size. - */ - bio->bi_opf |= REQ_NOMERGE; - - spin_lock(&pd->iosched.lock); - if (bio_data_dir(bio) == READ) - bio_list_add(&pd->iosched.read_queue, bio); - else - bio_list_add(&pd->iosched.write_queue, bio); - spin_unlock(&pd->iosched.lock); - - atomic_set(&pd->iosched.attention, 1); - wake_up(&pd->wqueue); -} - -/* - * Process the queued read/write requests. This function handles special - * requirements for CDRW drives: - * - A cache flush command must be inserted before a read request if the - * previous request was a write. - * - Switching between reading and writing is slow, so don't do it more often - * than necessary. - * - Optimize for throughput at the expense of latency. This means that streaming - * writes will never be interrupted by a read, but if the drive has to seek - * before the next write, switch to reading instead if there are any pending - * read requests. - * - Set the read speed according to current usage pattern. When only reading - * from the device, it's best to use the highest possible read speed, but - * when switching often between reading and writing, it's better to have the - * same read and write speeds. - */ -static void pkt_iosched_process_queue(struct pktcdvd_device *pd) -{ - struct device *ddev = disk_to_dev(pd->disk); - - if (atomic_read(&pd->iosched.attention) == 0) - return; - atomic_set(&pd->iosched.attention, 0); - - for (;;) { - struct bio *bio; - int reads_queued, writes_queued; - - spin_lock(&pd->iosched.lock); - reads_queued = !bio_list_empty(&pd->iosched.read_queue); - writes_queued = !bio_list_empty(&pd->iosched.write_queue); - spin_unlock(&pd->iosched.lock); - - if (!reads_queued && !writes_queued) - break; - - if (pd->iosched.writing) { - int need_write_seek = 1; - spin_lock(&pd->iosched.lock); - bio = bio_list_peek(&pd->iosched.write_queue); - spin_unlock(&pd->iosched.lock); - if (bio && (bio->bi_iter.bi_sector == - pd->iosched.last_write)) - need_write_seek = 0; - if (need_write_seek && reads_queued) { - if (atomic_read(&pd->cdrw.pending_bios) > 0) { - dev_dbg(ddev, "write, waiting\n"); - break; - } - pkt_flush_cache(pd); - pd->iosched.writing = 0; - } - } else { - if (!reads_queued && writes_queued) { - if (atomic_read(&pd->cdrw.pending_bios) > 0) { - dev_dbg(ddev, "read, waiting\n"); - break; - } - pd->iosched.writing = 1; - } - } - - spin_lock(&pd->iosched.lock); - if (pd->iosched.writing) - bio = bio_list_pop(&pd->iosched.write_queue); - else - bio = bio_list_pop(&pd->iosched.read_queue); - spin_unlock(&pd->iosched.lock); - - if (!bio) - continue; - - if (bio_data_dir(bio) == READ) - pd->iosched.successive_reads += - bio->bi_iter.bi_size >> 10; - else { - pd->iosched.successive_reads = 0; - pd->iosched.last_write = bio_end_sector(bio); - } - if (pd->iosched.successive_reads >= HI_SPEED_SWITCH) { - if (pd->read_speed == pd->write_speed) { - pd->read_speed = MAX_SPEED; - pkt_set_speed(pd, pd->write_speed, pd->read_speed); - } - } else { - if (pd->read_speed != pd->write_speed) { - pd->read_speed = pd->write_speed; - pkt_set_speed(pd, pd->write_speed, pd->read_speed); - } - } - - atomic_inc(&pd->cdrw.pending_bios); - submit_bio_noacct(bio); - } -} - -/* - * Special care is needed if the underlying block device has a small - * max_phys_segments value. - */ -static int pkt_set_segment_merging(struct pktcdvd_device *pd, struct request_queue *q) -{ - struct device *ddev = disk_to_dev(pd->disk); - - if ((pd->settings.size << 9) / CD_FRAMESIZE <= queue_max_segments(q)) { - /* - * The cdrom device can handle one segment/frame - */ - clear_bit(PACKET_MERGE_SEGS, &pd->flags); - return 0; - } - - if ((pd->settings.size << 9) / PAGE_SIZE <= queue_max_segments(q)) { - /* - * We can handle this case at the expense of some extra memory - * copies during write operations - */ - set_bit(PACKET_MERGE_SEGS, &pd->flags); - return 0; - } - - dev_err(ddev, "cdrom max_phys_segments too small\n"); - return -EIO; -} - -static void pkt_end_io_read(struct bio *bio) -{ - struct packet_data *pkt = bio->bi_private; - struct pktcdvd_device *pd = pkt->pd; - BUG_ON(!pd); - - dev_dbg(disk_to_dev(pd->disk), "bio=%p sec0=%llx sec=%llx err=%d\n", - bio, pkt->sector, bio->bi_iter.bi_sector, bio->bi_status); - - if (bio->bi_status) - atomic_inc(&pkt->io_errors); - bio_uninit(bio); - if (atomic_dec_and_test(&pkt->io_wait)) { - atomic_inc(&pkt->run_sm); - wake_up(&pd->wqueue); - } - pkt_bio_finished(pd); -} - -static void pkt_end_io_packet_write(struct bio *bio) -{ - struct packet_data *pkt = bio->bi_private; - struct pktcdvd_device *pd = pkt->pd; - BUG_ON(!pd); - - dev_dbg(disk_to_dev(pd->disk), "id=%d, err=%d\n", pkt->id, bio->bi_status); - - pd->stats.pkt_ended++; - - bio_uninit(bio); - pkt_bio_finished(pd); - atomic_dec(&pkt->io_wait); - atomic_inc(&pkt->run_sm); - wake_up(&pd->wqueue); -} - -/* - * Schedule reads for the holes in a packet - */ -static void pkt_gather_data(struct pktcdvd_device *pd, struct packet_data *pkt) -{ - struct device *ddev = disk_to_dev(pd->disk); - int frames_read = 0; - struct bio *bio; - int f; - char written[PACKET_MAX_SIZE]; - - BUG_ON(bio_list_empty(&pkt->orig_bios)); - - atomic_set(&pkt->io_wait, 0); - atomic_set(&pkt->io_errors, 0); - - /* - * Figure out which frames we need to read before we can write. - */ - memset(written, 0, sizeof(written)); - spin_lock(&pkt->lock); - bio_list_for_each(bio, &pkt->orig_bios) { - int first_frame = (bio->bi_iter.bi_sector - pkt->sector) / - (CD_FRAMESIZE >> 9); - int num_frames = bio->bi_iter.bi_size / CD_FRAMESIZE; - pd->stats.secs_w += num_frames * (CD_FRAMESIZE >> 9); - BUG_ON(first_frame < 0); - BUG_ON(first_frame + num_frames > pkt->frames); - for (f = first_frame; f < first_frame + num_frames; f++) - written[f] = 1; - } - spin_unlock(&pkt->lock); - - if (pkt->cache_valid) { - dev_dbg(ddev, "zone %llx cached\n", pkt->sector); - goto out_account; - } - - /* - * Schedule reads for missing parts of the packet. - */ - for (f = 0; f < pkt->frames; f++) { - int p, offset; - - if (written[f]) - continue; - - bio = pkt->r_bios[f]; - bio_init(bio, file_bdev(pd->bdev_file), bio->bi_inline_vecs, 1, - REQ_OP_READ); - bio->bi_iter.bi_sector = pkt->sector + f * (CD_FRAMESIZE >> 9); - bio->bi_end_io = pkt_end_io_read; - bio->bi_private = pkt; - - p = (f * CD_FRAMESIZE) / PAGE_SIZE; - offset = (f * CD_FRAMESIZE) % PAGE_SIZE; - dev_dbg(ddev, "Adding frame %d, page:%p offs:%d\n", f, - pkt->pages[p], offset); - if (!bio_add_page(bio, pkt->pages[p], CD_FRAMESIZE, offset)) - BUG(); - - atomic_inc(&pkt->io_wait); - pkt_queue_bio(pd, bio); - frames_read++; - } - -out_account: - dev_dbg(ddev, "need %d frames for zone %llx\n", frames_read, pkt->sector); - pd->stats.pkt_started++; - pd->stats.secs_rg += frames_read * (CD_FRAMESIZE >> 9); -} - -/* - * Find a packet matching zone, or the least recently used packet if - * there is no match. - */ -static struct packet_data *pkt_get_packet_data(struct pktcdvd_device *pd, int zone) -{ - struct packet_data *pkt; - - list_for_each_entry(pkt, &pd->cdrw.pkt_free_list, list) { - if (pkt->sector == zone || pkt->list.next == &pd->cdrw.pkt_free_list) { - list_del_init(&pkt->list); - if (pkt->sector != zone) - pkt->cache_valid = 0; - return pkt; - } - } - BUG(); - return NULL; -} - -static void pkt_put_packet_data(struct pktcdvd_device *pd, struct packet_data *pkt) -{ - if (pkt->cache_valid) { - list_add(&pkt->list, &pd->cdrw.pkt_free_list); - } else { - list_add_tail(&pkt->list, &pd->cdrw.pkt_free_list); - } -} - -static inline void pkt_set_state(struct device *ddev, struct packet_data *pkt, - enum packet_data_state state) -{ - static const char *state_name[] = { - "IDLE", "WAITING", "READ_WAIT", "WRITE_WAIT", "RECOVERY", "FINISHED" - }; - enum packet_data_state old_state = pkt->state; - - dev_dbg(ddev, "pkt %2d : s=%6llx %s -> %s\n", - pkt->id, pkt->sector, state_name[old_state], state_name[state]); - - pkt->state = state; -} - -/* - * Scan the work queue to see if we can start a new packet. - * returns non-zero if any work was done. - */ -static int pkt_handle_queue(struct pktcdvd_device *pd) -{ - struct device *ddev = disk_to_dev(pd->disk); - struct packet_data *pkt, *p; - struct bio *bio = NULL; - sector_t zone = 0; /* Suppress gcc warning */ - struct pkt_rb_node *node, *first_node; - struct rb_node *n; - - atomic_set(&pd->scan_queue, 0); - - if (list_empty(&pd->cdrw.pkt_free_list)) { - dev_dbg(ddev, "no pkt\n"); - return 0; - } - - /* - * Try to find a zone we are not already working on. - */ - spin_lock(&pd->lock); - first_node = pkt_rbtree_find(pd, pd->current_sector); - if (!first_node) { - n = rb_first(&pd->bio_queue); - if (n) - first_node = rb_entry(n, struct pkt_rb_node, rb_node); - } - node = first_node; - while (node) { - bio = node->bio; - zone = get_zone(bio->bi_iter.bi_sector, pd); - list_for_each_entry(p, &pd->cdrw.pkt_active_list, list) { - if (p->sector == zone) { - bio = NULL; - goto try_next_bio; - } - } - break; -try_next_bio: - node = pkt_rbtree_next(node); - if (!node) { - n = rb_first(&pd->bio_queue); - if (n) - node = rb_entry(n, struct pkt_rb_node, rb_node); - } - if (node == first_node) - node = NULL; - } - spin_unlock(&pd->lock); - if (!bio) { - dev_dbg(ddev, "no bio\n"); - return 0; - } - - pkt = pkt_get_packet_data(pd, zone); - - pd->current_sector = zone + pd->settings.size; - pkt->sector = zone; - BUG_ON(pkt->frames != pd->settings.size >> 2); - pkt->write_size = 0; - - /* - * Scan work queue for bios in the same zone and link them - * to this packet. - */ - spin_lock(&pd->lock); - dev_dbg(ddev, "looking for zone %llx\n", zone); - while ((node = pkt_rbtree_find(pd, zone)) != NULL) { - sector_t tmp = get_zone(node->bio->bi_iter.bi_sector, pd); - - bio = node->bio; - dev_dbg(ddev, "found zone=%llx\n", tmp); - if (tmp != zone) - break; - pkt_rbtree_erase(pd, node); - spin_lock(&pkt->lock); - bio_list_add(&pkt->orig_bios, bio); - pkt->write_size += bio->bi_iter.bi_size / CD_FRAMESIZE; - spin_unlock(&pkt->lock); - } - /* check write congestion marks, and if bio_queue_size is - * below, wake up any waiters - */ - if (pd->congested && - pd->bio_queue_size <= pd->write_congestion_off) { - pd->congested = false; - wake_up_var(&pd->congested); - } - spin_unlock(&pd->lock); - - pkt->sleep_time = max(PACKET_WAIT_TIME, 1); - pkt_set_state(ddev, pkt, PACKET_WAITING_STATE); - atomic_set(&pkt->run_sm, 1); - - spin_lock(&pd->cdrw.active_list_lock); - list_add(&pkt->list, &pd->cdrw.pkt_active_list); - spin_unlock(&pd->cdrw.active_list_lock); - - return 1; -} - -/** - * bio_list_copy_data - copy contents of data buffers from one chain of bios to - * another - * @src: source bio list - * @dst: destination bio list - * - * Stops when it reaches the end of either the @src list or @dst list - that is, - * copies min(src->bi_size, dst->bi_size) bytes (or the equivalent for lists of - * bios). - */ -static void bio_list_copy_data(struct bio *dst, struct bio *src) -{ - struct bvec_iter src_iter = src->bi_iter; - struct bvec_iter dst_iter = dst->bi_iter; - - while (1) { - if (!src_iter.bi_size) { - src = src->bi_next; - if (!src) - break; - - src_iter = src->bi_iter; - } - - if (!dst_iter.bi_size) { - dst = dst->bi_next; - if (!dst) - break; - - dst_iter = dst->bi_iter; - } - - bio_copy_data_iter(dst, &dst_iter, src, &src_iter); - } -} - -/* - * Assemble a bio to write one packet and queue the bio for processing - * by the underlying block device. - */ -static void pkt_start_write(struct pktcdvd_device *pd, struct packet_data *pkt) -{ - struct device *ddev = disk_to_dev(pd->disk); - int f; - - bio_init(pkt->w_bio, file_bdev(pd->bdev_file), pkt->w_bio->bi_inline_vecs, - pkt->frames, REQ_OP_WRITE); - pkt->w_bio->bi_iter.bi_sector = pkt->sector; - pkt->w_bio->bi_end_io = pkt_end_io_packet_write; - pkt->w_bio->bi_private = pkt; - - /* XXX: locking? */ - for (f = 0; f < pkt->frames; f++) { - struct page *page = pkt->pages[(f * CD_FRAMESIZE) / PAGE_SIZE]; - unsigned offset = (f * CD_FRAMESIZE) % PAGE_SIZE; - - if (!bio_add_page(pkt->w_bio, page, CD_FRAMESIZE, offset)) - BUG(); - } - dev_dbg(ddev, "vcnt=%d\n", pkt->w_bio->bi_vcnt); - - /* - * Fill-in bvec with data from orig_bios. - */ - spin_lock(&pkt->lock); - bio_list_copy_data(pkt->w_bio, pkt->orig_bios.head); - - pkt_set_state(ddev, pkt, PACKET_WRITE_WAIT_STATE); - spin_unlock(&pkt->lock); - - dev_dbg(ddev, "Writing %d frames for zone %llx\n", pkt->write_size, pkt->sector); - - if (test_bit(PACKET_MERGE_SEGS, &pd->flags) || (pkt->write_size < pkt->frames)) - pkt->cache_valid = 1; - else - pkt->cache_valid = 0; - - /* Start the write request */ - atomic_set(&pkt->io_wait, 1); - pkt_queue_bio(pd, pkt->w_bio); -} - -static void pkt_finish_packet(struct packet_data *pkt, blk_status_t status) -{ - struct bio *bio; - - if (status) - pkt->cache_valid = 0; - - /* Finish all bios corresponding to this packet */ - while ((bio = bio_list_pop(&pkt->orig_bios))) { - bio->bi_status = status; - bio_endio(bio); - } -} - -static void pkt_run_state_machine(struct pktcdvd_device *pd, struct packet_data *pkt) -{ - struct device *ddev = disk_to_dev(pd->disk); - - dev_dbg(ddev, "pkt %d\n", pkt->id); - - for (;;) { - switch (pkt->state) { - case PACKET_WAITING_STATE: - if ((pkt->write_size < pkt->frames) && (pkt->sleep_time > 0)) - return; - - pkt->sleep_time = 0; - pkt_gather_data(pd, pkt); - pkt_set_state(ddev, pkt, PACKET_READ_WAIT_STATE); - break; - - case PACKET_READ_WAIT_STATE: - if (atomic_read(&pkt->io_wait) > 0) - return; - - if (atomic_read(&pkt->io_errors) > 0) { - pkt_set_state(ddev, pkt, PACKET_RECOVERY_STATE); - } else { - pkt_start_write(pd, pkt); - } - break; - - case PACKET_WRITE_WAIT_STATE: - if (atomic_read(&pkt->io_wait) > 0) - return; - - if (!pkt->w_bio->bi_status) { - pkt_set_state(ddev, pkt, PACKET_FINISHED_STATE); - } else { - pkt_set_state(ddev, pkt, PACKET_RECOVERY_STATE); - } - break; - - case PACKET_RECOVERY_STATE: - dev_dbg(ddev, "No recovery possible\n"); - pkt_set_state(ddev, pkt, PACKET_FINISHED_STATE); - break; - - case PACKET_FINISHED_STATE: - pkt_finish_packet(pkt, pkt->w_bio->bi_status); - return; - - default: - BUG(); - break; - } - } -} - -static void pkt_handle_packets(struct pktcdvd_device *pd) -{ - struct device *ddev = disk_to_dev(pd->disk); - struct packet_data *pkt, *next; - - /* - * Run state machine for active packets - */ - list_for_each_entry(pkt, &pd->cdrw.pkt_active_list, list) { - if (atomic_read(&pkt->run_sm) > 0) { - atomic_set(&pkt->run_sm, 0); - pkt_run_state_machine(pd, pkt); - } - } - - /* - * Move no longer active packets to the free list - */ - spin_lock(&pd->cdrw.active_list_lock); - list_for_each_entry_safe(pkt, next, &pd->cdrw.pkt_active_list, list) { - if (pkt->state == PACKET_FINISHED_STATE) { - list_del(&pkt->list); - pkt_put_packet_data(pd, pkt); - pkt_set_state(ddev, pkt, PACKET_IDLE_STATE); - atomic_set(&pd->scan_queue, 1); - } - } - spin_unlock(&pd->cdrw.active_list_lock); -} - -/* - * kcdrwd is woken up when writes have been queued for one of our - * registered devices - */ -static int kcdrwd(void *foobar) -{ - struct pktcdvd_device *pd = foobar; - struct device *ddev = disk_to_dev(pd->disk); - struct packet_data *pkt; - int states[PACKET_NUM_STATES]; - long min_sleep_time, residue; - - set_user_nice(current, MIN_NICE); - set_freezable(); - - for (;;) { - DECLARE_WAITQUEUE(wait, current); - - /* - * Wait until there is something to do - */ - add_wait_queue(&pd->wqueue, &wait); - for (;;) { - set_current_state(TASK_INTERRUPTIBLE); - - /* Check if we need to run pkt_handle_queue */ - if (atomic_read(&pd->scan_queue) > 0) - goto work_to_do; - - /* Check if we need to run the state machine for some packet */ - list_for_each_entry(pkt, &pd->cdrw.pkt_active_list, list) { - if (atomic_read(&pkt->run_sm) > 0) - goto work_to_do; - } - - /* Check if we need to process the iosched queues */ - if (atomic_read(&pd->iosched.attention) != 0) - goto work_to_do; - - /* Otherwise, go to sleep */ - pkt_count_states(pd, states); - dev_dbg(ddev, "i:%d ow:%d rw:%d ww:%d rec:%d fin:%d\n", - states[0], states[1], states[2], states[3], states[4], states[5]); - - min_sleep_time = MAX_SCHEDULE_TIMEOUT; - list_for_each_entry(pkt, &pd->cdrw.pkt_active_list, list) { - if (pkt->sleep_time && pkt->sleep_time < min_sleep_time) - min_sleep_time = pkt->sleep_time; - } - - dev_dbg(ddev, "sleeping\n"); - residue = schedule_timeout(min_sleep_time); - dev_dbg(ddev, "wake up\n"); - - /* make swsusp happy with our thread */ - try_to_freeze(); - - list_for_each_entry(pkt, &pd->cdrw.pkt_active_list, list) { - if (!pkt->sleep_time) - continue; - pkt->sleep_time -= min_sleep_time - residue; - if (pkt->sleep_time <= 0) { - pkt->sleep_time = 0; - atomic_inc(&pkt->run_sm); - } - } - - if (kthread_should_stop()) - break; - } -work_to_do: - set_current_state(TASK_RUNNING); - remove_wait_queue(&pd->wqueue, &wait); - - if (kthread_should_stop()) - break; - - /* - * if pkt_handle_queue returns true, we can queue - * another request. - */ - while (pkt_handle_queue(pd)) - ; - - /* - * Handle packet state machine - */ - pkt_handle_packets(pd); - - /* - * Handle iosched queues - */ - pkt_iosched_process_queue(pd); - } - - return 0; -} - -static void pkt_print_settings(struct pktcdvd_device *pd) -{ - dev_info(disk_to_dev(pd->disk), "%s packets, %u blocks, Mode-%c disc\n", - pd->settings.fp ? "Fixed" : "Variable", - pd->settings.size >> 2, - pd->settings.block_mode == 8 ? '1' : '2'); -} - -static int pkt_mode_sense(struct pktcdvd_device *pd, struct packet_command *cgc, int page_code, int page_control) -{ - memset(cgc->cmd, 0, sizeof(cgc->cmd)); - - cgc->cmd[0] = GPCMD_MODE_SENSE_10; - cgc->cmd[2] = page_code | (page_control << 6); - put_unaligned_be16(cgc->buflen, &cgc->cmd[7]); - cgc->data_direction = CGC_DATA_READ; - return pkt_generic_packet(pd, cgc); -} - -static int pkt_mode_select(struct pktcdvd_device *pd, struct packet_command *cgc) -{ - memset(cgc->cmd, 0, sizeof(cgc->cmd)); - memset(cgc->buffer, 0, 2); - cgc->cmd[0] = GPCMD_MODE_SELECT_10; - cgc->cmd[1] = 0x10; /* PF */ - put_unaligned_be16(cgc->buflen, &cgc->cmd[7]); - cgc->data_direction = CGC_DATA_WRITE; - return pkt_generic_packet(pd, cgc); -} - -static int pkt_get_disc_info(struct pktcdvd_device *pd, disc_information *di) -{ - struct packet_command cgc; - int ret; - - /* set up command and get the disc info */ - init_cdrom_command(&cgc, di, sizeof(*di), CGC_DATA_READ); - cgc.cmd[0] = GPCMD_READ_DISC_INFO; - cgc.cmd[8] = cgc.buflen = 2; - cgc.quiet = 1; - - ret = pkt_generic_packet(pd, &cgc); - if (ret) - return ret; - - /* not all drives have the same disc_info length, so requeue - * packet with the length the drive tells us it can supply - */ - cgc.buflen = be16_to_cpu(di->disc_information_length) + - sizeof(di->disc_information_length); - - if (cgc.buflen > sizeof(disc_information)) - cgc.buflen = sizeof(disc_information); - - cgc.cmd[8] = cgc.buflen; - return pkt_generic_packet(pd, &cgc); -} - -static int pkt_get_track_info(struct pktcdvd_device *pd, __u16 track, __u8 type, track_information *ti) -{ - struct packet_command cgc; - int ret; - - init_cdrom_command(&cgc, ti, 8, CGC_DATA_READ); - cgc.cmd[0] = GPCMD_READ_TRACK_RZONE_INFO; - cgc.cmd[1] = type & 3; - put_unaligned_be16(track, &cgc.cmd[4]); - cgc.cmd[8] = 8; - cgc.quiet = 1; - - ret = pkt_generic_packet(pd, &cgc); - if (ret) - return ret; - - cgc.buflen = be16_to_cpu(ti->track_information_length) + - sizeof(ti->track_information_length); - - if (cgc.buflen > sizeof(track_information)) - cgc.buflen = sizeof(track_information); - - cgc.cmd[8] = cgc.buflen; - return pkt_generic_packet(pd, &cgc); -} - -static noinline_for_stack int pkt_get_last_written(struct pktcdvd_device *pd, - long *last_written) -{ - disc_information di; - track_information ti; - __u32 last_track; - int ret; - - ret = pkt_get_disc_info(pd, &di); - if (ret) - return ret; - - last_track = (di.last_track_msb << 8) | di.last_track_lsb; - ret = pkt_get_track_info(pd, last_track, 1, &ti); - if (ret) - return ret; - - /* if this track is blank, try the previous. */ - if (ti.blank) { - last_track--; - ret = pkt_get_track_info(pd, last_track, 1, &ti); - if (ret) - return ret; - } - - /* if last recorded field is valid, return it. */ - if (ti.lra_v) { - *last_written = be32_to_cpu(ti.last_rec_address); - } else { - /* make it up instead */ - *last_written = be32_to_cpu(ti.track_start) + - be32_to_cpu(ti.track_size); - if (ti.free_blocks) - *last_written -= (be32_to_cpu(ti.free_blocks) + 7); - } - return 0; -} - -/* - * write mode select package based on pd->settings - */ -static noinline_for_stack int pkt_set_write_settings(struct pktcdvd_device *pd) -{ - struct device *ddev = disk_to_dev(pd->disk); - struct packet_command cgc; - struct scsi_sense_hdr sshdr; - write_param_page *wp; - char buffer[128]; - int ret, size; - - /* doesn't apply to DVD+RW or DVD-RAM */ - if ((pd->mmc3_profile == 0x1a) || (pd->mmc3_profile == 0x12)) - return 0; - - memset(buffer, 0, sizeof(buffer)); - init_cdrom_command(&cgc, buffer, sizeof(*wp), CGC_DATA_READ); - cgc.sshdr = &sshdr; - ret = pkt_mode_sense(pd, &cgc, GPMODE_WRITE_PARMS_PAGE, 0); - if (ret) { - pkt_dump_sense(pd, &cgc); - return ret; - } - - size = 2 + get_unaligned_be16(&buffer[0]); - pd->mode_offset = get_unaligned_be16(&buffer[6]); - if (size > sizeof(buffer)) - size = sizeof(buffer); - - /* - * now get it all - */ - init_cdrom_command(&cgc, buffer, size, CGC_DATA_READ); - cgc.sshdr = &sshdr; - ret = pkt_mode_sense(pd, &cgc, GPMODE_WRITE_PARMS_PAGE, 0); - if (ret) { - pkt_dump_sense(pd, &cgc); - return ret; - } - - /* - * write page is offset header + block descriptor length - */ - wp = (write_param_page *) &buffer[sizeof(struct mode_page_header) + pd->mode_offset]; - - wp->fp = pd->settings.fp; - wp->track_mode = pd->settings.track_mode; - wp->write_type = pd->settings.write_type; - wp->data_block_type = pd->settings.block_mode; - - wp->multi_session = 0; - -#ifdef PACKET_USE_LS - wp->link_size = 7; - wp->ls_v = 1; -#endif - - if (wp->data_block_type == PACKET_BLOCK_MODE1) { - wp->session_format = 0; - wp->subhdr2 = 0x20; - } else if (wp->data_block_type == PACKET_BLOCK_MODE2) { - wp->session_format = 0x20; - wp->subhdr2 = 8; -#if 0 - wp->mcn[0] = 0x80; - memcpy(&wp->mcn[1], PACKET_MCN, sizeof(wp->mcn) - 1); -#endif - } else { - /* - * paranoia - */ - dev_err(ddev, "write mode wrong %d\n", wp->data_block_type); - return 1; - } - wp->packet_size = cpu_to_be32(pd->settings.size >> 2); - - cgc.buflen = cgc.cmd[8] = size; - ret = pkt_mode_select(pd, &cgc); - if (ret) { - pkt_dump_sense(pd, &cgc); - return ret; - } - - pkt_print_settings(pd); - return 0; -} - -/* - * 1 -- we can write to this track, 0 -- we can't - */ -static int pkt_writable_track(struct pktcdvd_device *pd, track_information *ti) -{ - struct device *ddev = disk_to_dev(pd->disk); - - switch (pd->mmc3_profile) { - case 0x1a: /* DVD+RW */ - case 0x12: /* DVD-RAM */ - /* The track is always writable on DVD+RW/DVD-RAM */ - return 1; - default: - break; - } - - if (!ti->packet || !ti->fp) - return 0; - - /* - * "good" settings as per Mt Fuji. - */ - if (ti->rt == 0 && ti->blank == 0) - return 1; - - if (ti->rt == 0 && ti->blank == 1) - return 1; - - if (ti->rt == 1 && ti->blank == 0) - return 1; - - dev_err(ddev, "bad state %d-%d-%d\n", ti->rt, ti->blank, ti->packet); - return 0; -} - -/* - * 1 -- we can write to this disc, 0 -- we can't - */ -static int pkt_writable_disc(struct pktcdvd_device *pd, disc_information *di) -{ - struct device *ddev = disk_to_dev(pd->disk); - - switch (pd->mmc3_profile) { - case 0x0a: /* CD-RW */ - case 0xffff: /* MMC3 not supported */ - break; - case 0x1a: /* DVD+RW */ - case 0x13: /* DVD-RW */ - case 0x12: /* DVD-RAM */ - return 1; - default: - dev_dbg(ddev, "Wrong disc profile (%x)\n", pd->mmc3_profile); - return 0; - } - - /* - * for disc type 0xff we should probably reserve a new track. - * but i'm not sure, should we leave this to user apps? probably. - */ - if (di->disc_type == 0xff) { - dev_notice(ddev, "unknown disc - no track?\n"); - return 0; - } - - if (di->disc_type != 0x20 && di->disc_type != 0) { - dev_err(ddev, "wrong disc type (%x)\n", di->disc_type); - return 0; - } - - if (di->erasable == 0) { - dev_err(ddev, "disc not erasable\n"); - return 0; - } - - if (di->border_status == PACKET_SESSION_RESERVED) { - dev_err(ddev, "can't write to last track (reserved)\n"); - return 0; - } - - return 1; -} - -static noinline_for_stack int pkt_probe_settings(struct pktcdvd_device *pd) -{ - struct device *ddev = disk_to_dev(pd->disk); - struct packet_command cgc; - unsigned char buf[12]; - disc_information di; - track_information ti; - int ret, track; - - init_cdrom_command(&cgc, buf, sizeof(buf), CGC_DATA_READ); - cgc.cmd[0] = GPCMD_GET_CONFIGURATION; - cgc.cmd[8] = 8; - ret = pkt_generic_packet(pd, &cgc); - pd->mmc3_profile = ret ? 0xffff : get_unaligned_be16(&buf[6]); - - memset(&di, 0, sizeof(disc_information)); - memset(&ti, 0, sizeof(track_information)); - - ret = pkt_get_disc_info(pd, &di); - if (ret) { - dev_err(ddev, "failed get_disc\n"); - return ret; - } - - if (!pkt_writable_disc(pd, &di)) - return -EROFS; - - pd->type = di.erasable ? PACKET_CDRW : PACKET_CDR; - - track = 1; /* (di.last_track_msb << 8) | di.last_track_lsb; */ - ret = pkt_get_track_info(pd, track, 1, &ti); - if (ret) { - dev_err(ddev, "failed get_track\n"); - return ret; - } - - if (!pkt_writable_track(pd, &ti)) { - dev_err(ddev, "can't write to this track\n"); - return -EROFS; - } - - /* - * we keep packet size in 512 byte units, makes it easier to - * deal with request calculations. - */ - pd->settings.size = be32_to_cpu(ti.fixed_packet_size) << 2; - if (pd->settings.size == 0) { - dev_notice(ddev, "detected zero packet size!\n"); - return -ENXIO; - } - if (pd->settings.size > PACKET_MAX_SECTORS) { - dev_err(ddev, "packet size is too big\n"); - return -EROFS; - } - pd->settings.fp = ti.fp; - pd->offset = (be32_to_cpu(ti.track_start) << 2) & (pd->settings.size - 1); - - if (ti.nwa_v) { - pd->nwa = be32_to_cpu(ti.next_writable); - set_bit(PACKET_NWA_VALID, &pd->flags); - } - - /* - * in theory we could use lra on -RW media as well and just zero - * blocks that haven't been written yet, but in practice that - * is just a no-go. we'll use that for -R, naturally. - */ - if (ti.lra_v) { - pd->lra = be32_to_cpu(ti.last_rec_address); - set_bit(PACKET_LRA_VALID, &pd->flags); - } else { - pd->lra = 0xffffffff; - set_bit(PACKET_LRA_VALID, &pd->flags); - } - - /* - * fine for now - */ - pd->settings.link_loss = 7; - pd->settings.write_type = 0; /* packet */ - pd->settings.track_mode = ti.track_mode; - - /* - * mode1 or mode2 disc - */ - switch (ti.data_mode) { - case PACKET_MODE1: - pd->settings.block_mode = PACKET_BLOCK_MODE1; - break; - case PACKET_MODE2: - pd->settings.block_mode = PACKET_BLOCK_MODE2; - break; - default: - dev_err(ddev, "unknown data mode\n"); - return -EROFS; - } - return 0; -} - -/* - * enable/disable write caching on drive - */ -static noinline_for_stack int pkt_write_caching(struct pktcdvd_device *pd) -{ - struct device *ddev = disk_to_dev(pd->disk); - struct packet_command cgc; - struct scsi_sense_hdr sshdr; - unsigned char buf[64]; - bool set = IS_ENABLED(CONFIG_CDROM_PKTCDVD_WCACHE); - int ret; - - init_cdrom_command(&cgc, buf, sizeof(buf), CGC_DATA_READ); - cgc.sshdr = &sshdr; - cgc.buflen = pd->mode_offset + 12; - - /* - * caching mode page might not be there, so quiet this command - */ - cgc.quiet = 1; - - ret = pkt_mode_sense(pd, &cgc, GPMODE_WCACHING_PAGE, 0); - if (ret) - return ret; - - /* - * use drive write caching -- we need deferred error handling to be - * able to successfully recover with this option (drive will return good - * status as soon as the cdb is validated). - */ - buf[pd->mode_offset + 10] |= (set << 2); - - cgc.buflen = cgc.cmd[8] = 2 + get_unaligned_be16(&buf[0]); - ret = pkt_mode_select(pd, &cgc); - if (ret) { - dev_err(ddev, "write caching control failed\n"); - pkt_dump_sense(pd, &cgc); - } else if (!ret && set) - dev_notice(ddev, "enabled write caching\n"); - return ret; -} - -static int pkt_lock_door(struct pktcdvd_device *pd, int lockflag) -{ - struct packet_command cgc; - - init_cdrom_command(&cgc, NULL, 0, CGC_DATA_NONE); - cgc.cmd[0] = GPCMD_PREVENT_ALLOW_MEDIUM_REMOVAL; - cgc.cmd[4] = lockflag ? 1 : 0; - return pkt_generic_packet(pd, &cgc); -} - -/* - * Returns drive maximum write speed - */ -static noinline_for_stack int pkt_get_max_speed(struct pktcdvd_device *pd, - unsigned *write_speed) -{ - struct packet_command cgc; - struct scsi_sense_hdr sshdr; - unsigned char buf[256+18]; - unsigned char *cap_buf; - int ret, offset; - - cap_buf = &buf[sizeof(struct mode_page_header) + pd->mode_offset]; - init_cdrom_command(&cgc, buf, sizeof(buf), CGC_DATA_UNKNOWN); - cgc.sshdr = &sshdr; - - ret = pkt_mode_sense(pd, &cgc, GPMODE_CAPABILITIES_PAGE, 0); - if (ret) { - cgc.buflen = pd->mode_offset + cap_buf[1] + 2 + - sizeof(struct mode_page_header); - ret = pkt_mode_sense(pd, &cgc, GPMODE_CAPABILITIES_PAGE, 0); - if (ret) { - pkt_dump_sense(pd, &cgc); - return ret; - } - } - - offset = 20; /* Obsoleted field, used by older drives */ - if (cap_buf[1] >= 28) - offset = 28; /* Current write speed selected */ - if (cap_buf[1] >= 30) { - /* If the drive reports at least one "Logical Unit Write - * Speed Performance Descriptor Block", use the information - * in the first block. (contains the highest speed) - */ - int num_spdb = get_unaligned_be16(&cap_buf[30]); - if (num_spdb > 0) - offset = 34; - } - - *write_speed = get_unaligned_be16(&cap_buf[offset]); - return 0; -} - -/* These tables from cdrecord - I don't have orange book */ -/* standard speed CD-RW (1-4x) */ -static char clv_to_speed[16] = { - /* 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 */ - 0, 2, 4, 6, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 -}; -/* high speed CD-RW (-10x) */ -static char hs_clv_to_speed[16] = { - /* 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 */ - 0, 2, 4, 6, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 -}; -/* ultra high speed CD-RW */ -static char us_clv_to_speed[16] = { - /* 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 */ - 0, 2, 4, 8, 0, 0,16, 0,24,32,40,48, 0, 0, 0, 0 -}; - -/* - * reads the maximum media speed from ATIP - */ -static noinline_for_stack int pkt_media_speed(struct pktcdvd_device *pd, - unsigned *speed) -{ - struct device *ddev = disk_to_dev(pd->disk); - struct packet_command cgc; - struct scsi_sense_hdr sshdr; - unsigned char buf[64]; - unsigned int size, st, sp; - int ret; - - init_cdrom_command(&cgc, buf, 2, CGC_DATA_READ); - cgc.sshdr = &sshdr; - cgc.cmd[0] = GPCMD_READ_TOC_PMA_ATIP; - cgc.cmd[1] = 2; - cgc.cmd[2] = 4; /* READ ATIP */ - cgc.cmd[8] = 2; - ret = pkt_generic_packet(pd, &cgc); - if (ret) { - pkt_dump_sense(pd, &cgc); - return ret; - } - size = 2 + get_unaligned_be16(&buf[0]); - if (size > sizeof(buf)) - size = sizeof(buf); - - init_cdrom_command(&cgc, buf, size, CGC_DATA_READ); - cgc.sshdr = &sshdr; - cgc.cmd[0] = GPCMD_READ_TOC_PMA_ATIP; - cgc.cmd[1] = 2; - cgc.cmd[2] = 4; - cgc.cmd[8] = size; - ret = pkt_generic_packet(pd, &cgc); - if (ret) { - pkt_dump_sense(pd, &cgc); - return ret; - } - - if (!(buf[6] & 0x40)) { - dev_notice(ddev, "disc type is not CD-RW\n"); - return 1; - } - if (!(buf[6] & 0x4)) { - dev_notice(ddev, "A1 values on media are not valid, maybe not CDRW?\n"); - return 1; - } - - st = (buf[6] >> 3) & 0x7; /* disc sub-type */ - - sp = buf[16] & 0xf; /* max speed from ATIP A1 field */ - - /* Info from cdrecord */ - switch (st) { - case 0: /* standard speed */ - *speed = clv_to_speed[sp]; - break; - case 1: /* high speed */ - *speed = hs_clv_to_speed[sp]; - break; - case 2: /* ultra high speed */ - *speed = us_clv_to_speed[sp]; - break; - default: - dev_notice(ddev, "unknown disc sub-type %d\n", st); - return 1; - } - if (*speed) { - dev_info(ddev, "maximum media speed: %d\n", *speed); - return 0; - } else { - dev_notice(ddev, "unknown speed %d for sub-type %d\n", sp, st); - return 1; - } -} - -static noinline_for_stack int pkt_perform_opc(struct pktcdvd_device *pd) -{ - struct device *ddev = disk_to_dev(pd->disk); - struct packet_command cgc; - struct scsi_sense_hdr sshdr; - int ret; - - dev_dbg(ddev, "Performing OPC\n"); - - init_cdrom_command(&cgc, NULL, 0, CGC_DATA_NONE); - cgc.sshdr = &sshdr; - cgc.timeout = 60*HZ; - cgc.cmd[0] = GPCMD_SEND_OPC; - cgc.cmd[1] = 1; - ret = pkt_generic_packet(pd, &cgc); - if (ret) - pkt_dump_sense(pd, &cgc); - return ret; -} - -static int pkt_open_write(struct pktcdvd_device *pd) -{ - struct device *ddev = disk_to_dev(pd->disk); - int ret; - unsigned int write_speed, media_write_speed, read_speed; - - ret = pkt_probe_settings(pd); - if (ret) { - dev_dbg(ddev, "failed probe\n"); - return ret; - } - - ret = pkt_set_write_settings(pd); - if (ret) { - dev_notice(ddev, "failed saving write settings\n"); - return -EIO; - } - - pkt_write_caching(pd); - - ret = pkt_get_max_speed(pd, &write_speed); - if (ret) - write_speed = 16 * 177; - switch (pd->mmc3_profile) { - case 0x13: /* DVD-RW */ - case 0x1a: /* DVD+RW */ - case 0x12: /* DVD-RAM */ - dev_notice(ddev, "write speed %ukB/s\n", write_speed); - break; - default: - ret = pkt_media_speed(pd, &media_write_speed); - if (ret) - media_write_speed = 16; - write_speed = min(write_speed, media_write_speed * 177); - dev_notice(ddev, "write speed %ux\n", write_speed / 176); - break; - } - read_speed = write_speed; - - ret = pkt_set_speed(pd, write_speed, read_speed); - if (ret) { - dev_notice(ddev, "couldn't set write speed\n"); - return -EIO; - } - pd->write_speed = write_speed; - pd->read_speed = read_speed; - - ret = pkt_perform_opc(pd); - if (ret) - dev_notice(ddev, "Optimum Power Calibration failed\n"); - - return 0; -} - -/* - * called at open time. - */ -static int pkt_open_dev(struct pktcdvd_device *pd, bool write) -{ - struct device *ddev = disk_to_dev(pd->disk); - int ret; - long lba; - struct request_queue *q; - struct file *bdev_file; - - /* - * We need to re-open the cdrom device without O_NONBLOCK to be able - * to read/write from/to it. It is already opened in O_NONBLOCK mode - * so open should not fail. - */ - bdev_file = bdev_file_open_by_dev(file_bdev(pd->bdev_file)->bd_dev, - BLK_OPEN_READ, pd, NULL); - if (IS_ERR(bdev_file)) { - ret = PTR_ERR(bdev_file); - goto out; - } - pd->f_open_bdev = bdev_file; - - ret = pkt_get_last_written(pd, &lba); - if (ret) { - dev_err(ddev, "pkt_get_last_written failed\n"); - goto out_putdev; - } - - set_capacity(pd->disk, lba << 2); - set_capacity_and_notify(file_bdev(pd->bdev_file)->bd_disk, lba << 2); - - q = bdev_get_queue(file_bdev(pd->bdev_file)); - if (write) { - ret = pkt_open_write(pd); - if (ret) - goto out_putdev; - set_bit(PACKET_WRITABLE, &pd->flags); - } else { - pkt_set_speed(pd, MAX_SPEED, MAX_SPEED); - clear_bit(PACKET_WRITABLE, &pd->flags); - } - - ret = pkt_set_segment_merging(pd, q); - if (ret) - goto out_putdev; - - if (write) { - if (!pkt_grow_pktlist(pd, CONFIG_CDROM_PKTCDVD_BUFFERS)) { - dev_err(ddev, "not enough memory for buffers\n"); - ret = -ENOMEM; - goto out_putdev; - } - dev_info(ddev, "%lukB available on disc\n", lba << 1); - } - set_blocksize(bdev_file, CD_FRAMESIZE); - - return 0; - -out_putdev: - fput(bdev_file); -out: - return ret; -} - -/* - * called when the device is closed. makes sure that the device flushes - * the internal cache before we close. - */ -static void pkt_release_dev(struct pktcdvd_device *pd, int flush) -{ - struct device *ddev = disk_to_dev(pd->disk); - - if (flush && pkt_flush_cache(pd)) - dev_notice(ddev, "not flushing cache\n"); - - pkt_lock_door(pd, 0); - - pkt_set_speed(pd, MAX_SPEED, MAX_SPEED); - fput(pd->f_open_bdev); - pd->f_open_bdev = NULL; - - pkt_shrink_pktlist(pd); -} - -static struct pktcdvd_device *pkt_find_dev_from_minor(unsigned int dev_minor) -{ - if (dev_minor >= MAX_WRITERS) - return NULL; - - dev_minor = array_index_nospec(dev_minor, MAX_WRITERS); - return pkt_devs[dev_minor]; -} - -static int pkt_open(struct gendisk *disk, blk_mode_t mode) -{ - struct pktcdvd_device *pd = NULL; - int ret; - - mutex_lock(&pktcdvd_mutex); - mutex_lock(&ctl_mutex); - pd = pkt_find_dev_from_minor(disk->first_minor); - if (!pd) { - ret = -ENODEV; - goto out; - } - BUG_ON(pd->refcnt < 0); - - pd->refcnt++; - if (pd->refcnt > 1) { - if ((mode & BLK_OPEN_WRITE) && - !test_bit(PACKET_WRITABLE, &pd->flags)) { - ret = -EBUSY; - goto out_dec; - } - } else { - ret = pkt_open_dev(pd, mode & BLK_OPEN_WRITE); - if (ret) - goto out_dec; - } - mutex_unlock(&ctl_mutex); - mutex_unlock(&pktcdvd_mutex); - return 0; - -out_dec: - pd->refcnt--; -out: - mutex_unlock(&ctl_mutex); - mutex_unlock(&pktcdvd_mutex); - return ret; -} - -static void pkt_release(struct gendisk *disk) -{ - struct pktcdvd_device *pd = disk->private_data; - - mutex_lock(&pktcdvd_mutex); - mutex_lock(&ctl_mutex); - pd->refcnt--; - BUG_ON(pd->refcnt < 0); - if (pd->refcnt == 0) { - int flush = test_bit(PACKET_WRITABLE, &pd->flags); - pkt_release_dev(pd, flush); - } - mutex_unlock(&ctl_mutex); - mutex_unlock(&pktcdvd_mutex); -} - - -static void pkt_end_io_read_cloned(struct bio *bio) -{ - struct packet_stacked_data *psd = bio->bi_private; - struct pktcdvd_device *pd = psd->pd; - - psd->bio->bi_status = bio->bi_status; - bio_put(bio); - bio_endio(psd->bio); - mempool_free(psd, &psd_pool); - pkt_bio_finished(pd); -} - -static void pkt_make_request_read(struct pktcdvd_device *pd, struct bio *bio) -{ - struct bio *cloned_bio = bio_alloc_clone(file_bdev(pd->bdev_file), bio, - GFP_NOIO, &pkt_bio_set); - struct packet_stacked_data *psd = mempool_alloc(&psd_pool, GFP_NOIO); - - psd->pd = pd; - psd->bio = bio; - cloned_bio->bi_private = psd; - cloned_bio->bi_end_io = pkt_end_io_read_cloned; - pd->stats.secs_r += bio_sectors(bio); - pkt_queue_bio(pd, cloned_bio); -} - -static void pkt_make_request_write(struct bio *bio) -{ - struct pktcdvd_device *pd = bio->bi_bdev->bd_disk->private_data; - sector_t zone; - struct packet_data *pkt; - int was_empty, blocked_bio; - struct pkt_rb_node *node; - - zone = get_zone(bio->bi_iter.bi_sector, pd); - - /* - * If we find a matching packet in state WAITING or READ_WAIT, we can - * just append this bio to that packet. - */ - spin_lock(&pd->cdrw.active_list_lock); - blocked_bio = 0; - list_for_each_entry(pkt, &pd->cdrw.pkt_active_list, list) { - if (pkt->sector == zone) { - spin_lock(&pkt->lock); - if ((pkt->state == PACKET_WAITING_STATE) || - (pkt->state == PACKET_READ_WAIT_STATE)) { - bio_list_add(&pkt->orig_bios, bio); - pkt->write_size += - bio->bi_iter.bi_size / CD_FRAMESIZE; - if ((pkt->write_size >= pkt->frames) && - (pkt->state == PACKET_WAITING_STATE)) { - atomic_inc(&pkt->run_sm); - wake_up(&pd->wqueue); - } - spin_unlock(&pkt->lock); - spin_unlock(&pd->cdrw.active_list_lock); - return; - } else { - blocked_bio = 1; - } - spin_unlock(&pkt->lock); - } - } - spin_unlock(&pd->cdrw.active_list_lock); - - /* - * Test if there is enough room left in the bio work queue - * (queue size >= congestion on mark). - * If not, wait till the work queue size is below the congestion off mark. - */ - spin_lock(&pd->lock); - if (pd->write_congestion_on > 0 - && pd->bio_queue_size >= pd->write_congestion_on) { - struct wait_bit_queue_entry wqe; - - init_wait_var_entry(&wqe, &pd->congested, 0); - for (;;) { - prepare_to_wait_event(__var_waitqueue(&pd->congested), - &wqe.wq_entry, - TASK_UNINTERRUPTIBLE); - if (pd->bio_queue_size <= pd->write_congestion_off) - break; - pd->congested = true; - spin_unlock(&pd->lock); - schedule(); - spin_lock(&pd->lock); - } - } - spin_unlock(&pd->lock); - - /* - * No matching packet found. Store the bio in the work queue. - */ - node = mempool_alloc(&pd->rb_pool, GFP_NOIO); - node->bio = bio; - spin_lock(&pd->lock); - BUG_ON(pd->bio_queue_size < 0); - was_empty = (pd->bio_queue_size == 0); - pkt_rbtree_insert(pd, node); - spin_unlock(&pd->lock); - - /* - * Wake up the worker thread. - */ - atomic_set(&pd->scan_queue, 1); - if (was_empty) { - /* This wake_up is required for correct operation */ - wake_up(&pd->wqueue); - } else if (!list_empty(&pd->cdrw.pkt_free_list) && !blocked_bio) { - /* - * This wake up is not required for correct operation, - * but improves performance in some cases. - */ - wake_up(&pd->wqueue); - } -} - -static void pkt_submit_bio(struct bio *bio) -{ - struct pktcdvd_device *pd = bio->bi_bdev->bd_disk->private_data; - struct device *ddev = disk_to_dev(pd->disk); - struct bio *split; - - bio = bio_split_to_limits(bio); - if (!bio) - return; - - dev_dbg(ddev, "start = %6llx stop = %6llx\n", - bio->bi_iter.bi_sector, bio_end_sector(bio)); - - /* - * Clone READ bios so we can have our own bi_end_io callback. - */ - if (bio_data_dir(bio) == READ) { - pkt_make_request_read(pd, bio); - return; - } - - if (!test_bit(PACKET_WRITABLE, &pd->flags)) { - dev_notice(ddev, "WRITE for ro device (%llu)\n", bio->bi_iter.bi_sector); - goto end_io; - } - - if (!bio->bi_iter.bi_size || (bio->bi_iter.bi_size % CD_FRAMESIZE)) { - dev_err(ddev, "wrong bio size\n"); - goto end_io; - } - - do { - sector_t zone = get_zone(bio->bi_iter.bi_sector, pd); - sector_t last_zone = get_zone(bio_end_sector(bio) - 1, pd); - - if (last_zone != zone) { - BUG_ON(last_zone != zone + pd->settings.size); - - split = bio_split(bio, last_zone - - bio->bi_iter.bi_sector, - GFP_NOIO, &pkt_bio_set); - bio_chain(split, bio); - } else { - split = bio; - } - - pkt_make_request_write(split); - } while (split != bio); - - return; -end_io: - bio_io_error(bio); -} - -static int pkt_new_dev(struct pktcdvd_device *pd, dev_t dev) -{ - struct device *ddev = disk_to_dev(pd->disk); - int i; - struct file *bdev_file; - struct scsi_device *sdev; - - if (pd->pkt_dev == dev) { - dev_err(ddev, "recursive setup not allowed\n"); - return -EBUSY; - } - for (i = 0; i < MAX_WRITERS; i++) { - struct pktcdvd_device *pd2 = pkt_devs[i]; - if (!pd2) - continue; - if (file_bdev(pd2->bdev_file)->bd_dev == dev) { - dev_err(ddev, "%pg already setup\n", - file_bdev(pd2->bdev_file)); - return -EBUSY; - } - if (pd2->pkt_dev == dev) { - dev_err(ddev, "can't chain pktcdvd devices\n"); - return -EBUSY; - } - } - - bdev_file = bdev_file_open_by_dev(dev, BLK_OPEN_READ | BLK_OPEN_NDELAY, - NULL, NULL); - if (IS_ERR(bdev_file)) - return PTR_ERR(bdev_file); - sdev = scsi_device_from_queue(file_bdev(bdev_file)->bd_disk->queue); - if (!sdev) { - fput(bdev_file); - return -EINVAL; - } - put_device(&sdev->sdev_gendev); - - /* This is safe, since we have a reference from open(). */ - __module_get(THIS_MODULE); - - pd->bdev_file = bdev_file; - - atomic_set(&pd->cdrw.pending_bios, 0); - pd->cdrw.thread = kthread_run(kcdrwd, pd, "%s", pd->disk->disk_name); - if (IS_ERR(pd->cdrw.thread)) { - dev_err(ddev, "can't start kernel thread\n"); - goto out_mem; - } - - proc_create_single_data(pd->disk->disk_name, 0, pkt_proc, pkt_seq_show, pd); - dev_notice(ddev, "writer mapped to %pg\n", file_bdev(bdev_file)); - return 0; - -out_mem: - fput(bdev_file); - /* This is safe: open() is still holding a reference. */ - module_put(THIS_MODULE); - return -ENOMEM; -} - -static int pkt_ioctl(struct block_device *bdev, blk_mode_t mode, - unsigned int cmd, unsigned long arg) -{ - struct pktcdvd_device *pd = bdev->bd_disk->private_data; - struct device *ddev = disk_to_dev(pd->disk); - int ret; - - dev_dbg(ddev, "cmd %x, dev %d:%d\n", cmd, MAJOR(bdev->bd_dev), MINOR(bdev->bd_dev)); - - mutex_lock(&pktcdvd_mutex); - switch (cmd) { - case CDROMEJECT: - /* - * The door gets locked when the device is opened, so we - * have to unlock it or else the eject command fails. - */ - if (pd->refcnt == 1) - pkt_lock_door(pd, 0); - fallthrough; - /* - * forward selected CDROM ioctls to CD-ROM, for UDF - */ - case CDROMMULTISESSION: - case CDROMREADTOCENTRY: - case CDROM_LAST_WRITTEN: - case CDROM_SEND_PACKET: - case SCSI_IOCTL_SEND_COMMAND: - if (!bdev->bd_disk->fops->ioctl) - ret = -ENOTTY; - else - ret = bdev->bd_disk->fops->ioctl(bdev, mode, cmd, arg); - break; - default: - dev_dbg(ddev, "Unknown ioctl (%x)\n", cmd); - ret = -ENOTTY; - } - mutex_unlock(&pktcdvd_mutex); - - return ret; -} - -static unsigned int pkt_check_events(struct gendisk *disk, - unsigned int clearing) -{ - struct pktcdvd_device *pd = disk->private_data; - struct gendisk *attached_disk; - - if (!pd) - return 0; - if (!pd->bdev_file) - return 0; - attached_disk = file_bdev(pd->bdev_file)->bd_disk; - if (!attached_disk || !attached_disk->fops->check_events) - return 0; - return attached_disk->fops->check_events(attached_disk, clearing); -} - -static char *pkt_devnode(struct gendisk *disk, umode_t *mode) -{ - return kasprintf(GFP_KERNEL, "pktcdvd/%s", disk->disk_name); -} - -static const struct block_device_operations pktcdvd_ops = { - .owner = THIS_MODULE, - .submit_bio = pkt_submit_bio, - .open = pkt_open, - .release = pkt_release, - .ioctl = pkt_ioctl, - .compat_ioctl = blkdev_compat_ptr_ioctl, - .check_events = pkt_check_events, - .devnode = pkt_devnode, -}; - -/* - * Set up mapping from pktcdvd device to CD-ROM device. - */ -static int pkt_setup_dev(dev_t dev, dev_t* pkt_dev) -{ - struct queue_limits lim = { - .max_hw_sectors = PACKET_MAX_SECTORS, - .logical_block_size = CD_FRAMESIZE, - .features = BLK_FEAT_ROTATIONAL, - }; - int idx; - int ret = -ENOMEM; - struct pktcdvd_device *pd; - struct gendisk *disk; - - mutex_lock_nested(&ctl_mutex, SINGLE_DEPTH_NESTING); - - for (idx = 0; idx < MAX_WRITERS; idx++) - if (!pkt_devs[idx]) - break; - if (idx == MAX_WRITERS) { - pr_err("max %d writers supported\n", MAX_WRITERS); - ret = -EBUSY; - goto out_mutex; - } - - pd = kzalloc(sizeof(struct pktcdvd_device), GFP_KERNEL); - if (!pd) - goto out_mutex; - - ret = mempool_init_kmalloc_pool(&pd->rb_pool, PKT_RB_POOL_SIZE, - sizeof(struct pkt_rb_node)); - if (ret) - goto out_mem; - - INIT_LIST_HEAD(&pd->cdrw.pkt_free_list); - INIT_LIST_HEAD(&pd->cdrw.pkt_active_list); - spin_lock_init(&pd->cdrw.active_list_lock); - - spin_lock_init(&pd->lock); - spin_lock_init(&pd->iosched.lock); - bio_list_init(&pd->iosched.read_queue); - bio_list_init(&pd->iosched.write_queue); - init_waitqueue_head(&pd->wqueue); - pd->bio_queue = RB_ROOT; - - pd->write_congestion_on = write_congestion_on; - pd->write_congestion_off = write_congestion_off; - - disk = blk_alloc_disk(&lim, NUMA_NO_NODE); - if (IS_ERR(disk)) { - ret = PTR_ERR(disk); - goto out_mem; - } - pd->disk = disk; - disk->major = pktdev_major; - disk->first_minor = idx; - disk->minors = 1; - disk->fops = &pktcdvd_ops; - disk->flags = GENHD_FL_REMOVABLE | GENHD_FL_NO_PART; - snprintf(disk->disk_name, sizeof(disk->disk_name), DRIVER_NAME"%d", idx); - disk->private_data = pd; - - pd->pkt_dev = MKDEV(pktdev_major, idx); - ret = pkt_new_dev(pd, dev); - if (ret) - goto out_mem2; - - /* inherit events of the host device */ - disk->events = file_bdev(pd->bdev_file)->bd_disk->events; - - ret = add_disk(disk); - if (ret) - goto out_mem2; - - pkt_sysfs_dev_new(pd); - pkt_debugfs_dev_new(pd); - - pkt_devs[idx] = pd; - if (pkt_dev) - *pkt_dev = pd->pkt_dev; - - mutex_unlock(&ctl_mutex); - return 0; - -out_mem2: - put_disk(disk); -out_mem: - mempool_exit(&pd->rb_pool); - kfree(pd); -out_mutex: - mutex_unlock(&ctl_mutex); - pr_err("setup of pktcdvd device failed\n"); - return ret; -} - -/* - * Tear down mapping from pktcdvd device to CD-ROM device. - */ -static int pkt_remove_dev(dev_t pkt_dev) -{ - struct pktcdvd_device *pd; - struct device *ddev; - int idx; - int ret = 0; - - mutex_lock_nested(&ctl_mutex, SINGLE_DEPTH_NESTING); - - for (idx = 0; idx < MAX_WRITERS; idx++) { - pd = pkt_devs[idx]; - if (pd && (pd->pkt_dev == pkt_dev)) - break; - } - if (idx == MAX_WRITERS) { - pr_debug("dev not setup\n"); - ret = -ENXIO; - goto out; - } - - if (pd->refcnt > 0) { - ret = -EBUSY; - goto out; - } - - ddev = disk_to_dev(pd->disk); - - if (!IS_ERR(pd->cdrw.thread)) - kthread_stop(pd->cdrw.thread); - - pkt_devs[idx] = NULL; - - pkt_debugfs_dev_remove(pd); - pkt_sysfs_dev_remove(pd); - - fput(pd->bdev_file); - - remove_proc_entry(pd->disk->disk_name, pkt_proc); - dev_notice(ddev, "writer unmapped\n"); - - del_gendisk(pd->disk); - put_disk(pd->disk); - - mempool_exit(&pd->rb_pool); - kfree(pd); - - /* This is safe: open() is still holding a reference. */ - module_put(THIS_MODULE); - -out: - mutex_unlock(&ctl_mutex); - return ret; -} - -static void pkt_get_status(struct pkt_ctrl_command *ctrl_cmd) -{ - struct pktcdvd_device *pd; - - mutex_lock_nested(&ctl_mutex, SINGLE_DEPTH_NESTING); - - pd = pkt_find_dev_from_minor(ctrl_cmd->dev_index); - if (pd) { - ctrl_cmd->dev = new_encode_dev(file_bdev(pd->bdev_file)->bd_dev); - ctrl_cmd->pkt_dev = new_encode_dev(pd->pkt_dev); - } else { - ctrl_cmd->dev = 0; - ctrl_cmd->pkt_dev = 0; - } - ctrl_cmd->num_devices = MAX_WRITERS; - - mutex_unlock(&ctl_mutex); -} - -static long pkt_ctl_ioctl(struct file *file, unsigned int cmd, unsigned long arg) -{ - void __user *argp = (void __user *)arg; - struct pkt_ctrl_command ctrl_cmd; - int ret = 0; - dev_t pkt_dev = 0; - - if (cmd != PACKET_CTRL_CMD) - return -ENOTTY; - - if (copy_from_user(&ctrl_cmd, argp, sizeof(struct pkt_ctrl_command))) - return -EFAULT; - - switch (ctrl_cmd.command) { - case PKT_CTRL_CMD_SETUP: - if (!capable(CAP_SYS_ADMIN)) - return -EPERM; - ret = pkt_setup_dev(new_decode_dev(ctrl_cmd.dev), &pkt_dev); - ctrl_cmd.pkt_dev = new_encode_dev(pkt_dev); - break; - case PKT_CTRL_CMD_TEARDOWN: - if (!capable(CAP_SYS_ADMIN)) - return -EPERM; - ret = pkt_remove_dev(new_decode_dev(ctrl_cmd.pkt_dev)); - break; - case PKT_CTRL_CMD_STATUS: - pkt_get_status(&ctrl_cmd); - break; - default: - return -ENOTTY; - } - - if (copy_to_user(argp, &ctrl_cmd, sizeof(struct pkt_ctrl_command))) - return -EFAULT; - return ret; -} - -#ifdef CONFIG_COMPAT -static long pkt_ctl_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg) -{ - return pkt_ctl_ioctl(file, cmd, (unsigned long)compat_ptr(arg)); -} -#endif - -static const struct file_operations pkt_ctl_fops = { - .open = nonseekable_open, - .unlocked_ioctl = pkt_ctl_ioctl, -#ifdef CONFIG_COMPAT - .compat_ioctl = pkt_ctl_compat_ioctl, -#endif - .owner = THIS_MODULE, -}; - -static struct miscdevice pkt_misc = { - .minor = MISC_DYNAMIC_MINOR, - .name = DRIVER_NAME, - .nodename = "pktcdvd/control", - .fops = &pkt_ctl_fops -}; - -static int __init pkt_init(void) -{ - int ret; - - mutex_init(&ctl_mutex); - - ret = mempool_init_kmalloc_pool(&psd_pool, PSD_POOL_SIZE, - sizeof(struct packet_stacked_data)); - if (ret) - return ret; - ret = bioset_init(&pkt_bio_set, BIO_POOL_SIZE, 0, 0); - if (ret) { - mempool_exit(&psd_pool); - return ret; - } - - ret = register_blkdev(pktdev_major, DRIVER_NAME); - if (ret < 0) { - pr_err("unable to register block device\n"); - goto out2; - } - if (!pktdev_major) - pktdev_major = ret; - - ret = pkt_sysfs_init(); - if (ret) - goto out; - - pkt_debugfs_init(); - - ret = misc_register(&pkt_misc); - if (ret) { - pr_err("unable to register misc device\n"); - goto out_misc; - } - - pkt_proc = proc_mkdir("driver/"DRIVER_NAME, NULL); - - return 0; - -out_misc: - pkt_debugfs_cleanup(); - pkt_sysfs_cleanup(); -out: - unregister_blkdev(pktdev_major, DRIVER_NAME); -out2: - mempool_exit(&psd_pool); - bioset_exit(&pkt_bio_set); - return ret; -} - -static void __exit pkt_exit(void) -{ - remove_proc_entry("driver/"DRIVER_NAME, NULL); - misc_deregister(&pkt_misc); - - pkt_debugfs_cleanup(); - pkt_sysfs_cleanup(); - - unregister_blkdev(pktdev_major, DRIVER_NAME); - mempool_exit(&psd_pool); - bioset_exit(&pkt_bio_set); -} - -MODULE_DESCRIPTION("Packet writing layer for CD/DVD drives"); -MODULE_AUTHOR("Jens Axboe <axboe@suse.de>"); -MODULE_LICENSE("GPL"); - -module_init(pkt_init); -module_exit(pkt_exit); diff --git a/drivers/block/sunvdc.c b/drivers/block/sunvdc.c index b5727dea15bd..7af21fe67671 100644 --- a/drivers/block/sunvdc.c +++ b/drivers/block/sunvdc.c @@ -957,8 +957,10 @@ static bool vdc_port_mpgroup_check(struct vio_dev *vdev) dev = device_find_child(vdev->dev.parent, &port_data, vdc_device_probed); - if (dev) + if (dev) { + put_device(dev); return true; + } return false; } diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 9fd284fa76dc..6561d2a561fa 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -48,6 +48,8 @@ #define UBLK_MINORS (1U << MINORBITS) +#define UBLK_INVALID_BUF_IDX ((u16)-1) + /* private ioctl command mirror */ #define UBLK_CMD_DEL_DEV_ASYNC _IOC_NR(UBLK_U_CMD_DEL_DEV_ASYNC) #define UBLK_CMD_UPDATE_SIZE _IOC_NR(UBLK_U_CMD_UPDATE_SIZE) @@ -70,7 +72,8 @@ | UBLK_F_UPDATE_SIZE \ | UBLK_F_AUTO_BUF_REG \ | UBLK_F_QUIESCE \ - | UBLK_F_PER_IO_DAEMON) + | UBLK_F_PER_IO_DAEMON \ + | UBLK_F_BUF_REG_OFF_DAEMON) #define UBLK_F_ALL_RECOVERY_FLAGS (UBLK_F_USER_RECOVERY \ | UBLK_F_USER_RECOVERY_REISSUE \ @@ -82,14 +85,6 @@ UBLK_PARAM_TYPE_DEVT | UBLK_PARAM_TYPE_ZONED | \ UBLK_PARAM_TYPE_DMA_ALIGN | UBLK_PARAM_TYPE_SEGMENT) -struct ublk_rq_data { - refcount_t ref; - - /* for auto-unregister buffer in case of UBLK_F_AUTO_BUF_REG */ - u16 buf_index; - void *buf_ctx_handle; -}; - struct ublk_uring_cmd_pdu { /* * Store requests in same batch temporarily for queuing them to @@ -110,8 +105,6 @@ struct ublk_uring_cmd_pdu { */ struct ublk_queue *ubq; - struct ublk_auto_buf_reg buf; - u16 tag; }; @@ -155,9 +148,19 @@ struct ublk_uring_cmd_pdu { /* atomic RW with ubq->cancel_lock */ #define UBLK_IO_FLAG_CANCELED 0x80000000 +/* + * Initialize refcount to a large number to include any registered buffers. + * UBLK_IO_COMMIT_AND_FETCH_REQ will release these references minus those for + * any buffers registered on the io daemon task. + */ +#define UBLK_REFCOUNT_INIT (REFCOUNT_MAX / 2) + struct ublk_io { /* userspace buffer address from io cmd */ - __u64 addr; + union { + __u64 addr; + struct ublk_auto_buf_reg buf; + }; unsigned int flags; int res; @@ -169,7 +172,24 @@ struct ublk_io { }; struct task_struct *task; -}; + + /* + * The number of uses of this I/O by the ublk server + * if user copy or zero copy are enabled: + * - UBLK_REFCOUNT_INIT from dispatch to the server + * until UBLK_IO_COMMIT_AND_FETCH_REQ + * - 1 for each inflight ublk_ch_{read,write}_iter() call + * - 1 for each io_uring registered buffer not registered on task + * The I/O can only be completed once all references are dropped. + * User copy and buffer registration operations are only permitted + * if the reference count is nonzero. + */ + refcount_t ref; + /* Count of buffers registered on task and not yet unregistered */ + unsigned task_registered_buffers; + + void *buf_ctx_handle; +} ____cacheline_aligned_in_smp; struct ublk_queue { int q_id; @@ -216,6 +236,9 @@ struct ublk_device { struct completion completion; unsigned int nr_queues_ready; unsigned int nr_privileged_daemon; + struct mutex cancel_mutex; + bool canceling; + pid_t ublksrv_tgid; }; /* header of ublk_params */ @@ -228,7 +251,8 @@ static void ublk_io_release(void *priv); static void ublk_stop_dev_unlocked(struct ublk_device *ub); static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq); static inline struct request *__ublk_check_and_get_req(struct ublk_device *ub, - const struct ublk_queue *ubq, int tag, size_t offset); + const struct ublk_queue *ubq, struct ublk_io *io, + size_t offset); static inline unsigned int ublk_req_build_flags(struct request *req); static inline struct ublksrv_io_desc * @@ -673,38 +697,29 @@ static inline bool ublk_need_req_ref(const struct ublk_queue *ubq) } static inline void ublk_init_req_ref(const struct ublk_queue *ubq, - struct request *req) + struct ublk_io *io) { - if (ublk_need_req_ref(ubq)) { - struct ublk_rq_data *data = blk_mq_rq_to_pdu(req); - - refcount_set(&data->ref, 1); - } + if (ublk_need_req_ref(ubq)) + refcount_set(&io->ref, UBLK_REFCOUNT_INIT); } -static inline bool ublk_get_req_ref(const struct ublk_queue *ubq, - struct request *req) +static inline bool ublk_get_req_ref(struct ublk_io *io) { - if (ublk_need_req_ref(ubq)) { - struct ublk_rq_data *data = blk_mq_rq_to_pdu(req); - - return refcount_inc_not_zero(&data->ref); - } + return refcount_inc_not_zero(&io->ref); +} - return true; +static inline void ublk_put_req_ref(struct ublk_io *io, struct request *req) +{ + if (refcount_dec_and_test(&io->ref)) + __ublk_complete_rq(req); } -static inline void ublk_put_req_ref(const struct ublk_queue *ubq, - struct request *req) +static inline bool ublk_sub_req_ref(struct ublk_io *io) { - if (ublk_need_req_ref(ubq)) { - struct ublk_rq_data *data = blk_mq_rq_to_pdu(req); + unsigned sub_refs = UBLK_REFCOUNT_INIT - io->task_registered_buffers; - if (refcount_dec_and_test(&data->ref)) - __ublk_complete_rq(req); - } else { - __ublk_complete_rq(req); - } + io->task_registered_buffers = 0; + return refcount_sub_and_test(sub_refs, &io->ref); } static inline bool ublk_need_get_data(const struct ublk_queue *ubq) @@ -981,7 +996,7 @@ static inline bool ublk_need_unmap_req(const struct request *req) } static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req, - struct ublk_io *io) + const struct ublk_io *io) { const unsigned int rq_bytes = blk_rq_bytes(req); @@ -1005,7 +1020,7 @@ static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req, static int ublk_unmap_io(const struct ublk_queue *ubq, const struct request *req, - struct ublk_io *io) + const struct ublk_io *io) { const unsigned int rq_bytes = blk_rq_bytes(req); @@ -1140,7 +1155,7 @@ static inline void __ublk_complete_rq(struct request *req) if (blk_update_request(req, BLK_STS_OK, io->res)) blk_mq_requeue_request(req, true); - else + else if (likely(!blk_should_fake_timeout(req->q))) __blk_mq_end_request(req, BLK_STS_OK); return; @@ -1188,39 +1203,33 @@ static inline void __ublk_abort_rq(struct ublk_queue *ubq, blk_mq_end_request(rq, BLK_STS_IOERR); } -static void ublk_auto_buf_reg_fallback(struct request *req) +static void +ublk_auto_buf_reg_fallback(const struct ublk_queue *ubq, struct ublk_io *io) { - const struct ublk_queue *ubq = req->mq_hctx->driver_data; - struct ublksrv_io_desc *iod = ublk_get_iod(ubq, req->tag); - struct ublk_rq_data *data = blk_mq_rq_to_pdu(req); + unsigned tag = io - ubq->ios; + struct ublksrv_io_desc *iod = ublk_get_iod(ubq, tag); iod->op_flags |= UBLK_IO_F_NEED_REG_BUF; - refcount_set(&data->ref, 1); } -static bool ublk_auto_buf_reg(struct request *req, struct ublk_io *io, - unsigned int issue_flags) +static bool ublk_auto_buf_reg(const struct ublk_queue *ubq, struct request *req, + struct ublk_io *io, unsigned int issue_flags) { - struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(io->cmd); - struct ublk_rq_data *data = blk_mq_rq_to_pdu(req); int ret; ret = io_buffer_register_bvec(io->cmd, req, ublk_io_release, - pdu->buf.index, issue_flags); + io->buf.index, issue_flags); if (ret) { - if (pdu->buf.flags & UBLK_AUTO_BUF_REG_FALLBACK) { - ublk_auto_buf_reg_fallback(req); + if (io->buf.flags & UBLK_AUTO_BUF_REG_FALLBACK) { + ublk_auto_buf_reg_fallback(ubq, io); return true; } blk_mq_end_request(req, BLK_STS_IOERR); return false; } - /* one extra reference is dropped by ublk_io_release */ - refcount_set(&data->ref, 2); - data->buf_ctx_handle = io_uring_cmd_ctx_handle(io->cmd); - /* store buffer index in request payload */ - data->buf_index = pdu->buf.index; + io->task_registered_buffers = 1; + io->buf_ctx_handle = io_uring_cmd_ctx_handle(io->cmd); io->flags |= UBLK_IO_FLAG_AUTO_BUF_REG; return true; } @@ -1229,10 +1238,10 @@ static bool ublk_prep_auto_buf_reg(struct ublk_queue *ubq, struct request *req, struct ublk_io *io, unsigned int issue_flags) { + ublk_init_req_ref(ubq, io); if (ublk_support_auto_buf_reg(ubq) && ublk_rq_has_data(req)) - return ublk_auto_buf_reg(req, io, issue_flags); + return ublk_auto_buf_reg(ubq, req, io, issue_flags); - ublk_init_req_ref(ubq, req); return true; } @@ -1356,14 +1365,23 @@ static void ublk_queue_cmd_list(struct ublk_io *io, struct rq_list *l) static enum blk_eh_timer_return ublk_timeout(struct request *rq) { struct ublk_queue *ubq = rq->mq_hctx->driver_data; - struct ublk_io *io = &ubq->ios[rq->tag]; + pid_t tgid = ubq->dev->ublksrv_tgid; + struct task_struct *p; + struct pid *pid; - if (ubq->flags & UBLK_F_UNPRIVILEGED_DEV) { - send_sig(SIGKILL, io->task, 0); - return BLK_EH_DONE; - } + if (!(ubq->flags & UBLK_F_UNPRIVILEGED_DEV)) + return BLK_EH_RESET_TIMER; - return BLK_EH_RESET_TIMER; + if (unlikely(!tgid)) + return BLK_EH_RESET_TIMER; + + rcu_read_lock(); + pid = find_vpid(tgid); + p = pid_task(pid, PIDTYPE_PID); + if (p) + send_sig(SIGKILL, p, 0); + rcu_read_unlock(); + return BLK_EH_DONE; } static blk_status_t ublk_prep_req(struct ublk_queue *ubq, struct request *rq, @@ -1504,6 +1522,9 @@ static void ublk_queue_reinit(struct ublk_device *ub, struct ublk_queue *ubq) put_task_struct(io->task); io->task = NULL; } + + WARN_ON_ONCE(refcount_read(&io->ref)); + WARN_ON_ONCE(io->task_registered_buffers); } } @@ -1515,6 +1536,7 @@ static int ublk_ch_open(struct inode *inode, struct file *filp) if (test_and_set_bit(UB_STATE_OPEN, &ub->state)) return -EBUSY; filp->private_data = ub; + ub->ublksrv_tgid = current->tgid; return 0; } @@ -1529,6 +1551,7 @@ static void ublk_reset_ch_dev(struct ublk_device *ub) ub->mm = NULL; ub->nr_queues_ready = 0; ub->nr_privileged_daemon = 0; + ub->ublksrv_tgid = -1; } static struct gendisk *ublk_get_disk(struct ublk_device *ub) @@ -1550,6 +1573,27 @@ static void ublk_put_disk(struct gendisk *disk) put_device(disk_to_dev(disk)); } +/* + * Use this function to ensure that ->canceling is consistently set for + * the device and all queues. Do not set these flags directly. + * + * Caller must ensure that: + * - cancel_mutex is held. This ensures that there is no concurrent + * access to ub->canceling and no concurrent writes to ubq->canceling. + * - there are no concurrent reads of ubq->canceling from the queue_rq + * path. This can be done by quiescing the queue, or through other + * means. + */ +static void ublk_set_canceling(struct ublk_device *ub, bool canceling) + __must_hold(&ub->cancel_mutex) +{ + int i; + + ub->canceling = canceling; + for (i = 0; i < ub->dev_info.nr_hw_queues; i++) + ublk_get_queue(ub, i)->canceling = canceling; +} + static int ublk_ch_release(struct inode *inode, struct file *filp) { struct ublk_device *ub = filp->private_data; @@ -1578,12 +1622,11 @@ static int ublk_ch_release(struct inode *inode, struct file *filp) * All requests may be inflight, so ->canceling may not be set, set * it now. */ - for (i = 0; i < ub->dev_info.nr_hw_queues; i++) { - struct ublk_queue *ubq = ublk_get_queue(ub, i); - - ubq->canceling = true; - ublk_abort_queue(ub, ubq); - } + mutex_lock(&ub->cancel_mutex); + ublk_set_canceling(ub, true); + for (i = 0; i < ub->dev_info.nr_hw_queues; i++) + ublk_abort_queue(ub, ublk_get_queue(ub, i)); + mutex_unlock(&ub->cancel_mutex); blk_mq_kick_requeue_list(disk->queue); /* @@ -1706,23 +1749,17 @@ static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq) } } -/* Must be called when queue is frozen */ -static void ublk_mark_queue_canceling(struct ublk_queue *ubq) +static void ublk_start_cancel(struct ublk_device *ub) { - spin_lock(&ubq->cancel_lock); - if (!ubq->canceling) - ubq->canceling = true; - spin_unlock(&ubq->cancel_lock); -} - -static void ublk_start_cancel(struct ublk_queue *ubq) -{ - struct ublk_device *ub = ubq->dev; struct gendisk *disk = ublk_get_disk(ub); /* Our disk has been dead */ if (!disk) return; + + mutex_lock(&ub->cancel_mutex); + if (ub->canceling) + goto out; /* * Now we are serialized with ublk_queue_rq() * @@ -1731,8 +1768,10 @@ static void ublk_start_cancel(struct ublk_queue *ubq) * touch completed uring_cmd */ blk_mq_quiesce_queue(disk->queue); - ublk_mark_queue_canceling(ubq); + ublk_set_canceling(ub, true); blk_mq_unquiesce_queue(disk->queue); +out: + mutex_unlock(&ub->cancel_mutex); ublk_put_disk(disk); } @@ -1805,8 +1844,7 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd, if (WARN_ON_ONCE(task && task != io->task)) return; - if (!ubq->canceling) - ublk_start_cancel(ubq); + ublk_start_cancel(ubq->dev); WARN_ON_ONCE(io->cmd != cmd); ublk_cancel_cmd(ubq, pdu->tag, issue_flags); @@ -1930,9 +1968,11 @@ static void ublk_reset_io_flags(struct ublk_device *ub) for (j = 0; j < ubq->q_depth; j++) ubq->ios[j].flags &= ~UBLK_IO_FLAG_CANCELED; spin_unlock(&ubq->cancel_lock); - ubq->canceling = false; ubq->fail_io = false; } + mutex_lock(&ub->cancel_mutex); + ublk_set_canceling(ub, false); + mutex_unlock(&ub->cancel_mutex); } /* device can only be started after all IOs are ready */ @@ -1967,12 +2007,66 @@ static inline int ublk_check_cmd_op(u32 cmd_op) return 0; } -static inline void ublk_fill_io_cmd(struct ublk_io *io, - struct io_uring_cmd *cmd, unsigned long buf_addr) +static inline int ublk_set_auto_buf_reg(struct ublk_io *io, struct io_uring_cmd *cmd) +{ + io->buf = ublk_sqe_addr_to_auto_buf_reg(READ_ONCE(cmd->sqe->addr)); + + if (io->buf.reserved0 || io->buf.reserved1) + return -EINVAL; + + if (io->buf.flags & ~UBLK_AUTO_BUF_REG_F_MASK) + return -EINVAL; + return 0; +} + +static int ublk_handle_auto_buf_reg(struct ublk_io *io, + struct io_uring_cmd *cmd, + u16 *buf_idx) { + if (io->flags & UBLK_IO_FLAG_AUTO_BUF_REG) { + io->flags &= ~UBLK_IO_FLAG_AUTO_BUF_REG; + + /* + * `UBLK_F_AUTO_BUF_REG` only works iff `UBLK_IO_FETCH_REQ` + * and `UBLK_IO_COMMIT_AND_FETCH_REQ` are issued from same + * `io_ring_ctx`. + * + * If this uring_cmd's io_ring_ctx isn't same with the + * one for registering the buffer, it is ublk server's + * responsibility for unregistering the buffer, otherwise + * this ublk request gets stuck. + */ + if (io->buf_ctx_handle == io_uring_cmd_ctx_handle(cmd)) + *buf_idx = io->buf.index; + } + + return ublk_set_auto_buf_reg(io, cmd); +} + +/* Once we return, `io->req` can't be used any more */ +static inline struct request * +ublk_fill_io_cmd(struct ublk_io *io, struct io_uring_cmd *cmd) +{ + struct request *req = io->req; + io->cmd = cmd; io->flags |= UBLK_IO_FLAG_ACTIVE; + /* now this cmd slot is owned by ublk driver */ + io->flags &= ~UBLK_IO_FLAG_OWNED_BY_SRV; + + return req; +} + +static inline int +ublk_config_io_buf(const struct ublk_queue *ubq, struct ublk_io *io, + struct io_uring_cmd *cmd, unsigned long buf_addr, + u16 *buf_idx) +{ + if (ublk_support_auto_buf_reg(ubq)) + return ublk_handle_auto_buf_reg(io, cmd, buf_idx); + io->addr = buf_addr; + return 0; } static inline void ublk_prep_cancel(struct io_uring_cmd *cmd, @@ -1990,30 +2084,25 @@ static inline void ublk_prep_cancel(struct io_uring_cmd *cmd, io_uring_cmd_mark_cancelable(cmd, issue_flags); } -static inline int ublk_set_auto_buf_reg(struct io_uring_cmd *cmd) -{ - struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd); - - pdu->buf = ublk_sqe_addr_to_auto_buf_reg(READ_ONCE(cmd->sqe->addr)); - - if (pdu->buf.reserved0 || pdu->buf.reserved1) - return -EINVAL; - - if (pdu->buf.flags & ~UBLK_AUTO_BUF_REG_F_MASK) - return -EINVAL; - return 0; -} - static void ublk_io_release(void *priv) { struct request *rq = priv; struct ublk_queue *ubq = rq->mq_hctx->driver_data; + struct ublk_io *io = &ubq->ios[rq->tag]; - ublk_put_req_ref(ubq, rq); + /* + * task_registered_buffers may be 0 if buffers were registered off task + * but unregistered on task. Or after UBLK_IO_COMMIT_AND_FETCH_REQ. + */ + if (current == io->task && io->task_registered_buffers) + io->task_registered_buffers--; + else + ublk_put_req_ref(io, rq); } static int ublk_register_io_buf(struct io_uring_cmd *cmd, - const struct ublk_queue *ubq, unsigned int tag, + const struct ublk_queue *ubq, + struct ublk_io *io, unsigned int index, unsigned int issue_flags) { struct ublk_device *ub = cmd->file->private_data; @@ -2023,30 +2112,75 @@ static int ublk_register_io_buf(struct io_uring_cmd *cmd, if (!ublk_support_zero_copy(ubq)) return -EINVAL; - req = __ublk_check_and_get_req(ub, ubq, tag, 0); + req = __ublk_check_and_get_req(ub, ubq, io, 0); if (!req) return -EINVAL; ret = io_buffer_register_bvec(cmd, req, ublk_io_release, index, issue_flags); if (ret) { - ublk_put_req_ref(ubq, req); + ublk_put_req_ref(io, req); return ret; } return 0; } +static int +ublk_daemon_register_io_buf(struct io_uring_cmd *cmd, + const struct ublk_queue *ubq, struct ublk_io *io, + unsigned index, unsigned issue_flags) +{ + unsigned new_registered_buffers; + struct request *req = io->req; + int ret; + + /* + * Ensure there are still references for ublk_sub_req_ref() to release. + * If not, fall back on the thread-safe buffer registration. + */ + new_registered_buffers = io->task_registered_buffers + 1; + if (unlikely(new_registered_buffers >= UBLK_REFCOUNT_INIT)) + return ublk_register_io_buf(cmd, ubq, io, index, issue_flags); + + if (!ublk_support_zero_copy(ubq) || !ublk_rq_has_data(req)) + return -EINVAL; + + ret = io_buffer_register_bvec(cmd, req, ublk_io_release, index, + issue_flags); + if (ret) + return ret; + + io->task_registered_buffers = new_registered_buffers; + return 0; +} + static int ublk_unregister_io_buf(struct io_uring_cmd *cmd, - const struct ublk_queue *ubq, + const struct ublk_device *ub, unsigned int index, unsigned int issue_flags) { - if (!ublk_support_zero_copy(ubq)) + if (!(ub->dev_info.flags & UBLK_F_SUPPORT_ZERO_COPY)) return -EINVAL; return io_buffer_unregister_bvec(cmd, index, issue_flags); } +static int ublk_check_fetch_buf(const struct ublk_queue *ubq, __u64 buf_addr) +{ + if (ublk_need_map_io(ubq)) { + /* + * FETCH_RQ has to provide IO buffer if NEED GET + * DATA is not enabled + */ + if (!buf_addr && !ublk_need_get_data(ubq)) + return -EINVAL; + } else if (buf_addr) { + /* User copy requires addr to be unset */ + return -EINVAL; + } + return 0; +} + static int ublk_fetch(struct io_uring_cmd *cmd, struct ublk_queue *ubq, struct ublk_io *io, __u64 buf_addr) { @@ -2073,26 +2207,11 @@ static int ublk_fetch(struct io_uring_cmd *cmd, struct ublk_queue *ubq, WARN_ON_ONCE(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV); - if (ublk_need_map_io(ubq)) { - /* - * FETCH_RQ has to provide IO buffer if NEED GET - * DATA is not enabled - */ - if (!buf_addr && !ublk_need_get_data(ubq)) - goto out; - } else if (buf_addr) { - /* User copy requires addr to be unset */ - ret = -EINVAL; + ublk_fill_io_cmd(io, cmd); + ret = ublk_config_io_buf(ubq, io, cmd, buf_addr, NULL); + if (ret) goto out; - } - if (ublk_support_auto_buf_reg(ubq)) { - ret = ublk_set_auto_buf_reg(cmd); - if (ret) - goto out; - } - - ublk_fill_io_cmd(io, cmd, buf_addr); WRITE_ONCE(io->task, get_task_struct(current)); ublk_mark_io_ready(ub, ubq); out: @@ -2100,10 +2219,8 @@ out: return ret; } -static int ublk_commit_and_fetch(const struct ublk_queue *ubq, - struct ublk_io *io, struct io_uring_cmd *cmd, - const struct ublksrv_io_cmd *ub_cmd, - unsigned int issue_flags) +static int ublk_check_commit_and_fetch(const struct ublk_queue *ubq, + struct ublk_io *io, __u64 buf_addr) { struct request *req = io->req; @@ -2112,10 +2229,10 @@ static int ublk_commit_and_fetch(const struct ublk_queue *ubq, * COMMIT_AND_FETCH_REQ has to provide IO buffer if * NEED GET DATA is not enabled or it is Read IO. */ - if (!ub_cmd->addr && (!ublk_need_get_data(ubq) || + if (!buf_addr && (!ublk_need_get_data(ubq) || req_op(req) == REQ_OP_READ)) return -EINVAL; - } else if (req_op(req) != REQ_OP_ZONE_APPEND && ub_cmd->addr) { + } else if (req_op(req) != REQ_OP_ZONE_APPEND && buf_addr) { /* * User copy requires addr to be unset when command is * not zone append @@ -2123,48 +2240,17 @@ static int ublk_commit_and_fetch(const struct ublk_queue *ubq, return -EINVAL; } - if (ublk_support_auto_buf_reg(ubq)) { - int ret; - - /* - * `UBLK_F_AUTO_BUF_REG` only works iff `UBLK_IO_FETCH_REQ` - * and `UBLK_IO_COMMIT_AND_FETCH_REQ` are issued from same - * `io_ring_ctx`. - * - * If this uring_cmd's io_ring_ctx isn't same with the - * one for registering the buffer, it is ublk server's - * responsibility for unregistering the buffer, otherwise - * this ublk request gets stuck. - */ - if (io->flags & UBLK_IO_FLAG_AUTO_BUF_REG) { - struct ublk_rq_data *data = blk_mq_rq_to_pdu(req); - - if (data->buf_ctx_handle == io_uring_cmd_ctx_handle(cmd)) - io_buffer_unregister_bvec(cmd, data->buf_index, - issue_flags); - io->flags &= ~UBLK_IO_FLAG_AUTO_BUF_REG; - } - - ret = ublk_set_auto_buf_reg(cmd); - if (ret) - return ret; - } - - ublk_fill_io_cmd(io, cmd, ub_cmd->addr); - - /* now this cmd slot is owned by ublk driver */ - io->flags &= ~UBLK_IO_FLAG_OWNED_BY_SRV; - io->res = ub_cmd->result; - - if (req_op(req) == REQ_OP_ZONE_APPEND) - req->__sector = ub_cmd->zone_append_lba; - - if (likely(!blk_should_fake_timeout(req->q))) - ublk_put_req_ref(ubq, req); - return 0; } +static bool ublk_need_complete_req(const struct ublk_queue *ubq, + struct ublk_io *io) +{ + if (ublk_need_req_ref(ubq)) + return ublk_sub_req_ref(io); + return true; +} + static bool ublk_get_data(const struct ublk_queue *ubq, struct ublk_io *io, struct request *req) { @@ -2187,19 +2273,33 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags, const struct ublksrv_io_cmd *ub_cmd) { + u16 buf_idx = UBLK_INVALID_BUF_IDX; struct ublk_device *ub = cmd->file->private_data; - struct task_struct *task; struct ublk_queue *ubq; struct ublk_io *io; u32 cmd_op = cmd->cmd_op; unsigned tag = ub_cmd->tag; - int ret = -EINVAL; struct request *req; + int ret; + bool compl; pr_devel("%s: received: cmd op %d queue %d tag %d result %d\n", __func__, cmd->cmd_op, ub_cmd->q_id, tag, ub_cmd->result); + ret = ublk_check_cmd_op(cmd_op); + if (ret) + goto out; + + /* + * io_buffer_unregister_bvec() doesn't access the ubq or io, + * so no need to validate the q_id, tag, or task + */ + if (_IOC_NR(cmd_op) == UBLK_IO_UNREGISTER_IO_BUF) + return ublk_unregister_io_buf(cmd, ub, ub_cmd->addr, + issue_flags); + + ret = -EINVAL; if (ub_cmd->q_id >= ub->dev_info.nr_hw_queues) goto out; @@ -2209,21 +2309,37 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd, goto out; io = &ubq->ios[tag]; - task = READ_ONCE(io->task); - if (task && task != current) + /* UBLK_IO_FETCH_REQ can be handled on any task, which sets io->task */ + if (unlikely(_IOC_NR(cmd_op) == UBLK_IO_FETCH_REQ)) { + ret = ublk_check_fetch_buf(ubq, ub_cmd->addr); + if (ret) + goto out; + ret = ublk_fetch(cmd, ubq, io, ub_cmd->addr); + if (ret) + goto out; + + ublk_prep_cancel(cmd, issue_flags, ubq, tag); + return -EIOCBQUEUED; + } + + if (READ_ONCE(io->task) != current) { + /* + * ublk_register_io_buf() accesses only the io's refcount, + * so can be handled on any task + */ + if (_IOC_NR(cmd_op) == UBLK_IO_REGISTER_IO_BUF) + return ublk_register_io_buf(cmd, ubq, io, ub_cmd->addr, + issue_flags); + goto out; + } /* there is pending io cmd, something must be wrong */ - if (io->flags & UBLK_IO_FLAG_ACTIVE) { + if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV)) { ret = -EBUSY; goto out; } - /* only UBLK_IO_FETCH_REQ is allowed if io is not OWNED_BY_SRV */ - if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV) && - _IOC_NR(cmd_op) != UBLK_IO_FETCH_REQ) - goto out; - /* * ensure that the user issues UBLK_IO_NEED_GET_DATA * iff the driver have set the UBLK_IO_FLAG_NEED_GET_DATA. @@ -2232,23 +2348,27 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd, ^ (_IOC_NR(cmd_op) == UBLK_IO_NEED_GET_DATA)) goto out; - ret = ublk_check_cmd_op(cmd_op); - if (ret) - goto out; - - ret = -EINVAL; switch (_IOC_NR(cmd_op)) { case UBLK_IO_REGISTER_IO_BUF: - return ublk_register_io_buf(cmd, ubq, tag, ub_cmd->addr, issue_flags); - case UBLK_IO_UNREGISTER_IO_BUF: - return ublk_unregister_io_buf(cmd, ubq, ub_cmd->addr, issue_flags); - case UBLK_IO_FETCH_REQ: - ret = ublk_fetch(cmd, ubq, io, ub_cmd->addr); + return ublk_daemon_register_io_buf(cmd, ubq, io, ub_cmd->addr, + issue_flags); + case UBLK_IO_COMMIT_AND_FETCH_REQ: + ret = ublk_check_commit_and_fetch(ubq, io, ub_cmd->addr); if (ret) goto out; - break; - case UBLK_IO_COMMIT_AND_FETCH_REQ: - ret = ublk_commit_and_fetch(ubq, io, cmd, ub_cmd, issue_flags); + io->res = ub_cmd->result; + req = ublk_fill_io_cmd(io, cmd); + ret = ublk_config_io_buf(ubq, io, cmd, ub_cmd->addr, &buf_idx); + compl = ublk_need_complete_req(ubq, io); + + /* can't touch 'ublk_io' any more */ + if (buf_idx != UBLK_INVALID_BUF_IDX) + io_buffer_unregister_bvec(cmd, buf_idx, issue_flags); + if (req_op(req) == REQ_OP_ZONE_APPEND) + req->__sector = ub_cmd->zone_append_lba; + if (compl) + __ublk_complete_rq(req); + if (ret) goto out; break; @@ -2258,9 +2378,9 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd, * uring_cmd active first and prepare for handling new requeued * request */ - req = io->req; - ublk_fill_io_cmd(io, cmd, ub_cmd->addr); - io->flags &= ~UBLK_IO_FLAG_OWNED_BY_SRV; + req = ublk_fill_io_cmd(io, cmd); + ret = ublk_config_io_buf(ubq, io, cmd, ub_cmd->addr, NULL); + WARN_ON_ONCE(ret); if (likely(ublk_get_data(ubq, io, req))) { __ublk_prep_compl_io_cmd(io, req); return UBLK_IO_RES_OK; @@ -2279,15 +2399,20 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd, } static inline struct request *__ublk_check_and_get_req(struct ublk_device *ub, - const struct ublk_queue *ubq, int tag, size_t offset) + const struct ublk_queue *ubq, struct ublk_io *io, size_t offset) { + unsigned tag = io - ubq->ios; struct request *req; + /* + * can't use io->req in case of concurrent UBLK_IO_COMMIT_AND_FETCH_REQ, + * which would overwrite it with io->cmd + */ req = blk_mq_tag_to_rq(ub->tag_set.tags[ubq->q_id], tag); if (!req) return NULL; - if (!ublk_get_req_ref(ubq, req)) + if (!ublk_get_req_ref(io)) return NULL; if (unlikely(!blk_mq_request_started(req) || req->tag != tag)) @@ -2301,7 +2426,7 @@ static inline struct request *__ublk_check_and_get_req(struct ublk_device *ub, return req; fail_put: - ublk_put_req_ref(ubq, req); + ublk_put_req_ref(io, req); return NULL; } @@ -2368,7 +2493,8 @@ static inline bool ublk_check_ubuf_dir(const struct request *req, } static struct request *ublk_check_and_get_req(struct kiocb *iocb, - struct iov_iter *iter, size_t *off, int dir) + struct iov_iter *iter, size_t *off, int dir, + struct ublk_io **io) { struct ublk_device *ub = iocb->ki_filp->private_data; struct ublk_queue *ubq; @@ -2402,7 +2528,8 @@ static struct request *ublk_check_and_get_req(struct kiocb *iocb, if (tag >= ubq->q_depth) return ERR_PTR(-EINVAL); - req = __ublk_check_and_get_req(ub, ubq, tag, buf_off); + *io = &ubq->ios[tag]; + req = __ublk_check_and_get_req(ub, ubq, *io, buf_off); if (!req) return ERR_PTR(-EINVAL); @@ -2415,42 +2542,40 @@ static struct request *ublk_check_and_get_req(struct kiocb *iocb, *off = buf_off; return req; fail: - ublk_put_req_ref(ubq, req); + ublk_put_req_ref(*io, req); return ERR_PTR(-EACCES); } static ssize_t ublk_ch_read_iter(struct kiocb *iocb, struct iov_iter *to) { - struct ublk_queue *ubq; struct request *req; + struct ublk_io *io; size_t buf_off; size_t ret; - req = ublk_check_and_get_req(iocb, to, &buf_off, ITER_DEST); + req = ublk_check_and_get_req(iocb, to, &buf_off, ITER_DEST, &io); if (IS_ERR(req)) return PTR_ERR(req); ret = ublk_copy_user_pages(req, buf_off, to, ITER_DEST); - ubq = req->mq_hctx->driver_data; - ublk_put_req_ref(ubq, req); + ublk_put_req_ref(io, req); return ret; } static ssize_t ublk_ch_write_iter(struct kiocb *iocb, struct iov_iter *from) { - struct ublk_queue *ubq; struct request *req; + struct ublk_io *io; size_t buf_off; size_t ret; - req = ublk_check_and_get_req(iocb, from, &buf_off, ITER_SOURCE); + req = ublk_check_and_get_req(iocb, from, &buf_off, ITER_SOURCE, &io); if (IS_ERR(req)) return PTR_ERR(req); ret = ublk_copy_user_pages(req, buf_off, from, ITER_SOURCE); - ubq = req->mq_hctx->driver_data; - ublk_put_req_ref(ubq, req); + ublk_put_req_ref(io, req); return ret; } @@ -2475,6 +2600,8 @@ static void ublk_deinit_queue(struct ublk_device *ub, int q_id) struct ublk_io *io = &ubq->ios[i]; if (io->task) put_task_struct(io->task); + WARN_ON_ONCE(refcount_read(&io->ref)); + WARN_ON_ONCE(io->task_registered_buffers); } if (ubq->io_cmd_buf) @@ -2513,7 +2640,7 @@ static void ublk_deinit_queues(struct ublk_device *ub) for (i = 0; i < nr_queues; i++) ublk_deinit_queue(ub, i); - kfree(ub->__queues); + kvfree(ub->__queues); } static int ublk_init_queues(struct ublk_device *ub) @@ -2524,7 +2651,7 @@ static int ublk_init_queues(struct ublk_device *ub) int i, ret = -ENOMEM; ub->queue_size = ubq_size; - ub->__queues = kcalloc(nr_queues, ubq_size, GFP_KERNEL); + ub->__queues = kvcalloc(nr_queues, ubq_size, GFP_KERNEL); if (!ub->__queues) return ret; @@ -2580,6 +2707,7 @@ static void ublk_cdev_rel(struct device *dev) ublk_deinit_queues(ub); ublk_free_dev_number(ub); mutex_destroy(&ub->mutex); + mutex_destroy(&ub->cancel_mutex); kfree(ub); } @@ -2627,7 +2755,6 @@ static int ublk_add_tag_set(struct ublk_device *ub) ub->tag_set.nr_hw_queues = ub->dev_info.nr_hw_queues; ub->tag_set.queue_depth = ub->dev_info.queue_depth; ub->tag_set.numa_node = NUMA_NO_NODE; - ub->tag_set.cmd_size = sizeof(struct ublk_rq_data); ub->tag_set.driver_data = ub; return blk_mq_alloc_tag_set(&ub->tag_set); } @@ -2729,6 +2856,9 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub, if (wait_for_completion_interruptible(&ub->completion) != 0) return -EINTR; + if (ub->ublksrv_tgid != ublksrv_pid) + return -EINVAL; + mutex_lock(&ub->mutex); if (ub->dev_info.state == UBLK_S_DEV_LIVE || test_bit(UB_STATE_USED, &ub->state)) { @@ -2933,6 +3063,7 @@ static int ublk_ctrl_add_dev(const struct ublksrv_ctrl_cmd *header) goto out_unlock; mutex_init(&ub->mutex); spin_lock_init(&ub->lock); + mutex_init(&ub->cancel_mutex); ret = ublk_alloc_dev_number(ub, header->dev_id); if (ret < 0) @@ -2953,7 +3084,8 @@ static int ublk_ctrl_add_dev(const struct ublksrv_ctrl_cmd *header) ub->dev_info.flags |= UBLK_F_CMD_IOCTL_ENCODE | UBLK_F_URING_CMD_COMP_IN_TASK | - UBLK_F_PER_IO_DAEMON; + UBLK_F_PER_IO_DAEMON | + UBLK_F_BUF_REG_OFF_DAEMON; /* GET_DATA isn't needed any more with USER_COPY or ZERO COPY */ if (ub->dev_info.flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY | @@ -3003,6 +3135,7 @@ out_free_dev_number: ublk_free_dev_number(ub); out_free_ub: mutex_destroy(&ub->mutex); + mutex_destroy(&ub->cancel_mutex); kfree(ub); out_unlock: mutex_unlock(&ublk_ctl_mutex); @@ -3227,6 +3360,9 @@ static int ublk_ctrl_end_recovery(struct ublk_device *ub, pr_devel("%s: All FETCH_REQs received, dev id %d\n", __func__, header->dev_id); + if (ub->ublksrv_tgid != ublksrv_pid) + return -EINVAL; + mutex_lock(&ub->mutex); if (ublk_nosrv_should_stop_dev(ub)) goto out_unlock; @@ -3340,7 +3476,7 @@ static int ublk_ctrl_quiesce_dev(struct ublk_device *ub, /* zero means wait forever */ u64 timeout_ms = header->data[0]; struct gendisk *disk; - int i, ret = -ENODEV; + int ret = -ENODEV; if (!(ub->dev_info.flags & UBLK_F_QUIESCE)) return -EOPNOTSUPP; @@ -3357,14 +3493,12 @@ static int ublk_ctrl_quiesce_dev(struct ublk_device *ub, if (ub->dev_info.state != UBLK_S_DEV_LIVE) goto put_disk; - /* Mark all queues as canceling */ + /* Mark the device as canceling */ + mutex_lock(&ub->cancel_mutex); blk_mq_quiesce_queue(disk->queue); - for (i = 0; i < ub->dev_info.nr_hw_queues; i++) { - struct ublk_queue *ubq = ublk_get_queue(ub, i); - - ubq->canceling = true; - } + ublk_set_canceling(ub, true); blk_mq_unquiesce_queue(disk->queue); + mutex_unlock(&ub->cancel_mutex); if (!timeout_ms) timeout_ms = UINT_MAX; diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 30bca8cb7106..e649fa67bac1 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -976,9 +976,8 @@ static int init_vq(struct virtio_blk *vblk) return -EINVAL; } - num_vqs = min_t(unsigned int, - min_not_zero(num_request_queues, nr_cpu_ids), - num_vqs); + num_vqs = blk_mq_num_possible_queues( + min_not_zero(num_request_queues, num_vqs)); num_poll_vqs = min_t(unsigned int, poll_queues, num_vqs - 1); diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c index d26a58c67e95..b1bd1daa0060 100644 --- a/drivers/block/zram/zcomp.c +++ b/drivers/block/zram/zcomp.c @@ -8,6 +8,7 @@ #include <linux/sched.h> #include <linux/cpuhotplug.h> #include <linux/vmalloc.h> +#include <linux/sysfs.h> #include "zcomp.h" @@ -89,23 +90,21 @@ bool zcomp_available_algorithm(const char *comp) } /* show available compressors */ -ssize_t zcomp_available_show(const char *comp, char *buf) +ssize_t zcomp_available_show(const char *comp, char *buf, ssize_t at) { - ssize_t sz = 0; int i; for (i = 0; i < ARRAY_SIZE(backends) - 1; i++) { if (!strcmp(comp, backends[i]->name)) { - sz += scnprintf(buf + sz, PAGE_SIZE - sz - 2, - "[%s] ", backends[i]->name); + at += sysfs_emit_at(buf, at, "[%s] ", + backends[i]->name); } else { - sz += scnprintf(buf + sz, PAGE_SIZE - sz - 2, - "%s ", backends[i]->name); + at += sysfs_emit_at(buf, at, "%s ", backends[i]->name); } } - sz += scnprintf(buf + sz, PAGE_SIZE - sz, "\n"); - return sz; + at += sysfs_emit_at(buf, at, "\n"); + return at; } struct zcomp_strm *zcomp_stream_get(struct zcomp *comp) diff --git a/drivers/block/zram/zcomp.h b/drivers/block/zram/zcomp.h index 4acffe671a5e..eacfd3f7d61d 100644 --- a/drivers/block/zram/zcomp.h +++ b/drivers/block/zram/zcomp.h @@ -79,7 +79,7 @@ struct zcomp { int zcomp_cpu_up_prepare(unsigned int cpu, struct hlist_node *node); int zcomp_cpu_dead(unsigned int cpu, struct hlist_node *node); -ssize_t zcomp_available_show(const char *comp, char *buf); +ssize_t zcomp_available_show(const char *comp, char *buf, ssize_t at); bool zcomp_available_algorithm(const char *comp); struct zcomp *zcomp_create(const char *alg, struct zcomp_params *params); diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 54c57103715f..8acad3cc6e6e 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -373,7 +373,7 @@ static ssize_t initstate_show(struct device *dev, val = init_done(zram); up_read(&zram->init_lock); - return scnprintf(buf, PAGE_SIZE, "%u\n", val); + return sysfs_emit(buf, "%u\n", val); } static ssize_t disksize_show(struct device *dev, @@ -381,7 +381,7 @@ static ssize_t disksize_show(struct device *dev, { struct zram *zram = dev_to_zram(dev); - return scnprintf(buf, PAGE_SIZE, "%llu\n", zram->disksize); + return sysfs_emit(buf, "%llu\n", zram->disksize); } static ssize_t mem_limit_store(struct device *dev, @@ -532,7 +532,7 @@ static ssize_t writeback_limit_enable_show(struct device *dev, spin_unlock(&zram->wb_limit_lock); up_read(&zram->init_lock); - return scnprintf(buf, PAGE_SIZE, "%d\n", val); + return sysfs_emit(buf, "%d\n", val); } static ssize_t writeback_limit_store(struct device *dev, @@ -567,7 +567,7 @@ static ssize_t writeback_limit_show(struct device *dev, spin_unlock(&zram->wb_limit_lock); up_read(&zram->init_lock); - return scnprintf(buf, PAGE_SIZE, "%llu\n", val); + return sysfs_emit(buf, "%llu\n", val); } static void reset_bdev(struct zram *zram) @@ -1225,12 +1225,13 @@ static void comp_algorithm_set(struct zram *zram, u32 prio, const char *alg) zram->comp_algs[prio] = alg; } -static ssize_t __comp_algorithm_show(struct zram *zram, u32 prio, char *buf) +static ssize_t __comp_algorithm_show(struct zram *zram, u32 prio, + char *buf, ssize_t at) { ssize_t sz; down_read(&zram->init_lock); - sz = zcomp_available_show(zram->comp_algs[prio], buf); + sz = zcomp_available_show(zram->comp_algs[prio], buf, at); up_read(&zram->init_lock); return sz; @@ -1387,7 +1388,7 @@ static ssize_t comp_algorithm_show(struct device *dev, { struct zram *zram = dev_to_zram(dev); - return __comp_algorithm_show(zram, ZRAM_PRIMARY_COMP, buf); + return __comp_algorithm_show(zram, ZRAM_PRIMARY_COMP, buf, 0); } static ssize_t comp_algorithm_store(struct device *dev, @@ -1415,8 +1416,8 @@ static ssize_t recomp_algorithm_show(struct device *dev, if (!zram->comp_algs[prio]) continue; - sz += scnprintf(buf + sz, PAGE_SIZE - sz - 2, "#%d: ", prio); - sz += __comp_algorithm_show(zram, prio, buf + sz); + sz += sysfs_emit_at(buf, sz, "#%d: ", prio); + sz += __comp_algorithm_show(zram, prio, buf, sz); } return sz; @@ -1488,7 +1489,7 @@ static ssize_t io_stat_show(struct device *dev, ssize_t ret; down_read(&zram->init_lock); - ret = scnprintf(buf, PAGE_SIZE, + ret = sysfs_emit(buf, "%8llu %8llu 0 %8llu\n", (u64)atomic64_read(&zram->stats.failed_reads), (u64)atomic64_read(&zram->stats.failed_writes), @@ -1518,7 +1519,7 @@ static ssize_t mm_stat_show(struct device *dev, orig_size = atomic64_read(&zram->stats.pages_stored); max_used = atomic_long_read(&zram->stats.max_used_pages); - ret = scnprintf(buf, PAGE_SIZE, + ret = sysfs_emit(buf, "%8llu %8llu %8llu %8lu %8ld %8llu %8lu %8llu %8llu\n", orig_size << PAGE_SHIFT, (u64)atomic64_read(&zram->stats.compr_data_size), @@ -1543,8 +1544,8 @@ static ssize_t bd_stat_show(struct device *dev, ssize_t ret; down_read(&zram->init_lock); - ret = scnprintf(buf, PAGE_SIZE, - "%8llu %8llu %8llu\n", + ret = sysfs_emit(buf, + "%8llu %8llu %8llu\n", FOUR_K((u64)atomic64_read(&zram->stats.bd_count)), FOUR_K((u64)atomic64_read(&zram->stats.bd_reads)), FOUR_K((u64)atomic64_read(&zram->stats.bd_writes))); @@ -1562,7 +1563,7 @@ static ssize_t debug_stat_show(struct device *dev, ssize_t ret; down_read(&zram->init_lock); - ret = scnprintf(buf, PAGE_SIZE, + ret = sysfs_emit(buf, "version: %d\n0 %8llu\n", version, (u64)atomic64_read(&zram->stats.miss_free)); @@ -2810,7 +2811,7 @@ static ssize_t hot_add_show(const struct class *class, if (ret < 0) return ret; - return scnprintf(buf, PAGE_SIZE, "%d\n", ret); + return sysfs_emit(buf, "%d\n", ret); } /* This attribute must be set to 0400, so CLASS_ATTR_RO() can not be used */ static struct class_attribute class_attr_hot_add = diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c index 21a10552da61..31ba1f8c1f78 100644 --- a/drivers/cdrom/cdrom.c +++ b/drivers/cdrom/cdrom.c @@ -624,9 +624,6 @@ int register_cdrom(struct gendisk *disk, struct cdrom_device_info *cdi) if (check_media_type == 1) cdi->options |= (int) CDO_CHECK_TYPE; - if (CDROM_CAN(CDC_MRW_W)) - cdi->exit = cdrom_mrw_exit; - if (cdi->ops->read_cdda_bpc) cdi->cdda_method = CDDA_BPC_FULL; else @@ -651,9 +648,6 @@ void unregister_cdrom(struct cdrom_device_info *cdi) list_del(&cdi->list); mutex_unlock(&cdrom_mutex); - if (cdi->exit) - cdi->exit(cdi); - cd_dbg(CD_REG_UNREG, "drive \"/dev/%s\" unregistered\n", cdi->name); } EXPORT_SYMBOL(unregister_cdrom); @@ -1264,6 +1258,8 @@ void cdrom_release(struct cdrom_device_info *cdi) cd_dbg(CD_CLOSE, "Use count for \"/dev/%s\" now zero\n", cdi->name); cdrom_dvd_rw_close_write(cdi); + if (CDROM_CAN(CDC_MRW_W)) + cdrom_mrw_exit(cdi); if ((cdo->capability & CDC_LOCK) && !cdi->keeplocked) { cd_dbg(CD_CLOSE, "Unlocking door!\n"); diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 2ea490b9d370..1492c8552255 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -168,14 +168,14 @@ static const char *read_super(struct cache_sb *sb, struct block_device *bdev, { const char *err; struct cache_sb_disk *s; - struct page *page; + struct folio *folio; unsigned int i; - page = read_cache_page_gfp(bdev->bd_mapping, - SB_OFFSET >> PAGE_SHIFT, GFP_KERNEL); - if (IS_ERR(page)) + folio = mapping_read_folio_gfp(bdev->bd_mapping, + SB_OFFSET >> PAGE_SHIFT, GFP_KERNEL); + if (IS_ERR(folio)) return "IO error"; - s = page_address(page) + offset_in_page(SB_OFFSET); + s = folio_address(folio) + offset_in_folio(folio, SB_OFFSET); sb->offset = le64_to_cpu(s->offset); sb->version = le64_to_cpu(s->version); @@ -272,7 +272,7 @@ static const char *read_super(struct cache_sb *sb, struct block_device *bdev, *res = s; return NULL; err: - put_page(page); + folio_put(folio); return err; } @@ -1366,7 +1366,7 @@ static CLOSURE_CALLBACK(cached_dev_free) mutex_unlock(&bch_register_lock); if (dc->sb_disk) - put_page(virt_to_page(dc->sb_disk)); + folio_put(virt_to_folio(dc->sb_disk)); if (dc->bdev_file) fput(dc->bdev_file); @@ -2216,7 +2216,7 @@ void bch_cache_release(struct kobject *kobj) free_fifo(&ca->free[i]); if (ca->sb_disk) - put_page(virt_to_page(ca->sb_disk)); + folio_put(virt_to_folio(ca->sb_disk)); if (ca->bdev_file) fput(ca->bdev_file); @@ -2593,7 +2593,7 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr, if (!holder) { ret = -ENOMEM; err = "cannot allocate memory"; - goto out_put_sb_page; + goto out_put_sb_folio; } /* Now reopen in exclusive mode with proper holder */ @@ -2667,8 +2667,8 @@ async_done: out_free_holder: kfree(holder); -out_put_sb_page: - put_page(virt_to_page(sb_disk)); +out_put_sb_folio: + folio_put(virt_to_folio(sb_disk)); out_blkdev_put: if (bdev_file) fput(bdev_file); diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c index 79ba1e3a4770..5ef43231fe77 100644 --- a/drivers/md/dm-crypt.c +++ b/drivers/md/dm-crypt.c @@ -253,17 +253,35 @@ MODULE_PARM_DESC(max_read_size, "Maximum size of a read request"); static unsigned int max_write_size = 0; module_param(max_write_size, uint, 0644); MODULE_PARM_DESC(max_write_size, "Maximum size of a write request"); -static unsigned get_max_request_size(struct crypt_config *cc, bool wrt) + +static unsigned get_max_request_sectors(struct dm_target *ti, struct bio *bio) { + struct crypt_config *cc = ti->private; unsigned val, sector_align; - val = !wrt ? READ_ONCE(max_read_size) : READ_ONCE(max_write_size); - if (likely(!val)) - val = !wrt ? DM_CRYPT_DEFAULT_MAX_READ_SIZE : DM_CRYPT_DEFAULT_MAX_WRITE_SIZE; - if (wrt || cc->used_tag_size) { - if (unlikely(val > BIO_MAX_VECS << PAGE_SHIFT)) - val = BIO_MAX_VECS << PAGE_SHIFT; - } - sector_align = max(bdev_logical_block_size(cc->dev->bdev), (unsigned)cc->sector_size); + bool wrt = op_is_write(bio_op(bio)); + + if (wrt) { + /* + * For zoned devices, splitting write operations creates the + * risk of deadlocking queue freeze operations with zone write + * plugging BIO work when the reminder of a split BIO is + * issued. So always allow the entire BIO to proceed. + */ + if (ti->emulate_zone_append) + return bio_sectors(bio); + + val = min_not_zero(READ_ONCE(max_write_size), + DM_CRYPT_DEFAULT_MAX_WRITE_SIZE); + } else { + val = min_not_zero(READ_ONCE(max_read_size), + DM_CRYPT_DEFAULT_MAX_READ_SIZE); + } + + if (wrt || cc->used_tag_size) + val = min(val, BIO_MAX_VECS << PAGE_SHIFT); + + sector_align = max(bdev_logical_block_size(cc->dev->bdev), + (unsigned)cc->sector_size); val = round_down(val, sector_align); if (unlikely(!val)) val = sector_align; @@ -3496,7 +3514,7 @@ static int crypt_map(struct dm_target *ti, struct bio *bio) /* * Check if bio is too large, split as needed. */ - max_sectors = get_max_request_size(cc, bio_data_dir(bio) == WRITE); + max_sectors = get_max_request_sectors(ti, bio); if (unlikely(bio_sectors(bio) > max_sectors)) dm_accept_partial_bio(bio, max_sectors); @@ -3733,6 +3751,17 @@ static void crypt_io_hints(struct dm_target *ti, struct queue_limits *limits) max_t(unsigned int, limits->physical_block_size, cc->sector_size); limits->io_min = max_t(unsigned int, limits->io_min, cc->sector_size); limits->dma_alignment = limits->logical_block_size - 1; + + /* + * For zoned dm-crypt targets, there will be no internal splitting of + * write BIOs to avoid exceeding BIO_MAX_VECS vectors per BIO. But + * without respecting this limit, crypt_alloc_buffer() will trigger a + * BUG(). Avoid this by forcing DM core to split write BIOs to this + * limit. + */ + if (ti->emulate_zone_append) + limits->max_hw_sectors = min(limits->max_hw_sectors, + BIO_MAX_VECS << PAGE_SECTORS_SHIFT); } static struct target_type crypt_target = { diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c index a7dc04bd55e5..5bbbdf8fc1bd 100644 --- a/drivers/md/dm-stripe.c +++ b/drivers/md/dm-stripe.c @@ -458,6 +458,7 @@ static void stripe_io_hints(struct dm_target *ti, struct stripe_c *sc = ti->private; unsigned int chunk_size = sc->chunk_size << SECTOR_SHIFT; + limits->chunk_sectors = sc->chunk_size; limits->io_min = chunk_size; limits->io_opt = chunk_size * sc->stripes; } diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 1726f0f828cc..abfe0392b5a4 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1293,8 +1293,9 @@ out: /* * A target may call dm_accept_partial_bio only from the map routine. It is * allowed for all bio types except REQ_PREFLUSH, REQ_OP_ZONE_* zone management - * operations, REQ_OP_ZONE_APPEND (zone append writes) and any bio serviced by - * __send_duplicate_bios(). + * operations, zone append writes (native with REQ_OP_ZONE_APPEND or emulated + * with write BIOs flagged with BIO_EMULATES_ZONE_APPEND) and any bio serviced + * by __send_duplicate_bios(). * * dm_accept_partial_bio informs the dm that the target only wants to process * additional n_sectors sectors of the bio and the rest of the data should be @@ -1327,11 +1328,19 @@ void dm_accept_partial_bio(struct bio *bio, unsigned int n_sectors) unsigned int bio_sectors = bio_sectors(bio); BUG_ON(dm_tio_flagged(tio, DM_TIO_IS_DUPLICATE_BIO)); - BUG_ON(op_is_zone_mgmt(bio_op(bio))); - BUG_ON(bio_op(bio) == REQ_OP_ZONE_APPEND); BUG_ON(bio_sectors > *tio->len_ptr); BUG_ON(n_sectors > bio_sectors); + if (static_branch_unlikely(&zoned_enabled) && + unlikely(bdev_is_zoned(bio->bi_bdev))) { + enum req_op op = bio_op(bio); + + BUG_ON(op_is_zone_mgmt(op)); + BUG_ON(op == REQ_OP_WRITE); + BUG_ON(op == REQ_OP_WRITE_ZEROES); + BUG_ON(op == REQ_OP_ZONE_APPEND); + } + *tio->len_ptr -= bio_sectors - n_sectors; bio->bi_iter.bi_size = n_sectors << SECTOR_SHIFT; @@ -1776,19 +1785,35 @@ static void init_clone_info(struct clone_info *ci, struct dm_io *io, } #ifdef CONFIG_BLK_DEV_ZONED -static inline bool dm_zone_bio_needs_split(struct mapped_device *md, - struct bio *bio) +static inline bool dm_zone_bio_needs_split(struct bio *bio) { /* - * For mapped device that need zone append emulation, we must - * split any large BIO that straddles zone boundaries. + * Special case the zone operations that cannot or should not be split. */ - return dm_emulate_zone_append(md) && bio_straddles_zones(bio) && - !bio_flagged(bio, BIO_ZONE_WRITE_PLUGGING); + switch (bio_op(bio)) { + case REQ_OP_ZONE_APPEND: + case REQ_OP_ZONE_FINISH: + case REQ_OP_ZONE_RESET: + case REQ_OP_ZONE_RESET_ALL: + return false; + default: + break; + } + + /* + * When mapped devices use the block layer zone write plugging, we must + * split any large BIO to the mapped device limits to not submit BIOs + * that span zone boundaries and to avoid potential deadlocks with + * queue freeze operations. + */ + return bio_needs_zone_write_plugging(bio) || bio_straddles_zones(bio); } + static inline bool dm_zone_plug_bio(struct mapped_device *md, struct bio *bio) { - return dm_emulate_zone_append(md) && blk_zone_plug_bio(bio, 0); + if (!bio_needs_zone_write_plugging(bio)) + return false; + return blk_zone_plug_bio(bio, 0); } static blk_status_t __send_zone_reset_all_emulated(struct clone_info *ci, @@ -1904,8 +1929,7 @@ static blk_status_t __send_zone_reset_all(struct clone_info *ci) } #else -static inline bool dm_zone_bio_needs_split(struct mapped_device *md, - struct bio *bio) +static inline bool dm_zone_bio_needs_split(struct bio *bio) { return false; } @@ -1932,9 +1956,7 @@ static void dm_split_and_process_bio(struct mapped_device *md, is_abnormal = is_abnormal_io(bio); if (static_branch_unlikely(&zoned_enabled)) { - /* Special case REQ_OP_ZONE_RESET_ALL as it cannot be split. */ - need_split = (bio_op(bio) != REQ_OP_ZONE_RESET_ALL) && - (is_abnormal || dm_zone_bio_needs_split(md, bio)); + need_split = is_abnormal || dm_zone_bio_needs_split(bio); } else { need_split = is_abnormal; } diff --git a/drivers/md/md.c b/drivers/md/md.c index 0f03b21e66e4..046fe85c76fe 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -636,9 +636,6 @@ static void __mddev_put(struct mddev *mddev) mddev->ctime || mddev->hold_active) return; - /* Array is not configured at all, and not held active, so destroy it */ - set_bit(MD_DELETED, &mddev->flags); - /* * Call queue_work inside the spinlock so that flush_workqueue() after * mddev_find will succeed in waiting for the work to be done. @@ -873,6 +870,16 @@ void mddev_unlock(struct mddev *mddev) kobject_del(&rdev->kobj); export_rdev(rdev, mddev); } + + /* Call del_gendisk after release reconfig_mutex to avoid + * deadlock (e.g. call del_gendisk under the lock and an + * access to sysfs files waits the lock) + * And MD_DELETED is only used for md raid which is set in + * do_md_stop. dm raid only uses md_stop to stop. So dm raid + * doesn't need to check MD_DELETED when getting reconfig lock + */ + if (test_bit(MD_DELETED, &mddev->flags)) + del_gendisk(mddev->gendisk); } EXPORT_SYMBOL_GPL(mddev_unlock); @@ -5774,19 +5781,30 @@ md_attr_store(struct kobject *kobj, struct attribute *attr, struct md_sysfs_entry *entry = container_of(attr, struct md_sysfs_entry, attr); struct mddev *mddev = container_of(kobj, struct mddev, kobj); ssize_t rv; + struct kernfs_node *kn = NULL; if (!entry->store) return -EIO; if (!capable(CAP_SYS_ADMIN)) return -EACCES; + + if (entry->store == array_state_store && cmd_match(page, "clear")) + kn = sysfs_break_active_protection(kobj, attr); + spin_lock(&all_mddevs_lock); if (!mddev_get(mddev)) { spin_unlock(&all_mddevs_lock); + if (kn) + sysfs_unbreak_active_protection(kn); return -EBUSY; } spin_unlock(&all_mddevs_lock); rv = entry->store(mddev, page, length); mddev_put(mddev); + + if (kn) + sysfs_unbreak_active_protection(kn); + return rv; } @@ -5794,12 +5812,6 @@ static void md_kobj_release(struct kobject *ko) { struct mddev *mddev = container_of(ko, struct mddev, kobj); - if (mddev->sysfs_state) - sysfs_put(mddev->sysfs_state); - if (mddev->sysfs_level) - sysfs_put(mddev->sysfs_level); - - del_gendisk(mddev->gendisk); put_disk(mddev->gendisk); } @@ -6413,15 +6425,10 @@ static void md_clean(struct mddev *mddev) mddev->persistent = 0; mddev->level = LEVEL_NONE; mddev->clevel[0] = 0; - /* - * Don't clear MD_CLOSING, or mddev can be opened again. - * 'hold_active != 0' means mddev is still in the creation - * process and will be used later. - */ - if (mddev->hold_active) - mddev->flags = 0; - else - mddev->flags &= BIT_ULL_MASK(MD_CLOSING); + /* if UNTIL_STOP is set, it's cleared here */ + mddev->hold_active = 0; + /* Don't clear MD_CLOSING, or mddev can be opened again. */ + mddev->flags &= BIT_ULL_MASK(MD_CLOSING); mddev->sb_flags = 0; mddev->ro = MD_RDWR; mddev->metadata_type[0] = 0; @@ -6516,8 +6523,6 @@ static void __md_stop(struct mddev *mddev) if (mddev->private) pers->free(mddev, mddev->private); mddev->private = NULL; - if (pers->sync_request && mddev->to_remove == NULL) - mddev->to_remove = &md_redundancy_group; put_pers(pers); clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); @@ -6646,10 +6651,8 @@ static int do_md_stop(struct mddev *mddev, int mode) mddev->bitmap_info.offset = 0; export_array(mddev); - md_clean(mddev); - if (mddev->hold_active == UNTIL_STOP) - mddev->hold_active = 0; + set_bit(MD_DELETED, &mddev->flags); } md_new_event(); sysfs_notify_dirent_safe(mddev->sysfs_state); @@ -9456,17 +9459,11 @@ static bool md_spares_need_change(struct mddev *mddev) return false; } -static int remove_and_add_spares(struct mddev *mddev, - struct md_rdev *this) +static int remove_spares(struct mddev *mddev, struct md_rdev *this) { struct md_rdev *rdev; - int spares = 0; int removed = 0; - if (this && test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) - /* Mustn't remove devices when resync thread is running */ - return 0; - rdev_for_each(rdev, mddev) { if ((this == NULL || rdev == this) && rdev_removeable(rdev) && !mddev->pers->hot_remove_disk(mddev, rdev)) { @@ -9480,6 +9477,21 @@ static int remove_and_add_spares(struct mddev *mddev, if (removed && mddev->kobj.sd) sysfs_notify_dirent_safe(mddev->sysfs_degraded); + return removed; +} + +static int remove_and_add_spares(struct mddev *mddev, + struct md_rdev *this) +{ + struct md_rdev *rdev; + int spares = 0; + int removed = 0; + + if (this && test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) + /* Mustn't remove devices when resync thread is running */ + return 0; + + removed = remove_spares(mddev, this); if (this && removed) goto no_add; @@ -9522,6 +9534,7 @@ static bool md_choose_sync_action(struct mddev *mddev, int *spares) /* Check if resync is in progress. */ if (mddev->recovery_cp < MaxSector) { + remove_spares(mddev, NULL); set_bit(MD_RECOVERY_SYNC, &mddev->recovery); clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery); return true; diff --git a/drivers/md/md.h b/drivers/md/md.h index d45a9e6ead80..67b365621507 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -700,11 +700,26 @@ static inline bool reshape_interrupted(struct mddev *mddev) static inline int __must_check mddev_lock(struct mddev *mddev) { - return mutex_lock_interruptible(&mddev->reconfig_mutex); + int ret; + + ret = mutex_lock_interruptible(&mddev->reconfig_mutex); + + /* MD_DELETED is set in do_md_stop with reconfig_mutex. + * So check it here. + */ + if (!ret && test_bit(MD_DELETED, &mddev->flags)) { + ret = -ENODEV; + mutex_unlock(&mddev->reconfig_mutex); + } + + return ret; } /* Sometimes we need to take the lock in a situation where * failure due to interrupts is not acceptable. + * It doesn't need to check MD_DELETED here, the owner which + * holds the lock here can't be stopped. And all paths can't + * call this function after do_md_stop. */ static inline void mddev_lock_nointr(struct mddev *mddev) { @@ -713,7 +728,14 @@ static inline void mddev_lock_nointr(struct mddev *mddev) static inline int mddev_trylock(struct mddev *mddev) { - return mutex_trylock(&mddev->reconfig_mutex); + int ret; + + ret = mutex_trylock(&mddev->reconfig_mutex); + if (!ret && test_bit(MD_DELETED, &mddev->flags)) { + ret = -ENODEV; + mutex_unlock(&mddev->reconfig_mutex); + } + return ret; } extern void mddev_unlock(struct mddev *mddev); diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c index d8f639f4ae12..cbe2a9054cb9 100644 --- a/drivers/md/raid0.c +++ b/drivers/md/raid0.c @@ -384,6 +384,7 @@ static int raid0_set_limits(struct mddev *mddev) lim.max_write_zeroes_sectors = mddev->chunk_sectors; lim.io_min = mddev->chunk_sectors << 9; lim.io_opt = lim.io_min * mddev->raid_disks; + lim.chunk_sectors = mddev->chunk_sectors; lim.features |= BLK_FEAT_ATOMIC_WRITES; err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY); if (err) diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index c9bd2005bfd0..95dc354a86a0 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -2446,15 +2446,12 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio) * that are active */ for (i = 0; i < conf->copies; i++) { - int d; - tbio = r10_bio->devs[i].repl_bio; if (!tbio || !tbio->bi_end_io) continue; if (r10_bio->devs[i].bio->bi_end_io != end_sync_write && r10_bio->devs[i].bio != fbio) bio_copy_data(tbio, fbio); - d = r10_bio->devs[i].devnum; atomic_inc(&r10_bio->remaining); submit_bio_noacct(tbio); } @@ -4012,6 +4009,7 @@ static int raid10_set_queue_limits(struct mddev *mddev) md_init_stacking_limits(&lim); lim.max_write_zeroes_sectors = 0; lim.io_min = mddev->chunk_sectors << 9; + lim.chunk_sectors = mddev->chunk_sectors; lim.io_opt = lim.io_min * raid10_nr_stripes(conf); lim.features |= BLK_FEAT_ATOMIC_WRITES; err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY); diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index ca5b0e8ba707..7ec61ee7b218 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -9040,7 +9040,7 @@ static int __init raid5_init(void) int ret; raid5_wq = alloc_workqueue("raid5wq", - WQ_UNBOUND|WQ_MEM_RECLAIM|WQ_CPU_INTENSIVE|WQ_SYSFS, 0); + WQ_UNBOUND|WQ_MEM_RECLAIM|WQ_SYSFS, 0); if (!raid5_wq) return -ENOMEM; diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c index b1fddfa33ab9..1286c31320e6 100644 --- a/drivers/nvme/host/apple.c +++ b/drivers/nvme/host/apple.c @@ -301,8 +301,8 @@ static void apple_nvme_submit_cmd(struct apple_nvme_queue *q, memcpy(&q->sqes[tag], cmd, sizeof(*cmd)); /* - * This lock here doesn't make much sense at a first glace but - * removing it will result in occasional missed completetion + * This lock here doesn't make much sense at a first glance but + * removing it will result in occasional missed completion * interrupts even though the commands still appear on the CQ. * It's unclear why this happens but our best guess is that * there is a bug in the firmware triggered when a new command diff --git a/drivers/nvme/host/constants.c b/drivers/nvme/host/constants.c index 1a0058be5821..dc90df9e13a2 100644 --- a/drivers/nvme/host/constants.c +++ b/drivers/nvme/host/constants.c @@ -133,7 +133,7 @@ static const char * const nvme_statuses[] = { [NVME_SC_NS_NOT_ATTACHED] = "Namespace Not Attached", [NVME_SC_THIN_PROV_NOT_SUPP] = "Thin Provisioning Not Supported", [NVME_SC_CTRL_LIST_INVALID] = "Controller List Invalid", - [NVME_SC_SELT_TEST_IN_PROGRESS] = "Device Self-test In Progress", + [NVME_SC_SELF_TEST_IN_PROGRESS] = "Device Self-test In Progress", [NVME_SC_BP_WRITE_PROHIBITED] = "Boot Partition Write Prohibited", [NVME_SC_CTRL_ID_INVALID] = "Invalid Controller Identifier", [NVME_SC_SEC_CTRL_STATE_INVALID] = "Invalid Secondary Controller State", @@ -145,7 +145,7 @@ static const char * const nvme_statuses[] = { [NVME_SC_BAD_ATTRIBUTES] = "Conflicting Attributes", [NVME_SC_INVALID_PI] = "Invalid Protection Information", [NVME_SC_READ_ONLY] = "Attempted Write to Read Only Range", - [NVME_SC_CMD_SIZE_LIM_EXCEEDED ] = "Command Size Limits Exceeded", + [NVME_SC_CMD_SIZE_LIM_EXCEEDED] = "Command Size Limits Exceeded", [NVME_SC_ZONE_BOUNDARY_ERROR] = "Zoned Boundary Error", [NVME_SC_ZONE_FULL] = "Zone Is Full", [NVME_SC_ZONE_READ_ONLY] = "Zone Is Read Only", diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 29d6f7f85f74..9d988f4cb87a 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -4300,7 +4300,7 @@ static void nvme_scan_ns(struct nvme_ctrl *ctrl, unsigned nsid) } /* - * If available try to use the Command Set Idependent Identify Namespace + * If available try to use the Command Set Independent Identify Namespace * data structure to find all the generic information that is needed to * set up a namespace. If not fall back to the legacy version. */ diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c index 014b387f1e8b..08a5ea3e9383 100644 --- a/drivers/nvme/host/fc.c +++ b/drivers/nvme/host/fc.c @@ -899,7 +899,7 @@ EXPORT_SYMBOL_GPL(nvme_fc_set_remoteport_devloss); * may crash. * * As such: - * Wrapper all the dma routines and check the dev pointer. + * Wrap all the dma routines and check the dev pointer. * * If simple mappings (return just a dma address, we'll noop them, * returning a dma address of 0. @@ -1955,8 +1955,8 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req) } /* - * For the linux implementation, if we have an unsucceesful - * status, they blk-mq layer can typically be called with the + * For the linux implementation, if we have an unsuccessful + * status, the blk-mq layer can typically be called with the * non-zero status and the content of the cqe isn't important. */ if (status) @@ -2429,7 +2429,7 @@ static bool nvme_fc_terminate_exchange(struct request *req, void *data) /* * This routine runs through all outstanding commands on the association - * and aborts them. This routine is typically be called by the + * and aborts them. This routine is typically called by the * delete_association routine. It is also called due to an error during * reconnect. In that scenario, it is most likely a command that initializes * the controller, including fabric Connect commands on io queues, that @@ -2622,7 +2622,7 @@ nvme_fc_unmap_data(struct nvme_fc_ctrl *ctrl, struct request *rq, * as part of the exchange. The CQE is the last thing for the io, * which is transferred (explicitly or implicitly) with the RSP IU * sent on the exchange. After the CQE is received, the FC exchange is - * terminaed and the Exchange may be used on a different io. + * terminated and the Exchange may be used on a different io. * * The transport to LLDD api has the transport making a request for a * new fcp io request to the LLDD. The LLDD then allocates a FC exchange diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index 7df2ea21851f..cfd2b5b90b91 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -69,7 +69,7 @@ enum nvme_quirks { NVME_QUIRK_IDENTIFY_CNS = (1 << 1), /* - * The controller deterministically returns O's on reads to + * The controller deterministically returns 0's on reads to * logical blocks that deallocate was called on. */ NVME_QUIRK_DEALLOCATE_ZEROES = (1 << 2), diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 320aaa41ec39..071efec25346 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -7,7 +7,7 @@ #include <linux/acpi.h> #include <linux/async.h> #include <linux/blkdev.h> -#include <linux/blk-mq.h> +#include <linux/blk-mq-dma.h> #include <linux/blk-integrity.h> #include <linux/dmi.h> #include <linux/init.h> @@ -27,7 +27,6 @@ #include <linux/io-64-nonatomic-lo-hi.h> #include <linux/io-64-nonatomic-hi-lo.h> #include <linux/sed-opal.h> -#include <linux/pci-p2pdma.h> #include "trace.h" #include "nvme.h" @@ -39,20 +38,17 @@ #define NVME_SMALL_POOL_SIZE 256 /* - * These can be higher, but we need to ensure that any command doesn't - * require an sg allocation that needs more than a page of data. + * Arbitrary upper bound. */ -#define NVME_MAX_KB_SZ 8192 +#define NVME_MAX_BYTES SZ_8M #define NVME_MAX_NR_DESCRIPTORS 5 /* - * For data SGLs we support a single descriptors worth of SGL entries, but for - * now we also limit it to avoid an allocation larger than PAGE_SIZE for the - * scatterlist. + * For data SGLs we support a single descriptors worth of SGL entries. + * For PRPs, segments don't matter at all. */ #define NVME_MAX_SEGS \ - min(NVME_CTRL_PAGE_SIZE / sizeof(struct nvme_sgl_desc), \ - (PAGE_SIZE / sizeof(struct scatterlist))) + (NVME_CTRL_PAGE_SIZE / sizeof(struct nvme_sgl_desc)) /* * For metadata SGLs, only the small descriptor is supported, and the first @@ -61,6 +57,21 @@ #define NVME_MAX_META_SEGS \ ((NVME_SMALL_POOL_SIZE / sizeof(struct nvme_sgl_desc)) - 1) +/* + * The last entry is used to link to the next descriptor. + */ +#define PRPS_PER_PAGE \ + (((NVME_CTRL_PAGE_SIZE / sizeof(__le64))) - 1) + +/* + * I/O could be non-aligned both at the beginning and end. + */ +#define MAX_PRP_RANGE \ + (NVME_MAX_BYTES + 2 * (NVME_CTRL_PAGE_SIZE - 1)) + +static_assert(MAX_PRP_RANGE / NVME_CTRL_PAGE_SIZE <= + (1 /* prp1 */ + NVME_MAX_NR_DESCRIPTORS * PRPS_PER_PAGE)); + static int use_threaded_interrupts; module_param(use_threaded_interrupts, int, 0444); @@ -97,7 +108,7 @@ static int io_queue_count_set(const char *val, const struct kernel_param *kp) int ret; ret = kstrtouint(val, 10, &n); - if (ret != 0 || n > num_possible_cpus()) + if (ret != 0 || n > blk_mq_num_possible_queues(0)) return -EINVAL; return param_set_uint(val, kp); } @@ -162,7 +173,7 @@ struct nvme_dev { bool hmb; struct sg_table *hmb_sgt; - mempool_t *iod_mempool; + mempool_t *dmavec_mempool; mempool_t *iod_meta_mempool; /* shadow doorbell buffer support: */ @@ -246,7 +257,15 @@ enum nvme_iod_flags { IOD_ABORTED = 1U << 0, /* uses the small descriptor pool */ - IOD_SMALL_DESCRIPTOR = 1U << 1, + IOD_SMALL_DESCRIPTOR = 1U << 1, + + /* single segment dma mapping */ + IOD_SINGLE_SEGMENT = 1U << 2, +}; + +struct nvme_dma_vec { + dma_addr_t addr; + unsigned int len; }; /* @@ -257,13 +276,16 @@ struct nvme_iod { struct nvme_command cmd; u8 flags; u8 nr_descriptors; - unsigned int dma_len; /* length of single DMA segment mapping */ - dma_addr_t first_dma; + + unsigned int total_len; + struct dma_iova_state dma_state; + void *descriptors[NVME_MAX_NR_DESCRIPTORS]; + struct nvme_dma_vec *dma_vecs; + unsigned int nr_dma_vecs; + dma_addr_t meta_dma; - struct sg_table sgt; struct sg_table meta_sgt; struct nvme_sgl_desc *meta_descriptor; - void *descriptors[NVME_MAX_NR_DESCRIPTORS]; }; static inline unsigned int nvme_dbbuf_size(struct nvme_dev *dev) @@ -406,18 +428,6 @@ static bool nvme_dbbuf_update_and_check_event(u16 value, __le32 *dbbuf_db, return true; } -/* - * Will slightly overestimate the number of pages needed. This is OK - * as it only leads to a small amount of wasted memory for the lifetime of - * the I/O. - */ -static __always_inline int nvme_pci_npages_prp(void) -{ - unsigned max_bytes = (NVME_MAX_KB_SZ * 1024) + NVME_CTRL_PAGE_SIZE; - unsigned nprps = DIV_ROUND_UP(max_bytes, NVME_CTRL_PAGE_SIZE); - return DIV_ROUND_UP(8 * nprps, NVME_CTRL_PAGE_SIZE - 8); -} - static struct nvme_descriptor_pools * nvme_setup_descriptor_pools(struct nvme_dev *dev, unsigned numa_node) { @@ -578,32 +588,49 @@ static void nvme_commit_rqs(struct blk_mq_hw_ctx *hctx) spin_unlock(&nvmeq->sq_lock); } -static inline bool nvme_pci_metadata_use_sgls(struct nvme_dev *dev, - struct request *req) +enum nvme_use_sgl { + SGL_UNSUPPORTED, + SGL_SUPPORTED, + SGL_FORCED, +}; + +static inline bool nvme_pci_metadata_use_sgls(struct request *req) { + struct nvme_queue *nvmeq = req->mq_hctx->driver_data; + struct nvme_dev *dev = nvmeq->dev; + if (!nvme_ctrl_meta_sgl_supported(&dev->ctrl)) return false; return req->nr_integrity_segments > 1 || nvme_req(req)->flags & NVME_REQ_USERCMD; } -static inline bool nvme_pci_use_sgls(struct nvme_dev *dev, struct request *req, - int nseg) +static inline enum nvme_use_sgl nvme_pci_use_sgls(struct nvme_dev *dev, + struct request *req) { struct nvme_queue *nvmeq = req->mq_hctx->driver_data; - unsigned int avg_seg_size; - avg_seg_size = DIV_ROUND_UP(blk_rq_payload_bytes(req), nseg); + if (nvmeq->qid && nvme_ctrl_sgl_supported(&dev->ctrl)) { + if (nvme_req(req)->flags & NVME_REQ_USERCMD) + return SGL_FORCED; + if (req->nr_integrity_segments > 1) + return SGL_FORCED; + return SGL_SUPPORTED; + } - if (!nvme_ctrl_sgl_supported(&dev->ctrl)) - return false; - if (!nvmeq->qid) - return false; - if (nvme_pci_metadata_use_sgls(dev, req)) - return true; - if (!sgl_threshold || avg_seg_size < sgl_threshold) - return nvme_req(req)->flags & NVME_REQ_USERCMD; - return true; + return SGL_UNSUPPORTED; +} + +static unsigned int nvme_pci_avg_seg_size(struct request *req) +{ + struct nvme_iod *iod = blk_mq_rq_to_pdu(req); + unsigned int nseg; + + if (blk_rq_dma_map_coalesce(&iod->dma_state)) + nseg = 1; + else + nseg = blk_rq_nr_phys_segments(req); + return DIV_ROUND_UP(blk_rq_payload_bytes(req), nseg); } static inline struct dma_pool *nvme_dma_pool(struct nvme_queue *nvmeq, @@ -614,11 +641,25 @@ static inline struct dma_pool *nvme_dma_pool(struct nvme_queue *nvmeq, return nvmeq->descriptor_pools.large; } -static void nvme_free_descriptors(struct nvme_queue *nvmeq, struct request *req) +static inline bool nvme_pci_cmd_use_sgl(struct nvme_command *cmd) +{ + return cmd->common.flags & + (NVME_CMD_SGL_METABUF | NVME_CMD_SGL_METASEG); +} + +static inline dma_addr_t nvme_pci_first_desc_dma_addr(struct nvme_command *cmd) { + if (nvme_pci_cmd_use_sgl(cmd)) + return le64_to_cpu(cmd->common.dptr.sgl.addr); + return le64_to_cpu(cmd->common.dptr.prp2); +} + +static void nvme_free_descriptors(struct request *req) +{ + struct nvme_queue *nvmeq = req->mq_hctx->driver_data; const int last_prp = NVME_CTRL_PAGE_SIZE / sizeof(__le64) - 1; struct nvme_iod *iod = blk_mq_rq_to_pdu(req); - dma_addr_t dma_addr = iod->first_dma; + dma_addr_t dma_addr = nvme_pci_first_desc_dma_addr(&iod->cmd); int i; if (iod->nr_descriptors == 1) { @@ -637,68 +678,130 @@ static void nvme_free_descriptors(struct nvme_queue *nvmeq, struct request *req) } } -static void nvme_unmap_data(struct nvme_dev *dev, struct nvme_queue *nvmeq, - struct request *req) +static void nvme_free_prps(struct request *req) { struct nvme_iod *iod = blk_mq_rq_to_pdu(req); + struct nvme_queue *nvmeq = req->mq_hctx->driver_data; + unsigned int i; - if (iod->dma_len) { - dma_unmap_page(dev->dev, iod->first_dma, iod->dma_len, - rq_dma_dir(req)); + for (i = 0; i < iod->nr_dma_vecs; i++) + dma_unmap_page(nvmeq->dev->dev, iod->dma_vecs[i].addr, + iod->dma_vecs[i].len, rq_dma_dir(req)); + mempool_free(iod->dma_vecs, nvmeq->dev->dmavec_mempool); +} + +static void nvme_free_sgls(struct request *req) +{ + struct nvme_iod *iod = blk_mq_rq_to_pdu(req); + struct nvme_queue *nvmeq = req->mq_hctx->driver_data; + struct device *dma_dev = nvmeq->dev->dev; + dma_addr_t sqe_dma_addr = le64_to_cpu(iod->cmd.common.dptr.sgl.addr); + unsigned int sqe_dma_len = le32_to_cpu(iod->cmd.common.dptr.sgl.length); + struct nvme_sgl_desc *sg_list = iod->descriptors[0]; + enum dma_data_direction dir = rq_dma_dir(req); + + if (iod->nr_descriptors) { + unsigned int nr_entries = sqe_dma_len / sizeof(*sg_list), i; + + for (i = 0; i < nr_entries; i++) + dma_unmap_page(dma_dev, le64_to_cpu(sg_list[i].addr), + le32_to_cpu(sg_list[i].length), dir); + } else { + dma_unmap_page(dma_dev, sqe_dma_addr, sqe_dma_len, dir); + } +} + +static void nvme_unmap_data(struct request *req) +{ + struct nvme_iod *iod = blk_mq_rq_to_pdu(req); + struct nvme_queue *nvmeq = req->mq_hctx->driver_data; + struct device *dma_dev = nvmeq->dev->dev; + + if (iod->flags & IOD_SINGLE_SEGMENT) { + static_assert(offsetof(union nvme_data_ptr, prp1) == + offsetof(union nvme_data_ptr, sgl.addr)); + dma_unmap_page(dma_dev, le64_to_cpu(iod->cmd.common.dptr.prp1), + iod->total_len, rq_dma_dir(req)); return; } - WARN_ON_ONCE(!iod->sgt.nents); + if (!blk_rq_dma_unmap(req, dma_dev, &iod->dma_state, iod->total_len)) { + if (nvme_pci_cmd_use_sgl(&iod->cmd)) + nvme_free_sgls(req); + else + nvme_free_prps(req); + } - dma_unmap_sgtable(dev->dev, &iod->sgt, rq_dma_dir(req), 0); - nvme_free_descriptors(nvmeq, req); - mempool_free(iod->sgt.sgl, dev->iod_mempool); + if (iod->nr_descriptors) + nvme_free_descriptors(req); } -static void nvme_print_sgl(struct scatterlist *sgl, int nents) +static bool nvme_pci_prp_iter_next(struct request *req, struct device *dma_dev, + struct blk_dma_iter *iter) { - int i; - struct scatterlist *sg; + struct nvme_iod *iod = blk_mq_rq_to_pdu(req); - for_each_sg(sgl, sg, nents, i) { - dma_addr_t phys = sg_phys(sg); - pr_warn("sg[%d] phys_addr:%pad offset:%d length:%d " - "dma_address:%pad dma_length:%d\n", - i, &phys, sg->offset, sg->length, &sg_dma_address(sg), - sg_dma_len(sg)); + if (iter->len) + return true; + if (!blk_rq_dma_map_iter_next(req, dma_dev, &iod->dma_state, iter)) + return false; + if (!dma_use_iova(&iod->dma_state) && dma_need_unmap(dma_dev)) { + iod->dma_vecs[iod->nr_dma_vecs].addr = iter->addr; + iod->dma_vecs[iod->nr_dma_vecs].len = iter->len; + iod->nr_dma_vecs++; } + return true; } -static blk_status_t nvme_pci_setup_prps(struct nvme_queue *nvmeq, - struct request *req, struct nvme_rw_command *cmnd) +static blk_status_t nvme_pci_setup_data_prp(struct request *req, + struct blk_dma_iter *iter) { struct nvme_iod *iod = blk_mq_rq_to_pdu(req); - int length = blk_rq_payload_bytes(req); - struct scatterlist *sg = iod->sgt.sgl; - int dma_len = sg_dma_len(sg); - u64 dma_addr = sg_dma_address(sg); - int offset = dma_addr & (NVME_CTRL_PAGE_SIZE - 1); + struct nvme_queue *nvmeq = req->mq_hctx->driver_data; + unsigned int length = blk_rq_payload_bytes(req); + dma_addr_t prp1_dma, prp2_dma = 0; + unsigned int prp_len, i; __le64 *prp_list; - dma_addr_t prp_dma; - int i; - length -= (NVME_CTRL_PAGE_SIZE - offset); - if (length <= 0) { - iod->first_dma = 0; - goto done; + if (!dma_use_iova(&iod->dma_state) && dma_need_unmap(nvmeq->dev->dev)) { + iod->dma_vecs = mempool_alloc(nvmeq->dev->dmavec_mempool, + GFP_ATOMIC); + if (!iod->dma_vecs) + return BLK_STS_RESOURCE; + iod->dma_vecs[0].addr = iter->addr; + iod->dma_vecs[0].len = iter->len; + iod->nr_dma_vecs = 1; } - dma_len -= (NVME_CTRL_PAGE_SIZE - offset); - if (dma_len) { - dma_addr += (NVME_CTRL_PAGE_SIZE - offset); - } else { - sg = sg_next(sg); - dma_addr = sg_dma_address(sg); - dma_len = sg_dma_len(sg); + /* + * PRP1 always points to the start of the DMA transfers. + * + * This is the only PRP (except for the list entries) that could be + * non-aligned. + */ + prp1_dma = iter->addr; + prp_len = min(length, NVME_CTRL_PAGE_SIZE - + (iter->addr & (NVME_CTRL_PAGE_SIZE - 1))); + iod->total_len += prp_len; + iter->addr += prp_len; + iter->len -= prp_len; + length -= prp_len; + if (!length) + goto done; + + if (!nvme_pci_prp_iter_next(req, nvmeq->dev->dev, iter)) { + if (WARN_ON_ONCE(!iter->status)) + goto bad_sgl; + goto done; } + /* + * PRP2 is usually a list, but can point to data if all data to be + * transferred fits into PRP1 + PRP2: + */ if (length <= NVME_CTRL_PAGE_SIZE) { - iod->first_dma = dma_addr; + prp2_dma = iter->addr; + iod->total_len += length; goto done; } @@ -707,58 +810,80 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_queue *nvmeq, iod->flags |= IOD_SMALL_DESCRIPTOR; prp_list = dma_pool_alloc(nvme_dma_pool(nvmeq, iod), GFP_ATOMIC, - &prp_dma); - if (!prp_list) - return BLK_STS_RESOURCE; + &prp2_dma); + if (!prp_list) { + iter->status = BLK_STS_RESOURCE; + goto done; + } iod->descriptors[iod->nr_descriptors++] = prp_list; - iod->first_dma = prp_dma; + i = 0; for (;;) { + prp_list[i++] = cpu_to_le64(iter->addr); + prp_len = min(length, NVME_CTRL_PAGE_SIZE); + if (WARN_ON_ONCE(iter->len < prp_len)) + goto bad_sgl; + + iod->total_len += prp_len; + iter->addr += prp_len; + iter->len -= prp_len; + length -= prp_len; + if (!length) + break; + + if (!nvme_pci_prp_iter_next(req, nvmeq->dev->dev, iter)) { + if (WARN_ON_ONCE(!iter->status)) + goto bad_sgl; + goto done; + } + + /* + * If we've filled the entire descriptor, allocate a new that is + * pointed to be the last entry in the previous PRP list. To + * accommodate for that move the last actual entry to the new + * descriptor. + */ if (i == NVME_CTRL_PAGE_SIZE >> 3) { __le64 *old_prp_list = prp_list; + dma_addr_t prp_list_dma; prp_list = dma_pool_alloc(nvmeq->descriptor_pools.large, - GFP_ATOMIC, &prp_dma); - if (!prp_list) - goto free_prps; + GFP_ATOMIC, &prp_list_dma); + if (!prp_list) { + iter->status = BLK_STS_RESOURCE; + goto done; + } iod->descriptors[iod->nr_descriptors++] = prp_list; + prp_list[0] = old_prp_list[i - 1]; - old_prp_list[i - 1] = cpu_to_le64(prp_dma); + old_prp_list[i - 1] = cpu_to_le64(prp_list_dma); i = 1; } - prp_list[i++] = cpu_to_le64(dma_addr); - dma_len -= NVME_CTRL_PAGE_SIZE; - dma_addr += NVME_CTRL_PAGE_SIZE; - length -= NVME_CTRL_PAGE_SIZE; - if (length <= 0) - break; - if (dma_len > 0) - continue; - if (unlikely(dma_len < 0)) - goto bad_sgl; - sg = sg_next(sg); - dma_addr = sg_dma_address(sg); - dma_len = sg_dma_len(sg); } + done: - cmnd->dptr.prp1 = cpu_to_le64(sg_dma_address(iod->sgt.sgl)); - cmnd->dptr.prp2 = cpu_to_le64(iod->first_dma); - return BLK_STS_OK; -free_prps: - nvme_free_descriptors(nvmeq, req); - return BLK_STS_RESOURCE; + /* + * nvme_unmap_data uses the DPT field in the SQE to tear down the + * mapping, so initialize it even for failures. + */ + iod->cmd.common.dptr.prp1 = cpu_to_le64(prp1_dma); + iod->cmd.common.dptr.prp2 = cpu_to_le64(prp2_dma); + if (unlikely(iter->status)) + nvme_unmap_data(req); + return iter->status; + bad_sgl: - WARN(DO_ONCE(nvme_print_sgl, iod->sgt.sgl, iod->sgt.nents), - "Invalid SGL for payload:%d nents:%d\n", - blk_rq_payload_bytes(req), iod->sgt.nents); + dev_err_once(nvmeq->dev->dev, + "Incorrectly formed request for payload:%d nents:%d\n", + blk_rq_payload_bytes(req), blk_rq_nr_phys_segments(req)); return BLK_STS_IOERR; } static void nvme_pci_sgl_set_data(struct nvme_sgl_desc *sge, - struct scatterlist *sg) + struct blk_dma_iter *iter) { - sge->addr = cpu_to_le64(sg_dma_address(sg)); - sge->length = cpu_to_le32(sg_dma_len(sg)); + sge->addr = cpu_to_le64(iter->addr); + sge->length = cpu_to_le32(iter->len); sge->type = NVME_SGL_FMT_DATA_DESC << 4; } @@ -770,21 +895,22 @@ static void nvme_pci_sgl_set_seg(struct nvme_sgl_desc *sge, sge->type = NVME_SGL_FMT_LAST_SEG_DESC << 4; } -static blk_status_t nvme_pci_setup_sgls(struct nvme_queue *nvmeq, - struct request *req, struct nvme_rw_command *cmd) +static blk_status_t nvme_pci_setup_data_sgl(struct request *req, + struct blk_dma_iter *iter) { struct nvme_iod *iod = blk_mq_rq_to_pdu(req); + struct nvme_queue *nvmeq = req->mq_hctx->driver_data; + unsigned int entries = blk_rq_nr_phys_segments(req); struct nvme_sgl_desc *sg_list; - struct scatterlist *sg = iod->sgt.sgl; - unsigned int entries = iod->sgt.nents; dma_addr_t sgl_dma; - int i = 0; + unsigned int mapped = 0; - /* setting the transfer type as SGL */ - cmd->flags = NVME_CMD_SGL_METABUF; + /* set the transfer type as SGL */ + iod->cmd.common.flags = NVME_CMD_SGL_METABUF; - if (entries == 1) { - nvme_pci_sgl_set_data(&cmd->dptr.sgl, sg); + if (entries == 1 || blk_rq_dma_map_coalesce(&iod->dma_state)) { + nvme_pci_sgl_set_data(&iod->cmd.common.dptr.sgl, iter); + iod->total_len += iter->len; return BLK_STS_OK; } @@ -796,119 +922,104 @@ static blk_status_t nvme_pci_setup_sgls(struct nvme_queue *nvmeq, if (!sg_list) return BLK_STS_RESOURCE; iod->descriptors[iod->nr_descriptors++] = sg_list; - iod->first_dma = sgl_dma; - nvme_pci_sgl_set_seg(&cmd->dptr.sgl, sgl_dma, entries); do { - nvme_pci_sgl_set_data(&sg_list[i++], sg); - sg = sg_next(sg); - } while (--entries > 0); + if (WARN_ON_ONCE(mapped == entries)) { + iter->status = BLK_STS_IOERR; + break; + } + nvme_pci_sgl_set_data(&sg_list[mapped++], iter); + iod->total_len += iter->len; + } while (blk_rq_dma_map_iter_next(req, nvmeq->dev->dev, &iod->dma_state, + iter)); - return BLK_STS_OK; + nvme_pci_sgl_set_seg(&iod->cmd.common.dptr.sgl, sgl_dma, mapped); + if (unlikely(iter->status)) + nvme_free_sgls(req); + return iter->status; } -static blk_status_t nvme_setup_prp_simple(struct nvme_dev *dev, - struct request *req, struct nvme_rw_command *cmnd, - struct bio_vec *bv) +static blk_status_t nvme_pci_setup_data_simple(struct request *req, + enum nvme_use_sgl use_sgl) { struct nvme_iod *iod = blk_mq_rq_to_pdu(req); - unsigned int offset = bv->bv_offset & (NVME_CTRL_PAGE_SIZE - 1); - unsigned int first_prp_len = NVME_CTRL_PAGE_SIZE - offset; - - iod->first_dma = dma_map_bvec(dev->dev, bv, rq_dma_dir(req), 0); - if (dma_mapping_error(dev->dev, iod->first_dma)) + struct nvme_queue *nvmeq = req->mq_hctx->driver_data; + struct bio_vec bv = req_bvec(req); + unsigned int prp1_offset = bv.bv_offset & (NVME_CTRL_PAGE_SIZE - 1); + bool prp_possible = prp1_offset + bv.bv_len <= NVME_CTRL_PAGE_SIZE * 2; + dma_addr_t dma_addr; + + if (!use_sgl && !prp_possible) + return BLK_STS_AGAIN; + if (is_pci_p2pdma_page(bv.bv_page)) + return BLK_STS_AGAIN; + + dma_addr = dma_map_bvec(nvmeq->dev->dev, &bv, rq_dma_dir(req), 0); + if (dma_mapping_error(nvmeq->dev->dev, dma_addr)) return BLK_STS_RESOURCE; - iod->dma_len = bv->bv_len; - - cmnd->dptr.prp1 = cpu_to_le64(iod->first_dma); - if (bv->bv_len > first_prp_len) - cmnd->dptr.prp2 = cpu_to_le64(iod->first_dma + first_prp_len); - else - cmnd->dptr.prp2 = 0; - return BLK_STS_OK; -} - -static blk_status_t nvme_setup_sgl_simple(struct nvme_dev *dev, - struct request *req, struct nvme_rw_command *cmnd, - struct bio_vec *bv) -{ - struct nvme_iod *iod = blk_mq_rq_to_pdu(req); + iod->total_len = bv.bv_len; + iod->flags |= IOD_SINGLE_SEGMENT; + + if (use_sgl == SGL_FORCED || !prp_possible) { + iod->cmd.common.flags = NVME_CMD_SGL_METABUF; + iod->cmd.common.dptr.sgl.addr = cpu_to_le64(dma_addr); + iod->cmd.common.dptr.sgl.length = cpu_to_le32(bv.bv_len); + iod->cmd.common.dptr.sgl.type = NVME_SGL_FMT_DATA_DESC << 4; + } else { + unsigned int first_prp_len = NVME_CTRL_PAGE_SIZE - prp1_offset; - iod->first_dma = dma_map_bvec(dev->dev, bv, rq_dma_dir(req), 0); - if (dma_mapping_error(dev->dev, iod->first_dma)) - return BLK_STS_RESOURCE; - iod->dma_len = bv->bv_len; + iod->cmd.common.dptr.prp1 = cpu_to_le64(dma_addr); + iod->cmd.common.dptr.prp2 = 0; + if (bv.bv_len > first_prp_len) + iod->cmd.common.dptr.prp2 = + cpu_to_le64(dma_addr + first_prp_len); + } - cmnd->flags = NVME_CMD_SGL_METABUF; - cmnd->dptr.sgl.addr = cpu_to_le64(iod->first_dma); - cmnd->dptr.sgl.length = cpu_to_le32(iod->dma_len); - cmnd->dptr.sgl.type = NVME_SGL_FMT_DATA_DESC << 4; return BLK_STS_OK; } -static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req, - struct nvme_command *cmnd) +static blk_status_t nvme_map_data(struct request *req) { - struct nvme_queue *nvmeq = req->mq_hctx->driver_data; struct nvme_iod *iod = blk_mq_rq_to_pdu(req); - blk_status_t ret = BLK_STS_RESOURCE; - int rc; + struct nvme_queue *nvmeq = req->mq_hctx->driver_data; + struct nvme_dev *dev = nvmeq->dev; + enum nvme_use_sgl use_sgl = nvme_pci_use_sgls(dev, req); + struct blk_dma_iter iter; + blk_status_t ret; + /* + * Try to skip the DMA iterator for single segment requests, as that + * significantly improves performances for small I/O sizes. + */ if (blk_rq_nr_phys_segments(req) == 1) { - struct bio_vec bv = req_bvec(req); - - if (!is_pci_p2pdma_page(bv.bv_page)) { - if (!nvme_pci_metadata_use_sgls(dev, req) && - (bv.bv_offset & (NVME_CTRL_PAGE_SIZE - 1)) + - bv.bv_len <= NVME_CTRL_PAGE_SIZE * 2) - return nvme_setup_prp_simple(dev, req, - &cmnd->rw, &bv); - - if (nvmeq->qid && sgl_threshold && - nvme_ctrl_sgl_supported(&dev->ctrl)) - return nvme_setup_sgl_simple(dev, req, - &cmnd->rw, &bv); - } + ret = nvme_pci_setup_data_simple(req, use_sgl); + if (ret != BLK_STS_AGAIN) + return ret; } - iod->dma_len = 0; - iod->sgt.sgl = mempool_alloc(dev->iod_mempool, GFP_ATOMIC); - if (!iod->sgt.sgl) - return BLK_STS_RESOURCE; - sg_init_table(iod->sgt.sgl, blk_rq_nr_phys_segments(req)); - iod->sgt.orig_nents = blk_rq_map_sg(req, iod->sgt.sgl); - if (!iod->sgt.orig_nents) - goto out_free_sg; + if (!blk_rq_dma_map_iter_start(req, dev->dev, &iod->dma_state, &iter)) + return iter.status; - rc = dma_map_sgtable(dev->dev, &iod->sgt, rq_dma_dir(req), - DMA_ATTR_NO_WARN); - if (rc) { - if (rc == -EREMOTEIO) - ret = BLK_STS_TARGET; - goto out_free_sg; - } - - if (nvme_pci_use_sgls(dev, req, iod->sgt.nents)) - ret = nvme_pci_setup_sgls(nvmeq, req, &cmnd->rw); - else - ret = nvme_pci_setup_prps(nvmeq, req, &cmnd->rw); - if (ret != BLK_STS_OK) - goto out_unmap_sg; - return BLK_STS_OK; + if (use_sgl == SGL_FORCED || + (use_sgl == SGL_SUPPORTED && + (sgl_threshold && nvme_pci_avg_seg_size(req) >= sgl_threshold))) + return nvme_pci_setup_data_sgl(req, &iter); + return nvme_pci_setup_data_prp(req, &iter); +} -out_unmap_sg: - dma_unmap_sgtable(dev->dev, &iod->sgt, rq_dma_dir(req), 0); -out_free_sg: - mempool_free(iod->sgt.sgl, dev->iod_mempool); - return ret; +static void nvme_pci_sgl_set_data_sg(struct nvme_sgl_desc *sge, + struct scatterlist *sg) +{ + sge->addr = cpu_to_le64(sg_dma_address(sg)); + sge->length = cpu_to_le32(sg_dma_len(sg)); + sge->type = NVME_SGL_FMT_DATA_DESC << 4; } -static blk_status_t nvme_pci_setup_meta_sgls(struct nvme_dev *dev, - struct request *req) +static blk_status_t nvme_pci_setup_meta_sgls(struct request *req) { struct nvme_queue *nvmeq = req->mq_hctx->driver_data; + struct nvme_dev *dev = nvmeq->dev; struct nvme_iod *iod = blk_mq_rq_to_pdu(req); - struct nvme_rw_command *cmnd = &iod->cmd.rw; struct nvme_sgl_desc *sg_list; struct scatterlist *sgl, *sg; unsigned int entries; @@ -939,19 +1050,19 @@ static blk_status_t nvme_pci_setup_meta_sgls(struct nvme_dev *dev, iod->meta_descriptor = sg_list; iod->meta_dma = sgl_dma; - cmnd->flags = NVME_CMD_SGL_METASEG; - cmnd->metadata = cpu_to_le64(sgl_dma); + iod->cmd.common.flags = NVME_CMD_SGL_METASEG; + iod->cmd.common.metadata = cpu_to_le64(sgl_dma); sgl = iod->meta_sgt.sgl; if (entries == 1) { - nvme_pci_sgl_set_data(sg_list, sgl); + nvme_pci_sgl_set_data_sg(sg_list, sgl); return BLK_STS_OK; } sgl_dma += sizeof(*sg_list); nvme_pci_sgl_set_seg(sg_list, sgl_dma, entries); for_each_sg(sgl, sg, entries, i) - nvme_pci_sgl_set_data(&sg_list[i + 1], sg); + nvme_pci_sgl_set_data_sg(&sg_list[i + 1], sg); return BLK_STS_OK; @@ -962,38 +1073,37 @@ out_free_sg: return BLK_STS_RESOURCE; } -static blk_status_t nvme_pci_setup_meta_mptr(struct nvme_dev *dev, - struct request *req) +static blk_status_t nvme_pci_setup_meta_mptr(struct request *req) { struct nvme_iod *iod = blk_mq_rq_to_pdu(req); + struct nvme_queue *nvmeq = req->mq_hctx->driver_data; struct bio_vec bv = rq_integrity_vec(req); - struct nvme_command *cmnd = &iod->cmd; - iod->meta_dma = dma_map_bvec(dev->dev, &bv, rq_dma_dir(req), 0); - if (dma_mapping_error(dev->dev, iod->meta_dma)) + iod->meta_dma = dma_map_bvec(nvmeq->dev->dev, &bv, rq_dma_dir(req), 0); + if (dma_mapping_error(nvmeq->dev->dev, iod->meta_dma)) return BLK_STS_IOERR; - cmnd->rw.metadata = cpu_to_le64(iod->meta_dma); + iod->cmd.common.metadata = cpu_to_le64(iod->meta_dma); return BLK_STS_OK; } -static blk_status_t nvme_map_metadata(struct nvme_dev *dev, struct request *req) +static blk_status_t nvme_map_metadata(struct request *req) { struct nvme_iod *iod = blk_mq_rq_to_pdu(req); if ((iod->cmd.common.flags & NVME_CMD_SGL_METABUF) && - nvme_pci_metadata_use_sgls(dev, req)) - return nvme_pci_setup_meta_sgls(dev, req); - return nvme_pci_setup_meta_mptr(dev, req); + nvme_pci_metadata_use_sgls(req)) + return nvme_pci_setup_meta_sgls(req); + return nvme_pci_setup_meta_mptr(req); } -static blk_status_t nvme_prep_rq(struct nvme_dev *dev, struct request *req) +static blk_status_t nvme_prep_rq(struct request *req) { struct nvme_iod *iod = blk_mq_rq_to_pdu(req); blk_status_t ret; iod->flags = 0; iod->nr_descriptors = 0; - iod->sgt.nents = 0; + iod->total_len = 0; iod->meta_sgt.nents = 0; ret = nvme_setup_cmd(req->q->queuedata, req); @@ -1001,13 +1111,13 @@ static blk_status_t nvme_prep_rq(struct nvme_dev *dev, struct request *req) return ret; if (blk_rq_nr_phys_segments(req)) { - ret = nvme_map_data(dev, req, &iod->cmd); + ret = nvme_map_data(req); if (ret) goto out_free_cmd; } if (blk_integrity_rq(req)) { - ret = nvme_map_metadata(dev, req); + ret = nvme_map_metadata(req); if (ret) goto out_unmap_data; } @@ -1016,7 +1126,7 @@ static blk_status_t nvme_prep_rq(struct nvme_dev *dev, struct request *req) return BLK_STS_OK; out_unmap_data: if (blk_rq_nr_phys_segments(req)) - nvme_unmap_data(dev, req->mq_hctx->driver_data, req); + nvme_unmap_data(req); out_free_cmd: nvme_cleanup_cmd(req); return ret; @@ -1041,7 +1151,7 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx, if (unlikely(!nvme_check_ready(&dev->ctrl, req, true))) return nvme_fail_nonready_command(&dev->ctrl, req); - ret = nvme_prep_rq(dev, req); + ret = nvme_prep_rq(req); if (unlikely(ret)) return ret; spin_lock(&nvmeq->sq_lock); @@ -1079,7 +1189,7 @@ static bool nvme_prep_rq_batch(struct nvme_queue *nvmeq, struct request *req) if (unlikely(!nvme_check_ready(&nvmeq->dev->ctrl, req, true))) return false; - return nvme_prep_rq(nvmeq->dev, req) == BLK_STS_OK; + return nvme_prep_rq(req) == BLK_STS_OK; } static void nvme_queue_rqs(struct rq_list *rqlist) @@ -1105,11 +1215,11 @@ static void nvme_queue_rqs(struct rq_list *rqlist) *rqlist = requeue_list; } -static __always_inline void nvme_unmap_metadata(struct nvme_dev *dev, - struct nvme_queue *nvmeq, - struct request *req) +static __always_inline void nvme_unmap_metadata(struct request *req) { struct nvme_iod *iod = blk_mq_rq_to_pdu(req); + struct nvme_queue *nvmeq = req->mq_hctx->driver_data; + struct nvme_dev *dev = nvmeq->dev; if (!iod->meta_sgt.nents) { dma_unmap_page(dev->dev, iod->meta_dma, @@ -1126,14 +1236,10 @@ static __always_inline void nvme_unmap_metadata(struct nvme_dev *dev, static __always_inline void nvme_pci_unmap_rq(struct request *req) { - struct nvme_queue *nvmeq = req->mq_hctx->driver_data; - struct nvme_dev *dev = nvmeq->dev; - if (blk_integrity_rq(req)) - nvme_unmap_metadata(dev, nvmeq, req); - + nvme_unmap_metadata(req); if (blk_rq_nr_phys_segments(req)) - nvme_unmap_data(dev, nvmeq, req); + nvme_unmap_data(req); } static void nvme_pci_complete_rq(struct request *req) @@ -1958,8 +2064,28 @@ static int nvme_pci_configure_admin_queue(struct nvme_dev *dev) * might be pointing at! */ result = nvme_disable_ctrl(&dev->ctrl, false); - if (result < 0) - return result; + if (result < 0) { + struct pci_dev *pdev = to_pci_dev(dev->dev); + + /* + * The NVMe Controller Reset method did not get an expected + * CSTS.RDY transition, so something with the device appears to + * be stuck. Use the lower level and bigger hammer PCIe + * Function Level Reset to attempt restoring the device to its + * initial state, and try again. + */ + result = pcie_reset_flr(pdev, false); + if (result < 0) + return result; + + pci_restore_state(pdev); + result = nvme_disable_ctrl(&dev->ctrl, false); + if (result < 0) + return result; + + dev_info(dev->ctrl.device, + "controller reset completed after pcie flr\n"); + } result = nvme_alloc_queue(dev, 0, NVME_AQ_DEPTH); if (result) @@ -2331,7 +2457,7 @@ static ssize_t cmb_show(struct device *dev, struct device_attribute *attr, { struct nvme_dev *ndev = to_nvme_dev(dev_get_drvdata(dev)); - return sysfs_emit(buf, "cmbloc : x%08x\ncmbsz : x%08x\n", + return sysfs_emit(buf, "cmbloc : 0x%08x\ncmbsz : 0x%08x\n", ndev->cmbloc, ndev->cmbsz); } static DEVICE_ATTR_RO(cmb); @@ -2518,7 +2644,8 @@ static unsigned int nvme_max_io_queues(struct nvme_dev *dev) */ if (dev->ctrl.quirks & NVME_QUIRK_SHARED_TAGS) return 1; - return num_possible_cpus() + dev->nr_write_queues + dev->nr_poll_queues; + return blk_mq_num_possible_queues(0) + dev->nr_write_queues + + dev->nr_poll_queues; } static int nvme_setup_io_queues(struct nvme_dev *dev) @@ -2913,13 +3040,13 @@ static int nvme_disable_prepare_reset(struct nvme_dev *dev, bool shutdown) static int nvme_pci_alloc_iod_mempool(struct nvme_dev *dev) { size_t meta_size = sizeof(struct scatterlist) * (NVME_MAX_META_SEGS + 1); - size_t alloc_size = sizeof(struct scatterlist) * NVME_MAX_SEGS; + size_t alloc_size = sizeof(struct nvme_dma_vec) * NVME_MAX_SEGS; - dev->iod_mempool = mempool_create_node(1, + dev->dmavec_mempool = mempool_create_node(1, mempool_kmalloc, mempool_kfree, (void *)alloc_size, GFP_KERNEL, dev_to_node(dev->dev)); - if (!dev->iod_mempool) + if (!dev->dmavec_mempool) return -ENOMEM; dev->iod_meta_mempool = mempool_create_node(1, @@ -2928,10 +3055,9 @@ static int nvme_pci_alloc_iod_mempool(struct nvme_dev *dev) dev_to_node(dev->dev)); if (!dev->iod_meta_mempool) goto free; - return 0; free: - mempool_destroy(dev->iod_mempool); + mempool_destroy(dev->dmavec_mempool); return -ENOMEM; } @@ -3272,7 +3398,8 @@ static struct nvme_dev *nvme_pci_alloc_dev(struct pci_dev *pdev, * over a single page. */ dev->ctrl.max_hw_sectors = min_t(u32, - NVME_MAX_KB_SZ << 1, dma_opt_mapping_size(&pdev->dev) >> 9); + NVME_MAX_BYTES >> SECTOR_SHIFT, + dma_opt_mapping_size(&pdev->dev) >> 9); dev->ctrl.max_segments = NVME_MAX_SEGS; dev->ctrl.max_integrity_segments = 1; return dev; @@ -3380,7 +3507,7 @@ out_disable: nvme_dbbuf_dma_free(dev); nvme_free_queues(dev, 0); out_release_iod_mempool: - mempool_destroy(dev->iod_mempool); + mempool_destroy(dev->dmavec_mempool); mempool_destroy(dev->iod_meta_mempool); out_dev_unmap: nvme_dev_unmap(dev); @@ -3444,7 +3571,7 @@ static void nvme_remove(struct pci_dev *pdev) nvme_dev_remove_admin(dev); nvme_dbbuf_dma_free(dev); nvme_free_queues(dev, 0); - mempool_destroy(dev->iod_mempool); + mempool_destroy(dev->dmavec_mempool); mempool_destroy(dev->iod_meta_mempool); nvme_release_descriptor_pools(dev); nvme_dev_unmap(dev); @@ -3847,7 +3974,6 @@ static int __init nvme_init(void) BUILD_BUG_ON(sizeof(struct nvme_create_sq) != 64); BUILD_BUG_ON(sizeof(struct nvme_delete_queue) != 64); BUILD_BUG_ON(IRQ_AFFINITY_MAX_SETS < 2); - BUILD_BUG_ON(nvme_pci_npages_prp() > NVME_MAX_NR_DESCRIPTORS); return pci_register_driver(&nvme_driver); } diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index 9bd3646568d0..190a4cfa8a5e 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -877,7 +877,7 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new) /* * Only start IO queues for which we have allocated the tagset - * and limitted it to the available queues. On reconnects, the + * and limited it to the available queues. On reconnects, the * queue number might have changed. */ nr_queues = min(ctrl->tag_set.nr_hw_queues + 1, ctrl->ctrl.queue_count); diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index d924008c3949..9233f088fac8 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -1745,9 +1745,14 @@ static int nvme_tcp_start_tls(struct nvme_ctrl *nctrl, qid, ret); tls_handshake_cancel(queue->sock->sk); } else { - dev_dbg(nctrl->device, - "queue %d: TLS handshake complete, error %d\n", - qid, queue->tls_err); + if (queue->tls_err) { + dev_err(nctrl->device, + "queue %d: TLS handshake complete, error %d\n", + qid, queue->tls_err); + } else { + dev_dbg(nctrl->device, + "queue %d: TLS handshake complete\n", qid); + } ret = queue->tls_err; } return ret; diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c index 175c5b6d4dd5..884286f90688 100644 --- a/drivers/nvme/target/core.c +++ b/drivers/nvme/target/core.c @@ -581,8 +581,6 @@ int nvmet_ns_enable(struct nvmet_ns *ns) if (ns->enabled) goto out_unlock; - ret = -EMFILE; - ret = nvmet_bdev_ns_enable(ns); if (ret == -ENOTBLK) ret = nvmet_file_ns_enable(ns); diff --git a/drivers/nvme/target/passthru.c b/drivers/nvme/target/passthru.c index b7515c53829b..3b4b0df8f879 100644 --- a/drivers/nvme/target/passthru.c +++ b/drivers/nvme/target/passthru.c @@ -106,7 +106,7 @@ static u16 nvmet_passthru_override_id_ctrl(struct nvmet_req *req) pctrl->max_hw_sectors); /* - * nvmet_passthru_map_sg is limitted to using a single bio so limit + * nvmet_passthru_map_sg is limited to using a single bio so limit * the mdts based on BIO_MAX_VECS as well */ max_hw_sectors = min_not_zero(BIO_MAX_VECS << PAGE_SECTORS_SHIFT, @@ -147,7 +147,7 @@ static u16 nvmet_passthru_override_id_ctrl(struct nvmet_req *req) * When passthru controller is setup using nvme-loop transport it will * export the passthru ctrl subsysnqn (PCIe NVMe ctrl) and will fail in * the nvme/host/core.c in the nvme_init_subsystem()->nvme_active_ctrl() - * code path with duplicate ctr subsynqn. In order to prevent that we + * code path with duplicate ctrl subsysnqn. In order to prevent that we * mask the passthru-ctrl subsysnqn with the target ctrl subsysnqn. */ memcpy(id->subnqn, ctrl->subsysnqn, sizeof(id->subnqn)); diff --git a/drivers/nvme/target/pci-epf.c b/drivers/nvme/target/pci-epf.c index a4295a5b8d28..2e78397a7373 100644 --- a/drivers/nvme/target/pci-epf.c +++ b/drivers/nvme/target/pci-epf.c @@ -1242,8 +1242,11 @@ static void nvmet_pci_epf_queue_response(struct nvmet_req *req) iod->status = le16_to_cpu(req->cqe->status) >> 1; - /* If we have no data to transfer, directly complete the command. */ - if (!iod->data_len || iod->dma_dir != DMA_TO_DEVICE) { + /* + * If the command failed or we have no data to transfer, complete the + * command immediately. + */ + if (iod->status || !iod->data_len || iod->dma_dir != DMA_TO_DEVICE) { nvmet_pci_epf_complete_iod(iod); return; } @@ -1604,8 +1607,13 @@ static void nvmet_pci_epf_exec_iod_work(struct work_struct *work) goto complete; } + /* + * If nvmet_req_init() fails (e.g., unsupported opcode) it will call + * __nvmet_req_complete() internally which will call + * nvmet_pci_epf_queue_response() and will complete the command directly. + */ if (!nvmet_req_init(req, &iod->sq->nvme_sq, &nvmet_pci_epf_fabrics_ops)) - goto complete; + return; iod->data_len = nvmet_req_transfer_len(req); if (iod->data_len) { @@ -1643,10 +1651,11 @@ static void nvmet_pci_epf_exec_iod_work(struct work_struct *work) wait_for_completion(&iod->done); - if (iod->status == NVME_SC_SUCCESS) { - WARN_ON_ONCE(!iod->data_len || iod->dma_dir != DMA_TO_DEVICE); - nvmet_pci_epf_transfer_iod_data(iod); - } + if (iod->status != NVME_SC_SUCCESS) + return; + + WARN_ON_ONCE(!iod->data_len || iod->dma_dir != DMA_TO_DEVICE); + nvmet_pci_epf_transfer_iod_data(iod); complete: nvmet_pci_epf_complete_iod(iod); @@ -1860,7 +1869,7 @@ static int nvmet_pci_epf_enable_ctrl(struct nvmet_pci_epf_ctrl *ctrl) ctrl->io_cqes = 1UL << nvmet_cc_iocqes(ctrl->cc); if (ctrl->io_cqes < sizeof(struct nvme_completion)) { dev_err(ctrl->dev, "Unsupported I/O CQES %zu (need %zu)\n", - ctrl->io_sqes, sizeof(struct nvme_completion)); + ctrl->io_cqes, sizeof(struct nvme_completion)); goto err; } diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c index 29a60fabfcc8..15a579cf528c 100644 --- a/drivers/nvme/target/zns.c +++ b/drivers/nvme/target/zns.c @@ -541,7 +541,7 @@ void nvmet_bdev_execute_zone_append(struct nvmet_req *req) struct bio *bio; int sg_cnt; - /* Request is completed on len mismatch in nvmet_check_transter_len() */ + /* Request is completed on len mismatch in nvmet_check_transfer_len() */ if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req))) return; diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c index 9179f8aee964..615e06fd4ee8 100644 --- a/drivers/scsi/megaraid/megaraid_sas_base.c +++ b/drivers/scsi/megaraid/megaraid_sas_base.c @@ -5971,7 +5971,8 @@ megasas_alloc_irq_vectors(struct megasas_instance *instance) else instance->iopoll_q_count = 0; - num_msix_req = num_online_cpus() + instance->low_latency_index_start; + num_msix_req = blk_mq_num_online_queues(0) + + instance->low_latency_index_start; instance->msix_vectors = min(num_msix_req, instance->msix_vectors); @@ -5987,7 +5988,8 @@ megasas_alloc_irq_vectors(struct megasas_instance *instance) /* Disable Balanced IOPS mode and try realloc vectors */ instance->perf_mode = MR_LATENCY_PERF_MODE; instance->low_latency_index_start = 1; - num_msix_req = num_online_cpus() + instance->low_latency_index_start; + num_msix_req = blk_mq_num_online_queues(0) + + instance->low_latency_index_start; instance->msix_vectors = min(num_msix_req, instance->msix_vectors); @@ -6243,7 +6245,7 @@ static int megasas_init_fw(struct megasas_instance *instance) intr_coalescing = (scratch_pad_1 & MR_INTR_COALESCING_SUPPORT_OFFSET) ? true : false; if (intr_coalescing && - (num_online_cpus() >= MR_HIGH_IOPS_QUEUE_COUNT) && + (blk_mq_num_online_queues(0) >= MR_HIGH_IOPS_QUEUE_COUNT) && (instance->msix_vectors == MEGASAS_MAX_MSIX_QUEUES)) instance->perf_mode = MR_BALANCED_PERF_MODE; else @@ -6287,7 +6289,8 @@ static int megasas_init_fw(struct megasas_instance *instance) else instance->low_latency_index_start = 1; - num_msix_req = num_online_cpus() + instance->low_latency_index_start; + num_msix_req = blk_mq_num_online_queues(0) + + instance->low_latency_index_start; instance->msix_vectors = min(num_msix_req, instance->msix_vectors); @@ -6319,8 +6322,8 @@ static int megasas_init_fw(struct megasas_instance *instance) megasas_setup_reply_map(instance); dev_info(&instance->pdev->dev, - "current msix/online cpus\t: (%d/%d)\n", - instance->msix_vectors, (unsigned int)num_online_cpus()); + "current msix/max num queues\t: (%d/%u)\n", + instance->msix_vectors, blk_mq_num_online_queues(0)); dev_info(&instance->pdev->dev, "RDPQ mode\t: (%s)\n", instance->is_rdpq ? "enabled" : "disabled"); diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c index fe98c76e9be3..c4c6b5c6658c 100644 --- a/drivers/scsi/qla2xxx/qla_isr.c +++ b/drivers/scsi/qla2xxx/qla_isr.c @@ -4533,13 +4533,13 @@ qla24xx_enable_msix(struct qla_hw_data *ha, struct rsp_que *rsp) if (USER_CTRL_IRQ(ha) || !ha->mqiobase) { /* user wants to control IRQ setting for target mode */ ret = pci_alloc_irq_vectors(ha->pdev, min_vecs, - min((u16)ha->msix_count, (u16)(num_online_cpus() + min_vecs)), - PCI_IRQ_MSIX); + blk_mq_num_online_queues(ha->msix_count) + min_vecs, + PCI_IRQ_MSIX); } else ret = pci_alloc_irq_vectors_affinity(ha->pdev, min_vecs, - min((u16)ha->msix_count, (u16)(num_online_cpus() + min_vecs)), - PCI_IRQ_MSIX | PCI_IRQ_AFFINITY, - &desc); + blk_mq_num_online_queues(ha->msix_count) + min_vecs, + PCI_IRQ_MSIX | PCI_IRQ_AFFINITY, + &desc); if (ret < 0) { ql_log(ql_log_fatal, vha, 0x00c7, diff --git a/drivers/scsi/smartpqi/smartpqi_init.c b/drivers/scsi/smartpqi/smartpqi_init.c index 3d40a63e378d..125944941601 100644 --- a/drivers/scsi/smartpqi/smartpqi_init.c +++ b/drivers/scsi/smartpqi/smartpqi_init.c @@ -5294,15 +5294,14 @@ static void pqi_calculate_queue_resources(struct pqi_ctrl_info *ctrl_info) if (is_kdump_kernel()) { num_queue_groups = 1; } else { - int num_cpus; int max_queue_groups; max_queue_groups = min(ctrl_info->max_inbound_queues / 2, ctrl_info->max_outbound_queues - 1); max_queue_groups = min(max_queue_groups, PQI_MAX_QUEUE_GROUPS); - num_cpus = num_online_cpus(); - num_queue_groups = min(num_cpus, ctrl_info->max_msix_vectors); + num_queue_groups = + blk_mq_num_online_queues(ctrl_info->max_msix_vectors); num_queue_groups = min(num_queue_groups, max_queue_groups); } diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 21ce3e940192..96a69edddbe5 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -919,6 +919,7 @@ static int virtscsi_probe(struct virtio_device *vdev) /* We need to know how many queues before we allocate. */ num_queues = virtscsi_config_get(vdev, num_queues) ? : 1; num_queues = min_t(unsigned int, nr_cpu_ids, num_queues); + num_queues = blk_mq_num_possible_queues(num_queues); num_targets = virtscsi_config_get(vdev, max_target) + 1; diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c index 1f60c9d5cb18..a7b297dae489 100644 --- a/drivers/virtio/virtio_vdpa.c +++ b/drivers/virtio/virtio_vdpa.c @@ -329,20 +329,21 @@ create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd) for (i = 0, usedvecs = 0; i < affd->nr_sets; i++) { unsigned int this_vecs = affd->set_size[i]; + unsigned int nr_masks; int j; - struct cpumask *result = group_cpus_evenly(this_vecs); + struct cpumask *result = group_cpus_evenly(this_vecs, &nr_masks); if (!result) { kfree(masks); return NULL; } - for (j = 0; j < this_vecs; j++) + for (j = 0; j < nr_masks; j++) cpumask_copy(&masks[curvec + j], &result[j]); kfree(result); - curvec += this_vecs; - usedvecs += this_vecs; + curvec += nr_masks; + usedvecs += nr_masks; } /* Fill out vectors at the end that don't need affinity */ |
