summaryrefslogtreecommitdiff
path: root/drivers/block/drbd/drbd_worker.c
AgeCommit message (Collapse)Author
2023-04-01drbd: Pass a peer device to the resync and online verify functionsChristoph Böhmwalder
Originally-from: Andreas Grünbacher <agruen@linbit.com> Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20230330102744.2128122-2-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-01drbd: pass drbd_peer_device to __req_modChristoph Böhmwalder
In preparation to support multiple connections, we need to know which one we need to modify the request state for. Originally-from: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20230330102744.2128122-2-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-01drbd: Add peer device parameter to whole-bitmap I/O handlersAndreas Gruenbacher
Pass a peer device parameter through the bitmap I/O functions to the I/O handlers. In after_state_ch(), set that parameter when queuing the drbd_send_bitmap operation so that this operation knows where to send the bitmap. Signed-off-by: Andreas Gruenbacher <agruen@kernel.org> Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20230330102744.2128122-2-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-01drbd: introduce drbd_ratelimit()Christoph Böhmwalder
Use call site specific ratelimit instead of one single static global. Also ratelimit ASSERTION messages generated by expect(). Originally-from: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20221201110349.1282687-5-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-22drbd: use consistent licenseChristoph Böhmwalder
DRBD currently has a mix of GPL-2.0 and GPL-2.0-or-later SPDX license identifiers. We have decided to stick with GPL 2.0 only, so consistently use that identifier. Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20221122134301.69258-5-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-09drbd: Store op in drbd_peer_requestChristoph Böhmwalder
(Sort of) cherry-picked from the out-of-tree drbd9 branch. Original commit message by Joel Colledge: This simplifies drbd_submit_peer_request by removing most of the arguments. It also makes the treatment of the op better aligned with that in struct bio. Determine fault_type dynamically using information which is already available instead of passing it in as a parameter. Note: The opf in receive_rs_deallocated was changed from REQ_OP_WRITE_ZEROES to REQ_OP_DISCARD. This was required in the out-of-tree module, and does not matter in-tree. The opf is ignored anyway in drbd_submit_peer_request, since the discard/zero-out is decided by the EE_TRIM flag. Signed-off-by: Joel Colledge <joel.colledge@linbit.com> Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20221109133453.51652-4-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14block/drbd: Combine two drbd_submit_peer_request() argumentsBart Van Assche
Combine the drbd_submit_peer_request() 'op' and 'op_flags' arguments into a single argument. This patch does not change any functionality. Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-15-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17drbd: Make use of PFN_UP helper macroCai Huoqing
it's a refactor to make use of PFN_UP helper macro Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20220406190715.1938174-5-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-24Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull SCSI updates from James Bottomley: "This series consists of the usual driver updates (qla2xxx, pm8001, libsas, smartpqi, scsi_debug, lpfc, iscsi, mpi3mr) plus minor updates and bug fixes. The high blast radius core update is the removal of write same, which affects block and several non-SCSI devices. The other big change, which is more local, is the removal of the SCSI pointer" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (281 commits) scsi: scsi_ioctl: Drop needless assignment in sg_io() scsi: bsg: Drop needless assignment in scsi_bsg_sg_io_fn() scsi: lpfc: Copyright updates for 14.2.0.0 patches scsi: lpfc: Update lpfc version to 14.2.0.0 scsi: lpfc: SLI path split: Refactor BSG paths scsi: lpfc: SLI path split: Refactor Abort paths scsi: lpfc: SLI path split: Refactor SCSI paths scsi: lpfc: SLI path split: Refactor CT paths scsi: lpfc: SLI path split: Refactor misc ELS paths scsi: lpfc: SLI path split: Refactor VMID paths scsi: lpfc: SLI path split: Refactor FDISC paths scsi: lpfc: SLI path split: Refactor LS_RJT paths scsi: lpfc: SLI path split: Refactor LS_ACC paths scsi: lpfc: SLI path split: Refactor the RSCN/SCR/RDF/EDC/FARPR paths scsi: lpfc: SLI path split: Refactor PLOGI/PRLI/ADISC/LOGO paths scsi: lpfc: SLI path split: Refactor base ELS paths and the FLOGI path scsi: lpfc: SLI path split: Introduce lpfc_prep_wqe scsi: lpfc: SLI path split: Refactor fast and slow paths to native SLI4 scsi: lpfc: SLI path split: Refactor lpfc_iocbq scsi: lpfc: Use kcalloc() ...
2022-03-04drbd: use bvec_kmap_local in drbd_csum_bioChristoph Hellwig
Using local kmaps slightly reduces the chances to stray writes, and the bvec interface cleans up the code a little bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20220303111905.321089-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-22scsi: drbd: Remove WRITE_SAME supportChristoph Hellwig
REQ_OP_WRITE_SAME was only ever submitted by the legacy Linux zeroing code, which has switched to use REQ_OP_WRITE_ZEROES long ago. Link: https://lore.kernel.org/r/20220209082828.2629273-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-04block: pass a block_device to bio_clone_fastChristoph Hellwig
Pass a block_device to bio_clone_fast and __bio_clone_fast and give the functions more suitable names. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220202160109.108149-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27drbd: remove drbd_req_make_private_bioChristoph Hellwig
Open code drbd_req_make_private_bio in the two callers to prepare for further changes. Also don't bother to initialize bi_next as the bio code already does that that. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: switch partition lookup to use struct block_deviceChristoph Hellwig
Use struct block_device to lookup partitions on a disk. This removes all usage of struct hd_struct from the I/O path. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: Chao Yu <yuchao0@huawei.com> [f2fs] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: allocate struct hd_struct as part of struct bdev_inodeChristoph Hellwig
Allocate hd_struct together with struct block_device to pre-load the lifetime rule changes in preparation of merging the two structures. Note that part0 was previously embedded into struct gendisk, but is a separate allocation now, and already points to the block_device instead of the hd_struct. The lifetime of struct gendisk is still controlled by the struct device embedded in the part0 hd_struct. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-05drbd: remove ->this_bdevChristoph Hellwig
DRBD keeps a block device open just to get and set the capacity from it. Switch to primarily using the disk capacity as intended by the block layer, and sync it to the bdev using revalidate_disk_size. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-25drbd: don't detour through bd_contains for the gendiskChristoph Hellwig
bd_disk is set on all block devices, including those for partitions. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-01block: rename generic_make_request to submit_bio_noacctChristoph Hellwig
generic_make_request has always been very confusingly misnamed, so rename it to submit_bio_noacct to make it clear that it is submit_bio minus accounting and a few checks. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-28tcp: add tcp_sock_set_corkChristoph Hellwig
Add a helper to directly set the TCP_CORK sockopt from kernel space without going through a fake uaccess. Cleanup the callers to avoid pointless wrappers now that this is a simple function call. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25block: move the part_stat* helpers from genhd.h to a new headerChristoph Hellwig
These macros are just used by a few files. Move them out of genhd.h, which is included everywhere into a new standalone header. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-29drbd: fifo_alloc() should use struct_sizeStephen Kitt
Switching to struct_size for the allocation in fifo_alloc avoids hard-coding the type of fifo_buffer.values in fifo_alloc. It also provides overflow protection; to avoid pessimistic code being generated by the compiler as a result, this patch also switches fifo_size to unsigned, propagating the change as appropriate. Reviewed-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Stephen Kitt <steve@sk2.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-24treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 91Thomas Gleixner
Based on 1 normalized pattern(s): is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 or at your option any later version [drbd] is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with [drbd] see the file copying if not write to the free software foundation 675 mass ave cambridge ma 02139 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 16 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Richard Fontana <rfontana@redhat.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190520075212.050796421@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25crypto: shash - remove shash_desc::flagsEric Biggers
The flags field in 'struct shash_desc' never actually does anything. The only ostensibly supported flag is CRYPTO_TFM_REQ_MAY_SLEEP. However, no shash algorithm ever sleeps, making this flag a no-op. With this being the case, inevitably some users who can't sleep wrongly pass MAY_SLEEP. These would all need to be fixed if any shash algorithm actually started sleeping. For example, the shash_ahash_*() functions, which wrap a shash algorithm with the ahash API, pass through MAY_SLEEP from the ahash API to the shash API. However, the shash functions are called under kmap_atomic(), so actually they're assumed to never sleep. Even if it turns out that some users do need preemption points while hashing large buffers, we could easily provide a helper function crypto_shash_update_large() which divides the data into smaller chunks and calls crypto_shash_update() and cond_resched() for each chunk. It's not necessary to have a flag in 'struct shash_desc', nor is it necessary to make individual shash algorithms aware of this at all. Therefore, remove shash_desc::flags, and document that the crypto_shash_*() functions can be called from any context. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-20drbd: introduce P_ZEROES (REQ_OP_WRITE_ZEROES on the "wire")Lars Ellenberg
And also re-enable partial-zero-out + discard aligned. With the introduction of REQ_OP_WRITE_ZEROES, we started to use that for both WRITE_ZEROES and DISCARDS, hoping that WRITE_ZEROES would "do what we want", UNMAP if possible, zero-out the rest. The example scenario is some LVM "thin" backend. While an un-allocated block on dm-thin reads as zeroes, on a dm-thin with "skip_block_zeroing=true", after a partial block write allocated that block, that same block may well map "undefined old garbage" from the backends on LBAs that have not yet been written to. If we cannot distinguish between zero-out and discard on the receiving side, to avoid "undefined old garbage" to pop up randomly at later times on supposedly zero-initialized blocks, we'd need to map all discards to zero-out on the receiving side. But that would potentially do a full alloc on thinly provisioned backends, even when the expectation was to unmap/trim/discard/de-allocate. We need to distinguish on the protocol level, whether we need to guarantee zeroes (and thus use zero-out, potentially doing the mentioned full-alloc), or if we want to put the emphasis on discard, and only do a "best effort zeroing" (by "discarding" blocks aligned to discard-granularity, and zeroing only potential unaligned head and tail clippings to at least *try* to avoid "false positives" in an online-verify later), hoping that someone set skip_block_zeroing=false. For some discussion regarding this on dm-devel, see also https://www.mail-archive.com/dm-devel%40redhat.com/msg07965.html https://www.redhat.com/archives/dm-devel/2018-January/msg00271.html For backward compatibility, P_TRIM means zero-out, unless the DRBD_FF_WZEROES feature flag is agreed upon during handshake. To have upper layers even try to submit WRITE ZEROES requests, we need to announce "efficient zeroout" independently. We need to fixup max_write_zeroes_sectors after blk_queue_stack_limits(): if we can handle "zeroes" efficiently on the protocol, we want to do that, even if our backend does not announce max_write_zeroes_sectors itself. Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-03block: Finish renaming REQ_DISCARD into REQ_OP_DISCARDBart Van Assche
Some time ago REQ_DISCARD was renamed into REQ_OP_DISCARD. Some comments and documentation files were not updated however. Update these comments and documentation files. See also commit 4e1b2d52a80d ("block, fs, drivers: remove REQ_OP compat defs and related code"). Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Mike Christie <mchristi@redhat.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-06drbd: Convert from ahash to shashKees Cook
In preparing to remove all stack VLA usage from the kernel[1], this removes the discouraged use of AHASH_REQUEST_ON_STACK in favor of the smaller SHASH_DESC_ON_STACK by converting from ahash-wrapped-shash to direct shash. By removing a layer of indirection this both improves performance and reduces stack usage. The stack allocation will be made a fixed size in a later patch to the crypto subsystem. The bulk of the lines in this change are simple s/ahash/shash/, but the main logic differences are in drbd_csum_ee() and drbd_csum_bio(), which externalizes the page walking with k(un)map_atomic() instead of using scattergather. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Acked-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-18block: Add part_stat_read_accum to read across field entries.Michael Callahan
Add a part_stat_read_accum macro to genhd.h to read and sum across field entries. For example to sum up the number read and write sectors completed. In addition to being ar reasonable cleanup by itself this will make it easier to add new stat fields in the future. tj: Refreshed on top of v4.17. Signed-off-by: Michael Callahan <michaelcallahan@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-02drbd: fix access after freeLars Ellenberg
We have struct drbd_requests { ... struct bio *private_bio; ... } to hold a bio clone for local submission. On local IO completion, we put that bio, and in case we want to use the result later, we overload that member to hold the ERR_PTR() of the completion result, Which, before v4.3, used to be the passed in "int error", so we could first bio_put(), then assign. v4.3-rc1~100^2~21 4246a0b63bd8 block: add a bi_error field to struct bio changed that: bio_put(req->private_bio); - req->private_bio = ERR_PTR(error); + req->private_bio = ERR_PTR(bio->bi_error); Which introduces an access after free, because it was non obvious that req->private_bio == bio. Impact of that was mostly unnoticable, because we only use that value in a multiple-failure case, and even then map any "unexpected" error code to EIO, so worst case we could potentially mask a more specific error with EIO in a multiple failure case. Unless the pointed to memory region was unmapped, as is the case with CONFIG_DEBUG_PAGEALLOC, in which case this results in BUG: unable to handle kernel paging request v4.13-rc1~70^2~75 4e4cbee93d56 block: switch bios to blk_status_t changes it further to bio_put(req->private_bio); req->private_bio = ERR_PTR(blk_status_to_errno(bio->bi_status)); And blk_status_to_errno() now contains a WARN_ON_ONCE() for unexpected values, which catches this "sometimes", if the memory has been reused quickly enough for other things. Should also go into stable since 4.3, with the trivial change around 4.13. Cc: stable@vger.kernel.org Fixes: 4246a0b63bd8 block: add a bi_error field to struct bio Reported-by: Sarah Newman <srn@prgmr.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-06drbd: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Cc: drbd-dev@lists.linbit.com Signed-off-by: Kees Cook <keescook@chromium.org>
2017-08-29drbd: abort drbd_start_resync if there is no connectionRoland Kammerer
This was found by a static analysis tool. While highly unlikely, be sure to return without dereferencing the NULL pointer. Reported-by: Shaobo <shaobo@cs.utah.edu> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-29drbd: fix potential get_ldev/put_ldev refcount imbalance during attachLars Ellenberg
Race: drbd_adm_attach() | async drbd_md_endio() | device->ldev is still NULL. | | drbd_md_read( | .endio = drbd_md_endio; | submit; | .... | wait for done == 1; | done = 1; ); | wake_up(); .. lot of other stuff, | .. includeing taking and | ...giving up locks, | .. doing further IO, | .. stuff that takes "some time" | | while in this context, | this is the next statement. | which means this context was scheduled .. only then, finally, | away for "some time". device->ldev = nbc; | | if (device->ldev) | put_ldev() Unlikely, but possible. I was able to provoke it "reliably" by adding an mdelay(500); after the wake_up(). Fixed by moving the if (!NULL) put_ldev() before done = 1; Impact of the bug was that the resulting refcount imbalance could lead to premature destruction of the object, potentially causing a NULL pointer dereference during a subsequent detach. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-29drbd: mark symbols static where possibleBaoyou Xie
We get a few warnings when building kernel with W=1: drbd/drbd_receiver.c:1224:6: warning: no previous prototype for 'one_flush_endio' [-Wmissing-prototypes] drbd/drbd_req.c:1450:6: warning: no previous prototype for 'send_and_submit_pending' [-Wmissing-prototypes] drbd/drbd_main.c:924:6: warning: no previous prototype for 'assign_p_sizes_qlim' [-Wmissing-prototypes] .... In fact, these functions are only used in the file in which they are declared and don't need a declaration, but can be made static. So this patch marks these functions with 'static'. Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-29drbd: Send P_NEG_ACK upon write error in protocol != CLars Ellenberg
In protocol != C, we forgot to send the P_NEG_ACK for failing writes. Once we no longer submit to local disk, because we already "detached", due to the typical "on-io-error detach;" config setting, we already send the neg acks right away. Only those requests that have been submitted, and have been error-completed by the local disk, would forget to send the neg-ack, and only in asynchronous replication (protocol != C). Unless this happened during resync, where we already always send acks, regardless of protocol. The primary side needs the P_NEG_ACK in order to mark the affected block(s) for resync in its out-of-sync bitmap. If the blocks in question are not re-written again, we may miss to resync them later, causing data inconsistencies. This patch will always send the neg-acks, and also at least try to persist the out-of-sync status on the local node already. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-29drbd: introduce drbd_recv_header_maybe_unplugLars Ellenberg
Recently, drbd_recv_header() was changed to potentially implicitly "unplug" the backend device(s), in case there is currently nothing to receive. Be more explicit about it: re-introduce the original drbd_recv_header(), and introduce a new drbd_recv_header_maybe_unplug() for use by the receiver "main loop". Using explicit plugging via blk_start_plug(); blk_finish_plug(); really helps the io-scheduler of the backend with merging requests. Wrap the receiver "main loop" with such a plug. Also catch unplug events on the Primary, and try to propagate. This is performance relevant. Without this, if the receiving side does not merge requests, number of IOPS on the peer can me significantly higher than IOPS on the Primary, and can easily become the bottleneck. Together, both changes should help to reduce the number of IOPS as seen on the backend of the receiving side, by increasing the chance of merging mergable requests, without trading latency for more throughput. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-23block: replace bi_bdev with a gendisk pointer and partitions indexChristoph Hellwig
This way we don't need a block_device structure to submit I/O. The block_device has different life time rules from the gendisk and request_queue and is usually only available when the block device node is open. Other callers need to explicitly create one (e.g. the lightnvm passthrough code, or the new nvme multipathing code). For the actual I/O path all that we need is the gendisk, which exists once per block device. But given that the block layer also does partition remapping we additionally need a partition index, which is used for said remapping in generic_make_request. Note that all the block drivers generally want request_queue or sometimes the gendisk, so this removes a layer of indirection all over the stack. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-09block: switch bios to blk_status_tChristoph Hellwig
Replace bi_error with a new bi_status to allow for a clear conversion. Note that device mapper overloaded bi_error with a private value, which we'll have to keep arround at least for now and thus propagate to a proper blk_status_t value. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-08drbd: implement REQ_OP_WRITE_ZEROESChristoph Hellwig
It seems like DRBD assumes its on the wire TRIM request always zeroes data. Use that fact to implement REQ_OP_WRITE_ZEROES. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-02sched/headers: Prepare to move signal wakeup & sigpending methods from ↵Ingo Molnar
<linux/sched.h> into <linux/sched/signal.h> Fix up affected files that include this signal functionality via sched.h. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-07block: rename bio bi_rw to bi_opfJens Axboe
Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower portion and the op code in the higher portions. This means that old code that relies on manually setting bi_rw is most likely going to be broken. Instead of letting that brokeness linger, rename the member, to force old and out-of-tree code to break at compile time instead of at runtime. No intended functional changes in this commit. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-20block: get rid of bio_rw and READAChristoph Hellwig
These two are confusing leftover of the old world order, combining values of the REQ_OP_ and REQ_ namespaces. For callers that don't special case we mostly just replace bi_rw with bio_data_dir or op_is_write, except for the few cases where a switch over the REQ_OP_ values makes more sense. Any check for READA is replaced with an explicit check for REQ_RAHEAD. Also remove the READA alias for REQ_RAHEAD. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-13drbd: code cleanups without semantic changesFabian Frederick
This contains various cosmetic fixes ranging from simple typos to const-ifying, and using booleans properly. Original commit messages from Fabian's patch set: drbd: debugfs: constify drbd_version_fops drbd: use seq_put instead of seq_print where possible drbd: include linux/uaccess.h instead of asm/uaccess.h drbd: use const char * const for drbd strings drbd: kerneldoc warning fix in w_e_end_data_req() drbd: use unsigned for one bit fields drbd: use bool for peer is_ states drbd: fix typo drbd: use | for bitmask combination drbd: use true/false for bool drbd: fix drbd_bm_init() comments drbd: introduce peer state union drbd: fix maybe_pull_ahead() locking comments drbd: use bool for growing drbd: remove redundant declarations drbd: replace if/BUG by BUG_ON Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Roland Kammerer <roland.kammerer@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-13drbd: introduce WRITE_SAME supportLars Ellenberg
We will support WRITE_SAME, if * all peers support WRITE_SAME (both in kernel and DRBD version), * all peer devices support WRITE_SAME * logical_block_size is identical on all peers. We may at some point introduce a fallback on the receiving side for devices/kernels that do not support WRITE_SAME, by open-coding a submit loop. But not yet. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-13drbd: introduce unfence-peer handlerLars Ellenberg
When resync is finished, we already call the "after-resync-target" handler (on the former sync target, obviously), once per volume. Paired with the before-resync-target handler, you can create snapshots, before the resync causes the volumes to become inconsistent, and discard those snapshots again, once they are no longer needed. It was also overloaded to be paired with the "fence-peer" handler, to "unfence" once the volumes are up-to-date and known good. This has some disadvantages, though: we call "fence-peer" for the whole connection (once for the group of volumes), but would call unfence as side-effect of after-resync-target once for each volume. Also, we fence on a (current, or about to become) Primary, which will later become the sync-source. Calling unfence only as a side effect of the after-resync-target handler opens a race window, between a new fence on the Primary (SyncTarget) and the unfence on the SyncTarget, which is difficult to close without some kind of "cluster wide lock" in those handlers. We would not need those handlers if we could still communicate. Which makes trying to aquire a cluster wide lock from those handlers seem like a very bad idea. This introduces the "unfence-peer" handler, which will be called per connection (once for the group of volumes), just like the fence handler, only once all volumes are back in sync, and on the SyncSource. Which is expected to be the node that previously called "fence", the node that is currently allowed to be Primary, and thus the only node that could trigger a new "fence" that could race with this unfence. Which makes us not need any cluster wide synchronization here, serializing two scripts running on the same node is trivial. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-13drbd: Create the protocol feature THIN_RESYNCPhilipp Reisner
If thinly provisioned volumes are used, during a resync the sync source tries to find out if a block is deallocated. If it is deallocated, then the resync target uses block_dev_issue_zeroout() on the range in question. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-13drbd: Implement handling of thinly provisioned storage on resync target nodesPhilipp Reisner
If during resync we read only zeroes for a range of sectors assume that these secotors can be discarded on the sync target node. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07drbd: use bio op accessorsMike Christie
Separate the op from the rq_flag_bits and have drbd set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-27drbd: Use shash and ahashHerbert Xu
This patch replaces uses of the long obsolete hash interface with either shash (for non-SG users) or ahash. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-11-25drbd: make drbd known to lsblk: use bd_link_disk_holderLars Ellenberg
lsblk should be able to pick up stacking device driver relations involving DRBD conveniently. Even though upstream kernel since 2011 says "DON'T USE THIS UNLESS YOU'RE ALREADY USING IT." a new user has been added since (bcache), which sets the precedences for us to use it as well. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-25drbd: Create a dedicated workqueue for sending acks on the control connectionPhilipp Reisner
The intention is to reduce CPU utilization. Recent measurements unveiled that the current performance bottleneck is CPU utilization on the receiving node. The asender thread became CPU limited. One of the main points is to eliminate the idr_for_each_entry() loop from the sending acks code path. One exception in that is sending back ping_acks. These stay in the ack-receiver thread. Otherwise the logic becomes too complicated for no added value. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-25drbd: improve network timeout detectionLars Ellenberg
Don't blame the peer for being unresponsive, if we did not even ask the question yet. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>