diff options
author | Qu Wenruo <wqu@suse.com> | 2023-02-07 12:26:14 +0800 |
---|---|---|
committer | David Sterba <dsterba@suse.com> | 2023-04-17 18:01:14 +0200 |
commit | 1faf3885067d5be65597d5dc682f0da505822104 (patch) | |
tree | 3e8cae741570bcdd878d5c2d919d2ad4e78a2235 /fs/btrfs/volumes.h | |
parent | 4ced85f81a7a67e190a4ff4e14d8cc04978a3767 (diff) | |
download | lwn-1faf3885067d5be65597d5dc682f0da505822104.tar.gz lwn-1faf3885067d5be65597d5dc682f0da505822104.zip |
btrfs: use an efficient way to represent source of duplicated stripes
For btrfs dev-replace, we have to duplicate writes to the source
device into the target device.
For non-RAID56, all writes into the same mapped ranges are sharing the
same content, thus they don't really need to bother anything.
(E.g. in btrfs_submit_bio() for non-RAID56 range we just submit the
same write to all involved devices).
But for RAID56, all stripes contain different content, thus we must
have a clear mapping of which stripe is duplicated from which original
stripe.
Currently we use a complex way using tgtdev_map[] array, e.g:
num_tgtdevs = 1
tgtdev_map[0] = 0 <- Means stripes[0] is not involved in replace.
tgtdev_map[1] = 3 <- Means stripes[1] is involved in replace,
and it's duplicated to stripes[3].
tgtdev_map[2] = 0 <- Means stripes[2] is not involved in replace.
But this is wasting some space, and ignores one important thing for
dev-replace, there is at most one running replace.
Thus we can change it to a fixed array to represent the mapping:
replace_nr_stripes = 1
replace_stripe_src = 1 <- Means stripes[1] is involved in replace.
thus the extra stripe is a copy of
stripes[1]
By this we can save some space for bioc on RAID56 chunks with many
devices. And we get rid of one variable sized array from bioc.
Thus the patch involves the following changes:
- Replace @num_tgtdevs and @tgtdev_map[] with @replace_nr_stripes
and @replace_stripe_src.
@num_tgtdevs is just renamed to @replace_nr_stripes.
While the mapping is completely changed.
- Add extra ASSERT()s for RAID56 code
- Only add two more extra stripes for dev-replace cases.
As we have an upper limit on how many dev-replace stripes we can have.
- Unify the behavior of handle_ops_on_dev_replace()
Previously handle_ops_on_dev_replace() go two different paths for
WRITE and GET_READ_MIRRORS.
Now unify them by always going the WRITE path first (with at most 2
replace stripes), then if we're doing GET_READ_MIRRORS and we have 2
extra stripes, just drop one stripe.
- Remove the @real_stripes argument from alloc_btrfs_io_context()
As we don't need the old variable length array any more.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Diffstat (limited to 'fs/btrfs/volumes.h')
-rw-r--r-- | fs/btrfs/volumes.h | 26 |
1 files changed, 14 insertions, 12 deletions
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index da0f9a9eaf94..e86e9f25ba0f 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -427,14 +427,13 @@ struct btrfs_io_context { /* * The following two members are for dev-replace case only. * - * @num_tgtdevs: Number of duplicated stripes which need to be + * @replace_nr_stripes: Number of duplicated stripes which need to be * written to replace target. * Should be <= 2 (2 for DUP, otherwise <= 1). - * @tgtdev_map: The array indicates where the duplicated stripes - * are from. The size is the number of original - * stripes (num_stripes - num_tgtdevs). + * @replace_stripe_src: The array indicates where the duplicated stripes + * are from. * - * The @tgtdev_map[] array is mostly for RAID56 cases. + * The @replace_stripe_src[] array is mostly for RAID56 cases. * As non-RAID56 stripes share the same contents of the mapped range, * thus no need to bother where the duplicated ones are from. * @@ -449,14 +448,17 @@ struct btrfs_io_context { * stripes[2]: dev = devid 3, physical = Z * stripes[3]: dev = devid 0, physical = Y * - * num_tgtdevs = 1 - * tgtdev_map[0] = 0 <- Means stripes[0] is not involved in replace. - * tgtdev_map[1] = 3 <- Means stripes[1] is involved in replace, - * and it's duplicated to stripes[3]. - * tgtdev_map[2] = 0 <- Means stripes[2] is not involved in replace. + * replace_nr_stripes = 1 + * replace_stripe_src = 1 <- Means stripes[1] is involved in replace. + * The duplicated stripe index would be + * (@num_stripes - 1). + * + * Note, that we can still have cases replace_nr_stripes = 2 for DUP. + * In that case, all stripes share the same content, thus we don't + * need to bother @replace_stripe_src value at all. */ - u16 num_tgtdevs; - u16 *tgtdev_map; + u16 replace_nr_stripes; + s16 replace_stripe_src; /* * logical block numbers for the start of each stripe * The last one or two are p/q. These are sorted, |