diff options
author | Dennis Zhou <dennis@kernel.org> | 2020-01-02 16:26:42 -0500 |
---|---|---|
committer | David Sterba <dsterba@suse.com> | 2020-01-20 16:41:00 +0100 |
commit | dbc2a8c92756507e8183a4c23a02fa2a994eb640 (patch) | |
tree | cbb4fb707073bd9b669f518e3a370b586f7ec7ab /fs/btrfs/discard.c | |
parent | 9ddf648f9c2a492cef4e41e31c50515a817d0562 (diff) | |
download | lwn-dbc2a8c92756507e8183a4c23a02fa2a994eb640.tar.gz lwn-dbc2a8c92756507e8183a4c23a02fa2a994eb640.zip |
btrfs: add async discard implementation overview
Give a brief overview for how async discard is implemented.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Diffstat (limited to 'fs/btrfs/discard.c')
-rw-r--r-- | fs/btrfs/discard.c | 39 |
1 files changed, 39 insertions, 0 deletions
diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c index 40dcb5dcdc95..6f48ae1589d9 100644 --- a/fs/btrfs/discard.c +++ b/fs/btrfs/discard.c @@ -12,6 +12,45 @@ #include "discard.h" #include "free-space-cache.h" +/* + * This contains the logic to handle async discard. + * + * Async discard manages trimming of free space outside of transaction commit. + * Discarding is done by managing the block_groups on a LRU list based on free + * space recency. Two passes are used to first prioritize discarding extents + * and then allow for trimming in the bitmap the best opportunity to coalesce. + * The block_groups are maintained on multiple lists to allow for multiple + * passes with different discard filter requirements. A delayed work item is + * used to manage discarding with timeout determined by a max of the delay + * incurred by the iops rate limit, the byte rate limit, and the max delay of + * BTRFS_DISCARD_MAX_DELAY. + * + * Note, this only keeps track of block_groups that are explicitly for data. + * Mixed block_groups are not supported. + * + * The first list is special to manage discarding of fully free block groups. + * This is necessary because we issue a final trim for a full free block group + * after forgetting it. When a block group becomes unused, instead of directly + * being added to the unused_bgs list, we add it to this first list. Then + * from there, if it becomes fully discarded, we place it onto the unused_bgs + * list. + * + * The in-memory free space cache serves as the backing state for discard. + * Consequently this means there is no persistence. We opt to load all the + * block groups in as not discarded, so the mount case degenerates to the + * crashing case. + * + * As the free space cache uses bitmaps, there exists a tradeoff between + * ease/efficiency for find_free_extent() and the accuracy of discard state. + * Here we opt to let untrimmed regions merge with everything while only letting + * trimmed regions merge with other trimmed regions. This can cause + * overtrimming, but the coalescing benefit seems to be worth it. Additionally, + * bitmap state is tracked as a whole. If we're able to fully trim a bitmap, + * the trimmed flag is set on the bitmap. Otherwise, if an allocation comes in, + * this resets the state and we will retry trimming the whole bitmap. This is a + * tradeoff between discard state accuracy and the cost of accounting. + */ + /* This is an initial delay to give some chance for block reuse */ #define BTRFS_DISCARD_DELAY (120ULL * NSEC_PER_SEC) #define BTRFS_DISCARD_UNUSED_DELAY (10ULL * NSEC_PER_SEC) |