writeback: plug writeback at a high level

Doing writeback on lots of little files causes terrible IOPS storms because of the per-mapping writeback plugging we do. This essentially causes imeediate dispatch of IO for each mapping, regardless of the context in which writeback is occurring. IOWs, running a concurrent write-lots-of-small 4k files using fsmark on XFS results in a huge number of IOPS being issued for data writes. Metadata writes are sorted and plugged at a high level by XFS, so aggregate nicely into large IOs. However, data writeback IOs are dispatched in individual 4k IOs, even when the blocks of two consecutively written files are adjacent. Test VM: 8p, 8GB RAM, 4xSSD in RAID0, 100TB sparse XFS filesystem, metadata CRCs enabled. Kernel: 3.10-rc5 + xfsdev + my 3.11 xfs queue (~70 patches) Test: $ ./fs_mark -D 10000 -S0 -n 10000 -s 4096 -L 120 -d /mnt/scratch/0 -d /mnt/scratch/1 -d /mnt/scratch/2 -d /mnt/scratch/3 -d /mnt/scratch/4 -d /mnt/scratch/5 -d /mnt/scratch/6 -d /mnt/scratch/7 Result: wall sys create rate Physical write IO time CPU (avg files/s) IOPS Bandwidth ----- ----- ------------ ------ --------- unpatched 6m56s 15m47s 24,000+/-500 26,000 130MB/s patched 5m06s 13m28s 32,800+/-600 1,500 180MB/s improvement -26.44% -14.68% +36.67% -94.23% +38.46% If I use zero length files, this workload at about 500 IOPS, so plugging drops the data IOs from roughly 25,500/s to 1000/s. 3 lines of code, 35% better throughput for 15% less CPU. The benefits of plugging at this layer are likely to be higher for spinning media as the IO patterns for this workload are going make a much bigger difference on high IO latency devices..... Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Jan Kara <jack@suse.cz> Tested-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
author: Dave Chinner <dchinner@redhat.com> 2015-03-04 11:16:36 -0500
committer: Josef Bacik <jbacik@fb.com> 2015-08-17 18:39:45 -0400
commit: d353d7587d02116b9732d5c06615aed75a4d3a47 (patch)
tree: 6ca0f5c0d5996cad2dce712c9da5477f84d940ca /fs/fs-writeback.c
parent: edf15b4d4b01b565cb5f4fd2e2d08940b9f92e2f (diff)
download: lwn-d353d7587d02116b9732d5c06615aed75a4d3a47.tar.gz
lwn-d353d7587d02116b9732d5c06615aed75a4d3a47.zip
1 files changed, 3 insertions, 0 deletions
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 518c6294bf6c..d98e37bbf417 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1439,7 +1439,9 @@ static long writeback_sb_inodes(struct super_block *sb,
 	unsigned long start_time = jiffies;
 	long write_chunk;
 	long wrote = 0;  /* count both pages and inodes */
+	struct blk_plug plug;
 
+	blk_start_plug(&plug);
 	while (!list_empty(&wb->b_io)) {
 		struct inode *inode = wb_inode(wb->b_io.prev);
 
@@ -1537,6 +1539,7 @@ static long writeback_sb_inodes(struct super_block *sb,
 				break;
 		}
 	}
+	blk_finish_plug(&plug);
 	return wrote;
 }
author	Dave Chinner <dchinner@redhat.com>	2015-03-04 11:16:36 -0500
committer	Josef Bacik <jbacik@fb.com>	2015-08-17 18:39:45 -0400
commit	d353d7587d02116b9732d5c06615aed75a4d3a47 (patch)
tree	6ca0f5c0d5996cad2dce712c9da5477f84d940ca /fs/fs-writeback.c
parent	edf15b4d4b01b565cb5f4fd2e2d08940b9f92e2f (diff)
download	lwn-d353d7587d02116b9732d5c06615aed75a4d3a47.tar.gz lwn-d353d7587d02116b9732d5c06615aed75a4d3a47.zip