writeback: per task dirty rate limit

Add two fields to task_struct. 1) account dirtied pages in the individual tasks, for accuracy 2) per-task balance_dirty_pages() call intervals, for flexibility The balance_dirty_pages() call interval (ie. nr_dirtied_pause) will scale near-sqrt to the safety gap between dirty pages and threshold. The main problem of per-task nr_dirtied is, if 1k+ tasks start dirtying pages at exactly the same time, each task will be assigned a large initial nr_dirtied_pause, so that the dirty threshold will be exceeded long before each task reached its nr_dirtied_pause and hence call balance_dirty_pages(). The solution is to watch for the number of pages dirtied on each CPU in between the calls into balance_dirty_pages(). If it exceeds ratelimit_pages (3% dirty threshold), force call balance_dirty_pages() for a chance to set bdi->dirty_exceeded. In normal situations, this safeguarding condition is not expected to trigger at all. On the sqrt in dirty_poll_interval(): It will serve as an initial guess when dirty pages are still in the freerun area. When dirty pages are floating inside the dirty control scope [freerun, limit], a followup patch will use some refined dirty poll interval to get the desired pause time. thresh-dirty (MB) sqrt 1 16 2 22 4 32 8 45 16 64 32 90 64 128 128 181 256 256 512 362 1024 512 The above table means, given 1MB (or 1GB) gap and the dd tasks polling balance_dirty_pages() on every 16 (or 512) pages, the dirty limit won't be exceeded as long as there are less than 16 (or 512) concurrent dd's. So sqrt naturally leads to less overheads and more safe concurrent tasks for large memory servers, which have large (thresh-freerun) gaps. peter: keep the per-CPU ratelimit for safeguarding the 1k+ tasks case CC: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Andrea Righi <andrea@betterlinux.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
author: Wu Fengguang <fengguang.wu@intel.com> 2011-06-11 18:10:12 -0600
committer: Wu Fengguang <fengguang.wu@intel.com> 2011-10-03 21:08:57 +0800
commit: 9d823e8f6b1b7b39f952d7d1795f29162143a433 (patch)
tree: 2ef4c0d29353452dd2f894e7dbd240a31bdd0a02 /kernel/fork.c
parent: 7381131cbcf7e15d201a0ffd782a4698efe4e740 (diff)
download: lwn-9d823e8f6b1b7b39f952d7d1795f29162143a433.tar.gz
lwn-9d823e8f6b1b7b39f952d7d1795f29162143a433.zip
1 files changed, 3 insertions, 0 deletions
diff --git a/kernel/fork.c b/kernel/fork.c
index 8e6b6f4fb272..cc0815df99f2 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1302,6 +1302,9 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 	p->pdeath_signal = 0;
 	p->exit_state = 0;
 
+	p->nr_dirtied = 0;
+	p->nr_dirtied_pause = 128 >> (PAGE_SHIFT - 10);
+
 	/*
 	 * Ok, make it visible to the rest of the system.
 	 * We dont wake it up yet.
author	Wu Fengguang <fengguang.wu@intel.com>	2011-06-11 18:10:12 -0600
committer	Wu Fengguang <fengguang.wu@intel.com>	2011-10-03 21:08:57 +0800
commit	9d823e8f6b1b7b39f952d7d1795f29162143a433 (patch)
tree	2ef4c0d29353452dd2f894e7dbd240a31bdd0a02 /kernel/fork.c
parent	7381131cbcf7e15d201a0ffd782a4698efe4e740 (diff)
download	lwn-9d823e8f6b1b7b39f952d7d1795f29162143a433.tar.gz lwn-9d823e8f6b1b7b39f952d7d1795f29162143a433.zip