<feed xmlns='http://www.w3.org/2005/Atom'>
<title>lwn.git/mm, branch v3.14.21</title>
<subtitle>Linux kernel documentation tree maintained by Jonathan Corbet</subtitle>
<id>http://mirrors.hust.edu.cn/git/lwn.git/atom?h=v3.14.21</id>
<link rel='self' href='http://mirrors.hust.edu.cn/git/lwn.git/atom?h=v3.14.21'/>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/'/>
<updated>2014-10-09T19:21:29+00:00</updated>
<entry>
<title>mm: don't pointlessly use BUG_ON() for sanity check</title>
<updated>2014-10-09T19:21:29+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-04-28T21:24:09+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=c56af023a5fe80252b4dd555926eeaa294d9112e'/>
<id>urn:sha1:c56af023a5fe80252b4dd555926eeaa294d9112e</id>
<content type='text'>
commit 50f5aa8a9b248fa4262cf379863ec9a531b49737 upstream.

BUG_ON() is a big hammer, and should be used _only_ if there is some
major corruption that you cannot possibly recover from, making it
imperative that the current process (and possibly the whole machine) be
terminated with extreme prejudice.

The trivial sanity check in the vmacache code is *not* such a fatal
error.  Recovering from it is absolutely trivial, and using BUG_ON()
just makes it harder to debug for no actual advantage.

To make matters worse, the placement of the BUG_ON() (only if the range
check matched) actually makes it harder to hit the sanity check to begin
with, so _if_ there is a bug (and we just got a report from Srivatsa
Bhat that this can indeed trigger), it is harder to debug not just
because the machine is possibly dead, but because we don't have better
coverage.

BUG_ON() must *die*.  Maybe we should add a checkpatch warning for it,
because it is simply just about the worst thing you can ever do if you
hit some "this cannot happen" situation.

Reported-by: Srivatsa S. Bhat &lt;srivatsa.bhat@linux.vnet.ibm.com&gt;
Cc: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm: per-thread vma caching</title>
<updated>2014-10-09T19:21:29+00:00</updated>
<author>
<name>Davidlohr Bueso</name>
<email>davidlohr@hp.com</email>
</author>
<published>2014-04-07T22:37:25+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=efb5fea23009a0223996e699b54cc9533e2070e9'/>
<id>urn:sha1:efb5fea23009a0223996e699b54cc9533e2070e9</id>
<content type='text'>
commit 615d6e8756c87149f2d4c1b93d471bca002bd849 upstream.

This patch is a continuation of efforts trying to optimize find_vma(),
avoiding potentially expensive rbtree walks to locate a vma upon faults.
The original approach (https://lkml.org/lkml/2013/11/1/410), where the
largest vma was also cached, ended up being too specific and random,
thus further comparison with other approaches were needed.  There are
two things to consider when dealing with this, the cache hit rate and
the latency of find_vma().  Improving the hit-rate does not necessarily
translate in finding the vma any faster, as the overhead of any fancy
caching schemes can be too high to consider.

We currently cache the last used vma for the whole address space, which
provides a nice optimization, reducing the total cycles in find_vma() by
up to 250%, for workloads with good locality.  On the other hand, this
simple scheme is pretty much useless for workloads with poor locality.
Analyzing ebizzy runs shows that, no matter how many threads are
running, the mmap_cache hit rate is less than 2%, and in many situations
below 1%.

The proposed approach is to replace this scheme with a small per-thread
cache, maximizing hit rates at a very low maintenance cost.
Invalidations are performed by simply bumping up a 32-bit sequence
number.  The only expensive operation is in the rare case of a seq
number overflow, where all caches that share the same address space are
flushed.  Upon a miss, the proposed replacement policy is based on the
page number that contains the virtual address in question.  Concretely,
the following results are seen on an 80 core, 8 socket x86-64 box:

1) System bootup: Most programs are single threaded, so the per-thread
   scheme does improve ~50% hit rate by just adding a few more slots to
   the cache.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 50.61%   | 19.90            |
| patched        | 73.45%   | 13.58            |
+----------------+----------+------------------+

2) Kernel build: This one is already pretty good with the current
   approach as we're dealing with good locality.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 75.28%   | 11.03            |
| patched        | 88.09%   | 9.31             |
+----------------+----------+------------------+

3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 70.66%   | 17.14            |
| patched        | 91.15%   | 12.57            |
+----------------+----------+------------------+

4) Ebizzy: There's a fair amount of variation from run to run, but this
   approach always shows nearly perfect hit rates, while baseline is just
   about non-existent.  The amounts of cycles can fluctuate between
   anywhere from ~60 to ~116 for the baseline scheme, but this approach
   reduces it considerably.  For instance, with 80 threads:

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 1.06%    | 91.54            |
| patched        | 99.97%   | 14.18            |
+----------------+----------+------------------+

[akpm@linux-foundation.org: fix nommu build, per Davidlohr]
[akpm@linux-foundation.org: document vmacache_valid() logic]
[akpm@linux-foundation.org: attempt to untangle header files]
[akpm@linux-foundation.org: add vmacache_find() BUG_ON]
[hughd@google.com: add vmacache_valid_mm() (from Oleg)]
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: adjust and enhance comments]
Signed-off-by: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Reviewed-by: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Reviewed-by: Michel Lespinasse &lt;walken@google.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Tested-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>vmscan: reclaim_clean_pages_from_list() must use mod_zone_page_state()</title>
<updated>2014-10-09T19:21:29+00:00</updated>
<author>
<name>Christoph Lameter</name>
<email>cl@linux.com</email>
</author>
<published>2014-04-18T22:07:10+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=264a8ae739402ab7e14d1794fd4bbce0e339d415'/>
<id>urn:sha1:264a8ae739402ab7e14d1794fd4bbce0e339d415</id>
<content type='text'>
commit 83da7510058736c09a14b9c17ec7d851940a4332 upstream.

Seems to be called with preemption enabled.  Therefore it must use
mod_zone_page_state instead.

Signed-off-by: Christoph Lameter &lt;cl@linux.com&gt;
Reported-by: Grygorii Strashko &lt;grygorii.strashko@ti.com&gt;
Tested-by: Grygorii Strashko &lt;grygorii.strashko@ti.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Santosh Shilimkar &lt;santosh.shilimkar@ti.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm: vmscan: shrink_slab: rename max_pass -&gt; freeable</title>
<updated>2014-10-09T19:21:29+00:00</updated>
<author>
<name>Vladimir Davydov</name>
<email>vdavydov@parallels.com</email>
</author>
<published>2014-04-03T21:47:32+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=8e524793fdfb061fc31936d9adaa026d7bdc916c'/>
<id>urn:sha1:8e524793fdfb061fc31936d9adaa026d7bdc916c</id>
<content type='text'>
commit d5bc5fd3fcb7b8dfb431694a8c8052466504c10c upstream.

The name `max_pass' is misleading, because this variable actually keeps
the estimate number of freeable objects, not the maximal number of
objects we can scan in this pass, which can be twice that.  Rename it to
reflect its actual meaning.

Signed-off-by: Vladimir Davydov &lt;vdavydov@parallels.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm: vmscan: respect NUMA policy mask when shrinking slab on direct reclaim</title>
<updated>2014-10-09T19:21:29+00:00</updated>
<author>
<name>Vladimir Davydov</name>
<email>vdavydov@parallels.com</email>
</author>
<published>2014-04-03T21:47:19+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=12f2f0bab442ecdff45d67ff1c3d0b21f46794bd'/>
<id>urn:sha1:12f2f0bab442ecdff45d67ff1c3d0b21f46794bd</id>
<content type='text'>
commit 99120b772b52853f9a2b829a21dd44d9b20558f1 upstream.

When direct reclaim is executed by a process bound to a set of NUMA
nodes, we should scan only those nodes when possible, but currently we
will scan kmem from all online nodes even if the kmem shrinker is NUMA
aware.  That said, binding a process to a particular NUMA node won't
prevent it from shrinking inode/dentry caches from other nodes, which is
not good.  Fix this.

Signed-off-by: Vladimir Davydov &lt;vdavydov@parallels.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Dave Chinner &lt;dchinner@redhat.com&gt;
Cc: Glauber Costa &lt;glommer@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm/filemap.c: avoid always dirtying mapping-&gt;flags on O_DIRECT</title>
<updated>2014-10-09T19:21:29+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@fb.com</email>
</author>
<published>2014-05-22T18:54:16+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=6591981a30a8c752b1c537211f7fe4b9530fea41'/>
<id>urn:sha1:6591981a30a8c752b1c537211f7fe4b9530fea41</id>
<content type='text'>
commit 7fcbbaf18392f0b17c95e2f033c8ccf87eecde1d upstream.

In some testing I ran today (some fio jobs that spread over two nodes),
we end up spending 40% of the time in filemap_check_errors().  That
smells fishy.  Looking further, this is basically what happens:

blkdev_aio_read()
    generic_file_aio_read()
        filemap_write_and_wait_range()
            if (!mapping-&gt;nr_pages)
                filemap_check_errors()

and filemap_check_errors() always attempts two test_and_clear_bit() on
the mapping flags, thus dirtying it for every single invocation.  The
patch below tests each of these bits before clearing them, avoiding this
issue.  In my test case (4-socket box), performance went from 1.7M IOPS
to 4.0M IOPS.

Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Acked-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm: optimize put_mems_allowed() usage</title>
<updated>2014-10-09T19:21:28+00:00</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2014-04-03T21:47:24+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=29c2a88157819d1e68ffea8b7d80117b332c8efe'/>
<id>urn:sha1:29c2a88157819d1e68ffea8b7d80117b332c8efe</id>
<content type='text'>
commit d26914d11751b23ca2e8747725f2cae10c2f2c1b upstream.

Since put_mems_allowed() is strictly optional, its a seqcount retry, we
don't need to evaluate the function if the allocation was in fact
successful, saving a smp_rmb some loads and comparisons on some relative
fast-paths.

Since the naming, get/put_mems_allowed() does suggest a mandatory
pairing, rename the interface, as suggested by Mel, to resemble the
seqcount interface.

This gives us: read_mems_allowed_begin() and read_mems_allowed_retry(),
where it is important to note that the return value of the latter call
is inverted from its previous incarnation.

Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm/readahead.c: fix readahead failure for memoryless NUMA nodes and limit readahead pages</title>
<updated>2014-10-09T19:21:28+00:00</updated>
<author>
<name>Raghavendra K T</name>
<email>raghavendra.kt@linux.vnet.ibm.com</email>
</author>
<published>2014-04-03T21:48:23+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=d4995db1ea96e5f357b92469c9b6c3ecc6bdfbaa'/>
<id>urn:sha1:d4995db1ea96e5f357b92469c9b6c3ecc6bdfbaa</id>
<content type='text'>
commit 6d2be915e589b58cb11418cbe1f22ff90732b6ac upstream.

Currently max_sane_readahead() returns zero on the cpu whose NUMA node
has no local memory which leads to readahead failure.  Fix this
readahead failure by returning minimum of (requested pages, 512).  Users
running applications on a memory-less cpu which needs readahead such as
streaming application see considerable boost in the performance.

Result:

fadvise experiment with FADV_WILLNEED on a PPC machine having memoryless
CPU with 1GB testfile (12 iterations) yielded around 46.66% improvement.

fadvise experiment with FADV_WILLNEED on a x240 machine with 1GB
testfile 32GB* 4G RAM numa machine (12 iterations) showed no impact on
the normal NUMA cases w/ patch.

  Kernel       Avg  Stddev
  base      7.4975   3.92%
  patched   7.4174   3.26%

[Andrew: making return value PAGE_SIZE independent]
Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Raghavendra K T &lt;raghavendra.kt@linux.vnet.ibm.com&gt;
Acked-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm, compaction: ignore pageblock skip when manually invoking compaction</title>
<updated>2014-10-09T19:21:28+00:00</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2014-04-03T21:47:23+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=67c58ce60024b9b3d9cf203da994bf265aa7e369'/>
<id>urn:sha1:67c58ce60024b9b3d9cf203da994bf265aa7e369</id>
<content type='text'>
commit 91ca9186484809c57303b33778d841cc28f696ed upstream.

The cached pageblock hint should be ignored when triggering compaction
through /proc/sys/vm/compact_memory so all eligible memory is isolated.
Manually invoking compaction is known to be expensive, there's no need
to skip pageblocks based on heuristics (mainly for debugging).

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm, compaction: determine isolation mode only once</title>
<updated>2014-10-09T19:21:28+00:00</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2014-04-07T22:37:34+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=e8cd5b562a4bfa351d07a5e4c5758b1afe6a4c0b'/>
<id>urn:sha1:e8cd5b562a4bfa351d07a5e4c5758b1afe6a4c0b</id>
<content type='text'>
commit da1c67a76f7cf2b3404823d24f9f10fa91aa5dc5 upstream.

The conditions that control the isolation mode in
isolate_migratepages_range() do not change during the iteration, so
extract them out and only define the value once.

This actually does have an effect, gcc doesn't optimize it itself because
of cc-&gt;sync.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
</feed>
