<feed xmlns='http://www.w3.org/2005/Atom'>
<title>lwn.git/mm/memory.c, branch v3.18.43</title>
<subtitle>Linux kernel documentation tree maintained by Jonathan Corbet</subtitle>
<id>http://mirrors.hust.edu.cn/git/lwn.git/atom?h=v3.18.43</id>
<link rel='self' href='http://mirrors.hust.edu.cn/git/lwn.git/atom?h=v3.18.43'/>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/'/>
<updated>2016-03-11T14:45:21+00:00</updated>
<entry>
<title>mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED</title>
<updated>2016-03-11T14:45:21+00:00</updated>
<author>
<name>Andrea Arcangeli</name>
<email>aarcange@redhat.com</email>
</author>
<published>2016-02-26T23:19:28+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=244976ce21da748cd2d3c6bab3eed6e9bb4895ab'/>
<id>urn:sha1:244976ce21da748cd2d3c6bab3eed6e9bb4895ab</id>
<content type='text'>
[ Upstream commit ad33bb04b2a6cee6c1f99fabb15cddbf93ff0433 ]

pmd_trans_unstable()/pmd_none_or_trans_huge_or_clear_bad() were
introduced to locklessy (but atomically) detect when a pmd is a regular
(stable) pmd or when the pmd is unstable and can infinitely transition
from pmd_none() and pmd_trans_huge() from under us, while only holding
the mmap_sem for reading (for writing not).

While holding the mmap_sem only for reading, MADV_DONTNEED can run from
under us and so before we can assume the pmd to be a regular stable pmd
we need to compare it against pmd_none() and pmd_trans_huge() in an
atomic way, with pmd_trans_unstable().  The old pmd_trans_huge() left a
tiny window for a race.

Useful applications are unlikely to notice the difference as doing
MADV_DONTNEED concurrently with a page fault would lead to undefined
behavior.

[akpm@linux-foundation.org: tidy up comment grammar/layout]
Signed-off-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Reported-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;

Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
<entry>
<title>mmu_gather: move minimal range calculations into generic code</title>
<updated>2015-03-28T13:22:01+00:00</updated>
<author>
<name>Will Deacon</name>
<email>will.deacon@arm.com</email>
</author>
<published>2014-10-29T10:03:09+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=bf91097cd544111aa767482cc88bd46b67700046'/>
<id>urn:sha1:bf91097cd544111aa767482cc88bd46b67700046</id>
<content type='text'>
[ Upstream commit fb7332a9fedfd62b1ba6530c86f39f0fa38afd49 ]

On architectures with hardware broadcasting of TLB invalidation messages
, it makes sense to reduce the range of the mmu_gather structure when
unmapping page ranges based on the dirty address information passed to
tlb_remove_tlb_entry.

arm64 already does this by directly manipulating the start/end fields
of the gather structure, but this confuses the generic code which
does not expect these fields to change and can end up calculating
invalid, negative ranges when forcing a flush in zap_pte_range.

This patch moves the minimal range calculation out of the arm64 code
and into the generic implementation, simplifying zap_pte_range in the
process (which no longer needs to care about start/end, since they will
point to the appropriate ranges already). With the range being tracked
by core code, the need_flush flag is dropped in favour of checking that
the end of the range has actually been set.

Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Russell King - ARM Linux &lt;linux@arm.linux.org.uk&gt;
Cc: Michal Simek &lt;monstr@monstr.eu&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
<entry>
<title>mm/memory.c: actually remap enough memory</title>
<updated>2015-03-14T19:37:16+00:00</updated>
<author>
<name>Grazvydas Ignotas</name>
<email>notasas@gmail.com</email>
</author>
<published>2015-02-12T23:00:19+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=463fb5a2cc7013f50763ee2322fa724553150842'/>
<id>urn:sha1:463fb5a2cc7013f50763ee2322fa724553150842</id>
<content type='text'>
commit 9cb12d7b4ccaa976f97ce0c5fd0f1b6a83bc2a75 upstream.

For whatever reason, generic_access_phys() only remaps one page, but
actually allows to access arbitrary size.  It's quite easy to trigger
large reads, like printing out large structure with gdb, which leads to a
crash.  Fix it by remapping correct size.

Fixes: 28b2ee20c7cb ("access_process_vm device memory infrastructure")
Signed-off-by: Grazvydas Ignotas &lt;notasas@gmail.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
<entry>
<title>vm: make stack guard page errors return VM_FAULT_SIGSEGV rather than SIGBUS</title>
<updated>2015-02-06T06:36:01+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-01-29T19:15:17+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=338dcaf53e1997a07139016eb62cd42a4dbc15e3'/>
<id>urn:sha1:338dcaf53e1997a07139016eb62cd42a4dbc15e3</id>
<content type='text'>
commit 9c145c56d0c8a0b62e48c8d71e055ad0fb2012ba upstream.

The stack guard page error case has long incorrectly caused a SIGBUS
rather than a SIGSEGV, but nobody actually noticed until commit
fee7e49d4514 ("mm: propagate error from stack expansion even for guard
page") because that error case was never actually triggered in any
normal situations.

Now that we actually report the error, people noticed the wrong signal
that resulted.  So far, only the test suite of libsigsegv seems to have
actually cared, but there are real applications that use libsigsegv, so
let's not wait for any of those to break.

Reported-and-tested-by: Takashi Iwai &lt;tiwai@suse.de&gt;
Tested-by: Jan Engelhardt &lt;jengelh@inai.de&gt;
Acked-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt; # "s390 still compiles and boots"
Cc: linux-arch@vger.kernel.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm: propagate error from stack expansion even for guard page</title>
<updated>2015-01-16T14:59:57+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-01-06T21:00:05+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=c03aed64c4fc9e39e8727920971d889dac14b002'/>
<id>urn:sha1:c03aed64c4fc9e39e8727920971d889dac14b002</id>
<content type='text'>
commit fee7e49d45149fba60156f5b59014f764d3e3728 upstream.

Jay Foad reports that the address sanitizer test (asan) sometimes gets
confused by a stack pointer that ends up being outside the stack vma
that is reported by /proc/maps.

This happens due to an interaction between RLIMIT_STACK and the guard
page: when we do the guard page check, we ignore the potential error
from the stack expansion, which effectively results in a missing guard
page, since the expected stack expansion won't have been done.

And since /proc/maps explicitly ignores the guard page (commit
d7824370e263: "mm: fix up some user-visible effects of the stack guard
page"), the stack pointer ends up being outside the reported stack area.

This is the minimal patch: it just propagates the error.  It also
effectively makes the guard page part of the stack limit, which in turn
measn that the actual real stack is one page less than the stack limit.

Let's see if anybody notices.  We could teach acct_stack_growth() to
allow an extra page for a grow-up/grow-down stack in the rlimit test,
but I don't want to add more complexity if it isn't needed.

Reported-and-tested-by: Jay Foad &lt;jay.foad@gmail.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm: protect set_page_dirty() from ongoing truncation</title>
<updated>2015-01-16T14:59:57+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2015-01-08T22:32:18+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=a78e877e9a689d8195b2e62d68030c36fd8e7f34'/>
<id>urn:sha1:a78e877e9a689d8195b2e62d68030c36fd8e7f34</id>
<content type='text'>
commit 2d6d7f98284648c5ed113fe22a132148950b140f upstream.

Tejun, while reviewing the code, spotted the following race condition
between the dirtying and truncation of a page:

__set_page_dirty_nobuffers()       __delete_from_page_cache()
  if (TestSetPageDirty(page))
                                     page-&gt;mapping = NULL
				     if (PageDirty())
				       dec_zone_page_state(page, NR_FILE_DIRTY);
				       dec_bdi_stat(mapping-&gt;backing_dev_info, BDI_RECLAIMABLE);
    if (page-&gt;mapping)
      account_page_dirtied(page)
        __inc_zone_page_state(page, NR_FILE_DIRTY);
	__inc_bdi_stat(mapping-&gt;backing_dev_info, BDI_RECLAIMABLE);

which results in an imbalance of NR_FILE_DIRTY and BDI_RECLAIMABLE.

Dirtiers usually lock out truncation, either by holding the page lock
directly, or in case of zap_pte_range(), by pinning the mapcount with
the page table lock held.  The notable exception to this rule, though,
is do_wp_page(), for which this race exists.  However, do_wp_page()
already waits for a locked page to unlock before setting the dirty bit,
in order to prevent a race where clear_page_dirty() misses the page bit
in the presence of dirty ptes.  Upgrade that wait to a fully locked
set_page_dirty() to also cover the situation explained above.

Afterwards, the code in set_page_dirty() dealing with a truncation race
is no longer needed.  Remove it.

Reported-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm: fix swapoff hang after page migration and fork</title>
<updated>2014-12-03T17:36:03+00:00</updated>
<author>
<name>Hugh Dickins</name>
<email>hughd@google.com</email>
</author>
<published>2014-12-02T23:59:39+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=2022b4d18a491a578218ce7a4eca8666db895a73'/>
<id>urn:sha1:2022b4d18a491a578218ce7a4eca8666db895a73</id>
<content type='text'>
I've been seeing swapoff hangs in recent testing: it's cycling around
trying unsuccessfully to find an mm for some remaining pages of swap.

I have been exercising swap and page migration more heavily recently,
and now notice a long-standing error in copy_one_pte(): it's trying to
add dst_mm to swapoff's mmlist when it finds a swap entry, but is doing
so even when it's a migration entry or an hwpoison entry.

Which wouldn't matter much, except it adds dst_mm next to src_mm,
assuming src_mm is already on the mmlist: which may not be so.  Then if
pages are later swapped out from dst_mm, swapoff won't be able to find
where to replace them.

There's already a !non_swap_entry() test for stats: move that up before
the swap_duplicate() and the addition to mmlist.

Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Kelley Nielsen &lt;kelleynnn@gmail.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[2.6.18+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>zap_pte_range: update addr when forcing flush after TLB batching faiure</title>
<updated>2014-10-28T20:16:28+00:00</updated>
<author>
<name>Will Deacon</name>
<email>will.deacon@arm.com</email>
</author>
<published>2014-10-28T20:16:28+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=ce9ec37bddb633404a0c23e1acb181a264e7f7f2'/>
<id>urn:sha1:ce9ec37bddb633404a0c23e1acb181a264e7f7f2</id>
<content type='text'>
When unmapping a range of pages in zap_pte_range, the page being
unmapped is added to an mmu_gather_batch structure for asynchronous
freeing. If we run out of space in the batch structure before the range
has been completely unmapped, then we break out of the loop, force a
TLB flush and free the pages that we have batched so far. If there are
further pages to unmap, then we resume the loop where we left off.

Unfortunately, we forget to update addr when we break out of the loop,
which causes us to truncate the range being invalidated as the end
address is exclusive. When we re-enter the loop at the same address, the
page has already been freed and the pte_present test will fail, meaning
that we do not reconsider the address for invalidation.

This patch fixes the problem by incrementing addr by the PAGE_SIZE
before breaking out of the loop on batch failure.

Signed-off-by: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared</title>
<updated>2014-10-14T00:18:28+00:00</updated>
<author>
<name>Peter Feiner</name>
<email>pfeiner@google.com</email>
</author>
<published>2014-10-13T22:55:46+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=64e455079e1bd7787cc47be30b7f601ce682a5f6'/>
<id>urn:sha1:64e455079e1bd7787cc47be30b7f601ce682a5f6</id>
<content type='text'>
For VMAs that don't want write notifications, PTEs created for read faults
have their write bit set.  If the read fault happens after VM_SOFTDIRTY is
cleared, then the PTE's softdirty bit will remain clear after subsequent
writes.

Here's a simple code snippet to demonstrate the bug:

  char* m = mmap(NULL, getpagesize(), PROT_READ | PROT_WRITE,
                 MAP_ANONYMOUS | MAP_SHARED, -1, 0);
  system("echo 4 &gt; /proc/$PPID/clear_refs"); /* clear VM_SOFTDIRTY */
  assert(*m == '\0');     /* new PTE allows write access */
  assert(!soft_dirty(x));
  *m = 'x';               /* should dirty the page */
  assert(soft_dirty(x));  /* fails */

With this patch, write notifications are enabled when VM_SOFTDIRTY is
cleared.  Furthermore, to avoid unnecessary faults, write notifications
are disabled when VM_SOFTDIRTY is set.

As a side effect of enabling and disabling write notifications with
care, this patch fixes a bug in mprotect where vm_page_prot bits set by
drivers were zapped on mprotect.  An analogous bug was fixed in mmap by
commit c9d0bf241451 ("mm: uncached vma support with writenotify").

Signed-off-by: Peter Feiner &lt;pfeiner@google.com&gt;
Reported-by: Peter Feiner &lt;pfeiner@google.com&gt;
Suggested-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Cc: Pavel Emelyanov &lt;xemul@parallels.com&gt;
Cc: Jamie Liu &lt;jamieliu@google.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: softdirty: keep bit when zapping file pte</title>
<updated>2014-09-26T15:10:35+00:00</updated>
<author>
<name>Peter Feiner</name>
<email>pfeiner@google.com</email>
</author>
<published>2014-09-25T23:05:29+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=dbab31aa2ceec2d201966fa0b552f151310ba5f4'/>
<id>urn:sha1:dbab31aa2ceec2d201966fa0b552f151310ba5f4</id>
<content type='text'>
This fixes the same bug as b43790eedd31 ("mm: softdirty: don't forget to
save file map softdiry bit on unmap") and 9aed8614af5a ("mm/memory.c:
don't forget to set softdirty on file mapped fault") where the return
value of pte_*mksoft_dirty was being ignored.

To be sure that no other pte/pmd "mk" function return values were being
ignored, I annotated the functions in arch/x86/include/asm/pgtable.h
with __must_check and rebuilt.

The userspace effect of this bug is that the softdirty mark might be
lost if a file mapped pte get zapped.

Signed-off-by: Peter Feiner &lt;pfeiner@google.com&gt;
Acked-by: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Cc: Pavel Emelyanov &lt;xemul@parallels.com&gt;
Cc: Jamie Liu &lt;jamieliu@google.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[3.12+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
