summaryrefslogtreecommitdiff
path: root/include/asm-generic/pgtable.h
diff options
context:
space:
mode:
authorAndrea Arcangeli <aarcange@redhat.com>2012-06-20 12:52:57 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2012-06-20 14:39:35 -0700
commite4eed03fd06578571c01d4f1478c874bb432c815 (patch)
tree9cfd16247d8208a1fe55ad81b0d4d85c17b80415 /include/asm-generic/pgtable.h
parentabca7c4965845924f65d40e0aa1092bdd895e314 (diff)
downloadlwn-e4eed03fd06578571c01d4f1478c874bb432c815.tar.gz
lwn-e4eed03fd06578571c01d4f1478c874bb432c815.zip
thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE
In the x86 32bit PAE CONFIG_TRANSPARENT_HUGEPAGE=y case while holding the mmap_sem for reading, cmpxchg8b cannot be used to read pmd contents under Xen. So instead of dealing only with "consistent" pmdvals in pmd_none_or_trans_huge_or_clear_bad() (which would be conceptually simpler) we let pmd_none_or_trans_huge_or_clear_bad() deal with pmdvals where the low 32bit and high 32bit could be inconsistent (to avoid having to use cmpxchg8b). The only guarantee we get from pmd_read_atomic is that if the low part of the pmd was found null, the high part will be null too (so the pmd will be considered unstable). And if the low part of the pmd is found "stable" later, then it means the whole pmd was read atomically (because after a pmd is stable, neither MADV_DONTNEED nor page faults can alter it anymore, and we read the high part after the low part). In the 32bit PAE x86 case, it is enough to read the low part of the pmdval atomically to declare the pmd as "stable" and that's true for THP and no THP, furthermore in the THP case we also have a barrier() that will prevent any inconsistent pmdvals to be cached by a later re-read of the *pmd. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Jonathan Nieder <jrnieder@gmail.com> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Petr Matousek <pmatouse@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Tested-by: Andrew Jones <drjones@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'include/asm-generic/pgtable.h')
-rw-r--r--include/asm-generic/pgtable.h10
1 files changed, 10 insertions, 0 deletions
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 6f2b45a9b6bc..ff4947b7a976 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -484,6 +484,16 @@ static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t *pmd)
/*
* The barrier will stabilize the pmdval in a register or on
* the stack so that it will stop changing under the code.
+ *
+ * When CONFIG_TRANSPARENT_HUGEPAGE=y on x86 32bit PAE,
+ * pmd_read_atomic is allowed to return a not atomic pmdval
+ * (for example pointing to an hugepage that has never been
+ * mapped in the pmd). The below checks will only care about
+ * the low part of the pmd with 32bit PAE x86 anyway, with the
+ * exception of pmd_none(). So the important thing is that if
+ * the low part of the pmd is found null, the high part will
+ * be also null or the pmd_none() check below would be
+ * confused.
*/
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
barrier();