diff options
author | Peter Xu <peterx@redhat.com> | 2022-05-12 20:22:55 -0700 |
---|---|---|
committer | Andrew Morton <akpm@linux-foundation.org> | 2022-05-13 07:20:11 -0700 |
commit | deb4c93a9871082a7df6a99ca8fa48024665a71a (patch) | |
tree | 393086bbfdc58ffa6cea9afafc6cf55a37aed9e9 /mm/khugepaged.c | |
parent | bc70fbf269fdff410b0b6d75c3770b9f59117b90 (diff) | |
download | lwn-deb4c93a9871082a7df6a99ca8fa48024665a71a.tar.gz lwn-deb4c93a9871082a7df6a99ca8fa48024665a71a.zip |
mm/khugepaged: don't recycle vma pgtable if uffd-wp registered
When we're trying to collapse a 2M huge shmem page, don't retract pgtable
pmd page if it's registered with uffd-wp, because that pgtable could have
pte markers installed. Recycling of that pgtable means we'll lose the pte
markers. That could cause data loss for an uffd-wp enabled application on
shmem.
Instead of disabling khugepaged on these files, simply skip retracting
these special VMAs, then the page cache can still be merged into a huge
thp, and other mm/vma can still map the range of file with a huge thp when
proper.
Note that checking VM_UFFD_WP needs to be done with mmap_sem held for
write, that avoids race like:
khugepaged user thread
========== ===========
check VM_UFFD_WP, not set
UFFDIO_REGISTER with uffd-wp on shmem
wr-protect some pages (install markers)
take mmap_sem write lock
erase pmd and free pmd page
--> pte markers are dropped unnoticed!
Link: https://lkml.kernel.org/r/20220405014921.14994-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Diffstat (limited to 'mm/khugepaged.c')
-rw-r--r-- | mm/khugepaged.c | 14 |
1 files changed, 13 insertions, 1 deletions
diff --git a/mm/khugepaged.c b/mm/khugepaged.c index a2560f970881..2243ed095f02 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1456,6 +1456,10 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr) if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE)) return; + /* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */ + if (userfaultfd_wp(vma)) + return; + hpage = find_lock_page(vma->vm_file->f_mapping, linear_page_index(vma, haddr)); if (!hpage) @@ -1591,7 +1595,15 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * reverse order. Trylock is a way to avoid deadlock. */ if (mmap_write_trylock(mm)) { - if (!khugepaged_test_exit(mm)) + /* + * When a vma is registered with uffd-wp, we can't + * recycle the pmd pgtable because there can be pte + * markers installed. Skip it only, so the rest mm/vma + * can still have the same file mapped hugely, however + * it'll always mapped in small page size for uffd-wp + * registered ranges. + */ + if (!khugepaged_test_exit(mm) && !userfaultfd_wp(vma)) collapse_and_free_pmd(mm, vma, addr, pmd); mmap_write_unlock(mm); } else { |