diff options
author | David Hildenbrand <david@redhat.com> | 2024-04-09 21:22:47 +0200 |
---|---|---|
committer | Andrew Morton <akpm@linux-foundation.org> | 2024-05-05 17:53:28 -0700 |
commit | 05c5323b2a344c19c51cd1b91a4ab9ae90853794 (patch) | |
tree | 6f7fb3f787ccf35a8d4571416bbb9a88f997cca3 /mm/khugepaged.c | |
parent | 46d62de7ad1286854e0c2944ad26a1c1b1a5f191 (diff) | |
download | lwn-05c5323b2a344c19c51cd1b91a4ab9ae90853794.tar.gz lwn-05c5323b2a344c19c51cd1b91a4ab9ae90853794.zip |
mm: track mapcount of large folios in single value
Let's track the mapcount of large folios in a single value. The mapcount
of a large folio currently corresponds to the sum of the entire mapcount
and all page mapcounts.
This sum is what we actually want to know in folio_mapcount() and it is
also sufficient for implementing folio_mapped().
With PTE-mapped THP becoming more important and more widely used, we want
to avoid looping over all pages of a folio just to obtain the mapcount of
large folios. The comment "In the common case, avoid the loop when no
pages mapped by PTE" in folio_total_mapcount() does no longer hold for
mTHP that are always mapped by PTE.
Further, we are planning on using folio_mapcount() more frequently, and
might even want to remove page mapcounts for large folios in some kernel
configs. Therefore, allow for reading the mapcount of large folios
efficiently and atomically without looping over any pages.
Maintain the mapcount also for hugetlb pages for simplicity. Use the new
mapcount to implement folio_mapcount() and folio_mapped(). Make
page_mapped() simply call folio_mapped(). We can now get rid of
folio_large_is_mapped().
_nr_pages_mapped is now only used in rmap code and for debugging purposes.
Keep folio_nr_pages_mapped() around, but document that its use should be
limited to rmap internals and debugging purposes.
This change implies one additional atomic add/sub whenever
mapping/unmapping (parts of) a large folio.
As we now batch RMAP operations for PTE-mapped THP during fork(), during
unmap/zap, and when PTE-remapping a PMD-mapped THP, and we adjust the
large mapcount for a PTE batch only once, the added overhead in the common
case is small. Only when unmapping individual pages of a large folio
(e.g., during COW), the overhead might be bigger in comparison, but it's
essentially one additional atomic operation.
Note that before the new mapcount would overflow, already our refcount
would overflow: each mapping requires a folio reference. Extend the
focumentation of folio_mapcount().
Link: https://lkml.kernel.org/r/20240409192301.907377-5-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Richard Chang <richardycc@google.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Diffstat (limited to 'mm/khugepaged.c')
-rw-r--r-- | mm/khugepaged.c | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 89e2624fb3ff..2f73d2aa9ae8 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1358,7 +1358,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, * Check if the page has any GUP (or other external) pins. * * Here the check may be racy: - * it may see total_mapcount > refcount in some cases? + * it may see folio_mapcount() > folio_ref_count(). * But such case is ephemeral we could always retry collapse * later. However it may report false positive if the page * has excessive GUP pins (i.e. 512). Anyway the same check |