From 25ee01a2fca02dfb5a3ce316e77910c468108199 Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi Date: Thu, 5 Nov 2015 18:47:11 -0800 Subject: mm: hugetlb: proc: add hugetlb-related fields to /proc/PID/smaps Currently /proc/PID/smaps provides no usage info for vma(VM_HUGETLB), which is inconvenient when we want to know per-task or per-vma base hugetlb usage. To solve this, this patch adds new fields for hugetlb usage like below: Size: 20480 kB Rss: 0 kB Pss: 0 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 0 kB Referenced: 0 kB Anonymous: 0 kB AnonHugePages: 0 kB Shared_Hugetlb: 18432 kB Private_Hugetlb: 2048 kB Swap: 0 kB KernelPageSize: 2048 kB MMUPageSize: 2048 kB Locked: 0 kB VmFlags: rd wr mr mw me de ht [hughd@google.com: fix Private_Hugetlb alignment ] Signed-off-by: Naoya Horiguchi Acked-by: Joern Engel Acked-by: David Rientjes Acked-by: Michal Hocko Cc: Mike Kravetz Signed-off-by: Hugh Dickins Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/filesystems/proc.txt | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'Documentation/filesystems/proc.txt') diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 3a9d65c912e7..a7d6c06f36c4 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -424,6 +424,9 @@ Private_Clean: 0 kB Private_Dirty: 0 kB Referenced: 892 kB Anonymous: 0 kB +AnonHugePages: 0 kB +Shared_Hugetlb: 0 kB +Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB KernelPageSize: 4 kB @@ -452,6 +455,11 @@ and a page is modified, the file page is replaced by a private anonymous copy. "Swap" shows how much would-be-anonymous memory is also used, but out on swap. "SwapPss" shows proportional swap share of this mapping. +"AnonHugePages" shows the ammount of memory backed by transparent hugepage. +"Shared_Hugetlb" and "Private_Hugetlb" show the ammounts of memory backed by +hugetlbfs page which is *not* counted in "RSS" or "PSS" field for historical +reasons. And these are not included in {Shared,Private}_{Clean,Dirty} field. + "VmFlags" field deserves a separate description. This member represents the kernel flags associated with the particular virtual memory area in two letter encoded manner. The codes are the following: -- cgit v1.2.3 From 5d317b2b6536592a9b51fe65faed43d65ca9158e Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi Date: Thu, 5 Nov 2015 18:47:14 -0800 Subject: mm: hugetlb: proc: add HugetlbPages field to /proc/PID/status Currently there's no easy way to get per-process usage of hugetlb pages, which is inconvenient because userspace applications which use hugetlb typically want to control their processes on the basis of how much memory (including hugetlb) they use. So this patch simply provides easy access to the info via /proc/PID/status. Signed-off-by: Naoya Horiguchi Acked-by: Joern Engel Acked-by: David Rientjes Acked-by: Michal Hocko Cc: Mike Kravetz Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/filesystems/proc.txt | 2 ++ fs/proc/task_mmu.c | 1 + include/linux/hugetlb.h | 19 +++++++++++++++++++ include/linux/mm_types.h | 3 +++ mm/hugetlb.c | 9 +++++++++ mm/rmap.c | 4 +++- 6 files changed, 37 insertions(+), 1 deletion(-) (limited to 'Documentation/filesystems/proc.txt') diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index a7d6c06f36c4..12ac0e4455d4 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -175,6 +175,7 @@ read the file /proc/PID/status: VmLib: 1412 kB VmPTE: 20 kb VmSwap: 0 kB + HugetlbPages: 0 kB Threads: 1 SigQ: 0/28578 SigPnd: 0000000000000000 @@ -238,6 +239,7 @@ Table 1-2: Contents of the status files (as of 4.1) VmPTE size of page table entries VmPMD size of second level page tables VmSwap size of swap usage (the number of referred swapents) + HugetlbPages size of hugetlb memory portions Threads number of threads SigQ number of signals queued/max. number for queue SigPnd bitmap of pending signals for the thread diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 5988b83836fb..288185e24762 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -70,6 +70,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm) ptes >> 10, pmds >> 10, swap << (PAGE_SHIFT-10)); + hugetlb_report_usage(m, mm); } unsigned long task_vsize(struct mm_struct *mm) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 5e35379f58a5..685c262e0be8 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -483,6 +483,17 @@ static inline spinlock_t *huge_pte_lockptr(struct hstate *h, #define hugepages_supported() (HPAGE_SHIFT != 0) #endif +void hugetlb_report_usage(struct seq_file *m, struct mm_struct *mm); + +static inline void hugetlb_count_add(long l, struct mm_struct *mm) +{ + atomic_long_add(l, &mm->hugetlb_usage); +} + +static inline void hugetlb_count_sub(long l, struct mm_struct *mm) +{ + atomic_long_sub(l, &mm->hugetlb_usage); +} #else /* CONFIG_HUGETLB_PAGE */ struct hstate {}; #define alloc_huge_page(v, a, r) NULL @@ -519,6 +530,14 @@ static inline spinlock_t *huge_pte_lockptr(struct hstate *h, { return &mm->page_table_lock; } + +static inline void hugetlb_report_usage(struct seq_file *f, struct mm_struct *m) +{ +} + +static inline void hugetlb_count_sub(long l, struct mm_struct *mm) +{ +} #endif /* CONFIG_HUGETLB_PAGE */ static inline spinlock_t *huge_pte_lock(struct hstate *h, diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 3d6baa7d4534..0a85da25a822 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -486,6 +486,9 @@ struct mm_struct { /* address of the bounds directory */ void __user *bd_addr; #endif +#ifdef CONFIG_HUGETLB_PAGE + atomic_long_t hugetlb_usage; +#endif }; static inline void mm_init_cpumask(struct mm_struct *mm) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 9cc773483624..abfbe8ca3323 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2790,6 +2790,12 @@ void hugetlb_show_meminfo(void) 1UL << (huge_page_order(h) + PAGE_SHIFT - 10)); } +void hugetlb_report_usage(struct seq_file *m, struct mm_struct *mm) +{ + seq_printf(m, "HugetlbPages:\t%8lu kB\n", + atomic_long_read(&mm->hugetlb_usage) << (PAGE_SHIFT - 10)); +} + /* Return the number pages of memory we physically have, in PAGE_SIZE units. */ unsigned long hugetlb_total_pages(void) { @@ -3025,6 +3031,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, get_page(ptepage); page_dup_rmap(ptepage); set_huge_pte_at(dst, addr, dst_pte, entry); + hugetlb_count_add(pages_per_huge_page(h), dst); } spin_unlock(src_ptl); spin_unlock(dst_ptl); @@ -3105,6 +3112,7 @@ again: if (huge_pte_dirty(pte)) set_page_dirty(page); + hugetlb_count_sub(pages_per_huge_page(h), mm); page_remove_rmap(page); force_flush = !__tlb_remove_page(tlb, page); if (force_flush) { @@ -3509,6 +3517,7 @@ retry: && (vma->vm_flags & VM_SHARED))); set_huge_pte_at(mm, address, ptep, new_pte); + hugetlb_count_add(pages_per_huge_page(h), mm); if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) { /* Optimization, do the COW without a second fault */ ret = hugetlb_cow(mm, vma, address, ptep, new_pte, page, ptl); diff --git a/mm/rmap.c b/mm/rmap.c index f5b5c1f3dcd7..d40e7aefb888 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1352,7 +1352,9 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, update_hiwater_rss(mm); if (PageHWPoison(page) && !(flags & TTU_IGNORE_HWPOISON)) { - if (!PageHuge(page)) { + if (PageHuge(page)) { + hugetlb_count_sub(1 << compound_order(page), mm); + } else { if (PageAnon(page)) dec_mm_counter(mm, MM_ANONPAGES); else -- cgit v1.2.3 From 7a14239a8fff45a241b6943a3ac444d5b67fcbed Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Thu, 5 Nov 2015 18:49:30 -0800 Subject: mm Documentation: undoc non-linear vmas While updating some mm Documentation, I came across a few straggling references to the non-linear vmas which were happily removed in v4.0. Delete them. Signed-off-by: Hugh Dickins Cc: Christoph Lameter Cc: "Kirill A. Shutemov" Cc: Rik van Riel Acked-by: Vlastimil Babka Cc: Davidlohr Bueso Cc: Oleg Nesterov Cc: Sasha Levin Cc: Dmitry Vyukov Cc: KOSAKI Motohiro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/filesystems/proc.txt | 1 - Documentation/vm/page_migration | 10 +++--- Documentation/vm/unevictable-lru.txt | 63 +++--------------------------------- 3 files changed, 9 insertions(+), 65 deletions(-) (limited to 'Documentation/filesystems/proc.txt') diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 12ac0e4455d4..9781b977edcb 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -485,7 +485,6 @@ manner. The codes are the following: ac - area is accountable nr - swap space is not reserved for the area ht - area uses huge tlb pages - nl - non-linear mapping ar - architecture specific flag dd - do not include area into core dump sd - soft-dirty flag diff --git a/Documentation/vm/page_migration b/Documentation/vm/page_migration index 6513fe2d90b8..479c3d0329bd 100644 --- a/Documentation/vm/page_migration +++ b/Documentation/vm/page_migration @@ -99,12 +99,10 @@ Steps: 4. The new page is prepped with some settings from the old page so that accesses to the new page will discover a page with the correct settings. -5. All the page table references to the page are converted - to migration entries or dropped (nonlinear vmas). - This decrease the mapcount of a page. If the resulting - mapcount is not zero then we do not migrate the page. - All user space processes that attempt to access the page - will now wait on the page lock. +5. All the page table references to the page are converted to migration + entries. This decreases the mapcount of a page. If the resulting + mapcount is not zero then we do not migrate the page. All user space + processes that attempt to access the page will now wait on the page lock. 6. The radix tree lock is taken. This will cause all processes trying to access the page via the mapping to block on the radix tree spinlock. diff --git a/Documentation/vm/unevictable-lru.txt b/Documentation/vm/unevictable-lru.txt index 32ee3a67dba2..e983ee4e2758 100644 --- a/Documentation/vm/unevictable-lru.txt +++ b/Documentation/vm/unevictable-lru.txt @@ -552,63 +552,17 @@ different reverse map mechanisms. is really unevictable or not. In this case, try_to_unmap_anon() will return SWAP_AGAIN. - (*) try_to_unmap_file() - linear mappings + (*) try_to_unmap_file() Unmapping of a mapped file page works the same as for anonymous mappings, except that the scan visits all VMAs that map the page's index/page offset - in the page's mapping's reverse map priority search tree. It also visits - each VMA in the page's mapping's non-linear list, if the list is - non-empty. + in the page's mapping's reverse map interval search tree. As for anonymous pages, on encountering a VM_LOCKED VMA for a mapped file page, try_to_unmap_file() will attempt to acquire the associated mm_struct's mmap semaphore to mlock the page, returning SWAP_MLOCK if this is successful, and SWAP_AGAIN, if not. - (*) try_to_unmap_file() - non-linear mappings - - If a page's mapping contains a non-empty non-linear mapping VMA list, then - try_to_un{map|lock}() must also visit each VMA in that list to determine - whether the page is mapped in a VM_LOCKED VMA. Again, the scan must visit - all VMAs in the non-linear list to ensure that the pages is not/should not - be mlocked. - - If a VM_LOCKED VMA is found in the list, the scan could terminate. - However, there is no easy way to determine whether the page is actually - mapped in a given VMA - either for unmapping or testing whether the - VM_LOCKED VMA actually pins the page. - - try_to_unmap_file() handles non-linear mappings by scanning a certain - number of pages - a "cluster" - in each non-linear VMA associated with the - page's mapping, for each file mapped page that vmscan tries to unmap. If - this happens to unmap the page we're trying to unmap, try_to_unmap() will - notice this on return (page_mapcount(page) will be 0) and return - SWAP_SUCCESS. Otherwise, it will return SWAP_AGAIN, causing vmscan to - recirculate this page. We take advantage of the cluster scan in - try_to_unmap_cluster() as follows: - - For each non-linear VMA, try_to_unmap_cluster() attempts to acquire the - mmap semaphore of the associated mm_struct for read without blocking. - - If this attempt is successful and the VMA is VM_LOCKED, - try_to_unmap_cluster() will retain the mmap semaphore for the scan; - otherwise it drops it here. - - Then, for each page in the cluster, if we're holding the mmap semaphore - for a locked VMA, try_to_unmap_cluster() calls mlock_vma_page() to - mlock the page. This call is a no-op if the page is already locked, - but will mlock any pages in the non-linear mapping that happen to be - unlocked. - - If one of the pages so mlocked is the page passed in to try_to_unmap(), - try_to_unmap_cluster() will return SWAP_MLOCK, rather than the default - SWAP_AGAIN. This will allow vmscan to cull the page, rather than - recirculating it on the inactive list. - - Again, if try_to_unmap_cluster() cannot acquire the VMA's mmap sem, it - returns SWAP_AGAIN, indicating that the page is mapped by a VM_LOCKED - VMA, but couldn't be mlocked. - try_to_munlock() REVERSE MAP SCAN --------------------------------- @@ -625,10 +579,9 @@ introduced a variant of try_to_unmap() called try_to_munlock(). try_to_munlock() calls the same functions as try_to_unmap() for anonymous and mapped file pages with an additional argument specifying unlock versus unmap processing. Again, these functions walk the respective reverse maps looking -for VM_LOCKED VMAs. When such a VMA is found for anonymous pages and file -pages mapped in linear VMAs, as in the try_to_unmap() case, the functions -attempt to acquire the associated mmap semaphore, mlock the page via -mlock_vma_page() and return SWAP_MLOCK. This effectively undoes the +for VM_LOCKED VMAs. When such a VMA is found, as in the try_to_unmap() case, +the functions attempt to acquire the associated mmap semaphore, mlock the page +via mlock_vma_page() and return SWAP_MLOCK. This effectively undoes the pre-clearing of the page's PG_mlocked done by munlock_vma_page. If try_to_unmap() is unable to acquire a VM_LOCKED VMA's associated mmap @@ -636,12 +589,6 @@ semaphore, it will return SWAP_AGAIN. This will allow shrink_page_list() to recycle the page on the inactive list and hope that it has better luck with the page next time. -For file pages mapped into non-linear VMAs, the try_to_munlock() logic works -slightly differently. On encountering a VM_LOCKED non-linear VMA that might -map the page, try_to_munlock() returns SWAP_AGAIN without actually mlocking the -page. munlock_vma_page() will just leave the page unlocked and let vmscan deal -with it - the usual fallback position. - Note that try_to_munlock()'s reverse map walk must visit every VMA in a page's reverse map to determine that a page is NOT mapped into any VM_LOCKED VMA. However, the scan can terminate when it encounters a VM_LOCKED VMA and can -- cgit v1.2.3 From a5be35632cf3048d3922e2e2fb05a4c6f0889ff0 Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Thu, 5 Nov 2015 18:50:37 -0800 Subject: Documentation/filesystems/proc.txt: a little tidying There's an odd line about "Locked" at the head of the description of /proc/meminfo: it seems to have strayed from /proc/PID/smaps, so lead it back there. Move "Swap" and "SwapPss" descriptions down above it, to match the order in the file (though "PageSize"s still undescribed). The example of "Locked: 374 kB" (the same as Pss, neither Rss nor Size) is so unlikely as to be misleading: just make it 0, this is /bin/bash text; which would be "dw" (disabled write) not "de" (do not expand). Signed-off-by: Hugh Dickins Acked-by: Vlastimil Babka Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/filesystems/proc.txt | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) (limited to 'Documentation/filesystems/proc.txt') diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 9781b977edcb..1e4a6cc1b6ea 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -433,8 +433,8 @@ Swap: 0 kB SwapPss: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB -Locked: 374 kB -VmFlags: rd ex mr mw me de +Locked: 0 kB +VmFlags: rd ex mr mw me dw the first of these lines shows the same information as is displayed for the mapping in /proc/PID/maps. The remaining lines show the size of the mapping @@ -454,13 +454,13 @@ accessed. "Anonymous" shows the amount of memory that does not belong to any file. Even a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE and a page is modified, the file page is replaced by a private anonymous copy. -"Swap" shows how much would-be-anonymous memory is also used, but out on -swap. -"SwapPss" shows proportional swap share of this mapping. "AnonHugePages" shows the ammount of memory backed by transparent hugepage. "Shared_Hugetlb" and "Private_Hugetlb" show the ammounts of memory backed by hugetlbfs page which is *not* counted in "RSS" or "PSS" field for historical reasons. And these are not included in {Shared,Private}_{Clean,Dirty} field. +"Swap" shows how much would-be-anonymous memory is also used, but out on swap. +"SwapPss" shows proportional swap share of this mapping. +"Locked" indicates whether the mapping is locked in memory or not. "VmFlags" field deserves a separate description. This member represents the kernel flags associated with the particular virtual memory area in two letter encoded @@ -824,9 +824,6 @@ varies by architecture and compile options. The following is from a > cat /proc/meminfo -The "Locked" indicates whether the mapping is locked in memory or not. - - MemTotal: 16344972 kB MemFree: 13634064 kB MemAvailable: 14836172 kB -- cgit v1.2.3