summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2022-10-03mm: hugetlb: eliminate memory-less nodes handlingMuchun Song
The memory-notify-based approach aims to handle meory-less nodes, however, it just adds the complexity of code as pointed by David in thread [1]. The handling of memory-less nodes is introduced by commit 4faf8d950ec4 ("hugetlb: handle memory hot-plug events"). >From its commit message, we cannot find any necessity of handling this case. So, we can simply register/unregister sysfs entries in register_node/unregister_node to simlify the code. BTW, hotplug callback added because in hugetlb_register_all_nodes() we register sysfs nodes only for N_MEMORY nodes, seeing commit 9b5e5d0fdc91, which said it was a preparation for handling memory-less nodes via memory hotplug. Since we want to remove memory hotplug, so make sure we only register per-node sysfs for online (N_ONLINE) nodes in hugetlb_register_all_nodes(). https://lore.kernel.org/linux-mm/60933ffc-b850-976c-78a0-0ee6e0ea9ef0@redhat.com/ [1] Link: https://lkml.kernel.org/r/20220914072603.60293-3-songmuchun@bytedance.com Suggested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Rafael J. Wysocki <rafael@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03mm: hugetlb: simplify per-node sysfs creation and removalMuchun Song
Patch series "simplify handling of per-node sysfs creation and removal", v4. This patch (of 2): The following commit offload per-node sysfs creation and removal to a kworker and did not say why it is needed. And it also said "I don't know that this is absolutely required". It seems like the author was not sure as well. Since it only complicates the code, this patch will revert the changes to simplify the code. 39da08cb074c ("hugetlb: offload per node attribute registrations") We could use memory hotplug notifier to do per-node sysfs creation and removal instead of inserting those operations to node registration and unregistration. Then, it can reduce the code coupling between node.c and hugetlb.c. Also, it can simplify the code. Link: https://lkml.kernel.org/r/20220914072603.60293-1-songmuchun@bytedance.com Link: https://lkml.kernel.org/r/20220914072603.60293-2-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Rafael J. Wysocki <rafael@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03mm/damon/core: reduce parameters for damon_set_attrs()SeongJae Park
Number of parameters for 'damon_set_attrs()' is six. As it could be confusing and verbose, this commit reduces the number by receiving single pointer to a 'struct damon_attrs'. Link: https://lkml.kernel.org/r/20220913174449.50645-7-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03mm/damon/core: use a dedicated struct for monitoring attributesSeongJae Park
DAMON monitoring attributes are directly defined as fields of 'struct damon_ctx'. This makes 'struct damon_ctx' a little long and complicated. This commit defines and uses a struct, 'struct damon_attrs', which is dedicated for only the monitoring attributes to make the purpose of the five values clearer and simplify 'struct damon_ctx'. Link: https://lkml.kernel.org/r/20220913174449.50645-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03writeback: remove unused macro DIRTY_FULL_SCOPEMiaohe Lin
It's introduced but never used. Remove it. Link: https://lkml.kernel.org/r/20220909025711.32012-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Jens Axboe <axboe@kernel.dk> Cc: Bart Van Assche <bvanassche@acm.org> Cc: David Howells <dhowells@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: NeilBrown <neilb@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: zhanglianjie <zhanglianjie@uniontech.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03mm/damon: remove duplicate get_monitoring_region() definitionsXin Hao
In lru_sort.c and reclaim.c, they are all defining get_monitoring_region() function, there is no need to define it separately. As 'get_monitoring_region()' is not a 'static' function anymore, we try to use a prefix to distinguish with other functions, so there rename it to 'damon_find_biggest_system_ram'. Link: https://lkml.kernel.org/r/20220909213606.136221-1-sj@kernel.org Signed-off-by: Xin Hao <xhao@linux.alibaba.com> Signed-off-by: SeongJae Park <sj@kernel.org> Suggested-by: SeongJae Park <sj@kernel.org> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03mm/damon: introduce struct damos_access_patternYajun Deng
damon_new_scheme() has too many parameters, so introduce struct damos_access_pattern to simplify it. In additon, we can't use a bpf trace kprobe that has more than 5 parameters. Link: https://lkml.kernel.org/r/20220908191443.129534-1-sj@kernel.org Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: SeongJae Park <sj@kernel.org> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03memcg: extract memcg_vmstats from struct mem_cgroupShakeel Butt
Patch series "memcg: reduce memory overhead of memory cgroups". Currently a lot of memory is wasted to maintain the vmevents for memory cgroups as we have multiple arrays of size NR_VM_EVENT_ITEMS which can be as large as 110. However memcg code uses small portion of those entries. This patch series eliminate this overhead by removing the unneeded vmevent entries from memory cgroup data structures. This patch (of 3): This is a preparatory patch to reduce the memory overhead of memory cgroup. The struct memcg_vmstats is the largest object embedded into the struct mem_cgroup. This patch extracts struct memcg_vmstats from struct mem_cgroup to ease the following patches in reducing the size of struct memcg_vmstats. Link: https://lkml.kernel.org/r/20220907043537.3457014-1-shakeelb@google.com Link: https://lkml.kernel.org/r/20220907043537.3457014-2-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03mm: add pageblock_aligned() macroKefeng Wang
Add pageblock_aligned() and use it to simplify code. Link: https://lkml.kernel.org/r/20220907060844.126891-3-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03mm: add pageblock_align() macroKefeng Wang
Add pageblock_align() macro and use it to simplify code. Link: https://lkml.kernel.org/r/20220907060844.126891-2-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03mm: reuse pageblock_start/end_pfn() macroKefeng Wang
Move pageblock_start_pfn/pageblock_end_pfn() into pageblock-flags.h, then they could be used somewhere else, not only in compaction, also use ALIGN_DOWN() instead of round_down() to be pair with ALIGN(), which should be same for pageblock usage. Link: https://lkml.kernel.org/r/20220907060844.126891-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03mm/damon/core: iterate the regions list from current point in ↵Kaixu Xia
damon_set_regions() We iterate the whole regions list every time to get the first/last regions intersecting with the specific range in damon_set_regions(), in order to add new region or resize existing regions to fit in the specific range. Actually, it is unnecessary to iterate the new added regions and the front regions that have been checked. Just iterate the regions list from the current point using list_for_each_entry_from() every time to improve performance. The kunit tests passed: [PASSED] damon_test_apply_three_regions1 [PASSED] damon_test_apply_three_regions2 [PASSED] damon_test_apply_three_regions3 [PASSED] damon_test_apply_three_regions4 Link: https://lkml.kernel.org/r/1662477527-13003-1-git-send-email-kaixuxia@tencent.com Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03kasan: only define kasan_cache_create for Generic modeAndrey Konovalov
Right now, kasan_cache_create() assigns SLAB_KASAN for all KASAN modes and then sets up metadata-related cache parameters for the Generic mode. SLAB_KASAN is used in two places: 1. In slab_ksize() to account for per-object metadata when calculating the size of the accessible memory within the object. 2. In slab_common.c via kasan_never_merge() to prevent merging of caches with per-object metadata. Both cases are only relevant when per-object metadata is present, which is only the case with the Generic mode. Thus, assign SLAB_KASAN and define kasan_cache_create() only for the Generic mode. Also update the SLAB_KASAN-related comment. Link: https://lkml.kernel.org/r/61faa2aa1906e2d02c97d00ddf99ce8911dda095.1662411799.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Peter Collingbourne <pcc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03kasan: only define metadata offsets for Generic modeAndrey Konovalov
Hide the definitions of alloc_meta_offset and free_meta_offset under an ifdef CONFIG_KASAN_GENERIC check, as these fields are now only used when the Generic mode is enabled. Link: https://lkml.kernel.org/r/d4bafa0534facafd1a23c465a94261e64f366493.1662411799.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Peter Collingbourne <pcc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03kasan: only define kasan_never_merge for Generic modeAndrey Konovalov
KASAN prevents merging of slab caches whose objects have per-object metadata stored in redzones. As now only the Generic mode uses per-object metadata, define kasan_never_merge() only for this mode. Link: https://lkml.kernel.org/r/81ed01f29ff3443580b7e2fe362a8b47b1e8006d.1662411799.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Peter Collingbourne <pcc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03kasan: only define kasan_metadata_size for Generic modeAndrey Konovalov
KASAN provides a helper for calculating the size of per-object metadata stored in the redzone. As now only the Generic mode uses per-object metadata, only define kasan_metadata_size() for this mode. Link: https://lkml.kernel.org/r/8f81d4938b80446bc72538a08217009f328a3e23.1662411799.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Peter Collingbourne <pcc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03hugetlb_encode.h: fix undefined behaviour (34 << 26)Matthias Goergens
Left-shifting past the size of your datatype is undefined behaviour in C. The literal 34 gets the type `int`, and that one is not big enough to be left shifted by 26 bits. An `unsigned` is long enough (on any machine that has at least 32 bits for their ints.) For uniformity, we mark all the literals as unsigned. But it's only really needed for HUGETLB_FLAG_ENCODE_16GB. Thanks to Randy Dunlap for an initial review and suggestion. Link: https://lkml.kernel.org/r/20220905031904.150925-1-matthias.goergens@gmail.com Signed-off-by: Matthias Goergens <matthias.goergens@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03mm: convert lock_page_or_retry() to folio_lock_or_retry()Matthew Wilcox (Oracle)
Remove a call to compound_head() in each of the two callers. Link: https://lkml.kernel.org/r/20220902194653.1739778-58-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03rmap: remove page_unlock_anon_vma_read()Matthew Wilcox (Oracle)
This was simply an alias for anon_vma_unlock_read() since 2011. Link: https://lkml.kernel.org/r/20220902194653.1739778-56-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03mm: convert page_get_anon_vma() to folio_get_anon_vma()Matthew Wilcox (Oracle)
With all callers now passing in a folio, rename the function and convert all callers. Removes a couple of calls to compound_head() and a reference to page->mapping. Link: https://lkml.kernel.org/r/20220902194653.1739778-55-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03mm: remove try_to_free_swap()Matthew Wilcox (Oracle)
All callers have now been converted to folio_free_swap() and we can remove this wrapper. Link: https://lkml.kernel.org/r/20220902194653.1739778-49-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03memcg: convert mem_cgroup_swap_full() to take a folioMatthew Wilcox (Oracle)
All callers now have a folio, so convert the function to take a folio. Saves a couple of calls to compound_head(). Link: https://lkml.kernel.org/r/20220902194653.1739778-48-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03shmem: remove shmem_getpage()Matthew Wilcox (Oracle)
With all callers removed, remove this wrapper function. The flags are now mysteriously called SGP, but I think we can live with that. Link: https://lkml.kernel.org/r/20220902194653.1739778-34-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03shmem: add shmem_get_folio()Matthew Wilcox (Oracle)
With no remaining callers of shmem_getpage_gfp(), add shmem_get_folio() and reimplement shmem_getpage() as a call to shmem_get_folio(). Link: https://lkml.kernel.org/r/20220902194653.1739778-25-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03memcg: convert mem_cgroup_swapin_charge_page() to ↵Matthew Wilcox (Oracle)
mem_cgroup_swapin_charge_folio() All callers now have a folio, so pass it in here and remove an unnecessary call to page_folio(). Link: https://lkml.kernel.org/r/20220902194653.1739778-17-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03mm/swap: convert put_swap_page() to put_swap_folio()Matthew Wilcox (Oracle)
With all callers now using a folio, we can convert this function. Link: https://lkml.kernel.org/r/20220902194653.1739778-14-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03mm/swapfile: convert try_to_free_swap() to folio_free_swap()Matthew Wilcox (Oracle)
Add kernel-doc for folio_free_swap() and make it return bool. Add a try_to_free_swap() compatibility wrapper. Link: https://lkml.kernel.org/r/20220902194653.1739778-11-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03shmem: convert shmem_replace_page() to use folios throughoutMatthew Wilcox (Oracle)
Introduce folio_set_swap_entry() to abstract how both folio->private and swp_entry_t work. Use swap_address_space() directly instead of indirecting through folio_mapping(). Include an assertion that the old folio is not large as we only allocate a single-page folio to replace it. Use folio_put_refs() instead of calling folio_put() twice. Link: https://lkml.kernel.org/r/20220902194653.1739778-9-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03mm: add folio_add_lru_vma()Matthew Wilcox (Oracle)
Convert lru_cache_add_inactive_or_unevictable() to folio_add_lru_vma() and add a compatibility wrapper. Link: https://lkml.kernel.org/r/20220902194653.1739778-6-willy@infradead.org Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03mm: add split_folio()Matthew Wilcox (Oracle)
This wrapper removes a need to use split_huge_page(&folio->page). Convert two callers. Link: https://lkml.kernel.org/r/20220902194653.1739778-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03mm: reimplement folio_order() and folio_nr_pages()Matthew Wilcox (Oracle)
Instead of calling compound_order() and compound_nr_pages(), use the folio directly. Saves 1905 bytes from mm/filemap.o due to folio_test_large() now being a cheaper check than PageHead(). Link: https://lkml.kernel.org/r/20220902194653.1739778-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03mm: add the first tail page to struct folioMatthew Wilcox (Oracle)
Some of the static checkers get confused by extracting the page from the folio and referring to fields in the first tail page. Adding these fields to struct folio lets us avoid doing that. It has the risk that people will refer to those fields without checking that the folio is actually a large folio, so prefix them with underscores and document the preferred function to use instead. Link: https://lkml.kernel.org/r/20220902194653.1739778-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03ksm: remove redundant declarations in ksm.hQi Zheng
Currently, for struct stable_node, no one uses it in both the include/linux/ksm.h file and the file that contains it. For struct mem_cgroup, it's also not used in ksm.h. So they're all redundant, just remove them. Link: https://lkml.kernel.org/r/20220831031951.43152-4-zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26ksm: count allocated ksm rmap_items for each processxu xin
Patch series "ksm: count allocated rmap_items and update documentation", v5. KSM can save memory by merging identical pages, but also can consume additional memory, because it needs to generate rmap_items to save each scanned page's brief rmap information. To determine how beneficial the ksm-policy (like madvise), they are using brings, so we add a new interface /proc/<pid>/ksm_stat for each process The value "ksm_rmap_items" in it indicates the total allocated ksm rmap_items of this process. The detailed description can be seen in the following patches' commit message. This patch (of 2): KSM can save memory by merging identical pages, but also can consume additional memory, because it needs to generate rmap_items to save each scanned page's brief rmap information. Some of these pages may be merged, but some may not be abled to be merged after being checked several times, which are unprofitable memory consumed. The information about whether KSM save memory or consume memory in system-wide range can be determined by the comprehensive calculation of pages_sharing, pages_shared, pages_unshared and pages_volatile. A simple approximate calculation: profit =~ pages_sharing * sizeof(page) - (all_rmap_items) * sizeof(rmap_item); where all_rmap_items equals to the sum of pages_sharing, pages_shared, pages_unshared and pages_volatile. But we cannot calculate this kind of ksm profit inner single-process wide because the information of ksm rmap_item's number of a process is lacked. For user applications, if this kind of information could be obtained, it helps upper users know how beneficial the ksm-policy (like madvise) they are using brings, and then optimize their app code. For example, one application madvise 1000 pages as MERGEABLE, while only a few pages are really merged, then it's not cost-efficient. So we add a new interface /proc/<pid>/ksm_stat for each process in which the value of ksm_rmap_itmes is only shown now and so more values can be added in future. So similarly, we can calculate the ksm profit approximately for a single process by: profit =~ ksm_merging_pages * sizeof(page) - ksm_rmap_items * sizeof(rmap_item); where ksm_merging_pages is shown at /proc/<pid>/ksm_merging_pages, and ksm_rmap_items is shown in /proc/<pid>/ksm_stat. Link: https://lkml.kernel.org/r/20220830143731.299702-1-xu.xin16@zte.com.cn Link: https://lkml.kernel.org/r/20220830143838.299758-1-xu.xin16@zte.com.cn Signed-off-by: xu xin <xu.xin16@zte.com.cn> Reviewed-by: Xiaokai Ran <ran.xiaokai@zte.com.cn> Reviewed-by: Yang Yang <yang.yang29@zte.com.cn> Signed-off-by: CGEL ZTE <cgel.zte@gmail.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Izik Eidus <izik.eidus@ravellosystems.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm: deduplicate cacheline padding codeShakeel Butt
There are three users (mmzone.h, memcontrol.h, page_counter.h) using similar code for forcing cacheline padding between fields of different structures. Dedup that code. Link: https://lkml.kernel.org/r/20220826230642.566725-1-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Suggested-by: Feng Tang <feng.tang@intel.com> Reviewed-by: Feng Tang <feng.tang@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm: reduce noise in show_mem for lowmem allocationsMichal Hocko
While discussing early DMA pool pre-allocation failure with Christoph [1] I have realized that the allocation failure warning is rather noisy for constrained allocations like GFP_DMA{32}. Those zones are usually not populated on all nodes very often as their memory ranges are constrained. This is an attempt to reduce the ballast that doesn't provide any relevant information for those allocation failures investigation. Please note that I have only compile tested it (in my default config setup) and I am throwing it mostly to see what people think about it. [1] http://lkml.kernel.org/r/20220817060647.1032426-1-hch@lst.de [mhocko@suse.com: update] Link: https://lkml.kernel.org/r/Yw29bmJTIkKogTiW@dhcp22.suse.cz [mhocko@suse.com: fix build] [akpm@linux-foundation.org: fix it for mapletree] [akpm@linux-foundation.org: update it for Michal's update] [mhocko@suse.com: fix arch/powerpc/xmon/xmon.c] Link: https://lkml.kernel.org/r/Ywh3C4dKB9B93jIy@dhcp22.suse.cz [akpm@linux-foundation.org: fix arch/sparc/kernel/setup_32.c] Link: https://lkml.kernel.org/r/YwScVmVofIZkopkF@dhcp22.suse.cz Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Hellwig <hch@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm: fixup documentation regarding pte_numa() and PROT_NUMADavid Hildenbrand
pte_numa() no longer exists -- replaced by pte_protnone() -- and PROT_NUMA probably never existed: MM_CP_PROT_NUMA also ends up using PROT_NONE. Let's fixup the doc. Link: https://lkml.kernel.org/r/20220825164659.89824-4-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()David Hildenbrand
Patch series "mm: minor cleanups around NUMA hinting". Working on some GUP cleanups (e.g., getting rid of some FOLL_ flags) and preparing for other GUP changes (getting rid of FOLL_FORCE|FOLL_WRITE for for taking a R/O longterm pin), this is something I can easily send out independently. Get rid of FOLL_NUMA, allow FOLL_FORCE access to PROT_NONE mapped pages in GUP-fast, and fixup some documentation around NUMA hinting. This patch (of 3): No need for a special flag that is not even properly documented to be internal-only. Let's just factor this check out and get rid of this flag. The separate function has the nice benefit that we can centralize comments. Link: https://lkml.kernel.org/r/20220825164659.89824-2-david@redhat.com Link: https://lkml.kernel.org/r/20220825164659.89824-1-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm: delete unused MMF_OOM_VICTIM flagSuren Baghdasaryan
With the last usage of MMF_OOM_VICTIM in exit_mmap gone, this flag is now unused and can be removed. [akpm@linux-foundation.org: remove comment about now-removed mm_is_oom_victim()] Link: https://lkml.kernel.org/r/20220531223100.510392-2-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: David Rientjes <rientjes@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Roman Gushchin <guro@fb.com> Cc: Minchan Kim <minchan@kernel.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Christian Brauner (Microsoft) <brauner@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm: drop oom code from exit_mmapSuren Baghdasaryan
The primary reason to invoke the oom reaper from the exit_mmap path used to be a prevention of an excessive oom killing if the oom victim exit races with the oom reaper (see [1] for more details). The invocation has moved around since then because of the interaction with the munlock logic but the underlying reason has remained the same (see [2]). Munlock code is no longer a problem since [3] and there shouldn't be any blocking operation before the memory is unmapped by exit_mmap so the oom reaper invocation can be dropped. The unmapping part can be done with the non-exclusive mmap_sem and the exclusive one is only required when page tables are freed. Remove the oom_reaper from exit_mmap which will make the code easier to read. This is really unlikely to make any observable difference although some microbenchmarks could benefit from one less branch that needs to be evaluated even though it almost never is true. [1] 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently") [2] 27ae357fa82b ("mm, oom: fix concurrent munlock and oom reaper unmap, v3") [3] a213e5cf71cb ("mm/munlock: delete munlock_vma_pages_all(), allow oomreap") [akpm@linux-foundation.org: restore Suren's mmap_read_lock() optimization] Link: https://lkml.kernel.org/r/20220531223100.510392-1-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Christian Brauner (Microsoft) <brauner@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm: remove the vma linked listLiam R. Howlett
Replace any vm_next use with vma_find(). Update free_pgtables(), unmap_vmas(), and zap_page_range() to use the maple tree. Use the new free_pgtables() and unmap_vmas() in do_mas_align_munmap(). At the same time, alter the loop to be more compact. Now that free_pgtables() and unmap_vmas() take a maple tree as an argument, rearrange do_mas_align_munmap() to use the new tree to hold the vmas to remove. Remove __vma_link_list() and __vma_unlink_list() as they are exclusively used to update the linked list. Drop linked list update from __insert_vm_struct(). Rework validation of tree as it was depending on the linked list. [yang.lee@linux.alibaba.com: fix one kernel-doc comment] Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=1949 Link: https://lkml.kernel.org/r/20220824021918.94116-1-yang.lee@linux.alibaba.comLink: https://lkml.kernel.org/r/20220906194824.2110408-69-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Tested-by: Yu Zhao <yuzhao@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26userfaultfd: use maple tree iterator to iterate VMAsLiam R. Howlett
Don't use the mm_struct linked list or the vma->vm_next in prep for removal. Link: https://lkml.kernel.org/r/20220906194824.2110408-45-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Tested-by: Yu Zhao <yuzhao@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm/mmap: reorganize munmap to use maple statesLiam R. Howlett
Remove __do_munmap() in favour of do_munmap(), do_mas_munmap(), and do_mas_align_munmap(). do_munmap() is a wrapper to create a maple state for any callers that have not been converted to the maple tree. do_mas_munmap() takes a maple state to mumap a range. This is just a small function which checks for error conditions and aligns the end of the range. do_mas_align_munmap() uses the aligned range to mumap a range. do_mas_align_munmap() starts with the first VMA in the range, then finds the last VMA in the range. Both start and end are split if necessary. Then the VMAs are removed from the linked list and the mm mlock count is updated at the same time. Followed by a single tree operation of overwriting the area in with a NULL. Finally, the detached list is unmapped and freed. By reorganizing the munmap calls as outlined, it is now possible to avoid extra work of aligning pre-aligned callers which are known to be safe, avoid extra VMA lookups or tree walks for modifications. detach_vmas_to_be_unmapped() is no longer used, so drop this code. vm_brk_flags() can just call the do_mas_munmap() as it checks for intersecting VMAs directly. Link: https://lkml.kernel.org/r/20220906194824.2110408-29-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Tested-by: Yu Zhao <yuzhao@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm: convert vma_lookup() to use mtree_load()Liam R. Howlett
Unlike the rbtree, the Maple Tree will return a NULL if there's nothing at a particular address. Since the previous commit dropped the vmacache, it is now possible to consult the tree directly. Link: https://lkml.kernel.org/r/20220906194824.2110408-27-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Yu Zhao <yuzhao@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: SeongJae Park <sj@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm: remove vmacacheLiam R. Howlett
By using the maple tree and the maple tree state, the vmacache is no longer beneficial and is complicating the VMA code. Remove the vmacache to reduce the work in keeping it up to date and code complexity. Link: https://lkml.kernel.org/r/20220906194824.2110408-26-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Yu Zhao <yuzhao@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm: use maple tree operations for find_vma_intersection()Liam R. Howlett
Move find_vma_intersection() to mmap.c and change implementation to maple tree. When searching for a vma within a range, it is easier to use the maple tree interface. Exported find_vma_intersection() for kvm module. Link: https://lkml.kernel.org/r/20220906194824.2110408-24-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Tested-by: Yu Zhao <yuzhao@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm: optimize find_exact_vma() to use vma_lookup()Liam R. Howlett
Use vma_lookup() to walk the tree to the start value requested. If the vma at the start does not match, then the answer is NULL and there is no need to look at the next vma the way that find_vma() would. Link: https://lkml.kernel.org/r/20220906194824.2110408-21-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Yu Zhao <yuzhao@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm: remove rb tree.Liam R. Howlett
Remove the RB tree and start using the maple tree for vm_area_struct tracking. Drop validate_mm() calls in expand_upwards() and expand_downwards() as the lock is not held. Link: https://lkml.kernel.org/r/20220906194824.2110408-18-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Tested-by: Yu Zhao <yuzhao@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26kernel/fork: use maple tree for dup_mmap() during forkingLiam R. Howlett
The maple tree was already tracking VMAs in this function by an earlier commit, but the rbtree iterator was being used to iterate the list. Change the iterator to use a maple tree native iterator and switch to the maple tree advanced API to avoid multiple walks of the tree during insert operations. Unexport the now-unused vma_store() function. For performance reasons we bulk allocate the maple tree nodes. The node calculations are done internally to the tree and use the VMA count and assume the worst-case node requirements. The VM_DONT_COPY flag does not allow for the most efficient copy method of the tree and so a bulk loading algorithm is used. Link: https://lkml.kernel.org/r/20220906194824.2110408-15-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Yu Zhao <yuzhao@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: SeongJae Park <sj@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm: add VMA iteratorMatthew Wilcox (Oracle)
This thin layer of abstraction over the maple tree state is for iterating over VMAs. You can go forwards, go backwards or ask where the iterator is. Rename the existing vma_next() to __vma_next() -- it will be removed by the end of this series. Link: https://lkml.kernel.org/r/20220906194824.2110408-10-Liam.Howlett@oracle.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Tested-by: Yu Zhao <yuzhao@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Howells <dhowells@redhat.com> Cc: SeongJae Park <sj@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>