From f81fdd0c4ab7ac2c57302283309bf776557d35ff Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Mon, 13 Jul 2020 11:37:39 -0700 Subject: mm: document warning in move_normal_pmd() and make it warn only once Naresh Kamboju reported that the LTP tests can cause warnings on i386 going back all the way to v5.0, and bisected it to commit 2c91bd4a4e2e ("mm: speed up mremap by 20x on large regions"). The warning in move_normal_pmd() is actually mostly correct, but we have a very unusual special case at process creation time, when we may move the stack down with an overlapping mode (kind of like a "memmove()" except using the page tables). And when you have just the right condition of "move a large initial stack by the right alignment in the end, but with the early part of the move being only page-aligned", we'll be in a situation where we're trying to move a normal PMD entry on top of an already existing - but now empty - PMD entry. The warning is still worth having, in case it ever triggers other cases, and perhaps as a reminder that we could do the stack move case more efficiently (although it's clearly rare enough that it probably doesn't matter). But make it do WARN_ON_ONCE(), so that you can't flood the logs with it. And add a *big* comment above it to explain and remind us what's going on, because it took some figuring out to see how this could trigger. Kudos to Joel Fernandes for debugging this. Reported-by: Naresh Kamboju Debugged-and-acked-by: Joel Fernandes Cc: Arnd Bergmann Cc: Kirill A. Shutemov Signed-off-by: Linus Torvalds --- mm/mremap.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) (limited to 'mm') diff --git a/mm/mremap.c b/mm/mremap.c index 5dd572d57ca9..6b153dc05fe4 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -206,9 +206,28 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr, /* * The destination pmd shouldn't be established, free_pgtables() - * should have release it. + * should have released it. + * + * However, there's a case during execve() where we use mremap + * to move the initial stack, and in that case the target area + * may overlap the source area (always moving down). + * + * If everything is PMD-aligned, that works fine, as moving + * each pmd down will clear the source pmd. But if we first + * have a few 4kB-only pages that get moved down, and then + * hit the "now the rest is PMD-aligned, let's do everything + * one pmd at a time", we will still have the old (now empty + * of any 4kB pages, but still there) PMD in the page table + * tree. + * + * Warn on it once - because we really should try to figure + * out how to do this better - but then say "I won't move + * this pmd". + * + * One alternative might be to just unmap the target pmd at + * this point, and verify that it really is empty. We'll see. */ - if (WARN_ON(!pmd_none(*new_pmd))) + if (WARN_ON_ONCE(!pmd_none(*new_pmd))) return false; /* -- cgit v1.2.3 From 246c320a8cfe0b11d81a4af38fa9985ef0cc9a4c Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" Date: Thu, 23 Jul 2020 21:15:11 -0700 Subject: mm/mmap.c: close race between munmap() and expand_upwards()/downwards() VMA with VM_GROWSDOWN or VM_GROWSUP flag set can change their size under mmap_read_lock(). It can lead to race with __do_munmap(): Thread A Thread B __do_munmap() detach_vmas_to_be_unmapped() mmap_write_downgrade() expand_downwards() vma->vm_start = address; // The VMA now overlaps with // VMAs detached by the Thread A // page fault populates expanded part // of the VMA unmap_region() // Zaps pagetables partly // populated by Thread B Similar race exists for expand_upwards(). The fix is to avoid downgrading mmap_lock in __do_munmap() if detached VMAs are next to VM_GROWSDOWN or VM_GROWSUP VMA. [akpm@linux-foundation.org: s/mmap_sem/mmap_lock/ in comment] Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap") Reported-by: Jann Horn Signed-off-by: Kirill A. Shutemov Signed-off-by: Andrew Morton Reviewed-by: Yang Shi Acked-by: Vlastimil Babka Cc: Oleg Nesterov Cc: Matthew Wilcox Cc: [4.20+] Link: http://lkml.kernel.org/r/20200709105309.42495-1-kirill.shutemov@linux.intel.com Signed-off-by: Linus Torvalds --- mm/mmap.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) (limited to 'mm') diff --git a/mm/mmap.c b/mm/mmap.c index 59a4682ebf3f..8c7ca737a19b 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2620,7 +2620,7 @@ static void unmap_region(struct mm_struct *mm, * Create a list of vma's touched by the unmap, removing them from the mm's * vma list as we go.. */ -static void +static bool detach_vmas_to_be_unmapped(struct mm_struct *mm, struct vm_area_struct *vma, struct vm_area_struct *prev, unsigned long end) { @@ -2645,6 +2645,17 @@ detach_vmas_to_be_unmapped(struct mm_struct *mm, struct vm_area_struct *vma, /* Kill the cache */ vmacache_invalidate(mm); + + /* + * Do not downgrade mmap_lock if we are next to VM_GROWSDOWN or + * VM_GROWSUP VMA. Such VMAs can change their size under + * down_read(mmap_lock) and collide with the VMA we are about to unmap. + */ + if (vma && (vma->vm_flags & VM_GROWSDOWN)) + return false; + if (prev && (prev->vm_flags & VM_GROWSUP)) + return false; + return true; } /* @@ -2825,7 +2836,8 @@ int __do_munmap(struct mm_struct *mm, unsigned long start, size_t len, } /* Detach vmas from rbtree */ - detach_vmas_to_be_unmapped(mm, vma, prev, end); + if (!detach_vmas_to_be_unmapped(mm, vma, prev, end)) + downgrade = false; if (downgrade) mmap_write_downgrade(mm); -- cgit v1.2.3 From 3bef735ad7b7d987069181e7b58588043cbd1509 Mon Sep 17 00:00:00 2001 From: Chengguang Xu Date: Thu, 23 Jul 2020 21:15:14 -0700 Subject: vfs/xattr: mm/shmem: kernfs: release simple xattr entry in a right way After commit fdc85222d58e ("kernfs: kvmalloc xattr value instead of kmalloc"), simple xattr entry is allocated with kvmalloc() instead of kmalloc(), so we should release it with kvfree() instead of kfree(). Fixes: fdc85222d58e ("kernfs: kvmalloc xattr value instead of kmalloc") Signed-off-by: Chengguang Xu Signed-off-by: Andrew Morton Acked-by: Hugh Dickins Acked-by: Tejun Heo Cc: Daniel Xu Cc: Chris Down Cc: Andreas Dilger Cc: Greg Kroah-Hartman Cc: Al Viro Cc: [5.7] Link: http://lkml.kernel.org/r/20200704051608.15043-1-cgxu519@mykernel.net Signed-off-by: Linus Torvalds --- include/linux/xattr.h | 3 ++- mm/shmem.c | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) (limited to 'mm') diff --git a/include/linux/xattr.h b/include/linux/xattr.h index 47eaa34f8761..c5afaf8ca7a2 100644 --- a/include/linux/xattr.h +++ b/include/linux/xattr.h @@ -15,6 +15,7 @@ #include #include #include +#include #include struct inode; @@ -94,7 +95,7 @@ static inline void simple_xattrs_free(struct simple_xattrs *xattrs) list_for_each_entry_safe(xattr, node, &xattrs->head, list) { kfree(xattr->name); - kfree(xattr); + kvfree(xattr); } } diff --git a/mm/shmem.c b/mm/shmem.c index a0dbe62f8042..b2abca3f7f33 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -3178,7 +3178,7 @@ static int shmem_initxattrs(struct inode *inode, new_xattr->name = kmalloc(XATTR_SECURITY_PREFIX_LEN + len, GFP_KERNEL); if (!new_xattr->name) { - kfree(new_xattr); + kvfree(new_xattr); return -ENOMEM; } -- cgit v1.2.3 From 45779b036d3d2870633443a9f9bd03c177befbf5 Mon Sep 17 00:00:00 2001 From: Tom Rix Date: Thu, 23 Jul 2020 21:15:18 -0700 Subject: mm: initialize return of vm_insert_pages clang static analysis reports a garbage return In file included from mm/memory.c:84: mm/memory.c:1612:2: warning: Undefined or garbage value returned to caller [core.uninitialized.UndefReturn] return err; ^~~~~~~~~~ The setting of err depends on a loop executing. So initialize err. Signed-off-by: Tom Rix Signed-off-by: Andrew Morton Link: http://lkml.kernel.org/r/20200703155354.29132-1-trix@redhat.com Signed-off-by: Linus Torvalds --- mm/memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'mm') diff --git a/mm/memory.c b/mm/memory.c index 87ec87cdc1ff..3ecad55103ad 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1601,7 +1601,7 @@ int vm_insert_pages(struct vm_area_struct *vma, unsigned long addr, return insert_pages(vma, addr, pages, num, vma->vm_page_prot); #else unsigned long idx = 0, pgcount = *num; - int err; + int err = -EINVAL; for (; idx < pgcount; ++idx) { err = vm_insert_page(vma, addr + (PAGE_SIZE * idx), pages[idx]); -- cgit v1.2.3 From 82ff165cd35110d4e380b55927bbd74dcb564998 Mon Sep 17 00:00:00 2001 From: Bhupesh Sharma Date: Thu, 23 Jul 2020 21:15:21 -0700 Subject: mm/memcontrol: fix OOPS inside mem_cgroup_get_nr_swap_pages() Prabhakar reported an OOPS inside mem_cgroup_get_nr_swap_pages() function in a corner case seen on some arm64 boards when kdump kernel runs with "cgroup_disable=memory" passed to the kdump kernel via bootargs. The root-cause behind the same is that currently mem_cgroup_swap_init() function is implemented as a subsys_initcall() call instead of a core_initcall(), this means 'cgroup_memory_noswap' still remains set to the default value (false) even when memcg is disabled via "cgroup_disable=memory" boot parameter. This may result in premature OOPS inside mem_cgroup_get_nr_swap_pages() function in corner cases: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000188 Mem abort info: ESR = 0x96000006 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000006 CM = 0, WnR = 0 [0000000000000188] user address but active_mm is swapper Internal error: Oops: 96000006 [#1] SMP Modules linked in: <..snip..> Call trace: mem_cgroup_get_nr_swap_pages+0x9c/0xf4 shrink_lruvec+0x404/0x4f8 shrink_node+0x1a8/0x688 do_try_to_free_pages+0xe8/0x448 try_to_free_pages+0x110/0x230 __alloc_pages_slowpath.constprop.106+0x2b8/0xb48 __alloc_pages_nodemask+0x2ac/0x2f8 alloc_page_interleave+0x20/0x90 alloc_pages_current+0xdc/0xf8 atomic_pool_expand+0x60/0x210 __dma_atomic_pool_init+0x50/0xa4 dma_atomic_pool_init+0xac/0x158 do_one_initcall+0x50/0x218 kernel_init_freeable+0x22c/0x2d0 kernel_init+0x18/0x110 ret_from_fork+0x10/0x18 Code: aa1403e3 91106000 97f82a27 14000011 (f940c663) ---[ end trace 9795948475817de4 ]--- Kernel panic - not syncing: Fatal exception Rebooting in 10 seconds.. Fixes: eccb52e78809 ("mm: memcontrol: prepare swap controller setup for integration") Reported-by: Prabhakar Kushwaha Signed-off-by: Bhupesh Sharma Signed-off-by: Andrew Morton Acked-by: Michal Hocko Cc: Johannes Weiner Cc: Vladimir Davydov Cc: James Morse Cc: Mark Rutland Cc: Will Deacon Cc: Catalin Marinas Link: http://lkml.kernel.org/r/1593641660-13254-2-git-send-email-bhsharma@redhat.com Signed-off-by: Linus Torvalds --- mm/memcontrol.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) (limited to 'mm') diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 19622328e4b5..c75c4face02e 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -7186,6 +7186,13 @@ static struct cftype memsw_files[] = { { }, /* terminate */ }; +/* + * If mem_cgroup_swap_init() is implemented as a subsys_initcall() + * instead of a core_initcall(), this could mean cgroup_memory_noswap still + * remains set to false even when memcg is disabled via "cgroup_disable=memory" + * boot parameter. This may result in premature OOPS inside + * mem_cgroup_get_nr_swap_pages() function in corner cases. + */ static int __init mem_cgroup_swap_init(void) { /* No memory control -> no swap control */ @@ -7200,6 +7207,6 @@ static int __init mem_cgroup_swap_init(void) return 0; } -subsys_initcall(mem_cgroup_swap_init); +core_initcall(mem_cgroup_swap_init); #endif /* CONFIG_MEMCG_SWAP */ -- cgit v1.2.3 From 8d22a9351035ef2ff12ef163a1091b8b8cf1e49c Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Thu, 23 Jul 2020 21:15:24 -0700 Subject: mm/memcg: fix refcount error while moving and swapping It was hard to keep a test running, moving tasks between memcgs with move_charge_at_immigrate, while swapping: mem_cgroup_id_get_many()'s refcount is discovered to be 0 (supposedly impossible), so it is then forced to REFCOUNT_SATURATED, and after thousands of warnings in quick succession, the test is at last put out of misery by being OOM killed. This is because of the way moved_swap accounting was saved up until the task move gets completed in __mem_cgroup_clear_mc(), deferred from when mem_cgroup_move_swap_account() actually exchanged old and new ids. Concurrent activity can free up swap quicker than the task is scanned, bringing id refcount down 0 (which should only be possible when offlining). Just skip that optimization: do that part of the accounting immediately. Fixes: 615d66c37c75 ("mm: memcontrol: fix memcg id ref counter on swap charge move") Signed-off-by: Hugh Dickins Signed-off-by: Andrew Morton Reviewed-by: Alex Shi Cc: Johannes Weiner Cc: Alex Shi Cc: Shakeel Butt Cc: Michal Hocko Cc: Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2007071431050.4726@eggly.anvils Signed-off-by: Linus Torvalds --- mm/memcontrol.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'mm') diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c75c4face02e..13f559af1ab6 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5669,7 +5669,6 @@ static void __mem_cgroup_clear_mc(void) if (!mem_cgroup_is_root(mc.to)) page_counter_uncharge(&mc.to->memory, mc.moved_swap); - mem_cgroup_id_get_many(mc.to, mc.moved_swap); css_put_many(&mc.to->css, mc.moved_swap); mc.moved_swap = 0; @@ -5860,7 +5859,8 @@ put: /* get_mctgt_type() gets the page */ ent = target.ent; if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to)) { mc.precharge--; - /* we fixup refcnts and charges later. */ + mem_cgroup_id_get_many(mc.to, 1); + /* we fixup other refcnts and charges later. */ mc.moved_swap++; } break; -- cgit v1.2.3 From d38a2b7a9c939e6d7329ab92b96559ccebf7b135 Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Thu, 23 Jul 2020 21:15:27 -0700 Subject: mm: memcg/slab: fix memory leak at non-root kmem_cache destroy If the kmem_cache refcount is greater than one, we should not mark the root kmem_cache as dying. If we mark the root kmem_cache dying incorrectly, the non-root kmem_cache can never be destroyed. It resulted in memory leak when memcg was destroyed. We can use the following steps to reproduce. 1) Use kmem_cache_create() to create a new kmem_cache named A. 2) Coincidentally, the kmem_cache A is an alias for kmem_cache B, so the refcount of B is just increased. 3) Use kmem_cache_destroy() to destroy the kmem_cache A, just decrease the B's refcount but mark the B as dying. 4) Create a new memory cgroup and alloc memory from the kmem_cache B. It leads to create a non-root kmem_cache for allocating memory. 5) When destroy the memory cgroup created in the step 4), the non-root kmem_cache can never be destroyed. If we repeat steps 4) and 5), this will cause a lot of memory leak. So only when refcount reach zero, we mark the root kmem_cache as dying. Fixes: 92ee383f6daa ("mm: fix race between kmem_cache destroy, create and deactivate") Signed-off-by: Muchun Song Signed-off-by: Andrew Morton Reviewed-by: Shakeel Butt Acked-by: Roman Gushchin Cc: Vlastimil Babka Cc: Christoph Lameter Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Cc: Shakeel Butt Cc: Link: http://lkml.kernel.org/r/20200716165103.83462-1-songmuchun@bytedance.com Signed-off-by: Linus Torvalds --- mm/slab_common.c | 35 ++++++++++++++++++++++++++++------- 1 file changed, 28 insertions(+), 7 deletions(-) (limited to 'mm') diff --git a/mm/slab_common.c b/mm/slab_common.c index 37d48a56431d..fe8b68482670 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -326,6 +326,14 @@ int slab_unmergeable(struct kmem_cache *s) if (s->refcount < 0) return 1; +#ifdef CONFIG_MEMCG_KMEM + /* + * Skip the dying kmem_cache. + */ + if (s->memcg_params.dying) + return 1; +#endif + return 0; } @@ -886,12 +894,15 @@ static int shutdown_memcg_caches(struct kmem_cache *s) return 0; } -static void flush_memcg_workqueue(struct kmem_cache *s) +static void memcg_set_kmem_cache_dying(struct kmem_cache *s) { spin_lock_irq(&memcg_kmem_wq_lock); s->memcg_params.dying = true; spin_unlock_irq(&memcg_kmem_wq_lock); +} +static void flush_memcg_workqueue(struct kmem_cache *s) +{ /* * SLAB and SLUB deactivate the kmem_caches through call_rcu. Make * sure all registered rcu callbacks have been invoked. @@ -923,10 +934,6 @@ static inline int shutdown_memcg_caches(struct kmem_cache *s) { return 0; } - -static inline void flush_memcg_workqueue(struct kmem_cache *s) -{ -} #endif /* CONFIG_MEMCG_KMEM */ void slab_kmem_cache_release(struct kmem_cache *s) @@ -944,8 +951,6 @@ void kmem_cache_destroy(struct kmem_cache *s) if (unlikely(!s)) return; - flush_memcg_workqueue(s); - get_online_cpus(); get_online_mems(); @@ -955,6 +960,22 @@ void kmem_cache_destroy(struct kmem_cache *s) if (s->refcount) goto out_unlock; +#ifdef CONFIG_MEMCG_KMEM + memcg_set_kmem_cache_dying(s); + + mutex_unlock(&slab_mutex); + + put_online_mems(); + put_online_cpus(); + + flush_memcg_workqueue(s); + + get_online_cpus(); + get_online_mems(); + + mutex_lock(&slab_mutex); +#endif + err = shutdown_memcg_caches(s); if (!err) err = shutdown_cache(s); -- cgit v1.2.3 From dbda8feadfa46b3d8dd7a2304f84ccbc036effe9 Mon Sep 17 00:00:00 2001 From: Barry Song Date: Thu, 23 Jul 2020 21:15:30 -0700 Subject: mm/hugetlb: avoid hardcoding while checking if cma is enabled hugetlb_cma[0] can be NULL due to various reasons, for example, node0 has no memory. so NULL hugetlb_cma[0] doesn't necessarily mean cma is not enabled. gigantic pages might have been reserved on other nodes. This patch fixes possible double reservation and CMA leak. [akpm@linux-foundation.org: fix CONFIG_CMA=n warning] [sfr@canb.auug.org.au: better checks before using hugetlb_cma] Link: http://lkml.kernel.org/r/20200721205716.6dbaa56b@canb.auug.org.au Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma") Signed-off-by: Barry Song Signed-off-by: Andrew Morton Reviewed-by: Mike Kravetz Acked-by: Roman Gushchin Cc: Jonathan Cameron Cc: Link: http://lkml.kernel.org/r/20200710005726.36068-1-song.bao.hua@hisilicon.com Signed-off-by: Linus Torvalds --- mm/hugetlb.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) (limited to 'mm') diff --git a/mm/hugetlb.c b/mm/hugetlb.c index fab4485b9e52..590111ea6975 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -45,7 +45,10 @@ int hugetlb_max_hstate __read_mostly; unsigned int default_hstate_idx; struct hstate hstates[HUGE_MAX_HSTATE]; +#ifdef CONFIG_CMA static struct cma *hugetlb_cma[MAX_NUMNODES]; +#endif +static unsigned long hugetlb_cma_size __initdata; /* * Minimum page order among possible hugepage sizes, set to a proper value @@ -1235,9 +1238,10 @@ static void free_gigantic_page(struct page *page, unsigned int order) * If the page isn't allocated using the cma allocator, * cma_release() returns false. */ - if (IS_ENABLED(CONFIG_CMA) && - cma_release(hugetlb_cma[page_to_nid(page)], page, 1 << order)) +#ifdef CONFIG_CMA + if (cma_release(hugetlb_cma[page_to_nid(page)], page, 1 << order)) return; +#endif free_contig_range(page_to_pfn(page), 1 << order); } @@ -1248,7 +1252,8 @@ static struct page *alloc_gigantic_page(struct hstate *h, gfp_t gfp_mask, { unsigned long nr_pages = 1UL << huge_page_order(h); - if (IS_ENABLED(CONFIG_CMA)) { +#ifdef CONFIG_CMA + { struct page *page; int node; @@ -1262,6 +1267,7 @@ static struct page *alloc_gigantic_page(struct hstate *h, gfp_t gfp_mask, return page; } } +#endif return alloc_contig_pages(nr_pages, gfp_mask, nid, nodemask); } @@ -2571,7 +2577,7 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h) for (i = 0; i < h->max_huge_pages; ++i) { if (hstate_is_gigantic(h)) { - if (IS_ENABLED(CONFIG_CMA) && hugetlb_cma[0]) { + if (hugetlb_cma_size) { pr_warn_once("HugeTLB: hugetlb_cma is enabled, skip boot time allocation\n"); break; } @@ -5654,7 +5660,6 @@ void move_hugetlb_state(struct page *oldpage, struct page *newpage, int reason) } #ifdef CONFIG_CMA -static unsigned long hugetlb_cma_size __initdata; static bool cma_reserve_called __initdata; static int __init cmdline_parse_hugetlb_cma(char *p) -- cgit v1.2.3 From 594cced14ad3903166c8b091ff96adac7552f0b3 Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" Date: Thu, 23 Jul 2020 21:15:34 -0700 Subject: khugepaged: fix null-pointer dereference due to race khugepaged has to drop mmap lock several times while collapsing a page. The situation can change while the lock is dropped and we need to re-validate that the VMA is still in place and the PMD is still subject for collapse. But we miss one corner case: while collapsing an anonymous pages the VMA could be replaced with file VMA. If the file VMA doesn't have any private pages we get NULL pointer dereference: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] anon_vma_lock_write include/linux/rmap.h:120 [inline] collapse_huge_page mm/khugepaged.c:1110 [inline] khugepaged_scan_pmd mm/khugepaged.c:1349 [inline] khugepaged_scan_mm_slot mm/khugepaged.c:2110 [inline] khugepaged_do_scan mm/khugepaged.c:2193 [inline] khugepaged+0x3bba/0x5a10 mm/khugepaged.c:2238 The fix is to make sure that the VMA is anonymous in hugepage_vma_revalidate(). The helper is only used for collapsing anonymous pages. Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS") Reported-by: syzbot+ed318e8b790ca72c5ad0@syzkaller.appspotmail.com Signed-off-by: Kirill A. Shutemov Signed-off-by: Andrew Morton Reviewed-by: David Hildenbrand Acked-by: Yang Shi Cc: Link: http://lkml.kernel.org/r/20200722121439.44328-1-kirill.shutemov@linux.intel.com Signed-off-by: Linus Torvalds --- mm/khugepaged.c | 3 +++ 1 file changed, 3 insertions(+) (limited to 'mm') diff --git a/mm/khugepaged.c b/mm/khugepaged.c index b043c40a21d4..700f5160f3e4 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -958,6 +958,9 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, return SCAN_ADDRESS_RANGE; if (!hugepage_vma_check(vma, vma->vm_flags)) return SCAN_VMA_CHECK; + /* Anon VMA expected */ + if (!vma->anon_vma || vma->vm_ops) + return SCAN_VMA_CHECK; return 0; } -- cgit v1.2.3