From 8be7258aad44b5e25977a98db136f677fa6f4370 Mon Sep 17 00:00:00 2001 From: Jeff Xu Date: Mon, 15 Apr 2024 16:35:21 +0000 Subject: mseal: add mseal syscall MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The new mseal() is an syscall on 64 bit CPU, and with following signature: int mseal(void addr, size_t len, unsigned long flags) addr/len: memory range. flags: reserved. mseal() blocks following operations for the given memory range. 1> Unmapping, moving to another location, and shrinking the size, via munmap() and mremap(), can leave an empty space, therefore can be replaced with a VMA with a new set of attributes. 2> Moving or expanding a different VMA into the current location, via mremap(). 3> Modifying a VMA via mmap(MAP_FIXED). 4> Size expansion, via mremap(), does not appear to pose any specific risks to sealed VMAs. It is included anyway because the use case is unclear. In any case, users can rely on merging to expand a sealed VMA. 5> mprotect() and pkey_mprotect(). 6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous memory, when users don't have write permission to the memory. Those behaviors can alter region contents by discarding pages, effectively a memset(0) for anonymous memory. Following input during RFC are incooperated into this patch: Jann Horn: raising awareness and providing valuable insights on the destructive madvise operations. Linus Torvalds: assisting in defining system call signature and scope. Liam R. Howlett: perf optimization. Theo de Raadt: sharing the experiences and insight gained from implementing mimmutable() in OpenBSD. Finally, the idea that inspired this patch comes from Stephen Röttger's work in Chrome V8 CFI. [jeffxu@chromium.org: add branch prediction hint, per Pedro] Link: https://lkml.kernel.org/r/20240423192825.1273679-2-jeffxu@chromium.org Link: https://lkml.kernel.org/r/20240415163527.626541-3-jeffxu@chromium.org Signed-off-by: Jeff Xu Reviewed-by: Kees Cook Reviewed-by: Liam R. Howlett Cc: Pedro Falcato Cc: Dave Hansen Cc: Greg Kroah-Hartman Cc: Guenter Roeck Cc: Jann Horn Cc: Jeff Xu Cc: Jonathan Corbet Cc: Jorge Lucangeli Obes Cc: Linus Torvalds Cc: Matthew Wilcox (Oracle) Cc: Muhammad Usama Anjum Cc: Pedro Falcato Cc: Stephen Röttger Cc: Suren Baghdasaryan Cc: Amer Al Shanawany Cc: Javier Carrasco Cc: Shuah Khan Signed-off-by: Andrew Morton --- mm/mmap.c | 31 ++++++++++++++++++++++++++++++- 1 file changed, 30 insertions(+), 1 deletion(-) (limited to 'mm/mmap.c') diff --git a/mm/mmap.c b/mm/mmap.c index d6d8ab119b72..83b4682ec85c 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1255,6 +1255,16 @@ unsigned long do_mmap(struct file *file, unsigned long addr, if (mm->map_count > sysctl_max_map_count) return -ENOMEM; + /* + * addr is returned from get_unmapped_area, + * There are two cases: + * 1> MAP_FIXED == false + * unallocated memory, no need to check sealing. + * 1> MAP_FIXED == true + * sealing is checked inside mmap_region when + * do_vmi_munmap is called. + */ + if (prot == PROT_EXEC) { pkey = execute_only_pkey(mm); if (pkey < 0) @@ -2727,6 +2737,14 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, if (end == start) return -EINVAL; + /* + * Check if memory is sealed before arch_unmap. + * Prevent unmapping a sealed VMA. + * can_modify_mm assumes we have acquired the lock on MM. + */ + if (unlikely(!can_modify_mm(mm, start, end))) + return -EPERM; + /* arch_unmap() might do unmaps itself. */ arch_unmap(mm, start, end); @@ -2789,7 +2807,10 @@ unsigned long mmap_region(struct file *file, unsigned long addr, } /* Unmap any existing mapping in the area */ - if (do_vmi_munmap(&vmi, mm, addr, len, uf, false)) + error = do_vmi_munmap(&vmi, mm, addr, len, uf, false); + if (error == -EPERM) + return error; + else if (error) return -ENOMEM; /* @@ -3139,6 +3160,14 @@ int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, { struct mm_struct *mm = vma->vm_mm; + /* + * Check if memory is sealed before arch_unmap. + * Prevent unmapping a sealed VMA. + * can_modify_mm assumes we have acquired the lock on MM. + */ + if (unlikely(!can_modify_mm(mm, start, end))) + return -EPERM; + arch_unmap(mm, start, end); return do_vmi_align_munmap(vmi, vma, mm, start, end, uf, unlock); } -- cgit v1.2.3