summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-10-22x86/stackprotector: Remove the call to boot_init_stack_canary() from ↵Christophe Leroy
cpu_startup_entry() The following commit: d7880812b359 ("idle: Add the stack canary init to cpu_startup_entry()") ... added an x86 specific boot_init_stack_canary() call to the generic cpu_startup_entry() as a temporary hack, with the intention to remove the #ifdef CONFIG_X86 later. More than 5 years later let's finally realize that plan! :-) While implementing stack protector support for PowerPC, we found that calling boot_init_stack_canary() is also needed for PowerPC which uses per task (TLS) stack canary like the X86. However, calling boot_init_stack_canary() would break architectures using a global stack canary (ARM, SH, MIPS and XTENSA). Instead of modifying the #ifdef CONFIG_X86 to an even messier: #if defined(CONFIG_X86) || defined(CONFIG_PPC) PowerPC implemented the call to boot_init_stack_canary() in the function calling cpu_startup_entry(). Let's try the same cleanup on the x86 side as well. On x86 we have two functions calling cpu_startup_entry(): - start_secondary() - cpu_bringup_and_idle() start_secondary() already calls boot_init_stack_canary(), so it's good, and this patch adds the call to boot_init_stack_canary() in cpu_bringup_and_idle(). I.e. now x86 catches up to the rest of the world and the ugly init sequence in init/main.c can be removed from cpu_startup_entry(). As a final benefit we can also remove the <linux/stackprotector.h> dependency from <linux/sched.h>. [ mingo: Improved the changelog a bit, added language explaining x86 borkage and sched.h change. ] Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linuxppc-dev@lists.ozlabs.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20181020072649.5B59310483E@pc16082vm.idsi0.si.c-s.fr Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-21x86/mm: Kill stray kernel fault handling commentDave Hansen
I originally had matching user and kernel comments, but the kernel one got improved. Some errant conflict resolution kicked the commment somewhere wrong. Kill it. Reported-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: aa37c51b94 ("x86/mm: Break out user address space handling") Link: http://lkml.kernel.org/r/20181019140842.12F929FA@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-10x86/mm: Do not warn about PCI BIOS W+X mappingsThomas Gleixner
PCI BIOS requires the BIOS area 0x0A0000-0x0FFFFFF to be mapped W+X for various legacy reasons. When CONFIG_DEBUG_WX is enabled, this triggers the WX warning, but this is misleading because the mapping is required and is not a result of an accidental oversight. Prevent the full warning when PCI BIOS is enabled and the detected WX mapping is in the BIOS area. Just emit a pr_warn() which denotes the fact. This is partially duplicating the info which the PCI BIOS code emits when it maps the area as executable, but that info is not in the context of the WX checking output. Remove the extra %p printout in the WARN_ONCE() while at it. %pS is enough. Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Borislav Petkov <bp@suse.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: Kees Cook <keescook@chromium.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1810082151160.2455@nanos.tec.linutronix.de
2018-10-09resource: Clean it up a bitBorislav Petkov
- Drop BUG_ON()s and do normal error handling instead, in find_next_iomem_res(). - Align function arguments on opening braces. - Get rid of local var sibling_only in find_next_iomem_res(). - Shorten unnecessarily long first_level_children_only arg name. Signed-off-by: Borislav Petkov <bp@suse.de> CC: Andrew Morton <akpm@linux-foundation.org> CC: Bjorn Helgaas <bhelgaas@google.com> CC: Brijesh Singh <brijesh.singh@amd.com> CC: Dan Williams <dan.j.williams@intel.com> CC: H. Peter Anvin <hpa@zytor.com> CC: Lianbo Jiang <lijiang@redhat.com> CC: Takashi Iwai <tiwai@suse.de> CC: Thomas Gleixner <tglx@linutronix.de> CC: Tom Lendacky <thomas.lendacky@amd.com> CC: Vivek Goyal <vgoyal@redhat.com> CC: Yaowei Bai <baiyaowei@cmss.chinamobile.com> CC: bhe@redhat.com CC: dan.j.williams@intel.com CC: dyoung@redhat.com CC: kexec@lists.infradead.org CC: mingo@redhat.com Link: <new submission>
2018-10-09resource: Fix find_next_iomem_res() iteration issueBjorn Helgaas
Previously find_next_iomem_res() used "*res" as both an input parameter for the range to search and the type of resource to search for, and an output parameter for the resource we found, which makes the interface confusing. The current callers use find_next_iomem_res() incorrectly because they allocate a single struct resource and use it for repeated calls to find_next_iomem_res(). When find_next_iomem_res() returns a resource, it overwrites the start, end, flags, and desc members of the struct. If we call find_next_iomem_res() again, we must update or restore these fields. The previous code restored res.start and res.end, but not res.flags or res.desc. Since the callers did not restore res.flags, if they searched for flags IORESOURCE_MEM | IORESOURCE_BUSY and found a resource with flags IORESOURCE_MEM | IORESOURCE_BUSY | IORESOURCE_SYSRAM, the next search would incorrectly skip resources unless they were also marked as IORESOURCE_SYSRAM. Fix this by restructuring the interface so it takes explicit "start, end, flags" parameters and uses "*res" only as an output parameter. Based on a patch by Lianbo Jiang <lijiang@redhat.com>. [ bp: While at it: - make comments kernel-doc style. - Originally-by: http://lore.kernel.org/lkml/20180921073211.20097-2-lijiang@redhat.com Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> CC: Andrew Morton <akpm@linux-foundation.org> CC: Brijesh Singh <brijesh.singh@amd.com> CC: Dan Williams <dan.j.williams@intel.com> CC: H. Peter Anvin <hpa@zytor.com> CC: Lianbo Jiang <lijiang@redhat.com> CC: Takashi Iwai <tiwai@suse.de> CC: Thomas Gleixner <tglx@linutronix.de> CC: Tom Lendacky <thomas.lendacky@amd.com> CC: Vivek Goyal <vgoyal@redhat.com> CC: Yaowei Bai <baiyaowei@cmss.chinamobile.com> CC: bhe@redhat.com CC: dan.j.williams@intel.com CC: dyoung@redhat.com CC: kexec@lists.infradead.org CC: mingo@redhat.com CC: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/153805812916.1157.177580438135143788.stgit@bhelgaas-glaptop.roam.corp.google.com
2018-10-09resource: Include resource end in walk_*() interfacesBjorn Helgaas
find_next_iomem_res() finds an iomem resource that covers part of a range described by "start, end". All callers expect that range to be inclusive, i.e., both start and end are included, but find_next_iomem_res() doesn't handle the end address correctly. If it finds an iomem resource that contains exactly the end address, it skips it, e.g., if "start, end" is [0x0-0x10000] and there happens to be an iomem resource [mem 0x10000-0x10000] (the single byte at 0x10000), we skip it: find_next_iomem_res(...) { start = 0x0; end = 0x10000; for (p = next_resource(...)) { # p->start = 0x10000; # p->end = 0x10000; # we *should* return this resource, but this condition is false: if ((p->end >= start) && (p->start < end)) break; Adjust find_next_iomem_res() so it allows a resource that includes the single byte at the end of the range. This is a corner case that we probably don't see in practice. Fixes: 58c1b5b07907 ("[PATCH] memory hotadd fixes: find_next_system_ram catch range fix") Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> CC: Andrew Morton <akpm@linux-foundation.org> CC: Brijesh Singh <brijesh.singh@amd.com> CC: Dan Williams <dan.j.williams@intel.com> CC: H. Peter Anvin <hpa@zytor.com> CC: Lianbo Jiang <lijiang@redhat.com> CC: Takashi Iwai <tiwai@suse.de> CC: Thomas Gleixner <tglx@linutronix.de> CC: Tom Lendacky <thomas.lendacky@amd.com> CC: Vivek Goyal <vgoyal@redhat.com> CC: Yaowei Bai <baiyaowei@cmss.chinamobile.com> CC: bhe@redhat.com CC: dan.j.williams@intel.com CC: dyoung@redhat.com CC: kexec@lists.infradead.org CC: mingo@redhat.com CC: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/153805812254.1157.16736368485811773752.stgit@bhelgaas-glaptop.roam.corp.google.com
2018-10-09x86/kexec: Correct KEXEC_BACKUP_SRC_END off-by-one errorBjorn Helgaas
The only use of KEXEC_BACKUP_SRC_END is as an argument to walk_system_ram_res(): int crash_load_segments(struct kimage *image) { ... walk_system_ram_res(KEXEC_BACKUP_SRC_START, KEXEC_BACKUP_SRC_END, image, determine_backup_region); walk_system_ram_res() expects "start, end" arguments that are inclusive, i.e., the range to be walked includes both the start and end addresses. KEXEC_BACKUP_SRC_END was previously defined as (640 * 1024UL), which is the first address *past* the desired 0-640KB range. Define KEXEC_BACKUP_SRC_END as (640 * 1024UL - 1) so the KEXEC_BACKUP_SRC region is [0-0x9ffff], not [0-0xa0000]. Fixes: dd5f726076cc ("kexec: support for kexec on panic using new system call") Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Andrew Morton <akpm@linux-foundation.org> CC: Brijesh Singh <brijesh.singh@amd.com> CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> CC: Ingo Molnar <mingo@redhat.com> CC: Lianbo Jiang <lijiang@redhat.com> CC: Takashi Iwai <tiwai@suse.de> CC: Thomas Gleixner <tglx@linutronix.de> CC: Tom Lendacky <thomas.lendacky@amd.com> CC: Vivek Goyal <vgoyal@redhat.com> CC: baiyaowei@cmss.chinamobile.com CC: bhe@redhat.com CC: dan.j.williams@intel.com CC: dyoung@redhat.com CC: kexec@lists.infradead.org Link: http://lkml.kernel.org/r/153805811578.1157.6948388946904655969.stgit@bhelgaas-glaptop.roam.corp.google.com
2018-10-09x86/mm: Remove spurious fault pkey checkDave Hansen
Spurious faults only ever occur in the kernel's address space. They are also constrained specifically to faults with one of these error codes: X86_PF_WRITE | X86_PF_PROT X86_PF_INSTR | X86_PF_PROT So, it's never even possible to reach spurious_kernel_fault_check() with X86_PF_PK set. In addition, the kernel's address space never has pages with user-mode protections. Protection Keys are only enforced on pages with user-mode protection. This gives us lots of reasons to not check for protection keys in our sprurious kernel fault handling. But, let's also add some warnings to ensure that these assumptions about protection keys hold true. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160231.243A0D6A@viggo.jf.intel.com
2018-10-09x86/mm/vsyscall: Consider vsyscall page part of user address spaceDave Hansen
The vsyscall page is weird. It is in what is traditionally part of the kernel address space. But, it has user permissions and we handle faults on it like we would on a user page: interrupts on. Right now, we handle vsyscall emulation in the "bad_area" code, which is used for both user-address-space and kernel-address-space faults. Move the handling to the user-address-space code *only* and ensure we get there by "excluding" the vsyscall page from the kernel address space via a check in fault_in_kernel_space(). Since the fault_in_kernel_space() check is used on 32-bit, also add a 64-bit check to make it clear we only use this path on 64-bit. Also move the unlikely() to be in is_vsyscall_vaddr() itself. This helps clean up the kernel fault handling path by removing a case that can happen in normal[1] operation. (Yeah, yeah, we can argue about the vsyscall page being "normal" or not.) This also makes sanity checks easier, like the "we never take pkey faults in the kernel address space" check in the next patch. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160230.6E9336EE@viggo.jf.intel.com
2018-10-09x86/mm: Add vsyscall address helperDave Hansen
We will shortly be using this check in two locations. Put it in a helper before we do so. Let's also insert PAGE_MASK instead of the open-coded ~0xfff. It is easier to read and also more obviously correct considering the implicit type conversion that has to happen when it is not an implicit 'unsigned long'. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160228.C593509B@viggo.jf.intel.com
2018-10-09x86/mm: Fix exception table commentsDave Hansen
The comments here are wrong. They are too absolute about where faults can occur when running in the kernel. The comments are also a bit hard to match up with the code. Trim down the comments, and make them more precise. Also add a comment explaining why we are doing the bad_area_nosemaphore() path here. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160227.077DDD7A@viggo.jf.intel.com
2018-10-09x86/mm: Add clarifying comments for user addr spaceDave Hansen
The SMAP and Reserved checking do not have nice comments. Add some to clarify and make it match everything else. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160225.FFD44B8D@viggo.jf.intel.com
2018-10-09x86/mm: Break out user address space handlingDave Hansen
The last patch broke out kernel address space handing into its own helper. Now, do the same for user address space handling. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160223.9C4F6440@viggo.jf.intel.com
2018-10-09x86/mm: Break out kernel address space handlingDave Hansen
The page fault handler (__do_page_fault()) basically has two sections: one for handling faults in the kernel portion of the address space and another for faults in the user portion of the address space. But, these two parts don't stick out that well. Let's make that more clear from code separation and naming. Pull kernel fault handling into its own helper, and reflect that naming by renaming spurious_fault() -> spurious_kernel_fault(). Also, rewrite the vmalloc() handling comment a bit. It was a bit stale and also glossed over the reserved bit handling. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160222.401F4E10@viggo.jf.intel.com
2018-10-09x86/mm: Clarify hardware vs. software "error_code"Dave Hansen
We pass around a variable called "error_code" all around the page fault code. Sounds simple enough, especially since "error_code" looks like it exactly matches the values that the hardware gives us on the stack to report the page fault error code (PFEC in SDM parlance). But, that's not how it works. For part of the page fault handler, "error_code" does exactly match PFEC. But, during later parts, it diverges and starts to mean something a bit different. Give it two names for its two jobs. The place it diverges is also really screwy. It's only in a spot where the hardware tells us we have kernel-mode access that occurred while we were in usermode accessing user-controlled address space. Add a warning in there. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160220.4A2272C9@viggo.jf.intel.com
2018-10-09x86/mm/tlb: Make lazy TLB mode lazierRik van Riel
Lazy TLB mode can result in an idle CPU being woken up by a TLB flush, when all it really needs to do is reload %CR3 at the next context switch, assuming no page table pages got freed. Memory ordering is used to prevent race conditions between switch_mm_irqs_off, which checks whether .tlb_gen changed, and the TLB invalidation code, which increments .tlb_gen whenever page table entries get invalidated. The atomic increment in inc_mm_tlb_gen is its own barrier; the context switch code adds an explicit barrier between reading tlbstate.is_lazy and next->context.tlb_gen. CPUs in lazy TLB mode remain part of the mm_cpumask(mm), both because that allows TLB flush IPIs to be sent at page table freeing time, and because the cache line bouncing on the mm_cpumask(mm) was responsible for about half the CPU use in switch_mm_irqs_off(). We can change native_flush_tlb_others() without touching other (paravirt) implementations of flush_tlb_others() because we'll be flushing less. The existing implementations flush more and are therefore still correct. Cc: npiggin@gmail.com Cc: mingo@kernel.org Cc: will.deacon@arm.com Cc: kernel-team@fb.com Cc: luto@kernel.org Cc: hpa@zytor.com Tested-by: Song Liu <songliubraving@fb.com> Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180926035844.1420-8-riel@surriel.com
2018-10-09x86/mm/tlb: Add freed_tables element to flush_tlb_infoRik van Riel
Pass the information on to native_flush_tlb_others. No functional changes. Cc: npiggin@gmail.com Cc: mingo@kernel.org Cc: will.deacon@arm.com Cc: songliubraving@fb.com Cc: kernel-team@fb.com Cc: hpa@zytor.com Cc: luto@kernel.org Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180926035844.1420-7-riel@surriel.com
2018-10-09x86/mm/tlb: Add freed_tables argument to flush_tlb_mm_rangeRik van Riel
Add an argument to flush_tlb_mm_range to indicate whether page tables are about to be freed after this TLB flush. This allows for an optimization of flush_tlb_mm_range to skip CPUs in lazy TLB mode. No functional changes. Cc: npiggin@gmail.com Cc: mingo@kernel.org Cc: will.deacon@arm.com Cc: songliubraving@fb.com Cc: kernel-team@fb.com Cc: luto@kernel.org Cc: hpa@zytor.com Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180926035844.1420-6-riel@surriel.com
2018-10-09smp,cpumask: introduce on_each_cpu_cond_maskRik van Riel
Introduce a variant of on_each_cpu_cond that iterates only over the CPUs in a cpumask, in order to avoid making callbacks for every single CPU in the system when we only need to test a subset. Cc: npiggin@gmail.com Cc: mingo@kernel.org Cc: will.deacon@arm.com Cc: songliubraving@fb.com Cc: kernel-team@fb.com Cc: hpa@zytor.com Cc: luto@kernel.org Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180926035844.1420-5-riel@surriel.com
2018-10-09smp: use __cpumask_set_cpu in on_each_cpu_condRik van Riel
The code in on_each_cpu_cond sets CPUs in a locally allocated bitmask, which should never be used by other CPUs simultaneously. There is no need to use locked memory accesses to set the bits in this bitmap. Switch to __cpumask_set_cpu. Cc: npiggin@gmail.com Cc: mingo@kernel.org Cc: will.deacon@arm.com Cc: songliubraving@fb.com Cc: kernel-team@fb.com Cc: hpa@zytor.com Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Rik van Riel <riel@surriel.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180926035844.1420-4-riel@surriel.com
2018-10-09x86/mm/tlb: Restructure switch_mm_irqs_off()Rik van Riel
Move some code that will be needed for the lazy -> !lazy state transition when a lazy TLB CPU has gotten out of date. No functional changes, since the if (real_prev == next) branch always returns. (cherry picked from commit 61d0beb5796ab11f7f3bf38cb2eccc6579aaa70b) Cc: npiggin@gmail.com Cc: efault@gmx.de Cc: will.deacon@arm.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: songliubraving@fb.com Cc: kernel-team@fb.com Cc: hpa@zytor.com Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Rik van Riel <riel@surriel.com> Acked-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180716190337.26133-4-riel@surriel.com
2018-10-09x86/mm/tlb: Always use lazy TLB modeRik van Riel
On most workloads, the number of context switches far exceeds the number of TLB flushes sent. Optimizing the context switches, by always using lazy TLB mode, speeds up those workloads. This patch results in about a 1% reduction in CPU use on a two socket Broadwell system running a memcache like workload. Cc: npiggin@gmail.com Cc: efault@gmx.de Cc: will.deacon@arm.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-team@fb.com Cc: hpa@zytor.com Cc: luto@kernel.org Tested-by: Song Liu <songliubraving@fb.com> Signed-off-by: Rik van Riel <riel@surriel.com> (cherry picked from commit 95b0e6357d3e4e05349668940d7ff8f3b7e7e11e) Acked-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180716190337.26133-7-riel@surriel.com
2018-10-09x86/mm: Page size aware flush_tlb_mm_range()Peter Zijlstra
Use the new tlb_get_unmap_shift() to determine the stride of the INVLPG loop. Cc: Nick Piggin <npiggin@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2018-10-09Merge branch 'tlb/asm-generic' of ↵Peter Zijlstra
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux into x86/mm Pull in the generic mmu_gather changes from the ARM64 tree such that we can put x86 specific things on top as well.
2018-10-09proc/vmcore: Fix i386 build error of missing copy_oldmem_page_encrypted()Borislav Petkov
Lianbo reported a build error with a particular 32-bit config, see Link below for details. Provide a weak copy_oldmem_page_encrypted() function which architectures can override, in the same manner other functionality in that file is supplied. Reported-by: Lianbo Jiang <lijiang@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> CC: x86@kernel.org Link: http://lkml.kernel.org/r/710b9d95-2f70-eadf-c4a1-c3dc80ee4ebb@redhat.com
2018-10-06x86/mm/doc: Enhance the x86-64 virtual memory layout descriptionsIngo Molnar
After the cleanups from Baoquan He, make it even more readable: - Remove the 'bits' area size column: it's pretty pointless and was even wrong for some of the entries. Given that MB, GB, TB, PT are 10, 20, 30 and 40 bits, a "8 TB" size description makes it obvious that it's 43 bits. - Introduce an "offset" column: -------------------------------------------------------------------------------- start addr | offset | end addr | size | VM area description -----------------|------------|------------------|---------|-------------------- ... ffff880000000000 | -120 TB | ffffc7ffffffffff | 64 TB | direct mapping of all physical memory (page_offset_base), this is what limits max physical memory supported. The -120 TB notation makes it obvious where this particular virtual memory region starts: 120 TB down from the top of the 64-bit virtual memory space. Especially the layout of the kernel mappings is a *lot* more obvious when written this way, plus it's much easier to compare it with the size column and understand/check/validate and modify the kernel's layout in the future. - Mark the part from where the 47-bit and 56-bit kernel layouts are 100% identical, this starts at the -512 GB offset and the EFI region. - Re-shuffle the size desciptions to be continous blocks of sizes, instead of the often mixed size. I.e. write "0.5 TB" instead of "512 GB" if we are still in the TB-granular region of the map. - Make the 47-bit and 56-bit descriptions use the *exact* same layout and wording, and only differ where there's a material difference. This makes it easy to compare the two tables side by side by switching between two terminal tabs. - Plus enhance a lot of other stylistic/typographical details: make the tables explicitly tabular, add headers, enhance certain entries, etc. etc. Note that there are some apparent errors in the tables as well, but I'll fix them in a separate patch to make it easier to review/validate. Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: corbet@lwn.net Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: thgarnie@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-06x86/mm/doc: Clean up the x86-64 virtual memory layout descriptionsBaoquan He
In Documentation/x86/x86_64/mm.txt, the description of the x86-64 virtual memory layout has become a confusing hodgepodge of inconsistencies: - there's a hard to read mixture of 'TB' and 'bits' notation - the entries sometimes mention a size in the description and sometimes not - sometimes they list holes by address, sometimes only as an 'unused hole' line So make it all a coherent, readable, well organized description. Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: corbet@lwn.net Cc: linux-doc@vger.kernel.org Cc: thgarnie@google.com Link: http://lkml.kernel.org/r/20181006084327.27467-3-bhe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-06x86/KASLR: Update KERNEL_IMAGE_SIZE descriptionBaoquan He
Currently CONFIG_RANDOMIZE_BASE=y is set by default, which makes some of the old comments above the KERNEL_IMAGE_SIZE definition out of date. Update them to the current state of affairs. Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: corbet@lwn.net Cc: linux-doc@vger.kernel.org Cc: thgarnie@google.com Link: http://lkml.kernel.org/r/20181006084327.27467-2-bhe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-06kdump, proc/vmcore: Enable kdumping encrypted memory with SME enabledLianbo Jiang
In the kdump kernel, the memory of the first kernel needs to be dumped into the vmcore file. If SME is enabled in the first kernel, the old memory has to be remapped with the memory encryption mask in order to access it properly. Split copy_oldmem_page() functionality to handle encrypted memory properly. [ bp: Heavily massage everything. ] Signed-off-by: Lianbo Jiang <lijiang@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: kexec@lists.infradead.org Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: akpm@linux-foundation.org Cc: dan.j.williams@intel.com Cc: bhelgaas@google.com Cc: baiyaowei@cmss.chinamobile.com Cc: tiwai@suse.de Cc: brijesh.singh@amd.com Cc: dyoung@redhat.com Cc: bhe@redhat.com Cc: jroedel@suse.de Link: https://lkml.kernel.org/r/be7b47f9-6be6-e0d1-2c2a-9125bc74b818@redhat.com
2018-10-06iommu/amd: Remap the IOMMU device table with the memory encryption mask for ↵Lianbo Jiang
kdump The kdump kernel copies the IOMMU device table from the old device table which is encrypted when SME is enabled in the first kernel. So remap the old device table with the memory encryption mask in the kdump kernel. [ bp: Massage commit message. ] Signed-off-by: Lianbo Jiang <lijiang@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Acked-by: Joerg Roedel <jroedel@suse.de> Cc: kexec@lists.infradead.org Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: akpm@linux-foundation.org Cc: dan.j.williams@intel.com Cc: bhelgaas@google.com Cc: baiyaowei@cmss.chinamobile.com Cc: tiwai@suse.de Cc: brijesh.singh@amd.com Cc: dyoung@redhat.com Cc: bhe@redhat.com Link: https://lkml.kernel.org/r/20180930031033.22110-4-lijiang@redhat.com
2018-10-06kexec: Allocate decrypted control pages for kdump if SME is enabledLianbo Jiang
When SME is enabled in the first kernel, it needs to allocate decrypted pages for kdump because when the kdump kernel boots, these pages need to be accessed decrypted in the initial boot stage, before SME is enabled. [ bp: clean up text. ] Signed-off-by: Lianbo Jiang <lijiang@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: kexec@lists.infradead.org Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: akpm@linux-foundation.org Cc: dan.j.williams@intel.com Cc: bhelgaas@google.com Cc: baiyaowei@cmss.chinamobile.com Cc: tiwai@suse.de Cc: brijesh.singh@amd.com Cc: dyoung@redhat.com Cc: bhe@redhat.com Cc: jroedel@suse.de Link: https://lkml.kernel.org/r/20180930031033.22110-3-lijiang@redhat.com
2018-10-06x86/ioremap: Add an ioremap_encrypted() helperLianbo Jiang
When SME is enabled, the memory is encrypted in the first kernel. In this case, SME also needs to be enabled in the kdump kernel, and we have to remap the old memory with the memory encryption mask. The case of concern here is if SME is active in the first kernel, and it is active too in the kdump kernel. There are four cases to be considered: a. dump vmcore It is encrypted in the first kernel, and needs be read out in the kdump kernel. b. crash notes When dumping vmcore, the people usually need to read useful information from notes, and the notes is also encrypted. c. iommu device table It's encrypted in the first kernel, kdump kernel needs to access its content to analyze and get information it needs. d. mmio of AMD iommu not encrypted in both kernels Add a new bool parameter @encrypted to __ioremap_caller(). If set, memory will be remapped with the SME mask. Add a new function ioremap_encrypted() to explicitly pass in a true value for @encrypted. Use ioremap_encrypted() for the above a, b, c cases. [ bp: cleanup commit message, extern defs in io.h and drop forgotten include. ] Signed-off-by: Lianbo Jiang <lijiang@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: kexec@lists.infradead.org Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: akpm@linux-foundation.org Cc: dan.j.williams@intel.com Cc: bhelgaas@google.com Cc: baiyaowei@cmss.chinamobile.com Cc: tiwai@suse.de Cc: brijesh.singh@amd.com Cc: dyoung@redhat.com Cc: bhe@redhat.com Cc: jroedel@suse.de Link: https://lkml.kernel.org/r/20180927071954.29615-2-lijiang@redhat.com
2018-10-03x86/mm: Fix typo in commentTakuya Yamamoto
Signed-off-by: Takuya Yamamoto <tkyymmt01@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20180829072730.988-1-tkyymmt01@gmail.com
2018-09-27x86/mm/cpa: Optimize __cpa_flush_range()Peter Zijlstra
If we IPI for WBINDV, then we might as well kill the entire TLB too. But if we don't have to invalidate cache, there is no reason not to use a range TLB flush. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180919085948.195633798@infradead.org
2018-09-27x86/mm/cpa: Factor common code between cpa_flush_*()Peter Zijlstra
The start of cpa_flush_range() and cpa_flush_array() is the same, use a common function. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180919085948.138859183@infradead.org
2018-09-27x86/mm/cpa: Move CLFLUSH test into cpa_flush_array()Peter Zijlstra
Rather than guarding cpa_flush_array() users with a CLFLUSH test, put it inside. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180919085948.087848187@infradead.org
2018-09-27x86/mm/cpa: Move CLFLUSH test into cpa_flush_range()Peter Zijlstra
Rather than guarding all cpa_flush_range() uses with a CLFLUSH test, put it inside. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180919085948.036195503@infradead.org
2018-09-27x86/mm/cpa: Use flush_tlb_kernel_range()Peter Zijlstra
Both cpa_flush_range() and cpa_flush_array() have a well specified range, use that to do a range based TLB invalidate. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180919085947.985193217@infradead.org
2018-09-27x86/mm/cpa: Unconditionally avoid WBINDV when we canPeter Zijlstra
CAT has happened, WBINDV is bad (even before CAT blowing away the entire cache on a multi-core platform wasn't nice), try not to use it ever. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180919085947.933674526@infradead.org
2018-09-27x86/mm/cpa: Move flush_tlb_all()Peter Zijlstra
There is an atom errata, where we do a local TLB invalidate right before we return and then do a global TLB invalidate. Move the global invalidate up a little bit and avoid the local invalidate entirely. This does put the global invalidate under pgd_lock, but that shouldn't matter. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180919085947.882287392@infradead.org
2018-09-27x86/mm/cpa: Use flush_tlb_all()Peter Zijlstra
Instead of open-coding it.. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180919085947.831102058@infradead.org
2018-09-27x86/mm/cpa: Avoid the 4k pages check completelyThomas Gleixner
The extra loop which tries hard to preserve large pages in case of conflicts with static protection regions turns out to be not preserving anything, at least not in the experiments which have been conducted. There might be corner cases in which the code would be able to preserve a large page oaccsionally, but it's really not worth the extra code and the cycles wasted in the common case. Before: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 541 2M pages sameprot: 466 2M pages preserved: 47 4K pages checked: 514 4K pages set-checked: 7668 After: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 538 2M pages sameprot: 466 2M pages preserved: 47 4K pages set-checked: 7668 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143546.589642503@linutronix.de
2018-09-27x86/mm/cpa: Do the range check earlyThomas Gleixner
To avoid excessive 4k wise checks in the common case do a quick check first whether the requested new page protections conflict with a static protection area in the large page. If there is no conflict then the decision whether to preserve or to split the page can be made immediately. If the requested range covers the full large page, preserve it. Otherwise split it up. No point in doing a slow crawl in 4k steps. Before: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 538 2M pages sameprot: 466 2M pages preserved: 47 4K pages checked: 560642 4K pages set-checked: 7668 After: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 541 2M pages sameprot: 466 2M pages preserved: 47 4K pages checked: 514 4K pages set-checked: 7668 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143546.507259989@linutronix.de
2018-09-27x86/mm/cpa: Optimize same protection checkThomas Gleixner
When the existing mapping is correct and the new requested page protections are the same as the existing ones, then further checks can be omitted and the large page can be preserved. The slow path 4k wise check will not come up with a different result. Before: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 540 2M pages sameprot: 466 2M pages preserved: 47 4K pages checked: 800709 4K pages set-checked: 7668 After: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 538 2M pages sameprot: 466 2M pages preserved: 47 4K pages checked: 560642 4K pages set-checked: 7668 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143546.424477581@linutronix.de
2018-09-27x86/mm/cpa: Add sanity check for existing mappingsThomas Gleixner
With the range check it is possible to do a quick verification that the current mapping is correct vs. the static protection areas. In case a incorrect mapping is detected a warning is emitted and the large page is split up. If the large page is a 2M page, then the split code is forced to check the static protections for the PTE entries to fix up the incorrectness. For 1G pages this can't be done easily because that would require to either find the offending 2M areas before the split or afterwards. For now just warn about that case and revisit it when reported. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143546.331408643@linutronix.de
2018-09-27x86/mm/cpa: Avoid static protection checks on unmapThomas Gleixner
If the new pgprot has the PRESENT bit cleared, then conflicts vs. RW/NX are completely irrelevant. Before: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 540 2M pages sameprot: 466 2M pages preserved: 47 4K pages checked: 800770 4K pages set-checked: 7668 After: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 540 2M pages sameprot: 466 2M pages preserved: 47 4K pages checked: 800709 4K pages set-checked: 7668 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143546.245849757@linutronix.de
2018-09-27x86/mm/cpa: Add large page preservation statisticsThomas Gleixner
The large page preservation mechanism is just magic and provides no information at all. Add optional statistic output in debugfs so the magic can be evaluated. Defaults is off. Output: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 540 2M pages sameprot: 466 2M pages preserved: 47 4K pages checked: 800770 4K pages set-checked: 7668 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143546.160867778@linutronix.de
2018-09-27x86/mm/cpa: Add debug mechanismThomas Gleixner
The whole static protection magic is silently fixing up anything which is handed in. That's just wrong. The offending call sites need to be fixed. Add a debug mechanism which emits a warning if a requested mapping needs to be fixed up. The DETECT debug mechanism is really not meant to be enabled except for developers, so limit the output hard to the protection fixups. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143546.078998733@linutronix.de
2018-09-27x86/mm/cpa: Allow range check for static protectionsThomas Gleixner
Checking static protections only page by page is slow especially for huge pages. To allow quick checks over a complete range, add the ability to do that. Make the checks inclusive so the ranges can be directly used for debug output later. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143545.995734490@linutronix.de
2018-09-27x86/mm/cpa: Rework static_protections()Thomas Gleixner
static_protections() is pretty unreadable. Split it up into separate checks for each protection area. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143545.913005317@linutronix.de