lwn.git/kernel/entry, branch master

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

2026-04-14T23:48:56+00:00

Pull arm64 updates from Catalin Marinas: "The biggest changes are MPAM enablement in drivers/resctrl and new PMU support under drivers/perf. On the core side, FEAT_LSUI lets futex atomic operations with EL0 permissions, avoiding PAN toggling. The rest is mostly TLB invalidation refactoring, further generic entry work, sysreg updates and a few fixes. Core features: - Add support for FEAT_LSUI, allowing futex atomic operations without toggling Privileged Access Never (PAN) - Further refactor the arm64 exception handling code towards the generic entry infrastructure - Optimise __READ_ONCE() with CONFIG_LTO=y and allow alias analysis through it Memory management: - Refactor the arm64 TLB invalidation API and implementation for better control over barrier placement and level-hinted invalidation - Enable batched TLB flushes during memory hot-unplug - Fix rodata=full block mapping support for realm guests (when BBML2_NOABORT is available) Perf and PMU: - Add support for a whole bunch of system PMUs featured in NVIDIA's Tegra410 SoC (cspmu extensions for the fabric and PCIe, new drivers for CPU/C2C memory latency PMUs) - Clean up iomem resource handling in the Arm CMN driver - Fix signedness handling of AA64DFR0.{PMUVer,PerfMon} MPAM (Memory Partitioning And Monitoring): - Add architecture context-switch and hiding of the feature from KVM - Add interface to allow MPAM to be exposed to user-space using resctrl - Add errata workaround for some existing platforms - Add documentation for using MPAM and what shape of platforms can use resctrl Miscellaneous: - Check DAIF (and PMR, where relevant) at task-switch time - Skip TFSR_EL1 checks and barriers in synchronous MTE tag check mode (only relevant to asynchronous or asymmetric tag check modes) - Remove a duplicate allocation in the kexec code - Remove redundant save/restore of SCS SP on entry to/from EL0 - Generate the KERNEL_HWCAP_ definitions from the arm64 hwcap descriptions - Add kselftest coverage for cmpbr_sigill() - Update sysreg definitions" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (109 commits) arm64: rsi: use linear-map alias for realm config buffer arm64: Kconfig: fix duplicate word in CMDLINE help text arm64: mte: Skip TFSR_EL1 checks and barriers in synchronous tag check mode arm64/sysreg: Update ID_AA64SMFR0_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64ZFR0_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64FPFR0_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64ISAR2_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64ISAR0_EL1 description to DDI0601 2025-12 arm64/hwcap: Generate the KERNEL_HWCAP_ definitions for the hwcaps arm64: kexec: Remove duplicate allocation for trans_pgd ACPI: AGDI: fix missing newline in error message arm64: Check DAIF (and PMR) at task-switch time arm64: entry: Use split preemption logic arm64: entry: Use irqentry_{enter_from,exit_to}_kernel_mode() arm64: entry: Consistently prefix arm64-specific wrappers arm64: entry: Don't preempt with SError or Debug masked entry: Split preemption from irqentry_exit_to_kernel_mode() entry: Split kernel mode logic from irqentry_{enter,exit}() entry: Move irqentry_enter() prototype later entry: Remove local_irq_{enable,disable}_exit_to_user() ...

entry: Split kernel mode logic from irqentry_{enter,exit}()

2026-04-08T09:43:32+00:00

The generic irqentry code has entry/exit functions specifically for exceptions taken from user mode, but doesn't have entry/exit functions specifically for exceptions taken from kernel mode. It would be helpful to have separate entry/exit functions specifically for exceptions taken from kernel mode. This would make the structure of the entry code more consistent, and would make it easier for architectures to manage logic specific to exceptions taken from kernel mode. Move the logic specific to kernel mode out of irqentry_enter() and irqentry_exit() into new irqentry_enter_from_kernel_mode() and irqentry_exit_to_kernel_mode() functions. These are marked __always_inline and placed in irq-entry-common.h, as with irqentry_enter_from_user_mode() and irqentry_exit_to_user_mode(), so that they can be inlined into architecture-specific wrappers. The existing out-of-line irqentry_enter() and irqentry_exit() functions retained as callers of the new functions. The lockdep assertion from irqentry_exit() is moved into irqentry_exit_to_user_mode() and irqentry_exit_to_kernel_mode(). This was previously missing from irqentry_exit_to_user_mode() when called directly, and any new lockdep assertion failure relating from this change is a latent bug. Aside from the lockdep change noted above, there should be no functional change as a result of this change. [ tglx: Updated kernel doc ] Signed-off-by: Mark Rutland Signed-off-by: Thomas Gleixner Reviewed-by: Jinjie Ruan Acked-by: Peter Zijlstra (Intel) Link: https://patch.msgid.link/20260407131650.3813777-5-mark.rutland@arm.com

entry: Remove local_irq_{enable,disable}_exit_to_user()

2026-04-08T09:43:31+00:00

local_irq_enable_exit_to_user() and local_irq_disable_exit_to_user() are never overridden by architecture code, and are always equivalent to local_irq_enable() and local_irq_disable(). These functions were added on the assumption that arm64 would override them to manage 'DAIF' exception masking, as described by Thomas Gleixner in these threads: https://lore.kernel.org/all/20190919150809.340471236@linutronix.de/ https://lore.kernel.org/all/alpine.DEB.2.21.1910240119090.1852@nanos.tec.linutronix.de/ In practice arm64 did not need to override either. Prior to moving to the generic irqentry code, arm64's management of DAIF was reworked in commit: 97d935faacde ("arm64: Unmask Debug + SError in do_notify_resume()") Since that commit, arm64 only masks interrupts during the 'prepare' step when returning to user mode, and masks other DAIF exceptions later. Within arm64_exit_to_user_mode(), the arm64 entry code is as follows: local_irq_disable(); exit_to_user_mode_prepare_legacy(regs); local_daif_mask(); mte_check_tfsr_exit(); exit_to_user_mode(); Remove the unnecessary local_irq_enable_exit_to_user() and local_irq_disable_exit_to_user() functions. Signed-off-by: Mark Rutland Signed-off-by: Thomas Gleixner Reviewed-by: Jinjie Ruan Acked-by: Peter Zijlstra (Intel) Link: https://patch.msgid.link/20260407131650.3813777-3-mark.rutland@arm.com

entry: Prepare for deferred hrtimer rearming

2026-02-27T15:40:13+00:00

The hrtimer interrupt expires timers and at the end of the interrupt it rearms the clockevent device for the next expiring timer. That's obviously correct, but in the case that a expired timer sets NEED_RESCHED the return from interrupt ends up in schedule(). If HRTICK is enabled then schedule() will modify the hrtick timer, which causes another reprogramming of the hardware. That can be avoided by deferring the rearming to the return from interrupt path and if the return results in a immediate schedule() invocation then it can be deferred until the end of schedule(), which avoids multiple rearms and re-evaluation of the timer wheel. As this is only relevant for interrupt to user return split the work masks up and hand them in as arguments from the relevant exit to user functions, which allows the compiler to optimize the deferred handling out for the syscall exit to user case. Add the rearm checks to the approritate places in the exit to user loop and the interrupt return to kernel path, so that the rearming is always guaranteed. In the return to user space path this is handled in the same way as TIF_RSEQ to avoid extra instructions in the fast path, which are truly hurtful for device interrupt heavy work loads as the extra instructions and conditionals while benign at first sight accumulate quickly into measurable regressions. The return from syscall path is completely unaffected due to the above mentioned split so syscall heavy workloads wont have any extra burden. For now this is just placing empty stubs at the right places which are all optimized out by the compiler until the actual functionality is in place. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Signed-off-by: Peter Zijlstra (Intel) Link: https://patch.msgid.link/20260224163431.066469985@kernel.org

Merge branch 'core/entry' into sched/core

2026-01-30T14:40:05+00:00

Pull the entry update to avoid merge conflicts with the time slice extension changes. Signed-off-by: Thomas Gleixner

entry: Inline syscall_exit_work() and syscall_trace_enter()

2026-01-30T14:38:10+00:00

After switching ARM64 to the generic entry code, a syscall_exit_work() appeared as a profiling hotspot because it is not inlined. Inlining both syscall_trace_enter() and syscall_exit_work() provides a performance gain when any of the work items is enabled. With audit enabled this results in a ~4% performance gain for perf bench basic syscall on a kunpeng920 system: | Metric | Baseline | Inlined | Change | | ---------- | ----------- | ----------- | ------ | | Total time | 2.353 [sec] | 2.264 [sec] | ↓3.8% | | usecs/op | 0.235374 | 0.226472 | ↓3.8% | | ops/sec | 4,248,588 | 4,415,554 | ↑3.9% | Small gains can be observed on x86 as well, though the generated code optimizes for the work case, which is counterproductive for high performance scenarios where such entry/exit work is usually avoided. Avoid this by marking the work check in syscall_enter_from_user_mode_work() unlikely, which is what the corresponding check in the exit path does already. [ tglx: Massage changelog and add the unlikely() ] Signed-off-by: Jinjie Ruan Signed-off-by: Thomas Gleixner Link: https://patch.msgid.link/20260128031934.3906955-14-ruanjinjie@huawei.com

entry: Add arch_ptrace_report_syscall_entry/exit()

2026-01-30T14:38:09+00:00

ARM64 requires a architecture specific ptrace wrapper as it needs to save and restore scratch registers. Provide arch_ptrace_report_syscall_entry/exit() wrappers which fall back to ptrace_report_syscall_entry/exit() if the architecture does not provide them. No functional change intended. [ tglx: Massaged changelog and comments ] Suggested-by: Mark Rutland Suggested-by: Thomas Gleixner Signed-off-by: Jinjie Ruan Signed-off-by: Thomas Gleixner Reviewed-by: Kevin Brodsky Link: https://patch.msgid.link/20260128031934.3906955-11-ruanjinjie@huawei.com

entry: Remove unused syscall argument from syscall_trace_enter()

2026-01-30T14:38:09+00:00

The 'syscall' argument of syscall_trace_enter() is immediately overwritten before any real use and serves only as a local variable, so drop the parameter. No functional change intended. Signed-off-by: Jinjie Ruan Signed-off-by: Thomas Gleixner Link: https://patch.msgid.link/20260128031934.3906955-2-ruanjinjie@huawei.com

entry: Hook up rseq time slice extension

2026-01-22T10:11:19+00:00

Wire the grant decision function up in exit_to_user_mode_loop() Signed-off-by: Thomas Gleixner Signed-off-by: Peter Zijlstra (Intel) Link: https://patch.msgid.link/20251215155709.258157362@linutronix.de

rseq: Implement syscall entry work for time slice extensions

2026-01-22T10:11:18+00:00

The kernel sets SYSCALL_WORK_RSEQ_SLICE when it grants a time slice extension. This allows to handle the rseq_slice_yield() syscall, which is used by user space to relinquish the CPU after finishing the critical section for which it requested an extension. In case the kernel state is still GRANTED, the kernel resets both kernel and user space state with a set of sanity checks. If the kernel state is already cleared, then this raced against the timer or some other interrupt and just clears the work bit. Doing it in syscall entry work allows to catch misbehaving user space, which issues an arbitrary syscall, i.e. not rseq_slice_yield(), from the critical section. Contrary to the initial strict requirement to use rseq_slice_yield() arbitrary syscalls are not considered a violation of the ABI contract anymore to allow onion architecture applications, which cannot control the code inside a critical section, to utilize this as well. If the code detects inconsistent user space that result in a SIGSEGV for the application. If the grant was still active and the task was not preempted yet, the work code reschedules immediately before continuing through the syscall. Signed-off-by: Thomas Gleixner Signed-off-by: Peter Zijlstra (Intel) Link: https://patch.msgid.link/20251215155709.005777059@linutronix.de