summaryrefslogtreecommitdiff
path: root/arch/s390/include
AgeCommit message (Collapse)Author
45 hoursMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "arm64: This is a bit of an odd merge window on the KVM/arm64 front. There is absolutely no new feature in the pull request. It is purely fixes, because it is simply becoming too hard to review new stuff when so many AI-fuelled fixes hit the list. - Significant cleanup of the vgic-v5 PPI support which was merged in 7.1. This makes the code more maintainable, and squashes a couple of bugs in the meantime - Set of fixes for the handling of the MMU in an NV context, particularly VNCR-triggered faults. S1POE support is fixed as well - Large set of pKVM fixes, mostly addressing recurring issues around hypervisor tracking of donated pages in obscure cases where the donation could fail and leave things in a bizarre state - Fixes for the so-called "lazy vgic init", which resulted in sleeping operations in non-preemptible sections. This turned out to be far more invasive than initially expected.. - Reduce the overhead of L1/L2 context switch by not touching the FP registers - Fix the way non-implemented page sizes are dealt with when a guest insist on using them for S2 translation - The usual set of low-impact fixes and cleanups all over the map Loongarch: - On a request for lazy FPU load, load all FPU state that the VM supports instead of enabling only the part (FPU, LSX or LASX) that caused the FPU load request - Some enhancements about interrupt injection - Some bug fixes and other small changes RISC-V: - Batch G-stage TLB flushes for GPA range based page table updates - Convert HGEI line management to fully per-HART - Fix missing CSR dirty marking when FWFT state updated via ONE_REG - Fix stale FWFT feature exposure to Guest/VM - Speed up dirty logging write faults using MMU rwlock and atomic PTE updates using cmpxchg() for permission-only changes - Use flexible array for APLIC IRQ state - Use kvm_slot_dirty_track_enabled() for logging enable check on a memslot - Avoid skipping valid pages in kvm_riscv_gstage_wp_range() - Avoid skipping valid pages in kvm_riscv_gstage_unmap_range() - Use endian-specific __lelong for NACL shared memory S390: - KVM_PRE_FAULT_MEMORY support - Support for 2G hugepages - Support for the ASTFLEIE 2 facility - Support for fast inject using kvm_arch_set_irq_inatomic - Fix potential leak of uninitialized bytes - A few more misc gmap fixes x86: - Generic support for the more granular permissions allowed by EPT, namely "read" (which was previously usurping the U bit) and separate execution bits for kernel and userspace - Do not assume that all page tables start with U=1/W=1/NX=0 at the root, as AMD GMET needs to have U=0 at the root - Introduce common assembly macros for use within Intel and AMD vendor-specific vmentry code. This touches the SPEC_CTRL handling, which is now entirely done in assembly for Intel (by reusing the AMD code that already existed), and register save/restore which uses some macro magic to compute the offsets in the struct. Both of these are preparatory changes for upcoming APX support - Clean up KVM's register tracking and storage, primarily to prepare for APX support, which expands the maximum number of GPRs from 16 to 32 - Keep a single copy of the PDPTRs rather than two, since architecturally there is just one - Handle EXIT_FASTPATH_EXIT_USERSPACE in vendor code to ensure vendor code gets a chance to handle things like reaping the PML buffer - Update KVM's view of PV async enabling if and only if the MSR write fully succeeds - Fix a variety of issues where the emulator doesn't honor guest-debug state, and clean up related code along the way - Synthesize EPT Violation and #NPF "error code" bits when injecting faults into L1 that didn't originate in hardware (in which case the VMCS/VMCB doesn't hold relevant information) - Add support for virtualizing (well, emulating) AMD's flavor of CPL>0 CPUID faulting - Clean up the GPR APIs so that KVM's use of "raw" is consistent, and fix a variety of minor bugs along the way - Fix an OOB memory access due to not checking the VP ID when handling a Hyper-V PV TLB flush for L2 - Fix a bug in the mediated PMU's handling of fixed counters that allowed the guest to bypass the PMU event filter - Allow userspace to return EAGAIN when handling SNP and TDX hypercalls, so the KVM can forward a "retry" status code to the guest, and reserve all unused error codes for future usage - Overhaul the TDP MMU => S-EPT code to move as much S-EPT specific logic as possible into the TDX code, and to funnel (almost) all S-EPT updates into a single chokepoint. The motivation is largely to prepare for upcoming Dynamic PAMT support, but the cleanups are nice to have on their own - Plug a hole in shadow page table handling, where KVM fails to recursively zap nested EPT/NPT shadow page tables when the nested hypervisor tears down its own EPT/NPT page tables from the bottom up x86 (Intel): - Support for nested MBEC (Mode-Based Execute Control), see above in the generic section; also run with MBEC enabled even for non-nested mode - Use the kernel's "enum pg_level" in the TDX APIs instead of the TDX-Module's level definitions (which are 0-based) - Rework the TDX memory APIs to not require/assume that guest memory is backed by "struct page" (in prepartion for guest_memfd hugepage support) - Fix a largely benign bug where KVM TDX would incorrectly state it could emulate several x2APIC MSRs - Use the "safe" WRMSR API when proxying LBR MSR writes as the to-be-written value is guest controlled and completely unvalidated x86 (AMD): - Support for nested GMET (Guest Mode Execution Trap), see above in the generic section; also run with GMET enabled even for non-nested mode - Fixes and minor cleanups to GHCB handling, on top of the earlier work already merged into 7.1-rc - Ensure KVM's copy of CR0 and CR3 are up-to-date prior to invoking fastpath handlers - Add support for virtualizing gPAT (KVM previously just used L1's PAT when running L2) - Fix goofs where KVM mishandles side effects (e.g. single-step and PMC updates) when emulating VMRUN - Fix a variety of bugs in AVIC's handling of x2APIC MSR interception, most notably where KVM didn't disable interception of IRR, ISR, and TMR regs - Add support for virtualizing Host-Only/Guest-Only bits in the mediated PMU - Don't advertise support for unusable VM types, and account for VM types that are disabled by firmware, e.g. to mitigate security vulnerabilities - Rewrite the SEV {en,de}crypt debug ioctls as they were riddle with bugs and unnecessarily complicated, and add comprehensive tests - Clean up and deduplicate the SEV page pinning code - Fix minor goofs related to writing back CPUID information after firmware rejects a CPUID page for an SNP vCPU Generic: - Rename invalidate_begin() to invalidate_start() throughout KVM to follow the kernel's nomenclature, e.g. for mmu_notifiers - Use guard() to cleanup up various KVM+VFIO flows - Minor cleanups guest_memfd: - Return -EEXIST instead of -EINVAL if userspace attempts to bind a gmem range to multiple memslots, and fix the test that was supposed to ensure KVM returns -EEXIST - Treat memslot binding offsets and sizes as unsigned values to fix a bug where KVM interprets a large "offset + size" as a negative value and allows a nonsensical offset - Use the inode number instead of the page offset for the NUMA interleaving index to fix a bug where the effective index would jump by two for consecutive pages (the caller also adds in the page offset) Selftests: - Randomize the dirty log test's delay when reaping the bitmap on the first pass, as always waiting only 1ms hid a KVM RISC-V bug as the test reaped the bitmap before KVM could build up enough state to hit the bug - A pile of one-off fixes and cleanups" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (326 commits) KVM: x86/mmu: Ensure hugepage is in by slot before checking max mapping level KVM: x86: Fix shadow paging use-after-free due to unexpected role KVM: s390: Introducing kvm_arch_set_irq_inatomic fast inject KVM: s390: Enable adapter_indicators_set to use mapped pages KVM: s390: Add map/unmap ioctl and clean mappings post-guest riscv: kvm: Use endian-specific __lelong for NACL shared memory KVM: selftests: access_tracking_perf_test: bump number of NUMA nodes to 32 KVM: s390: vsie: Implement ASTFLEIE facility 2 KVM: s390: vsie: Refactor handle_stfle s390/sclp: Detect ASTFLEIE 2 facility KVM: s390: Minor refactor of base/ext facility lists KVM: x86/mmu: move pdptrs out of the MMU KVM: x86: check that kvm_handle_invpcid is only invoked with shadow paging KVM: nSVM: invalidate cached PDPTRs across nested NPT transitions KVM: nVMX: remove unnecessary code in prepare_vmcs02_rare KVM: x86: remove nested_mmu from mmu_is_nested() KVM: arm64: vgic-its: Make ABI commit helpers return void KVM: s390: Initialize KVM_S390_GET_CMMA_BITS memory LoongArch: KVM: Add missing slots_lock for device register/unregister LoongArch: KVM: Validate irqchip index in irqfd routing ...
6 daysMerge tag 's390-7.2-1' of ↵Linus Torvalds
gitolite.kernel.org:pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Alexander Gordeev: - Use CIO device online variable instead of the internal FSM state to determine device availability during purge operations - Remove extra check of task_stack_page() because try_get_task_stack() already takes care of that when reading /proc/<pid>/wchan - Allow user-space to use the new SCLP action qualifier 4 for to provide NVMe SMART log data to the platform. - Send AP CHANGE uevents on successful bind and successful association to notify user-space about SE operations on AP queue devices - Add an s390dbf kernel parameter to configure debug log levels and area sizes during early boot - On arm64 the empty zero page is going to be mapped read-only. Do the same for s390 with an explicit set_memory_ro() call - Improve s390-specific bcr_serialize() and cpu_relax() implementations - Remove all unused variables to avoid allmodconfig W=1 build fails with latest clang-23 - Cleanup default Kconfig values for s390 selftests - Add a s390-tod trace clock to allow comparing trace timestamps between different systems or virtual machines on s390 - Remove the s390 implementation of strlcat() in favor of the generic variant - Make consistent the calling order between page_table_check_pte_clear() and secure page conversion across all code paths - Rearrange some fields within AP and zcrypt structs to reduce memory consumption and unused holes - Shorten GR_NUM and VX_NUM macros and move them to a separate header - Replace __get_free_page() with kmalloc() in few sources - Introduce an infrastructure for more efficient this_cpu operations. Eliminate conditional branches when PREEMPT_NONE is removed - Enable Rust support - Use z10 as minimum architecture level, similar to the boot code, to enforce a defined architecture level set - Improve and convert various mem*() helper functions to C. For that add .noinstr.text section to avoid orphaned warnings from the linker - Fix the function pointer type in __ret_from_fork() to correct the indirect call to match kernel thread return type of int - Revert support for DCACHE_WORD_ACCESS to avoid an endless exception loop on read from donated Ultravisor pages at unaligned addresses * tag 's390-7.2-1' of gitolite.kernel.org:pub/scm/linux/kernel/git/s390/linux: (52 commits) s390: Revert support for DCACHE_WORD_ACCESS s390/process: Fix kernel thread function pointer type s390/tishift: Convert __ashlti3(), __ashrti3(), __lshrti3() to C s390/memmove: Optimize backward copy case s390/string: Convert memset(16|32|64)() to C s390/string: Convert memcpy() to C s390/string: Convert memset() to C s390/string: Convert memmove() to C s390/string: Add -ffreestanding compile option to string.o s390: Add .noinstr.text to boot and purgatory linker scripts s390/purgatory: Enforce z10 minimum architecture level s390: Enable Rust support s390/cmpxchg: Fix KASAN stack-out-of-bounds in atomic helpers rust: helpers: Add memchr wrapper for string operations rust/bindgen_parameters: Mark s390 types as opaque to prevent repr conflicts s390/jump_label: Implement ARCH_STATIC_BRANCH_JUMP_ASM and ARCH_STATIC_BRANCH_ASM macros s390/bug: Provide ARCH_WARN_ASM for Rust WARN/BUG support s390/ap: Fix locking issue in SE bind and associate sysfs functions s390/percpu: Provide arch_this_cpu_write() implementation s390/percpu: Provide arch_this_cpu_read() implementation ...
6 daysMerge tag 'kvm-s390-next-7.2-1' of ↵Paolo Bonzini
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: New features for 7.2 New features for 7.2 for KVM/s390: * KVM_PRE_FAULT_MEMORY support * Support for 2G hugepages * Support for the ASTFLEIE 2 facility * kvm_arch_set_irq_inatomic Fast Inject * Fix potential leak of uninitialized bytes
6 daysKVM: s390: Introducing kvm_arch_set_irq_inatomic fast injectDouglas Freimuth
s390 needs a fast path for irq injection, and along those lines we introduce kvm_arch_set_irq_inatomic. Instead of placing all interrupts on the global work queue as it does today, this patch provides a fast path for irq injection. The inatomic fast path cannot lose control since it is running with interrupts disabled. This meant making the following changes that exist on the slow path today. First, the adapter_indicators page needs to be mapped since it is accessed with interrupts disabled, so we added map/unmap functions. Second, access to shared resources between the fast and slow paths needed to be changed from mutex and semaphores to spin_lock's. Finally, the memory allocation on the slow path utilizes GFP_KERNEL_ACCOUNT but we had to implement the fast path with GFP_ATOMIC allocation. Each of these enhancements were required to prevent blocking on the fast inject path. Fencing of Fast Inject in Secure Execution environments is enabled in the patch series by not mapping adapter indicator pages. In Secure Execution environments the path of execution available before this patch is followed. Statistical counters have been added to enable analysis of irq injection on the fast path and slow path including io_390_inatomic, io_flic_inject_airq, io_set_adapter_int and io_390_inatomic_no_inject. The no inject counter captures adapter masked, coalesced and suppressed interrupts. Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Douglas Freimuth <freimuth@linux.ibm.com> Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20260604192755.203143-4-freimuth@linux.ibm.com>
6 daysKVM: s390: Add map/unmap ioctl and clean mappings post-guestDouglas Freimuth
s390 needs map/unmap ioctls, which map the adapter set indicator pages, so the pages can be accessed when interrupts are disabled. The mappings are cleaned up when the guest is removed. pin_user_pages_remote is used for both the ioctl as well as the pin-on-demand logic in adapter_indicators_set(). Map/Unmap ioctls are fenced in order to avoid the longterm pinning in Secure Execution environments. In Secure Execution environments the path of execution available before this patch is followed. Statistical counters to count map/unmap functions for adapter indicator pages are added. The counters can be used to analyze map/unmap functions in non-Secure Execution environments and similarly can be used to analyze Secure Execution environments where the counters will not be incremented as the adapter indicator pages are not mapped. Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Douglas Freimuth <freimuth@linux.ibm.com> Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20260604192755.203143-2-freimuth@linux.ibm.com>
6 daysMerge tag 'timers-nohz-2026-06-13' of ↵Linus Torvalds
gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip Pull NOHZ updates from Thomas Gleixner: - Fix a long standing TOCTOU in get_cpu_sleep_time_us() - Make the CPU offline NOHZ handling more robust by disabling NOHZ on the outgoing CPU early instead of creating unneeded state which needs to be undone. - Unify idle CPU time accounting instead of having two different accounting mechanisms. These two different mechanisms are not really independent, but the different properties can in the worst case cause that gloabl idle time can be observed going backwards. - Consolidate the idle/iowait time retrieval interfaces instead of converting back and forth between them. - Make idle interrupt time accounting more robust. The original code assumes that interrupt time accouting is enabled and therefore stops elapsing idle time while an interrupt is handled in NOHZ dyntick state. That assumption is not correct as interrupt time accounting can be disabled at compile and runtime. - Fix an accounting error between dyntick idle time and dyntick idle steal time. The stolen time is not accounted and therefore idle time becomes inaccurate. The stolen time is now accounted after the fact as there is no way to predict the steal time upfront. * tag 'timers-nohz-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip: sched/cputime: Handle dyntick-idle steal time correctly sched/cputime: Handle idle irqtime gracefully sched/cputime: Provide get_cpu_[idle|iowait]_time_us() off-case tick/sched: Consolidate idle time fetching APIs tick/sched: Account tickless idle cputime only when tick is stopped tick/sched: Remove unused fields tick/sched: Move dyntick-idle cputime accounting to cputime code tick/sched: Remove nohz disabled special case in cputime fetch tick/sched: Unify idle cputime accounting s390/time: Prepare to stop elapsing in dynticks-idle powerpc/time: Prepare to stop elapsing in dynticks-idle sched/cputime: Correctly support generic vtime idle time sched/cputime: Remove superfluous and error prone kcpustat_field() parameter sched/idle: Handle offlining first in idle loop tick/sched: Fix TOCTOU in nohz idle time fetch
9 daysKVM: s390: vsie: Implement ASTFLEIE facility 2Nina Schoetterl-Glausch
Implement shadowing of format-2 facility list when running in VSIE. ASTFLEIE2 is available since IBM z16. To function G1 has to run this KVM code and G1 and G2 have to run QEMU with ASTFLEIE2 support. Signed-off-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com> Co-developed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Signed-off-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> [imbrenda@linux.ibm.com: Fix typo in comment] Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20260612-vsie-alter-stfle-fac-v4-4-74f0e1559929@linux.ibm.com>
9 daysKVM: s390: vsie: Refactor handle_stfleNina Schoetterl-Glausch
Use switch case in anticipation of handling format-1 and format-2 facility list designations in the future. As the alternate STFLE facilities are not enabled, only case 0 is possible. No functional change intended. Signed-off-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com> Co-developed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Signed-off-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20260612-vsie-alter-stfle-fac-v4-3-74f0e1559929@linux.ibm.com>
9 dayss390/sclp: Detect ASTFLEIE 2 facilityNina Schoetterl-Glausch
Detect alternate STFLE interpretive execution facility 2. Signed-off-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com> Signed-off-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20260612-vsie-alter-stfle-fac-v4-2-74f0e1559929@linux.ibm.com>
9 dayss390: Revert support for DCACHE_WORD_ACCESSHeiko Carstens
load_unaligned_zeropad() reads eight bytes from unaligned addresses and may cross page boundaries. It handles exceptions which may happen if reading from the second page results in an exception. For pages which are donated to the Ultravisor for secure execution purposes the do_secure_storage_access() exception handler however does not handle such exceptions correctly. Such an exception may result in an endless exception loop which will never be resolved. An attempt to fix this [1] turned out to be not sufficient. For now revert load_unaligned_zeropad() until this problem has been resolved in a proper way. Note that the implementation of load_unaligned_zeropad() itself is correct. The revert is just a temporary workaround until there is complete fix for secure storage access exceptions. [1] commit b00be77302d7 ("s390/mm: Add missing secure storage access fixups for donated memory") Fixes: 802ba53eefc5 ("s390: add support for DCACHE_WORD_ACCESS") Cc: stable@vger.kernel.org Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
11 dayss390/tishift: Convert __ashlti3(), __ashrti3(), __lshrti3() to CHeiko Carstens
There is no reason to have __ashlti3(), __ashrti3(), and __lshrti3() implemented in assembler. Convert them all to C, which allows the compiler to optimize the code if newer instructions allow that. Reviewed-by: Juergen Christ <jchrist@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
11 daysMerge branch 'rust-for-s390' into featuresAlexander Gordeev
Jan Polensky says: =================== Rust support on s390 requires a small set of architecture-specific pieces before the generic Rust kernel infrastructure can be used. The series wires up s390 as a Rust-capable 64-bit architecture, adds the missing assembly interfaces needed by Rust for WARN/BUG reporting and for static branches, adjusts bindgen parameters to avoid repr layout conflicts caused by packed and aligned s390 structures, and fixes issues discovered during testing. s390 currently requires rustc with support for -Zpacked-stack, and the minimum tool version gating is adjusted accordingly. Link: https://github.com/Rust-for-Linux/linux/issues/2 Tested against: rustc 1.96.0 (ac68faa20 2026-05-25) =================== Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
11 dayss390/cmpxchg: Fix KASAN stack-out-of-bounds in atomic helpersJan Polensky
The __arch_cmpxchg1, __arch_cmpxchg2, __arch_xchg1, and __arch_xchg2 functions emulate 1-byte and 2-byte atomic operations using 4-byte cmpxchg instructions, since s390 lacks native 1/2-byte cmpxchg support. When KASAN is enabled, the READ_ONCE() operations in these functions trigger stack-out-of-bounds warnings because they perform 4-byte reads when only 1 or 2 bytes should be accessed. Mark these functions as __no_sanitize_or_inline to prevent KASAN instrumentation while maintaining correct functionality. This resolves the following KASAN error during rust_atomics KUnit tests: BUG: KASAN: stack-out-of-bounds in rust_helper_atomic_i8_xchg+0xb2/0xc0 Read of size 4 at addr 001bff7ffdbefcf0 by task kunit_try_catch/142 Reported-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Link: https://lore.kernel.org/rust-for-linux/CANiq72m4GVWFYqnxNtCHTPu7XcGewHB5LNwOoayTfnXs9pPbNg@mail.gmail.com/ Suggested-by: Gary Guo <gary@garyguo.net> Link: https://lore.kernel.org/rust-for-linux/DITFTAVVHTNQ.380OHUHGTOI6M@garyguo.net/ Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Gary Guo <gary@garyguo.net> Signed-off-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
11 dayss390/jump_label: Implement ARCH_STATIC_BRANCH_JUMP_ASM and ↵Jan Polensky
ARCH_STATIC_BRANCH_ASM macros Rust static branch support needs the s390 jump label instruction sequence and __jump_table emission in a reusable form. The current implementation embeds the sequence directly in the C asm goto blocks, which cannot be shared with Rust. Introduce ARCH_STATIC_BRANCH_ASM and ARCH_STATIC_BRANCH_JUMP_ASM to describe the brcl sequences for the likely-false and likely-true cases and to emit the same __jump_table entries as before. Switch the existing C helpers to use the new macros to avoid duplication without changing the generated code. Acked-by: Gary Guo <gary@garyguo.net> Acked-by: Miguel Ojeda <ojeda@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
11 dayss390/bug: Provide ARCH_WARN_ASM for Rust WARN/BUG supportJan Polensky
Rust WARN and BUG support relies on ARCH_WARN_ASM to emit __bug_table entries. On s390 the macro is missing, so Rust code cannot generate proper WARN/BUG metadata for the kernel's bug reporting infrastructure. Define ARCH_WARN_ASM to produce the same assembly sequence and __bug_table entry format as the existing s390 BUG handling, including the monitor call. Define ARCH_WARN_REACHABLE as empty since s390 does not provide reachability analysis for warning paths. Acked-by: Gary Guo <gary@garyguo.net> Acked-by: Miguel Ojeda <ojeda@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2026-06-05Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "arm64: - Correctly drop the ITS translation cache reference when it actually gets invalidated - Take the SRCU lock for SW page table walks - Restore POR_EL0 access to host EL0, avoiding POR_EL0 becoming inaccessible from EL0 after running a guest - Reassign nested_mmus array behind mmu_lock, ensuring that vcpu init and MMU notifiers are mutually exclusive - Correctly handle FEAT_XNX at stage-2 s390: - More fixes for the new page table management and nested virtualization x86: - More fixes for GHCB issues: - Read start/end indices of page size change requests exactly once per vmexit - Unmap and unpin the GHCB as needed on vCPU free" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (23 commits) KVM: arm64: Correctly identify executable PTEs at stage-2 KVM: arm64: nv: Fix handling of XN[0] when !FEAT_XNX KVM: arm64: Reassign nested_mmus array behind mmu_lock KVM: arm64: Restore POR_EL0 access to host EL0 KVM: arm64: Take the SRCU lock for page table walks in fault injection and AT emulation KVM: arm64: vgic-its: Drop the translation cache reference only for the erased entry KVM: SEV: Unmap and unpin the GHCB as needed on vCPU free KVM: SEV: Decouple the need to sync the GHCB SA from the need to free the SA KVM: SEV: Move sev_free_vcpu() down below sev_es_unmap_ghcb() KVM: Don't WARN if memory is dirtied without a vCPU when the VM is dying KVM: SEV: Read start/end indices of PSC requests exactly once per #VMGEXIT KVM: SEV: Add an anonymous "psc" struct to track current PSC metadata KVM: SEV: Make it more obvious when KVM is writing back the current PSC index KVM: s390: Remove ptep_zap_softleaf_entry() KVM: s390: Fix possible reference leak in fault-in code KVM: s390: Prevent memslots outside the ASCE range KVM: s390: Lock pte when making page secure KVM: s390: Fix fault-in code KVM: s390: vsie: Fix rmap handling in _do_shadow_crste() KVM: s390: Fix guest / virtual address confusion in _essa_clear_cbrl() ...
2026-06-04Merge tag 's390-7.1-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Alexander Gordeev: - Enable IOMMUFD and VFIO cdev such that PCI pass-through to QEMU/KVM can optionally utilize native IOMMUFD - With HAVE_ARCH_BUG_FORMAT enabled the BUG infrastructure might misinterpret flags or fault. Fix this by moving the "format" field emission into __BUG_ENTRY() - The generic version of _THIS_IP_ is known to be brittle and may break with current and future GCC and Clang optimizations. Fix it by overriding _THIS_IP_ * tag 's390-7.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390: Implement _THIS_IP_ using inline asm s390/bug: Always emit format word in __BUG_ENTRY s390/configs: Enable IOMMUFD and VFIO cdev in defconfigs
2026-06-03s390/percpu: Provide arch_this_cpu_write() implementationHeiko Carstens
Provide an s390 specific implementation of arch_this_cpu_write() instead of the generic variant. The generic variant uses a quite expensive raw_local_irq_save() / raw_local_irq_restore() pair. Get rid of this by providing an own variant which makes use of the new percpu code section infrastructure. With this the text size of the kernel image is reduced by ~1k (defconfig). Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2026-06-03s390/percpu: Provide arch_this_cpu_read() implementationHeiko Carstens
Provide an s390 specific implementation of arch_this_cpu_read() instead of the generic variant. The generic variant uses preempt_disable() / preempt_enable() pair and READ_ONCE(). Get rid of the preempt_disable() / preempt_enable() pairs by providing an own variant which makes use of the new percpu code section infrastructure. With this the text size of the kernel image is reduced by ~1k (defconfig). Also 87 generated preempt_schedule_notrace() function calls within the kernel image (modules not counted) are removed. Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2026-06-03s390/percpu: Use new percpu code section for arch_this_cpu_[and|or]()Heiko Carstens
Convert arch_this_cpu_[and|or]() to make use of the new percpu code section infrastructure. There is no user of this_cpu_and() and only one user of this_cpu_or() within the kernel. Therefore this conversion has hardly any effect, and also removes only preempt_schedule_notrace() function call. Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2026-06-03s390/percpu: Use new percpu code section for arch_this_cpu_add_return()Heiko Carstens
Convert arch_this_cpu_add_return() to make use of the new percpu code section infrastructure. With this the text size of the kernel image is reduced by ~4k (defconfig). Also 66 generated preempt_schedule_notrace() function calls within the kernel image (modules not counted) are removed. Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2026-06-03s390/percpu: Use new percpu code section for arch_this_cpu_add()Heiko Carstens
Convert arch_this_cpu_add() to make use of the new percpu code section infrastructure. With this the text size of the kernel image is reduced by ~76kb (defconfig). Also more than 5300 generated preempt_schedule_notrace() function calls within the kernel image (modules not counted) are removed. With: DEFINE_PER_CPU(long, foo); void bar(long a) { this_cpu_add(foo, a); } Old arch_this_cpu_add() looks like this: 00000000000000c0 <bar>: c0: c0 04 00 00 00 00 jgnop c0 <bar> c6: eb 01 03 a8 00 6a asi 936,1 cc: c4 18 00 00 00 00 lgrl %r1,cc <bar+0xc> ce: R_390_GOTENT foo+0x2 d2: e3 10 03 b8 00 08 ag %r1,952 d8: eb 22 10 00 00 e8 laag %r2,%r2,0(%r1) de: eb ff 03 a8 00 6e alsi 936,-1 e4: a7 a4 00 05 jhe ee <bar+0x2e> e8: c0 f4 00 00 00 00 jg e8 <bar+0x28> ea: R_390_PC32DBL __s390_indirect_jump_r14+0x2 ee: c0 f4 00 00 00 00 jg ee <bar+0x2e> f0: R_390_PLT32DBL preempt_schedule_notrace+0x2 New arch_this_cpu_add() looks like this: 00000000000000c0 <bar>: c0: c0 04 00 00 00 00 jgnop c0 <bar> c6: c4 38 00 00 00 00 lgrl %r3,c6 <bar+0x6> c8: R_390_GOTENT foo+0x2 cc: b9 04 00 43 lgr %r4,%r3 d0: eb 00 43 c0 00 52 mviy 960(%r0),4 d6: e3 40 03 b8 00 08 ag %r4,952 dc: eb 52 40 00 00 e8 laag %r5,%r2,0(%r4) e2: eb 00 03 c0 00 52 mviy 960,0 e8: c0 f4 00 00 00 00 jg e8 <bar+0x28> ea: R_390_PC32DBL __s390_indirect_jump_r14+0x2 Note that the conditional function call is removed. Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2026-06-03s390/percpu: Add missing do { } while (0) constructsHeiko Carstens
Add missing do { } while (0) constructs in order to avoid potential build failures. Reported-by: Sashiko <sashiko-bot@kernel.org> Closes: https://sashiko.dev/#/patchset/20260319120503.4046659-1-hca%40linux.ibm.com Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2026-06-03s390/percpu: Infrastructure for more efficient this_cpu operationsHeiko Carstens
With the intended removal of PREEMPT_NONE this_cpu operations based on atomic instructions, guarded with preempt_disable()/preempt_enable() pairs become more expensive: the preempt_disable() / preempt_enable() pairs are not optimized away anymore during compile time. In particular the conditional call to preempt_schedule_notrace() after preempt_enable() adds additional code and register pressure. E.g. this simple C code sequence DEFINE_PER_CPU(long, foo); long bar(long a) { return this_cpu_add_return(foo, a); } generates this code: 11a976: eb af f0 68 00 24 stmg %r10,%r15,104(%r15) 11a97c: b9 04 00 ef lgr %r14,%r15 11a980: b9 04 00 b2 lgr %r11,%r2 11a984: e3 f0 ff c8 ff 71 lay %r15,-56(%r15) 11a98a: e3 e0 f0 98 00 24 stg %r14,152(%r15) 11a990: eb 01 03 a8 00 6a asi 936,1 <- __preempt_count_add(1) 11a996: c0 10 00 d2 ac b5 larl %r1,1b70300 <- address of percpu var 11a9a0: e3 10 23 b8 00 08 ag %r1,952 <- add percpu offset 11a9a6: eb ab 10 00 00 e8 laag %r10,%r11,0(%r1) <- atomic op 11a9ac: eb ff 03 a8 00 6e alsi 936,-1 <- __preempt_count_dec_and_test() 11a9b2: a7 54 00 05 jnhe 11a9bc <bar+0x4c> 11a9b6: c0 e5 00 76 d1 bd brasl %r14,ff4d30 <preempt_schedule_notrace> 11a9bc: b9 e8 b0 2a agrk %r2,%r10,%r11 11a9c0: eb af f0 a0 00 04 lmg %r10,%r15,160(%r15) 11a9c6 07 fe br %r14 Even though the above example is more or less the worst case, since the branch to preempt_schedule_notrace() requires a stackframe, which otherwise wouldn't be necessary, there is also the conditional jnhe branch instruction. Get rid of the conditional branch with the following code sequence: 11a8e6: c0 30 00 d0 c5 0d larl %r3,1b33300 11a8ec: b9 04 00 43 lgr %r4,%r3 11a8f0: eb 00 43 c0 00 52 mviy 960,4 11a8f6: e3 40 03 b8 00 08 ag %r4,952 11a8fc: eb 52 40 00 00 e8 laag %r5,%r2,0(%r4) 11a902: eb 00 03 c0 00 52 mviy 960,0 11a908: b9 08 00 25 agr %r2,%r5 11a90c 07 fe br %r14 The general idea is that this_cpu operations based on atomic instructions are guarded with mviy instructions: - The first mviy instruction writes the register number, which contains the percpu address variable to lowcore. This also indicates that a percpu code section is executed. - The first instruction following the mviy instruction must be the ag instruction which adds the percpu offset to the percpu address register. - Afterwards the atomic percpu operation follows. - Then a second mviy instruction writes a zero to lowcore, which indicates the end of the percpu code section. - In case of an interrupt/exception/nmi the register number which was written to lowcore is copied to the exception frame (pt_regs), and a zero is written to lowcore. - On return to the previous context it is checked if a percpu code section was executed (saved register number not zero), and if the process was migrated to a different cpu. If the percpu offset was already added to the percpu address register (instruction address does _not_ point to the ag instruction) the content of the percpu address register is adjusted so it points to percpu variable of the new cpu. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2026-06-03s390/fpu: Move GR_NUM / VX_NUM macros to separate header fileHeiko Carstens
Move GR_NUM / VX_NUM macros to separate insn-common-asm.h header file so they can be reused for non-fpu insn constructs. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2026-06-03s390/fpu: Shorten GR_NUM / VX_NUM macrosHeiko Carstens
Use the ".irp" directive to get rid of all the repeated ".ifc" usages in fpu-insn-asm.h. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2026-06-02s390/time: Prepare to stop elapsing in dynticks-idleFrederic Weisbecker
Currently the tick subsystem stores the idle cputime accounting in private fields, allowing cohabitation with architecture idle vtime accounting. The former is fetched on online CPUs, the latter on offline CPUs. For consolidation purposes, architecture vtime accounting will continue to account the cputime but will make a break when the idle tick is stopped. The dyntick cputime accounting will then be relayed by the tick subsystem so that the idle cputime is still seen advancing coherently even when the tick isn't there to flush the idle vtime. Prepare for that and introduce three new APIs which will be used in subsequent patches: - vtime_dynticks_start() is deemed to be called when idle enters in dyntick mode. The idle cputime that elapsed so far is accumulated and accounted. Also idle time accounting is ignored. - vtime_dynticks_stop() is deemed to be called when idle exits from dyntick mode. The vtime entry clocks are fast-forward to current time so that idle accounting restarts elapsing from now. Also idle time accounting is resumed. - vtime_reset() is deemed to be called from dynticks idle IRQ entry to fast-forward the clock to current time so that the IRQ time is still accounted by vtime while nohz cputime is paused. Also accumulated vtime won't be flushed from dyntick-idle ticks to avoid accounting twice the idle cputime, along with nohz accounting. Co-developed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://patch.msgid.link/20260508131647.43868-7-frederic@kernel.org
2026-06-02KVM: s390: Avoid potentially sleeping while atomic when zapping pagesClaudio Imbrenda
Factor out try_get_locked_pte(), which behaves similarly to get_locked_pte(), but does not attempt to allocate missing tables and performs a spin_trylock() instead of blocking. The new function is also exported, since it will be used in other patches. If intermediate entries are missing, there can be no pte swap entry to free, so it's safe to ignore them. This avoids potentially sleeping while atomic. Fixes: e38c884df921 ("KVM: s390: Switch to new gmap") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20260602142356.169458-4-imbrenda@linux.ibm.com>
2026-05-30s390/mm: Make PTC and UV call order consistentAlexander Gordeev
In various code paths, page_table_check_pte_clear() is called before converting a secure page, while in others it is called after. Make this consistent and always perform the conversion after the PTC hook has been called. Also make all conversion‑ eligibility condition checks look the same, and rework the one in ptep_get_and_clear_full() slightly. Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2026-05-30s390/string: Remove strlcat() implementationHeiko Carstens
strlcat() shouldn't be used anymore (see fortify-string.h), and will be deprecated / removed sooner or later [1]. Therefore remove the s390 implementation of strlcat() in favor of the generic variant. [1] https://lore.kernel.org/all/20260514160719.105084-3-manuelebner@mailbox.org/ Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2026-05-30s390: Implement _THIS_IP_ using inline asmMarco Elver
Both GCC [1] and Clang [2] consider the generic version of _THIS_IP_ to be broken: #define _THIS_IP_ ({ __label__ __here; __here: (unsigned long)&&__here; }) In particular, the address of a label is only expected to be used with a computed goto. While the generic version more or less works today, it is known to be brittle and may break with current and future optimizations. For example, Clang -O2 always returns 1 when this function is inlined: static inline unsigned long get_ip(void) { return ({ __label__ __here; __here: (unsigned long)&&__here; }); } Fix it by overriding _THIS_IP_ in <asm/linkage.h> (which is included by <linux/instruction_pointer.h>) using an architecture-specific inline asm version. Additionally, avoiding taking the address of a label prevents compilers from emitting spurious indirect branch targets (e.g. ENDBR or BTI) under control-flow integrity schemes. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120071 [1] Link: https://github.com/llvm/llvm-project/issues/138272 [2] Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2026-05-28s390/bug: Always emit format word in __BUG_ENTRYJan Polensky
When CONFIG_DEBUG_BUGVERBOSE is disabled, the s390 __BUG_ENTRY() macro omits the format string pointer, so the generated __bug_table entry no longer matches struct bug_entry. With HAVE_ARCH_BUG_FORMAT enabled, the generic BUG infrastructure reads bug_entry::format via bug_get_format(). If the format word is missing, subsequent fields are read from the wrong offset, which may: - Misinterpret flags (BUG vs WARN classification errors) - Fault when dereferencing a misread format pointer The root cause is that __BUG_ENTRY() delegates format word emission to __BUG_ENTRY_VERBOSE(), which is conditional on CONFIG_DEBUG_BUGVERBOSE. Fix this by moving the format field emission directly into __BUG_ENTRY() so it is always emitted unconditionally. Remove the format parameter from __BUG_ENTRY_VERBOSE() and keep only file/line emission conditional on CONFIG_DEBUG_BUGVERBOSE. Fixes: 2b71b8ab9718 ("s390/bug: Use BUG_FORMAT for DEBUG_BUGVERBOSE_DETAILED") Signed-off-by: Jan Polensky <japo@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2026-05-26s390/tracing: Add s390-tod clockSven Schnelle
In order to allow comparing trace timestamps between different systems or virtual machines on s390, add a s390-tod trace clock. This clock just uses the returned TOD clock value from stcke directly. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2026-05-21ring-buffer: Flush and stop persistent ring buffer on panicMasami Hiramatsu (Google)
On real hardware, panic and machine reboot may not flush hardware cache to memory. This means the persistent ring buffer, which relies on a coherent state of memory, may not have its events written to the buffer and they may be lost. Moreover, there may be inconsistency with the counters which are used for validation of the integrity of the persistent ring buffer which may cause all data to be discarded. To avoid this issue, stop recording of the ring buffer on panic and flush the cache of the ring buffer's memory. Fixes: e645535a954a ("tracing: Add option to use memmapped memory for trace boot instance") Cc: stable@vger.kernel.org Cc: Will Deacon <will@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Ian Rogers <irogers@google.com> Link: https://patch.msgid.link/177751969602.2136606.12031934362587643488.stgit@mhiramat.tok.corp.google.com Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-20s390/processor: Implement cpu_relax() with cpu serializationHeiko Carstens
There are many loops in the form of while (READ_ONCE(*somelocation)) cpu_relax(); Strictly speaking the architecture requires serialization instead of only a compiler barrier in the loop so the READ_ONCE() will see an updated value. However real hardware does not require this (see IBM z Systems Processor Optimization Primer - FAQ [1]), but it is still recommended to add serialization. Given that cpu_relax() is doing nothing useful, it does not hurt to add the single and fast instruction which makes sure that serialization happens, and such loops may be left a bit faster. [1] https://community.ibm.com/community/user/viewdocument/microprocessor-optimization-primer Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2026-05-20s390/processor: Remove duplicated cpu_relax() defineHeiko Carstens
cpu_relax() is defined identically at two different locations. Just like most other architectures remove the implementation at asm/processor.h and only use the one at asm/vdso/processor.h, avoiding code duplication. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2026-05-20s390/barrier: Use alternative instead of ifdef for bcr_serialize()Heiko Carstens
Use an alternative to implement bcr_serialize() and use alternative patching to select between serialization and fast-serialization depending on the corresponding facility bit. Reviewed-by: Jan Polensky <japo@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2026-05-11s390/debug: Add s390dbf kernel parameterPeter Oberparleiter
Problem determination using s390dbf logging sometimes requires changing the default logging level or log area size. While this is possible using sysfs interfaces, there is no easy way to adjust these parameters for early boot code that emits logs before userspace is available. Add an s390dbf kernel parameter to address this shortcoming. The parameter can be used to specify log level and area size (in units of pages). A level of '-' turns logging off for an area. Logs can be identified by name or a shell-style pattern. Parameter format: s390dbf=<name|pattern>:[<level>|-]:[<pages>][,...] Example: s390dbf=cio*:6:128,sclp_err::2 Specified parameters are applied immediately during debug area registration for regular log areas. For early, static debug areas, log levels are changed during early_param() parsing, while size changes are applied at arch_initcall-time. Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com> Tested-by: Vineeth Vijayan <vneethv@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2026-05-06s390/sclp: Allow user-space to provide PCI reports for NVMe SMART dataNiklas Schnelle
The new SCLP action qualifier 4 is used by user-space code to provide NVMe SMART log data to the platform. Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2026-04-22Merge tag 's390-7.1-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Vasily Gorbik: - Add support for CONFIG_PAGE_TABLE_CHECK and enable it in debug_defconfig. s390 can only tell user from kernel PTEs via the mm, so mm_struct is now passed into pxx_user_accessible_page() callbacks - Expose the PCI function UID as an arch-specific slot attribute in sysfs so a function can be identified by its user-defined id while still in standby. Introduces a generic ARCH_PCI_SLOT_GROUPS hook in drivers/pci/slot.c - Refresh s390 PCI documentation to reflect current behavior and cover previously undocumented sysfs attributes - zcrypt device driver cleanup series: consistent field types, clearer variable naming, a kernel-doc warning fix, and a comment explaining the intentional synchronize_rcu() in pkey_handler_register() - Provide an s390 arch_raw_cpu_ptr() that avoids the detour via get_lowcore() using alternatives, shrinking defconfig by ~27 kB - Guard identity-base randomization with kaslr_enabled() so nokaslr keeps the identity mapping at 0 even with RANDOMIZE_IDENTITY_BASE=y - Build S390_MODULES_SANITY_TEST as a module only by requiring KUNIT && m, since built-in would not exercise module loading - Remove the permanently commented-out HMCDRV_DEV_CLASS create_class() code in the hmcdrv driver - Drop stale ident_map_size extern conflicting with asm/page.h * tag 's390-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/zcrypt: Fix warning about wrong kernel doc comment PCI: s390: Expose the UID as an arch specific PCI slot attribute docs: s390/pci: Improve and update PCI documentation s390/pkey: Add comment about synchronize_rcu() to pkey base s390/hmcdrv: Remove commented out code s390/zcrypt: Slight rework on the agent_id field s390/zcrypt: Explicitly use a card variable in _zcrypt_send_cprb s390/zcrypt: Rework MKVP fields and handling s390/zcrypt: Make apfs a real unsigned int field s390/zcrypt: Rework domain processing within zcrypt device driver s390/zcrypt: Move inline function rng_type6cprb_msgx from header to code s390/percpu: Provide arch_raw_cpu_ptr() s390: Enable page table check for debug_defconfig s390/pgtable: Add s390 support for page table check s390/pgtable: Use set_pmd_bit() to invalidate PMD entry mm/page_table_check: Pass mm_struct to pxx_user_accessible_page() s390/boot: Respect kaslr_enabled() for identity randomization s390/Kconfig: Make modules sanity test a module-only option s390/setup: Drop stale ident_map_size declaration
2026-04-17Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "Arm: - Add support for tracing in the standalone EL2 hypervisor code, which should help both debugging and performance analysis. This uses the new infrastructure for 'remote' trace buffers that can be exposed by non-kernel entities such as firmware, and which came through the tracing tree - Add support for GICv5 Per Processor Interrupts (PPIs), as the starting point for supporting the new GIC architecture in KVM - Finally add support for pKVM protected guests, where pages are unmapped from the host as they are faulted into the guest and can be shared back from the guest using pKVM hypercalls. Protected guests are created using a new machine type identifier. As the elusive guestmem has not yet delivered on its promises, anonymous memory is also supported This is only a first step towards full isolation from the host; for example, the CPU register state and DMA accesses are not yet isolated. Because this does not really yet bring fully what it promises, it is hidden behind CONFIG_ARM_PKVM_GUEST + 'kvm-arm.mode=protected', and also triggers TAINT_USER when a VM is created. Caveat emptor - Rework the dreaded user_mem_abort() function to make it more maintainable, reducing the amount of state being exposed to the various helpers and rendering a substantial amount of state immutable - Expand the Stage-2 page table dumper to support NV shadow page tables on a per-VM basis - Tidy up the pKVM PSCI proxy code to be slightly less hard to follow - Fix both SPE and TRBE in non-VHE configurations so that they do not generate spurious, out of context table walks that ultimately lead to very bad HW lockups - A small set of patches fixing the Stage-2 MMU freeing in error cases - Tighten-up accepted SMC immediate value to be only #0 for host SMCCC calls - The usual cleanups and other selftest churn LoongArch: - Use CSR_CRMD_PLV for kvm_arch_vcpu_in_kernel() - Add DMSINTC irqchip in kernel support RISC-V: - Fix steal time shared memory alignment checks - Fix vector context allocation leak - Fix array out-of-bounds in pmu_ctr_read() and pmu_fw_ctr_read_hi() - Fix double-free of sdata in kvm_pmu_clear_snapshot_area() - Fix integer overflow in kvm_pmu_validate_counter_mask() - Fix shift-out-of-bounds in make_xfence_request() - Fix lost write protection on huge pages during dirty logging - Split huge pages during fault handling for dirty logging - Skip CSR restore if VCPU is reloaded on the same core - Implement kvm_arch_has_default_irqchip() for KVM selftests - Factored-out ISA checks into separate sources - Added hideleg to struct kvm_vcpu_config - Factored-out VCPU config into separate sources - Support configuration of per-VM HGATP mode from KVM user space s390: - Support for ESA (31-bit) guests inside nested hypervisors - Remove restriction on memslot alignment, which is not needed anymore with the new gmap code - Fix LPSW/E to update the bear (which of course is the breaking event address register) x86: - Shut up various UBSAN warnings on reading module parameter before they were initialized - Don't zero-allocate page tables that are used for splitting hugepages in the TDP MMU, as KVM is guaranteed to set all SPTEs in the page table and thus write all bytes - As an optimization, bail early when trying to unsync 4KiB mappings if the target gfn can just be mapped with a 2MiB hugepage x86 generic: - Copy single-chunk MMIO write values into struct kvm_vcpu (more precisely struct kvm_mmio_fragment) to fix use-after-free stack bugs where KVM would dereference stack pointer after an exit to userspace - Clean up and comment the emulated MMIO code to try to make it easier to maintain (not necessarily "easy", but "easier") - Move VMXON+VMXOFF and EFER.SVME toggling out of KVM (not *all* of VMX and SVM enabling) as it is needed for trusted I/O - Advertise support for AVX512 Bit Matrix Multiply (BMM) instructions - Immediately fail the build if a required #define is missing in one of KVM's headers that is included multiple times - Reject SET_GUEST_DEBUG with -EBUSY if there's an already injected exception, mostly to prevent syzkaller from abusing the uAPI to trigger WARNs, but also because it can help prevent userspace from unintentionally crashing the VM - Exempt SMM from CPUID faulting on Intel, as per the spec - Misc hardening and cleanup changes x86 (AMD): - Fix and optimize IRQ window inhibit handling for AVIC; make it per-vCPU so that KVM doesn't prematurely re-enable AVIC if multiple vCPUs have to-be-injected IRQs - Clean up and optimize the OSVW handling, avoiding a bug in which KVM would overwrite state when enabling virtualization on multiple CPUs in parallel. This should not be a problem because OSVW should usually be the same for all CPUs - Drop a WARN in KVM_MEMORY_ENCRYPT_REG_REGION where KVM complains about a "too large" size based purely on user input - Clean up and harden the pinning code for KVM_MEMORY_ENCRYPT_REG_REGION - Disallow synchronizing a VMSA of an already-launched/encrypted vCPU, as doing so for an SNP guest will crash the host due to an RMP violation page fault - Overhaul KVM's APIs for detecting SEV+ guests so that VM-scoped queries are required to hold kvm->lock, and enforce it by lockdep. Fix various bugs where sev_guest() was not ensured to be stable for the whole duration of a function or ioctl - Convert a pile of kvm->lock SEV code to guard() - Play nicer with userspace that does not enable KVM_CAP_EXCEPTION_PAYLOAD, for which KVM needs to set CR2 and DR6 as a response to ioctls such as KVM_GET_VCPU_EVENTS (even if the payload would end up in EXITINFO2 rather than CR2, for example). Only set CR2 and DR6 when consumption of the payload is imminent, but on the other hand force delivery of the payload in all paths where userspace retrieves CR2 or DR6 - Use vcpu->arch.cr2 when updating vmcb12's CR2 on nested #VMEXIT instead of vmcb02->save.cr2. The value is out of sync after a save/restore or after a #PF is injected into L2 - Fix a class of nSVM bugs where some fields written by the CPU are not synchronized from vmcb02 to cached vmcb12 after VMRUN, and so are not up-to-date when saved by KVM_GET_NESTED_STATE - Fix a class of bugs where the ordering between KVM_SET_NESTED_STATE and KVM_SET_{S}REGS could cause vmcb02 to be incorrectly initialized after save+restore - Add a variety of missing nSVM consistency checks - Fix several bugs where KVM failed to correctly update VMCB fields on nested #VMEXIT - Fix several bugs where KVM failed to correctly synthesize #UD or #GP for SVM-related instructions - Add support for save+restore of virtualized LBRs (on SVM) - Refactor various helpers and macros to improve clarity and (hopefully) make the code easier to maintain - Aggressively sanitize fields when copying from vmcb12, to guard against unintentionally allowing L1 to utilize yet-to-be-defined features - Fix several bugs where KVM botched rAX legality checks when emulating SVM instructions. There are remaining issues in that KVM doesn't handle size prefix overrides for 64-bit guests - Fail emulation of VMRUN/VMLOAD/VMSAVE if mapping vmcb12 fails instead of somewhat arbitrarily synthesizing #GP (i.e. don't double down on AMD's architectural but sketchy behavior of generating #GP for "unsupported" addresses) - Cache all used vmcb12 fields to further harden against TOCTOU bugs x86 (Intel): - Drop obsolete branch hint prefixes from the VMX instruction macros - Use ASM_INPUT_RM() in __vmcs_writel() to coerce clang into using a register input when appropriate - Code cleanups guest_memfd: - Don't mark guest_memfd folios as accessed, as guest_memfd doesn't support reclaim, the memory is unevictable, and there is no storage to write back to LoongArch selftests: - Add KVM PMU test cases s390 selftests: - Enable more memory selftests x86 selftests: - Add support for Hygon CPUs in KVM selftests - Fix a bug in the MSR test where it would get false failures on AMD/Hygon CPUs with exactly one of RDPID or RDTSCP - Add an MADV_COLLAPSE testcase for guest_memfd as a regression test for a bug where the kernel would attempt to collapse guest_memfd folios against KVM's will" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (373 commits) KVM: x86: use inlines instead of macros for is_sev_*guest x86/virt: Treat SVM as unsupported when running as an SEV+ guest KVM: SEV: Goto an existing error label if charging misc_cg for an ASID fails KVM: SVM: Move lock-protected allocation of SEV ASID into a separate helper KVM: SEV: use mutex guard in snp_handle_guest_req() KVM: SEV: use mutex guard in sev_mem_enc_unregister_region() KVM: SEV: use mutex guard in sev_mem_enc_ioctl() KVM: SEV: use mutex guard in snp_launch_update() KVM: SEV: Assert that kvm->lock is held when querying SEV+ support KVM: SEV: Document that checking for SEV+ guests when reclaiming memory is "safe" KVM: SEV: Hide "struct kvm_sev_info" behind CONFIG_KVM_AMD_SEV=y KVM: SEV: WARN on unhandled VM type when initializing VM KVM: LoongArch: selftests: Add PMU overflow interrupt test KVM: LoongArch: selftests: Add basic PMU event counting test KVM: LoongArch: selftests: Add cpucfg read/write helpers LoongArch: KVM: Add DMSINTC inject msi to vCPU LoongArch: KVM: Add DMSINTC device support LoongArch: KVM: Make vcpu_is_preempted() as a macro rather than function LoongArch: KVM: Move host CSR_GSTAT save and restore in context switch LoongArch: KVM: Move host CSR_EENTRY save and restore in context switch ...
2026-04-16Merge tag 'mm-nonmm-stable-2026-04-15-04-20' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "pid: make sub-init creation retryable" (Oleg Nesterov) Make creation of init in a new namespace more robust by clearing away some historical cruft which is no longer needed. Also some documentation fixups - "selftests/fchmodat2: Error handling and general" (Mark Brown) Fix and a cleanup for the fchmodat2() syscall selftest - "lib: polynomial: Move to math/ and clean up" (Andy Shevchenko) - "hung_task: Provide runtime reset interface for hung task detector" (Aaron Tomlin) Give administrators the ability to zero out /proc/sys/kernel/hung_task_detect_count - "tools/getdelays: use the static UAPI headers from tools/include/uapi" (Thomas Weißschuh) Teach getdelays to use the in-kernel UAPI headers rather than the system-provided ones - "watchdog/hardlockup: Improvements to hardlockup" (Mayank Rungta) Several cleanups and fixups to the hardlockup detector code and its documentation - "lib/bch: fix undefined behavior from signed left-shifts" (Josh Law) A couple of small/theoretical fixes in the bch code - "ocfs2/dlm: fix two bugs in dlm_match_regions()" (Junrui Luo) - "cleanup the RAID5 XOR library" (Christoph Hellwig) A quite far-reaching cleanup to this code. I can't do better than to quote Christoph: "The XOR library used for the RAID5 parity is a bit of a mess right now. The main file sits in crypto/ despite not being cryptography and not using the crypto API, with the generic implementations sitting in include/asm-generic and the arch implementations sitting in an asm/ header in theory. The latter doesn't work for many cases, so architectures often build the code directly into the core kernel, or create another module for the architecture code. Change this to a single module in lib/ that also contains the architecture optimizations, similar to the library work Eric Biggers has done for the CRC and crypto libraries later. After that it changes to better calling conventions that allow for smarter architecture implementations (although none is contained here yet), and uses static_call to avoid indirection function call overhead" - "lib/list_sort: Clean up list_sort() scheduling workarounds" (Kuan-Wei Chiu) Clean up this library code by removing a hacky thing which was added for UBIFS, which UBIFS doesn't actually need - "Fix bugs in extract_iter_to_sg()" (Christian Ehrhardt) Fix a few bugs in the scatterlist code, add in-kernel tests for the now-fixed bugs and fix a leak in the test itself - "kdump: Enable LUKS-encrypted dump target support in ARM64 and PowerPC" (Coiby Xu) Enable support of the LUKS-encrypted device dump target on arm64 and powerpc - "ocfs2: consolidate extent list validation into block read callbacks" (Joseph Qi) Cleanup, simplify, and make more robust ocfs2's validation of extent list fields (Kernel test robot loves mounting corrupted fs images!) * tag 'mm-nonmm-stable-2026-04-15-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (127 commits) ocfs2: validate group add input before caching ocfs2: validate bg_bits during freefrag scan ocfs2: fix listxattr handling when the buffer is full doc: watchdog: fix typos etc update Sean's email address ocfs2: use get_random_u32() where appropriate ocfs2: split transactions in dio completion to avoid credit exhaustion ocfs2: remove redundant l_next_free_rec check in __ocfs2_find_path() ocfs2: validate extent block list fields during block read ocfs2: remove empty extent list check in ocfs2_dx_dir_lookup_rec() ocfs2: validate dx_root extent list fields during block read ocfs2: fix use-after-free in ocfs2_fault() when VM_FAULT_RETRY ocfs2: handle invalid dinode in ocfs2_group_extend .get_maintainer.ignore: add Askar ocfs2: validate bg_list extent bounds in discontig groups checkpatch: exclude forward declarations of const structs tools/accounting: handle truncated taskstats netlink messages taskstats: set version in TGID exit notifications ocfs2/heartbeat: fix slot mapping rollback leaks on error paths arm64,ppc64le/kdump: pass dm-crypt keys to kdump kernel ...
2026-04-15Merge tag 'mm-stable-2026-04-13-21-45' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "maple_tree: Replace big node with maple copy" (Liam Howlett) Mainly prepararatory work for ongoing development but it does reduce stack usage and is an improvement. - "mm, swap: swap table phase III: remove swap_map" (Kairui Song) Offers memory savings by removing the static swap_map. It also yields some CPU savings and implements several cleanups. - "mm: memfd_luo: preserve file seals" (Pratyush Yadav) File seal preservation to LUO's memfd code - "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan Chen) Additional userspace stats reportng to zswap - "arch, mm: consolidate empty_zero_page" (Mike Rapoport) Some cleanups for our handling of ZERO_PAGE() and zero_pfn - "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu Han) A robustness improvement and some cleanups in the kmemleak code - "Improve khugepaged scan logic" (Vernon Yang) Improve khugepaged scan logic and reduce CPU consumption by prioritizing scanning tasks that access memory frequently - "Make KHO Stateless" (Jason Miu) Simplify Kexec Handover by transitioning KHO from an xarray-based metadata tracking system with serialization to a radix tree data structure that can be passed directly to the next kernel - "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas Ballasi and Steven Rostedt) Enhance vmscan's tracepointing - "mm: arch/shstk: Common shadow stack mapping helper and VM_NOHUGEPAGE" (Catalin Marinas) Cleanup for the shadow stack code: remove per-arch code in favour of a generic implementation - "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin) Fix a WARN() which can be emitted the KHO restores a vmalloc area - "mm: Remove stray references to pagevec" (Tal Zussman) Several cleanups, mainly udpating references to "struct pagevec", which became folio_batch three years ago - "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl Shutsemau) Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail pages encode their relationship to the head page - "mm/damon/core: improve DAMOS quota efficiency for core layer filters" (SeongJae Park) Improve two problematic behaviors of DAMOS that makes it less efficient when core layer filters are used - "mm/damon: strictly respect min_nr_regions" (SeongJae Park) Improve DAMON usability by extending the treatment of the min_nr_regions user-settable parameter - "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka) The proper fix for a previously hotfixed SMP=n issue. Code simplifications and cleanups ensued - "mm: cleanups around unmapping / zapping" (David Hildenbrand) A bunch of cleanups around unmapping and zapping. Mostly simplifications, code movements, documentation and renaming of zapping functions - "support batched checking of the young flag for MGLRU" (Baolin Wang) Batched checking of the young flag for MGLRU. It's part cleanups; one benchmark shows large performance benefits for arm64 - "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner) memcg cleanup and robustness improvements - "Allow order zero pages in page reporting" (Yuvraj Sakshith) Enhance free page reporting - it is presently and undesirably order-0 pages when reporting free memory. - "mm: vma flag tweaks" (Lorenzo Stoakes) Cleanup work following from the recent conversion of the VMA flags to a bitmap - "mm/damon: add optional debugging-purpose sanity checks" (SeongJae Park) Add some more developer-facing debug checks into DAMON core - "mm/damon: test and document power-of-2 min_region_sz requirement" (SeongJae Park) An additional DAMON kunit test and makes some adjustments to the addr_unit parameter handling - "mm/damon/core: make passed_sample_intervals comparisons overflow-safe" (SeongJae Park) Fix a hard-to-hit time overflow issue in DAMON core - "mm/damon: improve/fixup/update ratio calculation, test and documentation" (SeongJae Park) A batch of misc/minor improvements and fixups for DAMON - "mm: move vma_(kernel|mmu)_pagesize() out of hugetlb.c" (David Hildenbrand) Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code movement was required. - "zram: recompression cleanups and tweaks" (Sergey Senozhatsky) A somewhat random mix of fixups, recompression cleanups and improvements in the zram code - "mm/damon: support multiple goal-based quota tuning algorithms" (SeongJae Park) Extend DAMOS quotas goal auto-tuning to support multiple tuning algorithms that users can select - "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao) Fix the khugpaged sysfs handling so we no longer spam the logs with reams of junk when starting/stopping khugepaged - "mm: improve map count checks" (Lorenzo Stoakes) Provide some cleanups and slight fixes in the mremap, mmap and vma code - "mm/damon: support addr_unit on default monitoring targets for modules" (SeongJae Park) Extend the use of DAMON core's addr_unit tunable - "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache) Cleanups to khugepaged and is a base for Nico's planned khugepaged mTHP support - "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand) Code movement and cleanups in the memhotplug and sparsemem code - "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION" (David Hildenbrand) Rationalize some memhotplug Kconfig support - "change young flag check functions to return bool" (Baolin Wang) Cleanups to change all young flag check functions to return bool - "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh Law and SeongJae Park) Fix a few potential DAMON bugs - "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo Stoakes) Convert a lot of the existing use of the legacy vm_flags_t data type to the new vma_flags_t type which replaces it. Mainly in the vma code. - "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes) Expand the mmap_prepare functionality, which is intended to replace the deprecated f_op->mmap hook which has been the source of bugs and security issues for some time. Cleanups, documentation, extension of mmap_prepare into filesystem drivers - "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes) Simplify and clean up zap_huge_pmd(). Additional cleanups around vm_normal_folio_pmd() and the softleaf functionality are performed. * tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm: fix deferred split queue races during migration mm/khugepaged: fix issue with tracking lock mm/huge_memory: add and use has_deposited_pgtable() mm/huge_memory: add and use normal_or_softleaf_folio_pmd() mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio() mm/huge_memory: separate out the folio part of zap_huge_pmd() mm/huge_memory: use mm instead of tlb->mm mm/huge_memory: remove unnecessary sanity checks mm/huge_memory: deduplicate zap deposited table call mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE() mm/huge_memory: add a common exit path to zap_huge_pmd() mm/huge_memory: handle buggy PMD entry in zap_huge_pmd() mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc mm/huge: avoid big else branch in zap_huge_pmd() mm/huge_memory: simplify vma_is_specal_huge() mm: on remap assert that input range within the proposed VMA mm: add mmap_action_map_kernel_pages[_full]() uio: replace deprecated mmap hook with mmap_prepare in uio_info drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare mm: allow handling of stacked mmap_prepare hooks in more drivers ...
2026-04-13Merge tag 'hardening-v7.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: - randomize_kstack: Improve implementation across arches (Ryan Roberts) - lkdtm/fortify: Drop unneeded FORTIFY_STR_OBJECT test - refcount: Remove unused __signed_wrap function annotations * tag 'hardening-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: lkdtm/fortify: Drop unneeded FORTIFY_STR_OBJECT test refcount: Remove unused __signed_wrap function annotations randomize_kstack: Unify random source across arches randomize_kstack: Maintain kstack_offset per task
2026-04-13Merge tag 'kvm-s390-next-7.1-1' of ↵Paolo Bonzini
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD - ESA nesting support - 4k memslots - LPSW/E fix
2026-04-11PCI: s390: Expose the UID as an arch specific PCI slot attributeNiklas Schnelle
On s390, an individual PCI function can generally be identified by two identifiers, the FID and the UID. Which identifier is used depends on the scope and the platform configuration. The first identifier, the FID, is always available and identifies a PCI device uniquely within a machine. The FID may be virtualized by hypervisors, but on the LPAR level, the machine scope makes it impossible to create the same configuration based on FIDs on two different LPARs of the same machine, and difficult to reuse across machines. Such matching LPAR configurations are useful, though, allowing standardized setups and booting a Linux installation on different LPARs. To this end the UID, or user-defined identifier, was introduced. While it is only guaranteed to be unique within an LPAR and only if indicated by firmware, it allows users to replicate PCI device setups. On s390, which uses a machine hypervisor, a per PCI function hotplug model is used. The shortcoming with the UID then is, that it is not visible to the user without first attaching the PCI function and accessing the "uid" device attribute. The FID, on the other hand, is used as the slot name and is thus known even with the PCI function in standby. Remedy this shortcoming by providing the UID as an attribute on the slot allowing the user to identify a PCI function based on the UID without having to first attach it. Do this via a macro mechanism analogous to what was introduced by commit 265baca69a07 ("s390/pci: Stop usurping pdev->dev.groups") for the PCI device attributes. Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Julian Ruess <julianr@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # drivers/pci/slot.c Link: https://lore.kernel.org/r/20260407-uid_slot-v8-2-15ae4409d2ce@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2026-04-05mm: convert do_brk_flags() to use vma_flags_tLorenzo Stoakes (Oracle)
In order to be able to do this, we need to change VM_DATA_DEFAULT_FLAGS and friends and update the architecture-specific definitions also. We then have to update some KSM logic to handle VMA flags, and introduce VMA_STACK_FLAGS to define the vma_flags_t equivalent of VM_STACK_FLAGS. We also introduce two helper functions for use during the time we are converting legacy flags to vma_flags_t values - vma_flags_to_legacy() and legacy_to_vma_flags(). This enables us to iteratively make changes to break these changes up into separate parts. We use these explicitly here to keep VM_STACK_FLAGS around for certain users which need to maintain the legacy vm_flags_t values for the time being. We are no longer able to rely on the simple VM_xxx being set to zero if the feature is not enabled, so in the case of VM_DROPPABLE we introduce VMA_DROPPABLE as the vma_flags_t equivalent, which is set to EMPTY_VMA_FLAGS if the droppable flag is not available. While we're here, we make the description of do_brk_flags() into a kdoc comment, as it almost was already. We use vma_flags_to_legacy() to not need to update the vm_get_page_prot() logic as this time. Note that in create_init_stack_vma() we have to replace the BUILD_BUG_ON() with a VM_WARN_ON_ONCE() as the tested values are no longer build time available. We also update mprotect_fixup() to use VMA flags where possible, though we have to live with a little duplication between vm_flags_t and vma_flags_t values for the time being until further conversions are made. While we're here, update VM_SPECIAL to be defined in terms of VMA_SPECIAL_FLAGS now we have vma_flags_to_legacy(). Finally, we update the VMA tests to reflect these changes. Link: https://lkml.kernel.org/r/d02e3e45d9a33d7904b149f5604904089fd640ae.1774034900.git.ljs@kernel.org Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Acked-by: Paul Moore <paul@paul-moore.com> [SELinux] Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Kees Cook <kees@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Ondrej Mosnacek <omosnace@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <linux@armlinux.org.uk> Cc: Stephen Smalley <stephen.smalley.work@gmail.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Cc: xu xin <xu.xin16@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm: change to return bool for pmdp_clear_flush_young()Baolin Wang
The pmdp_clear_flush_young() is used to clear the young flag and flush the TLB, returning whether the young flag was set for this PMD entry. Change the return type to bool to make the intention clearer. Link: https://lkml.kernel.org/r/a668b9a974c0d675e7a41f6973bcbe3336e8b373.1774075004.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm: change to return bool for pmdp_test_and_clear_young()Baolin Wang
Callers use pmdp_test_and_clear_young() to clear the young flag and check whether it was set for this PMD entry. Change the return type to bool to make the intention clearer. Link: https://lkml.kernel.org/r/f1d31307a13365d3d0fed5809727dcc2dd59631b.1774075004.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm: change to return bool for ptep_clear_flush_young()/clear_flush_young_ptes()Baolin Wang
The ptep_clear_flush_young() and clear_flush_young_ptes() are used to clear the young flag and flush the TLB, returning whether the young flag was set. Change the return type to bool to make the intention clearer. Link: https://lkml.kernel.org/r/24af5144b96103631594501f77d4525f2475c1be.1774075004.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>