summaryrefslogtreecommitdiff
path: root/arch/s390
AgeCommit message (Collapse)Author
2026-04-03Merge tag 's390-7.0-7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Vasily Gorbik: - Fix a memory leak in the zcrypt driver where the AP message buffer for clear key RSA requests was allocated twice, once by the caller and again locally, causing the first allocation to never be freed - Fix the cpum_sf perf sampling rate overflow adjustment to clamp the recalculated rate to the hardware maximum, preventing exceptions on heavily loaded systems running with HZ=1000 * tag 's390-7.0-7' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/zcrypt: Fix memory leak with CCA cards used as accelerator s390/cpum_sf: Cap sampling rate to prevent lsctl exception
2026-03-29Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "s390: - Lots of small and not-so-small fixes for the newly rewritten gmap, mostly affecting the handling of nested guests. x86: - Fix an issue with shadow paging, which causes KVM to install an MMIO PTE in the shadow page tables without first zapping a non-MMIO SPTE if KVM didn't see the write that modified the shadowed guest PTE. While commit a54aa15c6bda3 ("KVM: x86/mmu: Handle MMIO SPTEs directly in mmu_set_spte()") was right about it being impossible to miss such a write if it was coming from the guest, it failed to account for writes to guest memory that are outside the scope of KVM: if userspace modifies the guest PTE, and then the guest hits a relevant page fault, KVM will get confused" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86/mmu: Only WARN in direct MMUs when overwriting shadow-present SPTE KVM: x86/mmu: Drop/zap existing present SPTE even when creating an MMIO SPTE KVM: s390: Fix KVM_S390_VCPU_FAULT ioctl KVM: s390: vsie: Fix guest page tables protection KVM: s390: vsie: Fix unshadowing while shadowing KVM: s390: vsie: Fix refcount overflow for shadow gmaps KVM: s390: vsie: Fix nested guest memory shadowing KVM: s390: Correctly handle guest mappings without struct page KVM: s390: Fix gmap_link() KVM: s390: vsie: Fix check for pre-existing shadow mapping KVM: s390: Remove non-atomic dat_crstep_xchg() KVM: s390: vsie: Fix dat_split_ste()
2026-03-28Merge tag 's390-7.0-6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Vasily Gorbik: - Add array_index_nospec() to syscall dispatch table lookup to prevent limited speculative out-of-bounds access with user-controlled syscall number - Mark array_index_mask_nospec() __always_inline since GCC may emit an out-of-line call instead of the inline data dependency sequence the mitigation relies on - Clear r12 on kernel entry to prevent potential speculative use of user value in system_call, ext/io/mcck interrupt handlers * tag 's390-7.0-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/entry: Scrub r12 register on kernel entry s390/syscalls: Add spectre boundary for syscall dispatch table s390/barrier: Make array_index_mask_nospec() __always_inline
2026-03-28s390/entry: Scrub r12 register on kernel entryVasily Gorbik
Before commit f33f2d4c7c80 ("s390/bp: remove TIF_ISOLATE_BP"), all entry handlers loaded r12 with the current task pointer (lg %r12,__LC_CURRENT) for use by the BPENTER/BPEXIT macros. That commit removed TIF_ISOLATE_BP, dropping both the branch prediction macros and the r12 load, but did not add r12 to the register clearing sequence. Add the missing xgr %r12,%r12 to make the register scrub consistent across all entry points. Fixes: f33f2d4c7c80 ("s390/bp: remove TIF_ISOLATE_BP") Cc: stable@kernel.org Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2026-03-28s390/syscalls: Add spectre boundary for syscall dispatch tableGreg Kroah-Hartman
The s390 syscall number is directly controlled by userspace, but does not have an array_index_nospec() boundary to prevent access past the syscall function pointer tables. Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Fixes: 56e62a737028 ("s390: convert to generic entry") Cc: stable@kernel.org Assisted-by: gkh_clanker_2000 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/2026032404-sterling-swoosh-43e6@gregkh Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2026-03-28s390/barrier: Make array_index_mask_nospec() __always_inlineVasily Gorbik
Mark array_index_mask_nospec() as __always_inline to guarantee the mitigation is emitted inline regardless of compiler inlining decisions. Fixes: e2dd833389cc ("s390: add optimized array_index_mask_nospec") Cc: stable@kernel.org Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2026-03-26KVM: s390: Fix KVM_S390_VCPU_FAULT ioctlClaudio Imbrenda
A previous commit changed the behaviour of the KVM_S390_VCPU_FAULT ioctl. The current (wrong) implementation will trigger a guest addressing exception if the requested address lies outside of a memslot, unless the VM is UCONTROL. Restore the previous behaviour by open coding the fault-in logic. Fixes: 3762e905ec2e ("KVM: s390: use __kvm_faultin_pfn()") Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-26KVM: s390: vsie: Fix guest page tables protectionClaudio Imbrenda
When shadowing, the guest page tables are write-protected, in order to trap changes and properly unshadow the shadow mapping for the nested guest. Already shadowed levels are skipped, so that only the needed levels are write protected. Currently the levels that get write protected are exactly one level too deep: the last level (nested guest memory) gets protected in the wrong way, and will be protected again correctly a few lines afterwards; most importantly, the highest non-shadowed level does *not* get write protected. Moreover, if the nested guest is running in a real address space, there are no DAT tables to shadow. Write protect the correct levels, so that all the levels that need to be protected are protected, and avoid double protecting the last level; skip attempting to shadow the DAT tables when the nested guest is running in a real address space. Fixes: e38c884df921 ("KVM: s390: Switch to new gmap") Tested-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-26KVM: s390: vsie: Fix unshadowing while shadowingClaudio Imbrenda
If shadowing causes the shadow gmap to get unshadowed, exit early to prevent an attempt to dereference the parent pointer, which at this point is NULL. Opportunistically add some more checks to prevent NULL parents. Fixes: a2c17f9270cc ("KVM: s390: New gmap code") Fixes: e5f98a6899bd ("KVM: s390: Add some helper functions needed for vSIE") Fixes: e38c884df921 ("KVM: s390: Switch to new gmap") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-26KVM: s390: vsie: Fix refcount overflow for shadow gmapsClaudio Imbrenda
In most cases gmap_put() was not called when it should have. Add the missing gmap_put() in vsie_run(). Fixes: e38c884df921 ("KVM: s390: Switch to new gmap") Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-26KVM: s390: vsie: Fix nested guest memory shadowingClaudio Imbrenda
Fix _do_shadow_pte() to use the correct pointer (guest pte instead of nested guest) to set up the new pte. Add a check to return -EOPNOTSUPP if the mapping for the nested guest is writeable but the same page in the guest is only read-only. Fixes: e38c884df921 ("KVM: s390: Switch to new gmap") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-26KVM: s390: Correctly handle guest mappings without struct pageClaudio Imbrenda
Introduce a new special softbit for large pages, like already presend for normal pages, and use it to mark guest mappings that do not have struct pages. Whenever a leaf DAT entry becomes dirty, check the special softbit and only call SetPageDirty() if there is an actual struct page. Move the logic to mark pages dirty inside _gmap_ptep_xchg() and _gmap_crstep_xchg_atomic(), to avoid needlessly duplicating the code. Fixes: 5a74e3d93417 ("KVM: s390: KVM-specific bitfields and helper functions") Fixes: a2c17f9270cc ("KVM: s390: New gmap code") Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-26KVM: s390: Fix gmap_link()Claudio Imbrenda
The slow path of the fault handler ultimately called gmap_link(), which assumed the fault was a major fault, and blindly called dat_link(). In case of minor faults, things were not always handled properly; in particular the prefix and vsie marker bits were ignored. Move dat_link() into gmap.c, renaming it accordingly. Once moved, the new _gmap_link() function will be able to correctly honour the prefix and vsie markers. This will cause spurious unshadows in some uncommon cases. Fixes: 94fd9b16cc67 ("KVM: s390: KVM page table management functions: lifecycle management") Fixes: a2c17f9270cc ("KVM: s390: New gmap code") Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-26KVM: s390: vsie: Fix check for pre-existing shadow mappingClaudio Imbrenda
When shadowing a nested guest, a check is performed and no shadowing is attempted if the nested guest is already shadowed. The existing check was incomplete; fix it by also checking whether the leaf DAT table entry in the existing shadow gmap has the same protection as the one specified in the guest DAT entry. Fixes: e38c884df921 ("KVM: s390: Switch to new gmap") Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-26KVM: s390: Remove non-atomic dat_crstep_xchg()Claudio Imbrenda
In practice dat_crstep_xchg() is racy and hard to use correctly. Simply remove it and replace its uses with dat_crstep_xchg_atomic(). This solves some actual races that lead to system hangs / crashes. Opportunistically fix an alignment issue in _gmap_crstep_xchg_atomic(). Fixes: 589071eaaa8f ("KVM: s390: KVM page table management functions: clear and replace") Fixes: 94fd9b16cc67 ("KVM: s390: KVM page table management functions: lifecycle management") Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-26KVM: s390: vsie: Fix dat_split_ste()Claudio Imbrenda
If the guest misbehaves and puts the page tables for its nested guest inside the memory of the nested guest itself, and the guest and nested guest are being mapped with large pages, the shadow mapping will lose synchronization with the actual mapping, since this will cause the large page with the vsie notification bit to be split, but the vsie notification bit will not be propagated to the resulting small pages. Fix this by propagating the vsie_notif bit from large pages to normal pages when splitting a large page. Fixes: 2db149a0a6c5 ("KVM: s390: KVM page table management functions: walks") Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-24s390/cpum_sf: Cap sampling rate to prevent lsctl exceptionThomas Richter
commit fcc43a7e294f ("s390/configs: Set HZ=1000") changed the interrupt frequency of the system. On machines with heavy load and many perf event overflows, this might lead to an exception. Dmesg displays these entries: [112.242542] cpum_sf: Loading sampling controls failed: op 1 err -22 One line per CPU online. The root cause is the CPU Measurement sampling facility overflow adjustment. Whenever an overflow (too much samples per tick) occurs, the sampling rate is adjusted and increased. This was done without observing the maximum sampling rate limit. When the current sampling interval is higher than the maximum sampling rate limit, the lsctl instruction raises an exception. The error messages is the result of such an exception. Observe the upper limit when the new sampling rate is recalculated. Cc: stable@vger.kernel.org Fixes: 39d4a501a9ef ("s390/cpum_sf: Adjust sampling interval to avoid hitting sample limits") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2026-03-24Merge tag 'kvm-s390-master-7.0-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fixes for 7.0 - fix deadlock in new memory management - handle kernel faults on donated memory properly - fix bounds checking for irq routing + selftest - fix invalid machine checks + logging
2026-03-16KVM: s390: vsie: Avoid injecting machine check on signalChristian Borntraeger
The recent XFER_TO_GUEST_WORK change resulted in a situation, where the vsie code would interpret a signal during work as a machine check during SIE as both use the EINTR return code. The exit_reason of the sie64a function has nothing to do with the kvm_run exit_reason. Rename it and define a specific code for machine checks instead of abusing -EINTR. rename exit_reason into sie_return to avoid the naming conflict and change the code flow in vsie.c to have a separate variable for rc and sie_return. Fixes: 2bd1337a1295e ("KVM: s390: Use generic VIRT_XFER_TO_GUEST_WORK functions") Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-16KVM: s390: log machine checks more aggressivelyChristian Borntraeger
KVM will reinject machine checks that happen during guest activity. From a host perspective this machine check is no longer visible and even for the guest, the guest might decide to only kill a userspace program or even ignore the machine check. As this can be a disruptive event nevertheless, we should log this not only in the VM debug event (that gets lost after guest shutdown) but also on the global KVM event as well as syslog. Consolidate the logging and log with loglevel 2 and higher. Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> Acked-by: Janosch Frank <frankja@linux.ibm.com> Acked-by: Hendrik Brueckner <brueckner@linux.ibm.com>
2026-03-16KVM: s390: Limit adapter indicator access to mapped pageJanosch Frank
While we check the address for errors, we don't seem to check the bit offsets and since they are 32 and 64 bits a lot of memory can be reached indirectly via those offsets. Fixes: 84223598778b ("KVM: s390: irq routing for adapter interrupts.") Suggested-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2026-03-16s390/mm: Add missing secure storage access fixups for donated memoryJanosch Frank
There are special cases where secure storage access exceptions happen in a kernel context for pages that don't have the PG_arch_1 bit set. That bit is set for non-exported guest secure storage (memory) but is absent on storage donated to the Ultravisor since the kernel isn't allowed to export donated pages. Prior to this patch we would try to export the page by calling arch_make_folio_accessible() which would instantly return since the arch bit is absent signifying that the page was already exported and no further action is necessary. This leads to secure storage access exception loops which can never be resolved. With this patch we unconditionally try to export and if that fails we fixup. Fixes: 084ea4d611a3 ("s390/mm: add (non)secure page access exceptions handlers") Reported-by: Heiko Carstens <hca@linux.ibm.com> Suggested-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Tested-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2026-03-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "Quite a large pull request, partly due to skipping last week and therefore having material from ~all submaintainers in this one. About a fourth of it is a new selftest, and a couple more changes are large in number of files touched (fixing a -Wflex-array-member-not-at-end compiler warning) or lines changed (reformatting of a table in the API documentation, thanks rST). But who am I kidding---it's a lot of commits and there are a lot of bugs being fixed here, some of them on the nastier side like the RISC-V ones. ARM: - Correctly handle deactivation of interrupts that were activated from LRs. Since EOIcount only denotes deactivation of interrupts that are not present in an LR, start EOIcount deactivation walk *after* the last irq that made it into an LR - Avoid calling into the stubs to probe for ICH_VTR_EL2.TDS when pKVM is already enabled -- not only thhis isn't possible (pKVM will reject the call), but it is also useless: this can only happen for a CPU that has already booted once, and the capability will not change - Fix a couple of low-severity bugs in our S2 fault handling path, affecting the recently introduced LS64 handling and the even more esoteric handling of hwpoison in a nested context - Address yet another syzkaller finding in the vgic initialisation, where we would end-up destroying an uninitialised vgic with nasty consequences - Address an annoying case of pKVM failing to boot when some of the memblock regions that the host is faulting in are not page-aligned - Inject some sanity in the NV stage-2 walker by checking the limits against the advertised PA size, and correctly report the resulting faults PPC: - Fix a PPC e500 build error due to a long-standing wart that was exposed by the recent conversion to kmalloc_obj(); rip out all the ugliness that led to the wart RISC-V: - Prevent speculative out-of-bounds access using array_index_nospec() in APLIC interrupt handling, ONE_REG regiser access, AIA CSR access, float register access, and PMU counter access - Fix potential use-after-free issues in kvm_riscv_gstage_get_leaf(), kvm_riscv_aia_aplic_has_attr(), and kvm_riscv_aia_imsic_has_attr() - Fix potential null pointer dereference in kvm_riscv_vcpu_aia_rmw_topei() - Fix off-by-one array access in SBI PMU - Skip THP support check during dirty logging - Fix error code returned for Smstateen and Ssaia ONE_REG interface - Check host Ssaia extension when creating AIA irqchip x86: - Fix cases where CPUID mitigation features were incorrectly marked as available whenever the kernel used scattered feature words for them - Validate _all_ GVAs, rather than just the first GVA, when processing a range of GVAs for Hyper-V's TLB flush hypercalls - Fix a brown paper bug in add_atomic_switch_msr() - Use hlist_for_each_entry_srcu() when traversing mask_notifier_list, to fix a lockdep warning; KVM doesn't hold RCU, just irq_srcu - Ensure AVIC VMCB fields are initialized if the VM has an in-kernel local APIC (and AVIC is enabled at the module level) - Update CR8 write interception when AVIC is (de)activated, to fix a bug where the guest can run in perpetuity with the CR8 intercept enabled - Add a quirk to skip the consistency check on FREEZE_IN_SMM, i.e. to allow L1 hypervisors to set FREEZE_IN_SMM. This reverts (by default) an unintentional tightening of userspace ABI in 6.17, and provides some amount of backwards compatibility with hypervisors who want to freeze PMCs on VM-Entry - Validate the VMCS/VMCB on return to a nested guest from SMM, because either userspace or the guest could stash invalid values in memory and trigger the processor's consistency checks Generic: - Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from being unnecessary and confusing, triggered compiler warnings due to -Wflex-array-member-not-at-end - Document that vcpu->mutex is take outside of kvm->slots_lock and kvm->slots_arch_lock, which is intentional and desirable despite being rather unintuitive Selftests: - Increase the maximum number of NUMA nodes in the guest_memfd selftest to 64 (from 8)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (43 commits) KVM: selftests: Verify SEV+ guests can read and write EFER, CR0, CR4, and CR8 Documentation: kvm: fix formatting of the quirks table KVM: x86: clarify leave_smm() return value selftests: kvm: add a test that VMX validates controls on RSM selftests: kvm: extract common functionality out of smm_test.c KVM: SVM: check validity of VMCB controls when returning from SMM KVM: VMX: check validity of VMCS controls when returning from SMM KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC KVM: x86: Introduce KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM KVM: x86: Fix SRCU list traversal in kvm_fire_mask_notifiers() KVM: VMX: Fix a wrong MSR update in add_atomic_switch_msr() KVM: x86: hyper-v: Validate all GVAs during PV TLB flush KVM: x86: synthesize CPUID bits only if CPU capability is set KVM: PPC: e500: Rip out "struct tlbe_ref" KVM: PPC: e500: Fix build error due to using kmalloc_obj() with wrong type KVM: selftests: Increase 'maxnode' for guest_memfd tests KVM: arm64: pkvm: Don't reprobe for ICH_VTR_EL2.TDS on CPU hotplug KVM: arm64: vgic: Pick EOIcount deactivations from AP-list tail KVM: arm64: Remove the redundant ISB in __kvm_at_s1e2() ...
2026-03-13Merge tag 's390-7.0-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Vasily Gorbik: - Revert IRQ entry/exit path optimization that incorrectly cleared some PSW bits before irqentry_exit(), causing boot failures with linux-next and HRTIMER_REARM_DEFERRED (which only uncovered the problem) - Fix zcrypt code to show CCA card serial numbers even when the default crypto domain is offline by selecting any domain available, preventing empty sysfs entries * tag 's390-7.0-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/zcrypt: Enable AUTOSEL_DOM for CCA serialnr sysfs attribute s390: Revert "s390/irq/idle: Remove psw bits early"
2026-03-11Merge tag 'kvm-x86-generic-7.0-rc3' of https://github.com/kvm-x86/linux into ↵Paolo Bonzini
HEAD KVM generic changes for 7.0 - Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from being unnecessary and confusing, triggered compiler warnings due to -Wflex-array-member-not-at-end. - Document that vcpu->mutex is take outside of kvm->slots_lock and kvm->slots_arch_lock, which is intentional and desirable despite being rather unintuitive.
2026-03-07s390: Revert "s390/irq/idle: Remove psw bits early"Heiko Carstens
This reverts commit d8b5cf9c63143fae54a734c41e3bb55cf3f365c7. Mikhail Zaslonko reported that linux-next doesn't boot anymore [2]. Reason for this is recent change [2] was supposed to slightly optimize the irq entry/exit path by removing some psw bits early in case of an idle exit. This however is incorrect since irqentry_exit() requires the correct old psw state at irq entry. Otherwise the embedded regs_irqs_disabled() will not provide the correct result. With linux-next and HRTIMER_REARM_DEFERRED this leads to the observed boot problems, however the commit is broken in any case. Revert the commit which introduced this. Thanks to Peter Zijlstra for pointing out that this is a bug in the s390 entry code. Fixes: d8b5cf9c6314 ("s390/irq/idle: Remove psw bits early") [1] Reported-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> Reported-by: Peter Zijlstra <peterz@infradead.org> Closes: https://lore.kernel.org/r/af549a19-db99-4b16-8511-bf315177a13e@linux.ibm.com/ [2] Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> Tested-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20260306111919.362559-1-hca@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2026-03-06Merge tag 'kbuild-fixes-7.0-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux Pull Kbuild fixes from Nathan Chancellor: - Split out .modinfo section from ELF_DETAILS macro, as that macro may be used in other areas that expect to discard .modinfo, breaking certain image layouts - Adjust genksyms parser to handle optional attributes in certain declarations, necessary after commit 07919126ecfc ("netfilter: annotate NAT helper hook pointers with __rcu") - Include resolve_btfids in external module build created by scripts/package/install-extmod-build when it may be run on external modules - Avoid removing objtool binary with 'make clean', as it is required for external module builds * tag 'kbuild-fixes-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux: kbuild: Leave objtool binary around with 'make clean' kbuild: install-extmod-build: Package resolve_btfids if necessary genksyms: Fix parsing a declarator with a preceding attribute kbuild: Split .modinfo out from ELF_DETAILS
2026-03-06KVM: s390: Fix a deadlockClaudio Imbrenda
In some scenarios, a deadlock can happen, involving _do_shadow_pte(). Convert all usages of pgste_get_lock() to pgste_get_trylock() in _do_shadow_pte() and return -EAGAIN. All callers can already deal with -EAGAIN being returned. Fixes: e38c884df921 ("KVM: s390: Switch to new gmap") Tested-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-03s390/stackleak: Fix __stackleak_poison() inline assembly constraintHeiko Carstens
The __stackleak_poison() inline assembly comes with a "count" operand where the "d" constraint is used. "count" is used with the exrl instruction and "d" means that the compiler may allocate any register from 0 to 15. If the compiler would allocate register 0 then the exrl instruction would not or the value of "count" into the executed instruction - resulting in a stackframe which is only partially poisoned. Use the correct "a" constraint, which excludes register 0 from register allocation. Fixes: 2a405f6bb3a5 ("s390/stackleak: provide fast __stackleak_poison() implementation") Cc: stable@vger.kernel.org Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20260302133500.1560531-4-hca@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2026-03-03s390/xor: Improve inline assembly constraintsHeiko Carstens
The inline assembly constraint for the "bytes" operand is "d" for all xor() inline assemblies. "d" means that any register from 0 to 15 can be used. If the compiler would use register 0 then the exrl instruction would not or the value of "bytes" into the executed instruction - resulting in an incorrect result. However all the xor() inline assemblies make hard-coded use of register 0, and it is correctly listed in the clobber list, so that this cannot happen. Given that this is quite subtle use the better "a" constraint, which excludes register 0 from register allocation in any case. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20260302133500.1560531-3-hca@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2026-03-03s390/xor: Fix xor_xc_2() inline assembly constraintsHeiko Carstens
The inline assembly constraints for xor_xc_2() are incorrect. "bytes", "p1", and "p2" are input operands, while all three of them are modified within the inline assembly. Given that the function consists only of this inline assembly it seems unlikely that this may cause any problems, however fix this in any case. Fixes: 2cfc5f9ce7f5 ("s390/xor: optimized xor routing using the XC instruction") Cc: stable@vger.kernel.org Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20260302133500.1560531-2-hca@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2026-03-03s390/xor: Fix xor_xc_5() inline assemblyVasily Gorbik
xor_xc_5() contains a larl 1,2f that is not used by the asm and is not declared as a clobber. This can corrupt a compiler-allocated value in %r1 and lead to miscompilation. Remove the instruction. Fixes: 745600ed6965 ("s390/lib: Use exrl instead of ex in xor functions") Cc: stable@vger.kernel.org Reviewed-by: Juergen Christ <jchrist@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2026-03-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "Arm: - Make sure we don't leak any S1POE state from guest to guest when the feature is supported on the HW, but not enabled on the host - Propagate the ID registers from the host into non-protected VMs managed by pKVM, ensuring that the guest sees the intended feature set - Drop double kern_hyp_va() from unpin_host_sve_state(), which could bite us if we were to change kern_hyp_va() to not being idempotent - Don't leak stage-2 mappings in protected mode - Correctly align the faulting address when dealing with single page stage-2 mappings for PAGE_SIZE > 4kB - Fix detection of virtualisation-capable GICv5 IRS, due to the maintainer being obviously fat fingered... [his words, not mine] - Remove duplication of code retrieving the ASID for the purpose of S1 PT handling - Fix slightly abusive const-ification in vgic_set_kvm_info() Generic: - Remove internal Kconfigs that are now set on all architectures - Remove per-architecture code to enable KVM_CAP_SYNC_MMU, all architectures finally enable it in Linux 7.0" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: always define KVM_CAP_SYNC_MMU KVM: remove CONFIG_KVM_GENERIC_MMU_NOTIFIER KVM: arm64: Deduplicate ASID retrieval code irqchip/gic-v5: Fix inversion of IRS_IDR0.virt flag KVM: arm64: Revert accidental drop of kvm_uninit_stage2_mmu() for non-NV VMs KVM: arm64: Fix protected mode handling of pages larger than 4kB KVM: arm64: vgic: Handle const qualifier from gic_kvm_info allocation type KVM: arm64: Remove redundant kern_hyp_va() in unpin_host_sve_state() KVM: arm64: Fix ID register initialization for non-protected pKVM guests KVM: arm64: Optimise away S1POE handling when not supported by host KVM: arm64: Hide S1POE from guests when not supported by the host
2026-02-28KVM: always define KVM_CAP_SYNC_MMUPaolo Bonzini
KVM_CAP_SYNC_MMU is provided by KVM's MMU notifiers, which are now always available. Move the definition from individual architectures to common code. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-02-28KVM: remove CONFIG_KVM_GENERIC_MMU_NOTIFIERPaolo Bonzini
All architectures now use MMU notifier for KVM page table management. Remove the Kconfig symbol and the code that is used when it is disabled. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-02-26kbuild: Split .modinfo out from ELF_DETAILSNathan Chancellor
Commit 3e86e4d74c04 ("kbuild: keep .modinfo section in vmlinux.unstripped") added .modinfo to ELF_DETAILS while removing it from COMMON_DISCARDS, as it was needed in vmlinux.unstripped and ELF_DETAILS was present in all architecture specific vmlinux linker scripts. While this shuffle is fine for vmlinux, ELF_DETAILS and COMMON_DISCARDS may be used by other linker scripts, such as the s390 and x86 compressed boot images, which may not expect to have a .modinfo section. In certain circumstances, this could result in a bootloader failing to load the compressed kernel [1]. Commit ddc6cbef3ef1 ("s390/boot/vmlinux.lds.S: Ensure bzImage ends with SecureBoot trailer") recently addressed this for the s390 bzImage but the same bug remains for arm, parisc, and x86. The presence of .modinfo in the x86 bzImage was the root cause of the issue worked around with commit d50f21091358 ("kbuild: align modinfo section for Secureboot Authenticode EDK2 compat"). misc.c in arch/x86/boot/compressed includes lib/decompress_unzstd.c, which in turn includes lib/xxhash.c and its MODULE_LICENSE / MODULE_DESCRIPTION macros due to the STATIC definition. Split .modinfo out from ELF_DETAILS into its own macro and handle it in all vmlinux linker scripts. Discard .modinfo in the places where it was previously being discarded from being in COMMON_DISCARDS, as it has never been necessary in those uses. Cc: stable@vger.kernel.org Fixes: 3e86e4d74c04 ("kbuild: keep .modinfo section in vmlinux.unstripped") Reported-by: Ed W <lists@wildgooses.com> Closes: https://lore.kernel.org/587f25e0-a80e-46a5-9f01-87cb40cfa377@wildgooses.com/ [1] Tested-by: Ed W <lists@wildgooses.com> # x86_64 Link: https://patch.msgid.link/20260225-separate-modinfo-from-elf-details-v1-1-387ced6baf4b@kernel.org Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2026-02-25s390/pfault: Fix virtual vs physical address confusionAlexander Gordeev
When Linux is running as guest, runs a user space process and the user space process accesses a page that the host has paged out, the guest gets a pfault interrupt and schedules a different process. Without this mechanism the host would have to suspend the whole virtual CPU until the page has been paged in. To setup the pfault interrupt the real address of parameter list should be passed to DIAGNOSE 0x258, but a virtual address is passed instead. That has a performance impact, since the pfault setup never succeeds, the interrupt is never delivered to a guest and the whole virtual CPU is suspended as result. Cc: stable@vger.kernel.org Fixes: c98d2ecae08f ("s390/mm: Uncouple physical vs virtual address spaces") Reported-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2026-02-25s390/kexec: Disable stack protector in s390_reset_system()Vasily Gorbik
s390_reset_system() calls set_prefix(0), which switches back to the absolute lowcore. At that point the stack protector canary no longer matches the canary from the lowcore the function was entered with, so the stack check fails. Mark s390_reset_system() __no_stack_protector. This is safe here since its callers (__do_machine_kdump() and __do_machine_kexec()) are effectively no-return and fall back to disabled_wait() on failure. Fixes: f5730d44e05e ("s390: Add stackprotector support") Reported-by: Nikita Dubrovskii <nikita@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2026-02-25s390/idle: Remove psw_idle() prototypeHeiko Carstens
psw_idle() does not exist anymore. Remove its prototype. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2026-02-25s390/vtime: Use lockdep_assert_irqs_disabled() instead of BUG_ON()Heiko Carstens
Use lockdep_assert_irqs_disabled() instead of BUG_ON(). This avoids crashing the kernel, and generates better code if CONFIG_PROVE_LOCKING is disabled. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2026-02-25s390/vtime: Use __this_cpu_read() / get rid of READ_ONCE()Heiko Carstens
do_account_vtime() runs always with interrupts disabled, therefore use __this_cpu_read() instead of this_cpu_read() to get rid of a pointless preempt_disable() / preempt_enable() pair. Also there are no concurrent writers to the cpu time accounting fields in lowcore. Therefore get rid of READ_ONCE() usages. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2026-02-25s390/irq/idle: Remove psw bits earlyHeiko Carstens
Remove wait, io, external interrupt bits early in do_io_irq()/do_ext_irq() when previous context was idle. This saves one conditional branch and is closer to the original old assembly code. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2026-02-25s390/idle: Inline update_timer_idle()Heiko Carstens
Inline update_timer_idle() again to avoid an extra function call. This way the generated code is close to old assembler version again. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2026-02-25s390/idle: Slightly optimize idle time accountingHeiko Carstens
Slightly optimize account_idle_time_irq() and update_timer_idle(): - Use fast single instruction __atomic64() primitives to update per cpu idle_time and idle_count, instead of READ_ONCE() / WRITE_ONCE() pairs - stcctm() is an inline assembly with a full memory barrier. This leads to a not necessary extra dereference of smp_cpu_mtid in update_timer_idle(). Avoid this and read smp_cpu_mtid into a variable - Use __this_cpu_add() instead of this_cpu_add() to avoid disabling / enabling of preemption several times in a loop in update_timer_idle(). Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2026-02-25s390/idle: Add comment for non obvious codeHeiko Carstens
Add a comment to update_timer_idle() which describes why wall time (not steal time) is added to steal_timer. This is not obvious and was reported by Frederic Weisbecker. Reported-by: Frederic Weisbecker <frederic@kernel.org> Closes: https://lore.kernel.org/all/aXEVM-04lj0lntMr@localhost.localdomain/ Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2026-02-25s390/vtime: Fix virtual timer forwardingHeiko Carstens
Since delayed accounting of system time [1] the virtual timer is forwarded by do_account_vtime() but also vtime_account_kernel(), vtime_account_softirq(), and vtime_account_hardirq(). This leads to double accounting of system, guest, softirq, and hardirq time. Remove accounting from the vtime_account*() family to restore old behavior. There is only one user of the vtimer interface, which might explain why nobody noticed this so far. Fixes: b7394a5f4ce9 ("sched/cputime, s390: Implement delayed accounting of system time") [1] Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2026-02-25s390/idle: Fix cpu idle exit cpu time accountingHeiko Carstens
With the conversion to generic entry [1] cpu idle exit cpu time accounting was converted from assembly to C. This introduced an reversed order of cpu time accounting. On cpu idle exit the current accounting happens with the following call chain: -> do_io_irq()/do_ext_irq() -> irq_enter_rcu() -> account_hardirq_enter() -> vtime_account_irq() -> vtime_account_kernel() vtime_account_kernel() accounts the passed cpu time since last_update_timer as system time, and updates last_update_timer to the current cpu timer value. However the subsequent call of -> account_idle_time_irq() will incorrectly subtract passed cpu time from timer_idle_enter to the updated last_update_timer value from system_timer. Then last_update_timer is updated to a sys_enter_timer, which means that last_update_timer goes back in time. Subsequently account_hardirq_exit() will account too much cpu time as hardirq time. The sum of all accounted cpu times is still correct, however some cpu time which was previously accounted as system time is now accounted as hardirq time, plus there is the oddity that last_update_timer goes back in time. Restore previous behavior by extracting cpu time accounting code from account_idle_time_irq() into a new update_timer_idle() function and call it before irq_enter_rcu(). Fixes: 56e62a737028 ("s390: convert to generic entry") [1] Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2026-02-22Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL usesKees Cook
Conversion performed via this Coccinelle script: // SPDX-License-Identifier: GPL-2.0-only // Options: --include-headers-for-types --all-includes --include-headers --keep-comments virtual patch @gfp depends on patch && !(file in "tools") && !(file in "samples")@ identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex, kzalloc_obj,kzalloc_objs,kzalloc_flex, kvmalloc_obj,kvmalloc_objs,kvmalloc_flex, kvzalloc_obj,kvzalloc_objs,kvzalloc_flex}; @@ ALLOC(... - , GFP_KERNEL ) $ make coccicheck MODE=patch COCCI=gfp.cocci Build and boot tested x86_64 with Fedora 42's GCC and Clang: Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert more 'alloc_obj' cases to default GFP_KERNEL argumentsLinus Torvalds
This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>