| Age | Commit message (Collapse) | Author |
|
Pull kvm updates from Paolo Bonzini:
"arm64:
This is a bit of an odd merge window on the KVM/arm64 front. There
is absolutely no new feature in the pull request. It is purely
fixes, because it is simply becoming too hard to review new stuff
when so many AI-fuelled fixes hit the list.
- Significant cleanup of the vgic-v5 PPI support which was merged in
7.1. This makes the code more maintainable, and squashes a couple
of bugs in the meantime
- Set of fixes for the handling of the MMU in an NV context,
particularly VNCR-triggered faults. S1POE support is fixed as well
- Large set of pKVM fixes, mostly addressing recurring issues around
hypervisor tracking of donated pages in obscure cases where the
donation could fail and leave things in a bizarre state
- Fixes for the so-called "lazy vgic init", which resulted in
sleeping operations in non-preemptible sections. This turned out to
be far more invasive than initially expected..
- Reduce the overhead of L1/L2 context switch by not touching the FP
registers
- Fix the way non-implemented page sizes are dealt with when a guest
insist on using them for S2 translation
- The usual set of low-impact fixes and cleanups all over the map
Loongarch:
- On a request for lazy FPU load, load all FPU state that the VM
supports instead of enabling only the part (FPU, LSX or LASX) that
caused the FPU load request
- Some enhancements about interrupt injection
- Some bug fixes and other small changes
RISC-V:
- Batch G-stage TLB flushes for GPA range based page table updates
- Convert HGEI line management to fully per-HART
- Fix missing CSR dirty marking when FWFT state updated via ONE_REG
- Fix stale FWFT feature exposure to Guest/VM
- Speed up dirty logging write faults using MMU rwlock and atomic PTE
updates using cmpxchg() for permission-only changes
- Use flexible array for APLIC IRQ state
- Use kvm_slot_dirty_track_enabled() for logging enable check on a
memslot
- Avoid skipping valid pages in kvm_riscv_gstage_wp_range()
- Avoid skipping valid pages in kvm_riscv_gstage_unmap_range()
- Use endian-specific __lelong for NACL shared memory
S390:
- KVM_PRE_FAULT_MEMORY support
- Support for 2G hugepages
- Support for the ASTFLEIE 2 facility
- Support for fast inject using kvm_arch_set_irq_inatomic
- Fix potential leak of uninitialized bytes
- A few more misc gmap fixes
x86:
- Generic support for the more granular permissions allowed by EPT,
namely "read" (which was previously usurping the U bit) and
separate execution bits for kernel and userspace
- Do not assume that all page tables start with U=1/W=1/NX=0 at the
root, as AMD GMET needs to have U=0 at the root
- Introduce common assembly macros for use within Intel and AMD
vendor-specific vmentry code. This touches the SPEC_CTRL handling,
which is now entirely done in assembly for Intel (by reusing the
AMD code that already existed), and register save/restore which
uses some macro magic to compute the offsets in the struct. Both of
these are preparatory changes for upcoming APX support
- Clean up KVM's register tracking and storage, primarily to prepare
for APX support, which expands the maximum number of GPRs from 16
to 32
- Keep a single copy of the PDPTRs rather than two, since
architecturally there is just one
- Handle EXIT_FASTPATH_EXIT_USERSPACE in vendor code to ensure vendor
code gets a chance to handle things like reaping the PML buffer
- Update KVM's view of PV async enabling if and only if the MSR write
fully succeeds
- Fix a variety of issues where the emulator doesn't honor
guest-debug state, and clean up related code along the way
- Synthesize EPT Violation and #NPF "error code" bits when injecting
faults into L1 that didn't originate in hardware (in which case the
VMCS/VMCB doesn't hold relevant information)
- Add support for virtualizing (well, emulating) AMD's flavor of
CPL>0 CPUID faulting
- Clean up the GPR APIs so that KVM's use of "raw" is consistent, and
fix a variety of minor bugs along the way
- Fix an OOB memory access due to not checking the VP ID when
handling a Hyper-V PV TLB flush for L2
- Fix a bug in the mediated PMU's handling of fixed counters that
allowed the guest to bypass the PMU event filter
- Allow userspace to return EAGAIN when handling SNP and TDX
hypercalls, so the KVM can forward a "retry" status code to the
guest, and reserve all unused error codes for future usage
- Overhaul the TDP MMU => S-EPT code to move as much S-EPT specific
logic as possible into the TDX code, and to funnel (almost) all
S-EPT updates into a single chokepoint. The motivation is largely
to prepare for upcoming Dynamic PAMT support, but the cleanups are
nice to have on their own
- Plug a hole in shadow page table handling, where KVM fails to
recursively zap nested EPT/NPT shadow page tables when the nested
hypervisor tears down its own EPT/NPT page tables from the bottom
up
x86 (Intel):
- Support for nested MBEC (Mode-Based Execute Control), see above in
the generic section; also run with MBEC enabled even for non-nested
mode
- Use the kernel's "enum pg_level" in the TDX APIs instead of the
TDX-Module's level definitions (which are 0-based)
- Rework the TDX memory APIs to not require/assume that guest memory
is backed by "struct page" (in prepartion for guest_memfd hugepage
support)
- Fix a largely benign bug where KVM TDX would incorrectly state it
could emulate several x2APIC MSRs
- Use the "safe" WRMSR API when proxying LBR MSR writes as the
to-be-written value is guest controlled and completely unvalidated
x86 (AMD):
- Support for nested GMET (Guest Mode Execution Trap), see above in
the generic section; also run with GMET enabled even for non-nested
mode
- Fixes and minor cleanups to GHCB handling, on top of the earlier
work already merged into 7.1-rc
- Ensure KVM's copy of CR0 and CR3 are up-to-date prior to invoking
fastpath handlers
- Add support for virtualizing gPAT (KVM previously just used L1's
PAT when running L2)
- Fix goofs where KVM mishandles side effects (e.g. single-step and
PMC updates) when emulating VMRUN
- Fix a variety of bugs in AVIC's handling of x2APIC MSR
interception, most notably where KVM didn't disable interception of
IRR, ISR, and TMR regs
- Add support for virtualizing Host-Only/Guest-Only bits in the
mediated PMU
- Don't advertise support for unusable VM types, and account for VM
types that are disabled by firmware, e.g. to mitigate security
vulnerabilities
- Rewrite the SEV {en,de}crypt debug ioctls as they were riddle with
bugs and unnecessarily complicated, and add comprehensive tests
- Clean up and deduplicate the SEV page pinning code
- Fix minor goofs related to writing back CPUID information after
firmware rejects a CPUID page for an SNP vCPU
Generic:
- Rename invalidate_begin() to invalidate_start() throughout KVM to
follow the kernel's nomenclature, e.g. for mmu_notifiers
- Use guard() to cleanup up various KVM+VFIO flows
- Minor cleanups
guest_memfd:
- Return -EEXIST instead of -EINVAL if userspace attempts to bind a
gmem range to multiple memslots, and fix the test that was supposed
to ensure KVM returns -EEXIST
- Treat memslot binding offsets and sizes as unsigned values to fix a
bug where KVM interprets a large "offset + size" as a negative
value and allows a nonsensical offset
- Use the inode number instead of the page offset for the NUMA
interleaving index to fix a bug where the effective index would
jump by two for consecutive pages (the caller also adds in the page
offset)
Selftests:
- Randomize the dirty log test's delay when reaping the bitmap on the
first pass, as always waiting only 1ms hid a KVM RISC-V bug as the
test reaped the bitmap before KVM could build up enough state to
hit the bug
- A pile of one-off fixes and cleanups"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (326 commits)
KVM: x86/mmu: Ensure hugepage is in by slot before checking max mapping level
KVM: x86: Fix shadow paging use-after-free due to unexpected role
KVM: s390: Introducing kvm_arch_set_irq_inatomic fast inject
KVM: s390: Enable adapter_indicators_set to use mapped pages
KVM: s390: Add map/unmap ioctl and clean mappings post-guest
riscv: kvm: Use endian-specific __lelong for NACL shared memory
KVM: selftests: access_tracking_perf_test: bump number of NUMA nodes to 32
KVM: s390: vsie: Implement ASTFLEIE facility 2
KVM: s390: vsie: Refactor handle_stfle
s390/sclp: Detect ASTFLEIE 2 facility
KVM: s390: Minor refactor of base/ext facility lists
KVM: x86/mmu: move pdptrs out of the MMU
KVM: x86: check that kvm_handle_invpcid is only invoked with shadow paging
KVM: nSVM: invalidate cached PDPTRs across nested NPT transitions
KVM: nVMX: remove unnecessary code in prepare_vmcs02_rare
KVM: x86: remove nested_mmu from mmu_is_nested()
KVM: arm64: vgic-its: Make ABI commit helpers return void
KVM: s390: Initialize KVM_S390_GET_CMMA_BITS memory
LoongArch: KVM: Add missing slots_lock for device register/unregister
LoongArch: KVM: Validate irqchip index in irqfd routing
...
|
|
gitolite.kernel.org:pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Alexander Gordeev:
- Use CIO device online variable instead of the internal FSM state to
determine device availability during purge operations
- Remove extra check of task_stack_page() because try_get_task_stack()
already takes care of that when reading /proc/<pid>/wchan
- Allow user-space to use the new SCLP action qualifier 4 for to
provide NVMe SMART log data to the platform.
- Send AP CHANGE uevents on successful bind and successful association
to notify user-space about SE operations on AP queue devices
- Add an s390dbf kernel parameter to configure debug log levels and
area sizes during early boot
- On arm64 the empty zero page is going to be mapped read-only. Do the
same for s390 with an explicit set_memory_ro() call
- Improve s390-specific bcr_serialize() and cpu_relax() implementations
- Remove all unused variables to avoid allmodconfig W=1 build fails
with latest clang-23
- Cleanup default Kconfig values for s390 selftests
- Add a s390-tod trace clock to allow comparing trace timestamps
between different systems or virtual machines on s390
- Remove the s390 implementation of strlcat() in favor of the generic
variant
- Make consistent the calling order between
page_table_check_pte_clear() and secure page conversion across all
code paths
- Rearrange some fields within AP and zcrypt structs to reduce memory
consumption and unused holes
- Shorten GR_NUM and VX_NUM macros and move them to a separate header
- Replace __get_free_page() with kmalloc() in few sources
- Introduce an infrastructure for more efficient this_cpu operations.
Eliminate conditional branches when PREEMPT_NONE is removed
- Enable Rust support
- Use z10 as minimum architecture level, similar to the boot code, to
enforce a defined architecture level set
- Improve and convert various mem*() helper functions to C. For that
add .noinstr.text section to avoid orphaned warnings from the linker
- Fix the function pointer type in __ret_from_fork() to correct the
indirect call to match kernel thread return type of int
- Revert support for DCACHE_WORD_ACCESS to avoid an endless exception
loop on read from donated Ultravisor pages at unaligned addresses
* tag 's390-7.2-1' of gitolite.kernel.org:pub/scm/linux/kernel/git/s390/linux: (52 commits)
s390: Revert support for DCACHE_WORD_ACCESS
s390/process: Fix kernel thread function pointer type
s390/tishift: Convert __ashlti3(), __ashrti3(), __lshrti3() to C
s390/memmove: Optimize backward copy case
s390/string: Convert memset(16|32|64)() to C
s390/string: Convert memcpy() to C
s390/string: Convert memset() to C
s390/string: Convert memmove() to C
s390/string: Add -ffreestanding compile option to string.o
s390: Add .noinstr.text to boot and purgatory linker scripts
s390/purgatory: Enforce z10 minimum architecture level
s390: Enable Rust support
s390/cmpxchg: Fix KASAN stack-out-of-bounds in atomic helpers
rust: helpers: Add memchr wrapper for string operations
rust/bindgen_parameters: Mark s390 types as opaque to prevent repr conflicts
s390/jump_label: Implement ARCH_STATIC_BRANCH_JUMP_ASM and ARCH_STATIC_BRANCH_ASM macros
s390/bug: Provide ARCH_WARN_ASM for Rust WARN/BUG support
s390/ap: Fix locking issue in SE bind and associate sysfs functions
s390/percpu: Provide arch_this_cpu_write() implementation
s390/percpu: Provide arch_this_cpu_read() implementation
...
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: New features for 7.2
New features for 7.2 for KVM/s390:
* KVM_PRE_FAULT_MEMORY support
* Support for 2G hugepages
* Support for the ASTFLEIE 2 facility
* kvm_arch_set_irq_inatomic Fast Inject
* Fix potential leak of uninitialized bytes
|
|
s390 needs a fast path for irq injection, and along those lines we
introduce kvm_arch_set_irq_inatomic. Instead of placing all interrupts on
the global work queue as it does today, this patch provides a fast path for
irq injection.
The inatomic fast path cannot lose control since it is running with
interrupts disabled. This meant making the following changes that exist on
the slow path today. First, the adapter_indicators page needs to be mapped
since it is accessed with interrupts disabled, so we added map/unmap
functions. Second, access to shared resources between the fast and slow
paths needed to be changed from mutex and semaphores to spin_lock's.
Finally, the memory allocation on the slow path utilizes GFP_KERNEL_ACCOUNT
but we had to implement the fast path with GFP_ATOMIC allocation. Each of
these enhancements were required to prevent blocking on the fast inject
path.
Fencing of Fast Inject in Secure Execution environments is enabled in the
patch series by not mapping adapter indicator pages. In Secure Execution
environments the path of execution available before this patch is followed.
Statistical counters have been added to enable analysis of irq injection on
the fast path and slow path including io_390_inatomic, io_flic_inject_airq,
io_set_adapter_int and io_390_inatomic_no_inject. The no inject counter
captures adapter masked, coalesced and suppressed interrupts.
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Douglas Freimuth <freimuth@linux.ibm.com>
Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260604192755.203143-4-freimuth@linux.ibm.com>
|
|
s390 needs map/unmap ioctls, which map the adapter set
indicator pages, so the pages can be accessed when interrupts are
disabled. The mappings are cleaned up when the guest is removed.
pin_user_pages_remote is used for both the ioctl as well
as the pin-on-demand logic in adapter_indicators_set().
Map/Unmap ioctls are fenced in order to avoid the longterm pinning
in Secure Execution environments. In Secure Execution
environments the path of execution available before this patch is followed.
Statistical counters to count map/unmap functions for adapter indicator
pages are added. The counters can be used to analyze
map/unmap functions in non-Secure Execution environments and similarly
can be used to analyze Secure Execution environments where the counters
will not be incremented as the adapter indicator pages are not mapped.
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Douglas Freimuth <freimuth@linux.ibm.com>
Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260604192755.203143-2-freimuth@linux.ibm.com>
|
|
gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip
Pull NOHZ updates from Thomas Gleixner:
- Fix a long standing TOCTOU in get_cpu_sleep_time_us()
- Make the CPU offline NOHZ handling more robust by disabling NOHZ on
the outgoing CPU early instead of creating unneeded state which needs
to be undone.
- Unify idle CPU time accounting instead of having two different
accounting mechanisms. These two different mechanisms are not really
independent, but the different properties can in the worst case cause
that gloabl idle time can be observed going backwards.
- Consolidate the idle/iowait time retrieval interfaces instead of
converting back and forth between them.
- Make idle interrupt time accounting more robust. The original code
assumes that interrupt time accouting is enabled and therefore stops
elapsing idle time while an interrupt is handled in NOHZ dyntick
state. That assumption is not correct as interrupt time accounting
can be disabled at compile and runtime.
- Fix an accounting error between dyntick idle time and dyntick idle
steal time. The stolen time is not accounted and therefore idle time
becomes inaccurate. The stolen time is now accounted after the fact
as there is no way to predict the steal time upfront.
* tag 'timers-nohz-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip:
sched/cputime: Handle dyntick-idle steal time correctly
sched/cputime: Handle idle irqtime gracefully
sched/cputime: Provide get_cpu_[idle|iowait]_time_us() off-case
tick/sched: Consolidate idle time fetching APIs
tick/sched: Account tickless idle cputime only when tick is stopped
tick/sched: Remove unused fields
tick/sched: Move dyntick-idle cputime accounting to cputime code
tick/sched: Remove nohz disabled special case in cputime fetch
tick/sched: Unify idle cputime accounting
s390/time: Prepare to stop elapsing in dynticks-idle
powerpc/time: Prepare to stop elapsing in dynticks-idle
sched/cputime: Correctly support generic vtime idle time
sched/cputime: Remove superfluous and error prone kcpustat_field() parameter
sched/idle: Handle offlining first in idle loop
tick/sched: Fix TOCTOU in nohz idle time fetch
|
|
Implement shadowing of format-2 facility list when running in VSIE.
ASTFLEIE2 is available since IBM z16.
To function G1 has to run this KVM code and G1 and G2 have to run QEMU
with ASTFLEIE2 support.
Signed-off-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Co-developed-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Signed-off-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
[imbrenda@linux.ibm.com: Fix typo in comment]
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260612-vsie-alter-stfle-fac-v4-4-74f0e1559929@linux.ibm.com>
|
|
Use switch case in anticipation of handling format-1 and format-2
facility list designations in the future.
As the alternate STFLE facilities are not enabled, only case 0 is
possible.
No functional change intended.
Signed-off-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Co-developed-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Signed-off-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260612-vsie-alter-stfle-fac-v4-3-74f0e1559929@linux.ibm.com>
|
|
Detect alternate STFLE interpretive execution facility 2.
Signed-off-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Signed-off-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260612-vsie-alter-stfle-fac-v4-2-74f0e1559929@linux.ibm.com>
|
|
load_unaligned_zeropad() reads eight bytes from unaligned addresses and may
cross page boundaries. It handles exceptions which may happen if reading
from the second page results in an exception.
For pages which are donated to the Ultravisor for secure execution purposes
the do_secure_storage_access() exception handler however does not handle
such exceptions correctly. Such an exception may result in an endless
exception loop which will never be resolved.
An attempt to fix this [1] turned out to be not sufficient. For now revert
load_unaligned_zeropad() until this problem has been resolved in a proper
way.
Note that the implementation of load_unaligned_zeropad() itself is
correct. The revert is just a temporary workaround until there is complete
fix for secure storage access exceptions.
[1] commit b00be77302d7 ("s390/mm: Add missing secure storage access fixups for donated memory")
Fixes: 802ba53eefc5 ("s390: add support for DCACHE_WORD_ACCESS")
Cc: stable@vger.kernel.org
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
There is no reason to have __ashlti3(), __ashrti3(), and __lshrti3()
implemented in assembler. Convert them all to C, which allows the
compiler to optimize the code if newer instructions allow that.
Reviewed-by: Juergen Christ <jchrist@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Jan Polensky says:
===================
Rust support on s390 requires a small set of architecture-specific pieces
before the generic Rust kernel infrastructure can be used.
The series wires up s390 as a Rust-capable 64-bit architecture, adds the
missing assembly interfaces needed by Rust for WARN/BUG reporting and for
static branches, adjusts bindgen parameters to avoid repr layout conflicts
caused by packed and aligned s390 structures, and fixes issues discovered
during testing.
s390 currently requires rustc with support for -Zpacked-stack, and the
minimum tool version gating is adjusted accordingly.
Link: https://github.com/Rust-for-Linux/linux/issues/2
Tested against: rustc 1.96.0 (ac68faa20 2026-05-25)
===================
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
The __arch_cmpxchg1, __arch_cmpxchg2, __arch_xchg1, and __arch_xchg2
functions emulate 1-byte and 2-byte atomic operations using 4-byte
cmpxchg instructions, since s390 lacks native 1/2-byte cmpxchg support.
When KASAN is enabled, the READ_ONCE() operations in these functions
trigger stack-out-of-bounds warnings because they perform 4-byte reads
when only 1 or 2 bytes should be accessed.
Mark these functions as __no_sanitize_or_inline to prevent KASAN
instrumentation while maintaining correct functionality.
This resolves the following KASAN error during rust_atomics KUnit tests:
BUG: KASAN: stack-out-of-bounds in rust_helper_atomic_i8_xchg+0xb2/0xc0
Read of size 4 at addr 001bff7ffdbefcf0 by task kunit_try_catch/142
Reported-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Link: https://lore.kernel.org/rust-for-linux/CANiq72m4GVWFYqnxNtCHTPu7XcGewHB5LNwOoayTfnXs9pPbNg@mail.gmail.com/
Suggested-by: Gary Guo <gary@garyguo.net>
Link: https://lore.kernel.org/rust-for-linux/DITFTAVVHTNQ.380OHUHGTOI6M@garyguo.net/
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
ARCH_STATIC_BRANCH_ASM macros
Rust static branch support needs the s390 jump label instruction sequence
and __jump_table emission in a reusable form. The current implementation
embeds the sequence directly in the C asm goto blocks, which cannot be
shared with Rust.
Introduce ARCH_STATIC_BRANCH_ASM and ARCH_STATIC_BRANCH_JUMP_ASM to
describe the brcl sequences for the likely-false and likely-true cases
and to emit the same __jump_table entries as before. Switch the existing
C helpers to use the new macros to avoid duplication without changing
the generated code.
Acked-by: Gary Guo <gary@garyguo.net>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Rust WARN and BUG support relies on ARCH_WARN_ASM to emit __bug_table
entries. On s390 the macro is missing, so Rust code cannot generate
proper WARN/BUG metadata for the kernel's bug reporting infrastructure.
Define ARCH_WARN_ASM to produce the same assembly sequence and
__bug_table entry format as the existing s390 BUG handling, including
the monitor call. Define ARCH_WARN_REACHABLE as empty since s390 does
not provide reachability analysis for warning paths.
Acked-by: Gary Guo <gary@garyguo.net>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Pull kvm fixes from Paolo Bonzini:
"arm64:
- Correctly drop the ITS translation cache reference when it actually
gets invalidated
- Take the SRCU lock for SW page table walks
- Restore POR_EL0 access to host EL0, avoiding POR_EL0 becoming
inaccessible from EL0 after running a guest
- Reassign nested_mmus array behind mmu_lock, ensuring that vcpu init
and MMU notifiers are mutually exclusive
- Correctly handle FEAT_XNX at stage-2
s390:
- More fixes for the new page table management and nested
virtualization
x86:
- More fixes for GHCB issues:
- Read start/end indices of page size change requests exactly once
per vmexit
- Unmap and unpin the GHCB as needed on vCPU free"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (23 commits)
KVM: arm64: Correctly identify executable PTEs at stage-2
KVM: arm64: nv: Fix handling of XN[0] when !FEAT_XNX
KVM: arm64: Reassign nested_mmus array behind mmu_lock
KVM: arm64: Restore POR_EL0 access to host EL0
KVM: arm64: Take the SRCU lock for page table walks in fault injection and AT emulation
KVM: arm64: vgic-its: Drop the translation cache reference only for the erased entry
KVM: SEV: Unmap and unpin the GHCB as needed on vCPU free
KVM: SEV: Decouple the need to sync the GHCB SA from the need to free the SA
KVM: SEV: Move sev_free_vcpu() down below sev_es_unmap_ghcb()
KVM: Don't WARN if memory is dirtied without a vCPU when the VM is dying
KVM: SEV: Read start/end indices of PSC requests exactly once per #VMGEXIT
KVM: SEV: Add an anonymous "psc" struct to track current PSC metadata
KVM: SEV: Make it more obvious when KVM is writing back the current PSC index
KVM: s390: Remove ptep_zap_softleaf_entry()
KVM: s390: Fix possible reference leak in fault-in code
KVM: s390: Prevent memslots outside the ASCE range
KVM: s390: Lock pte when making page secure
KVM: s390: Fix fault-in code
KVM: s390: vsie: Fix rmap handling in _do_shadow_crste()
KVM: s390: Fix guest / virtual address confusion in _essa_clear_cbrl()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Alexander Gordeev:
- Enable IOMMUFD and VFIO cdev such that PCI pass-through to
QEMU/KVM can optionally utilize native IOMMUFD
- With HAVE_ARCH_BUG_FORMAT enabled the BUG infrastructure might
misinterpret flags or fault. Fix this by moving the "format"
field emission into __BUG_ENTRY()
- The generic version of _THIS_IP_ is known to be brittle and may
break with current and future GCC and Clang optimizations. Fix
it by overriding _THIS_IP_
* tag 's390-7.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: Implement _THIS_IP_ using inline asm
s390/bug: Always emit format word in __BUG_ENTRY
s390/configs: Enable IOMMUFD and VFIO cdev in defconfigs
|
|
Provide an s390 specific implementation of arch_this_cpu_write()
instead of the generic variant. The generic variant uses a quite
expensive raw_local_irq_save() / raw_local_irq_restore() pair.
Get rid of this by providing an own variant which makes use of the new
percpu code section infrastructure.
With this the text size of the kernel image is reduced by ~1k (defconfig).
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Provide an s390 specific implementation of arch_this_cpu_read() instead
of the generic variant. The generic variant uses preempt_disable() /
preempt_enable() pair and READ_ONCE().
Get rid of the preempt_disable() / preempt_enable() pairs by providing an
own variant which makes use of the new percpu code section infrastructure.
With this the text size of the kernel image is reduced by ~1k
(defconfig). Also 87 generated preempt_schedule_notrace() function
calls within the kernel image (modules not counted) are removed.
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Convert arch_this_cpu_[and|or]() to make use of the new percpu code
section infrastructure.
There is no user of this_cpu_and() and only one user of this_cpu_or()
within the kernel. Therefore this conversion has hardly any effect,
and also removes only preempt_schedule_notrace() function call.
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Convert arch_this_cpu_add_return() to make use of the new percpu code
section infrastructure.
With this the text size of the kernel image is reduced by ~4k
(defconfig). Also 66 generated preempt_schedule_notrace() function
calls within the kernel image (modules not counted) are removed.
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Convert arch_this_cpu_add() to make use of the new percpu code section
infrastructure.
With this the text size of the kernel image is reduced by ~76kb
(defconfig). Also more than 5300 generated preempt_schedule_notrace()
function calls within the kernel image (modules not counted) are removed.
With:
DEFINE_PER_CPU(long, foo);
void bar(long a) { this_cpu_add(foo, a); }
Old arch_this_cpu_add() looks like this:
00000000000000c0 <bar>:
c0: c0 04 00 00 00 00 jgnop c0 <bar>
c6: eb 01 03 a8 00 6a asi 936,1
cc: c4 18 00 00 00 00 lgrl %r1,cc <bar+0xc>
ce: R_390_GOTENT foo+0x2
d2: e3 10 03 b8 00 08 ag %r1,952
d8: eb 22 10 00 00 e8 laag %r2,%r2,0(%r1)
de: eb ff 03 a8 00 6e alsi 936,-1
e4: a7 a4 00 05 jhe ee <bar+0x2e>
e8: c0 f4 00 00 00 00 jg e8 <bar+0x28>
ea: R_390_PC32DBL __s390_indirect_jump_r14+0x2
ee: c0 f4 00 00 00 00 jg ee <bar+0x2e>
f0: R_390_PLT32DBL preempt_schedule_notrace+0x2
New arch_this_cpu_add() looks like this:
00000000000000c0 <bar>:
c0: c0 04 00 00 00 00 jgnop c0 <bar>
c6: c4 38 00 00 00 00 lgrl %r3,c6 <bar+0x6>
c8: R_390_GOTENT foo+0x2
cc: b9 04 00 43 lgr %r4,%r3
d0: eb 00 43 c0 00 52 mviy 960(%r0),4
d6: e3 40 03 b8 00 08 ag %r4,952
dc: eb 52 40 00 00 e8 laag %r5,%r2,0(%r4)
e2: eb 00 03 c0 00 52 mviy 960,0
e8: c0 f4 00 00 00 00 jg e8 <bar+0x28>
ea: R_390_PC32DBL __s390_indirect_jump_r14+0x2
Note that the conditional function call is removed.
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Add missing do { } while (0) constructs in order to avoid potential
build failures.
Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://sashiko.dev/#/patchset/20260319120503.4046659-1-hca%40linux.ibm.com
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
With the intended removal of PREEMPT_NONE this_cpu operations based on
atomic instructions, guarded with preempt_disable()/preempt_enable() pairs
become more expensive: the preempt_disable() / preempt_enable() pairs are
not optimized away anymore during compile time.
In particular the conditional call to preempt_schedule_notrace() after
preempt_enable() adds additional code and register pressure.
E.g. this simple C code sequence
DEFINE_PER_CPU(long, foo);
long bar(long a) { return this_cpu_add_return(foo, a); }
generates this code:
11a976: eb af f0 68 00 24 stmg %r10,%r15,104(%r15)
11a97c: b9 04 00 ef lgr %r14,%r15
11a980: b9 04 00 b2 lgr %r11,%r2
11a984: e3 f0 ff c8 ff 71 lay %r15,-56(%r15)
11a98a: e3 e0 f0 98 00 24 stg %r14,152(%r15)
11a990: eb 01 03 a8 00 6a asi 936,1 <- __preempt_count_add(1)
11a996: c0 10 00 d2 ac b5 larl %r1,1b70300 <- address of percpu var
11a9a0: e3 10 23 b8 00 08 ag %r1,952 <- add percpu offset
11a9a6: eb ab 10 00 00 e8 laag %r10,%r11,0(%r1) <- atomic op
11a9ac: eb ff 03 a8 00 6e alsi 936,-1 <- __preempt_count_dec_and_test()
11a9b2: a7 54 00 05 jnhe 11a9bc <bar+0x4c>
11a9b6: c0 e5 00 76 d1 bd brasl %r14,ff4d30 <preempt_schedule_notrace>
11a9bc: b9 e8 b0 2a agrk %r2,%r10,%r11
11a9c0: eb af f0 a0 00 04 lmg %r10,%r15,160(%r15)
11a9c6 07 fe br %r14
Even though the above example is more or less the worst case, since the
branch to preempt_schedule_notrace() requires a stackframe, which
otherwise wouldn't be necessary, there is also the conditional jnhe branch
instruction.
Get rid of the conditional branch with the following code sequence:
11a8e6: c0 30 00 d0 c5 0d larl %r3,1b33300
11a8ec: b9 04 00 43 lgr %r4,%r3
11a8f0: eb 00 43 c0 00 52 mviy 960,4
11a8f6: e3 40 03 b8 00 08 ag %r4,952
11a8fc: eb 52 40 00 00 e8 laag %r5,%r2,0(%r4)
11a902: eb 00 03 c0 00 52 mviy 960,0
11a908: b9 08 00 25 agr %r2,%r5
11a90c 07 fe br %r14
The general idea is that this_cpu operations based on atomic instructions
are guarded with mviy instructions:
- The first mviy instruction writes the register number, which contains
the percpu address variable to lowcore. This also indicates that a
percpu code section is executed.
- The first instruction following the mviy instruction must be the ag
instruction which adds the percpu offset to the percpu address register.
- Afterwards the atomic percpu operation follows.
- Then a second mviy instruction writes a zero to lowcore, which indicates
the end of the percpu code section.
- In case of an interrupt/exception/nmi the register number which was
written to lowcore is copied to the exception frame (pt_regs), and a zero
is written to lowcore.
- On return to the previous context it is checked if a percpu code section
was executed (saved register number not zero), and if the process was
migrated to a different cpu. If the percpu offset was already added to
the percpu address register (instruction address does _not_ point to the
ag instruction) the content of the percpu address register is adjusted so
it points to percpu variable of the new cpu.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Move GR_NUM / VX_NUM macros to separate insn-common-asm.h header file
so they can be reused for non-fpu insn constructs.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Use the ".irp" directive to get rid of all the repeated ".ifc" usages
in fpu-insn-asm.h.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Currently the tick subsystem stores the idle cputime accounting in private
fields, allowing cohabitation with architecture idle vtime accounting. The
former is fetched on online CPUs, the latter on offline CPUs.
For consolidation purposes, architecture vtime accounting will continue to
account the cputime but will make a break when the idle tick is
stopped. The dyntick cputime accounting will then be relayed by the tick
subsystem so that the idle cputime is still seen advancing coherently even
when the tick isn't there to flush the idle vtime.
Prepare for that and introduce three new APIs which will be used in
subsequent patches:
- vtime_dynticks_start() is deemed to be called when idle enters in
dyntick mode. The idle cputime that elapsed so far is accumulated
and accounted. Also idle time accounting is ignored.
- vtime_dynticks_stop() is deemed to be called when idle exits from
dyntick mode. The vtime entry clocks are fast-forward to current time
so that idle accounting restarts elapsing from now. Also idle time
accounting is resumed.
- vtime_reset() is deemed to be called from dynticks idle IRQ entry to
fast-forward the clock to current time so that the IRQ time is still
accounted by vtime while nohz cputime is paused.
Also accumulated vtime won't be flushed from dyntick-idle ticks to avoid
accounting twice the idle cputime, along with nohz accounting.
Co-developed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20260508131647.43868-7-frederic@kernel.org
|
|
Factor out try_get_locked_pte(), which behaves similarly to
get_locked_pte(), but does not attempt to allocate missing tables and
performs a spin_trylock() instead of blocking.
The new function is also exported, since it will be used in other
patches.
If intermediate entries are missing, there can be no pte swap entry to
free, so it's safe to ignore them.
This avoids potentially sleeping while atomic.
Fixes: e38c884df921 ("KVM: s390: Switch to new gmap")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260602142356.169458-4-imbrenda@linux.ibm.com>
|
|
In various code paths, page_table_check_pte_clear() is called
before converting a secure page, while in others it is called
after. Make this consistent and always perform the conversion
after the PTC hook has been called. Also make all conversion‑
eligibility condition checks look the same, and rework the one
in ptep_get_and_clear_full() slightly.
Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
strlcat() shouldn't be used anymore (see fortify-string.h), and will be
deprecated / removed sooner or later [1].
Therefore remove the s390 implementation of strlcat() in favor of the
generic variant.
[1] https://lore.kernel.org/all/20260514160719.105084-3-manuelebner@mailbox.org/
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Both GCC [1] and Clang [2] consider the generic version of _THIS_IP_ to
be broken:
#define _THIS_IP_ ({ __label__ __here; __here: (unsigned long)&&__here; })
In particular, the address of a label is only expected to be used with a
computed goto.
While the generic version more or less works today, it is known to be
brittle and may break with current and future optimizations. For
example, Clang -O2 always returns 1 when this function is inlined:
static inline unsigned long get_ip(void)
{ return ({ __label__ __here; __here: (unsigned long)&&__here; }); }
Fix it by overriding _THIS_IP_ in <asm/linkage.h> (which is included by
<linux/instruction_pointer.h>) using an architecture-specific inline asm
version. Additionally, avoiding taking the address of a label prevents
compilers from emitting spurious indirect branch targets (e.g. ENDBR or
BTI) under control-flow integrity schemes.
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120071 [1]
Link: https://github.com/llvm/llvm-project/issues/138272 [2]
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
When CONFIG_DEBUG_BUGVERBOSE is disabled, the s390 __BUG_ENTRY() macro
omits the format string pointer, so the generated __bug_table entry no
longer matches struct bug_entry.
With HAVE_ARCH_BUG_FORMAT enabled, the generic BUG infrastructure reads
bug_entry::format via bug_get_format(). If the format word is missing,
subsequent fields are read from the wrong offset, which may:
- Misinterpret flags (BUG vs WARN classification errors)
- Fault when dereferencing a misread format pointer
The root cause is that __BUG_ENTRY() delegates format word emission to
__BUG_ENTRY_VERBOSE(), which is conditional on CONFIG_DEBUG_BUGVERBOSE.
Fix this by moving the format field emission directly into __BUG_ENTRY()
so it is always emitted unconditionally. Remove the format parameter from
__BUG_ENTRY_VERBOSE() and keep only file/line emission conditional on
CONFIG_DEBUG_BUGVERBOSE.
Fixes: 2b71b8ab9718 ("s390/bug: Use BUG_FORMAT for DEBUG_BUGVERBOSE_DETAILED")
Signed-off-by: Jan Polensky <japo@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
In order to allow comparing trace timestamps between different
systems or virtual machines on s390, add a s390-tod trace clock.
This clock just uses the returned TOD clock value from stcke
directly.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
On real hardware, panic and machine reboot may not flush hardware cache
to memory. This means the persistent ring buffer, which relies on a
coherent state of memory, may not have its events written to the buffer
and they may be lost. Moreover, there may be inconsistency with the
counters which are used for validation of the integrity of the
persistent ring buffer which may cause all data to be discarded.
To avoid this issue, stop recording of the ring buffer on panic and
flush the cache of the ring buffer's memory.
Fixes: e645535a954a ("tracing: Add option to use memmapped memory for trace boot instance")
Cc: stable@vger.kernel.org
Cc: Will Deacon <will@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ian Rogers <irogers@google.com>
Link: https://patch.msgid.link/177751969602.2136606.12031934362587643488.stgit@mhiramat.tok.corp.google.com
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
There are many loops in the form of
while (READ_ONCE(*somelocation))
cpu_relax();
Strictly speaking the architecture requires serialization instead of only a
compiler barrier in the loop so the READ_ONCE() will see an updated value.
However real hardware does not require this (see IBM z Systems Processor
Optimization Primer - FAQ [1]), but it is still recommended to add
serialization. Given that cpu_relax() is doing nothing useful, it does
not hurt to add the single and fast instruction which makes sure that
serialization happens, and such loops may be left a bit faster.
[1] https://community.ibm.com/community/user/viewdocument/microprocessor-optimization-primer
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
cpu_relax() is defined identically at two different locations.
Just like most other architectures remove the implementation at
asm/processor.h and only use the one at asm/vdso/processor.h,
avoiding code duplication.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Use an alternative to implement bcr_serialize() and use alternative
patching to select between serialization and fast-serialization
depending on the corresponding facility bit.
Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Problem determination using s390dbf logging sometimes requires changing
the default logging level or log area size. While this is possible
using sysfs interfaces, there is no easy way to adjust these parameters
for early boot code that emits logs before userspace is available.
Add an s390dbf kernel parameter to address this shortcoming. The
parameter can be used to specify log level and area size (in units of
pages). A level of '-' turns logging off for an area. Logs can be
identified by name or a shell-style pattern.
Parameter format:
s390dbf=<name|pattern>:[<level>|-]:[<pages>][,...]
Example:
s390dbf=cio*:6:128,sclp_err::2
Specified parameters are applied immediately during debug area
registration for regular log areas. For early, static debug areas,
log levels are changed during early_param() parsing, while size
changes are applied at arch_initcall-time.
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Tested-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
The new SCLP action qualifier 4 is used by user-space code to provide
NVMe SMART log data to the platform.
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:
- Add support for CONFIG_PAGE_TABLE_CHECK and enable it in
debug_defconfig. s390 can only tell user from kernel PTEs via the mm,
so mm_struct is now passed into pxx_user_accessible_page() callbacks
- Expose the PCI function UID as an arch-specific slot attribute in
sysfs so a function can be identified by its user-defined id while
still in standby. Introduces a generic ARCH_PCI_SLOT_GROUPS hook in
drivers/pci/slot.c
- Refresh s390 PCI documentation to reflect current behavior and cover
previously undocumented sysfs attributes
- zcrypt device driver cleanup series: consistent field types, clearer
variable naming, a kernel-doc warning fix, and a comment explaining
the intentional synchronize_rcu() in pkey_handler_register()
- Provide an s390 arch_raw_cpu_ptr() that avoids the detour via
get_lowcore() using alternatives, shrinking defconfig by ~27 kB
- Guard identity-base randomization with kaslr_enabled() so nokaslr
keeps the identity mapping at 0 even with RANDOMIZE_IDENTITY_BASE=y
- Build S390_MODULES_SANITY_TEST as a module only by requiring KUNIT &&
m, since built-in would not exercise module loading
- Remove the permanently commented-out HMCDRV_DEV_CLASS create_class()
code in the hmcdrv driver
- Drop stale ident_map_size extern conflicting with asm/page.h
* tag 's390-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/zcrypt: Fix warning about wrong kernel doc comment
PCI: s390: Expose the UID as an arch specific PCI slot attribute
docs: s390/pci: Improve and update PCI documentation
s390/pkey: Add comment about synchronize_rcu() to pkey base
s390/hmcdrv: Remove commented out code
s390/zcrypt: Slight rework on the agent_id field
s390/zcrypt: Explicitly use a card variable in _zcrypt_send_cprb
s390/zcrypt: Rework MKVP fields and handling
s390/zcrypt: Make apfs a real unsigned int field
s390/zcrypt: Rework domain processing within zcrypt device driver
s390/zcrypt: Move inline function rng_type6cprb_msgx from header to code
s390/percpu: Provide arch_raw_cpu_ptr()
s390: Enable page table check for debug_defconfig
s390/pgtable: Add s390 support for page table check
s390/pgtable: Use set_pmd_bit() to invalidate PMD entry
mm/page_table_check: Pass mm_struct to pxx_user_accessible_page()
s390/boot: Respect kaslr_enabled() for identity randomization
s390/Kconfig: Make modules sanity test a module-only option
s390/setup: Drop stale ident_map_size declaration
|
|
Pull kvm updates from Paolo Bonzini:
"Arm:
- Add support for tracing in the standalone EL2 hypervisor code,
which should help both debugging and performance analysis. This
uses the new infrastructure for 'remote' trace buffers that can be
exposed by non-kernel entities such as firmware, and which came
through the tracing tree
- Add support for GICv5 Per Processor Interrupts (PPIs), as the
starting point for supporting the new GIC architecture in KVM
- Finally add support for pKVM protected guests, where pages are
unmapped from the host as they are faulted into the guest and can
be shared back from the guest using pKVM hypercalls. Protected
guests are created using a new machine type identifier. As the
elusive guestmem has not yet delivered on its promises, anonymous
memory is also supported
This is only a first step towards full isolation from the host; for
example, the CPU register state and DMA accesses are not yet
isolated. Because this does not really yet bring fully what it
promises, it is hidden behind CONFIG_ARM_PKVM_GUEST +
'kvm-arm.mode=protected', and also triggers TAINT_USER when a VM is
created. Caveat emptor
- Rework the dreaded user_mem_abort() function to make it more
maintainable, reducing the amount of state being exposed to the
various helpers and rendering a substantial amount of state
immutable
- Expand the Stage-2 page table dumper to support NV shadow page
tables on a per-VM basis
- Tidy up the pKVM PSCI proxy code to be slightly less hard to
follow
- Fix both SPE and TRBE in non-VHE configurations so that they do not
generate spurious, out of context table walks that ultimately lead
to very bad HW lockups
- A small set of patches fixing the Stage-2 MMU freeing in error
cases
- Tighten-up accepted SMC immediate value to be only #0 for host
SMCCC calls
- The usual cleanups and other selftest churn
LoongArch:
- Use CSR_CRMD_PLV for kvm_arch_vcpu_in_kernel()
- Add DMSINTC irqchip in kernel support
RISC-V:
- Fix steal time shared memory alignment checks
- Fix vector context allocation leak
- Fix array out-of-bounds in pmu_ctr_read() and pmu_fw_ctr_read_hi()
- Fix double-free of sdata in kvm_pmu_clear_snapshot_area()
- Fix integer overflow in kvm_pmu_validate_counter_mask()
- Fix shift-out-of-bounds in make_xfence_request()
- Fix lost write protection on huge pages during dirty logging
- Split huge pages during fault handling for dirty logging
- Skip CSR restore if VCPU is reloaded on the same core
- Implement kvm_arch_has_default_irqchip() for KVM selftests
- Factored-out ISA checks into separate sources
- Added hideleg to struct kvm_vcpu_config
- Factored-out VCPU config into separate sources
- Support configuration of per-VM HGATP mode from KVM user space
s390:
- Support for ESA (31-bit) guests inside nested hypervisors
- Remove restriction on memslot alignment, which is not needed
anymore with the new gmap code
- Fix LPSW/E to update the bear (which of course is the breaking
event address register)
x86:
- Shut up various UBSAN warnings on reading module parameter before
they were initialized
- Don't zero-allocate page tables that are used for splitting
hugepages in the TDP MMU, as KVM is guaranteed to set all SPTEs in
the page table and thus write all bytes
- As an optimization, bail early when trying to unsync 4KiB mappings
if the target gfn can just be mapped with a 2MiB hugepage
x86 generic:
- Copy single-chunk MMIO write values into struct kvm_vcpu (more
precisely struct kvm_mmio_fragment) to fix use-after-free stack
bugs where KVM would dereference stack pointer after an exit to
userspace
- Clean up and comment the emulated MMIO code to try to make it
easier to maintain (not necessarily "easy", but "easier")
- Move VMXON+VMXOFF and EFER.SVME toggling out of KVM (not *all* of
VMX and SVM enabling) as it is needed for trusted I/O
- Advertise support for AVX512 Bit Matrix Multiply (BMM) instructions
- Immediately fail the build if a required #define is missing in one
of KVM's headers that is included multiple times
- Reject SET_GUEST_DEBUG with -EBUSY if there's an already injected
exception, mostly to prevent syzkaller from abusing the uAPI to
trigger WARNs, but also because it can help prevent userspace from
unintentionally crashing the VM
- Exempt SMM from CPUID faulting on Intel, as per the spec
- Misc hardening and cleanup changes
x86 (AMD):
- Fix and optimize IRQ window inhibit handling for AVIC; make it
per-vCPU so that KVM doesn't prematurely re-enable AVIC if multiple
vCPUs have to-be-injected IRQs
- Clean up and optimize the OSVW handling, avoiding a bug in which
KVM would overwrite state when enabling virtualization on multiple
CPUs in parallel. This should not be a problem because OSVW should
usually be the same for all CPUs
- Drop a WARN in KVM_MEMORY_ENCRYPT_REG_REGION where KVM complains
about a "too large" size based purely on user input
- Clean up and harden the pinning code for KVM_MEMORY_ENCRYPT_REG_REGION
- Disallow synchronizing a VMSA of an already-launched/encrypted
vCPU, as doing so for an SNP guest will crash the host due to an
RMP violation page fault
- Overhaul KVM's APIs for detecting SEV+ guests so that VM-scoped
queries are required to hold kvm->lock, and enforce it by lockdep.
Fix various bugs where sev_guest() was not ensured to be stable for
the whole duration of a function or ioctl
- Convert a pile of kvm->lock SEV code to guard()
- Play nicer with userspace that does not enable
KVM_CAP_EXCEPTION_PAYLOAD, for which KVM needs to set CR2 and DR6
as a response to ioctls such as KVM_GET_VCPU_EVENTS (even if the
payload would end up in EXITINFO2 rather than CR2, for example).
Only set CR2 and DR6 when consumption of the payload is imminent,
but on the other hand force delivery of the payload in all paths
where userspace retrieves CR2 or DR6
- Use vcpu->arch.cr2 when updating vmcb12's CR2 on nested #VMEXIT
instead of vmcb02->save.cr2. The value is out of sync after a
save/restore or after a #PF is injected into L2
- Fix a class of nSVM bugs where some fields written by the CPU are
not synchronized from vmcb02 to cached vmcb12 after VMRUN, and so
are not up-to-date when saved by KVM_GET_NESTED_STATE
- Fix a class of bugs where the ordering between KVM_SET_NESTED_STATE
and KVM_SET_{S}REGS could cause vmcb02 to be incorrectly
initialized after save+restore
- Add a variety of missing nSVM consistency checks
- Fix several bugs where KVM failed to correctly update VMCB fields
on nested #VMEXIT
- Fix several bugs where KVM failed to correctly synthesize #UD or
#GP for SVM-related instructions
- Add support for save+restore of virtualized LBRs (on SVM)
- Refactor various helpers and macros to improve clarity and
(hopefully) make the code easier to maintain
- Aggressively sanitize fields when copying from vmcb12, to guard
against unintentionally allowing L1 to utilize yet-to-be-defined
features
- Fix several bugs where KVM botched rAX legality checks when
emulating SVM instructions. There are remaining issues in that KVM
doesn't handle size prefix overrides for 64-bit guests
- Fail emulation of VMRUN/VMLOAD/VMSAVE if mapping vmcb12 fails
instead of somewhat arbitrarily synthesizing #GP (i.e. don't double
down on AMD's architectural but sketchy behavior of generating #GP
for "unsupported" addresses)
- Cache all used vmcb12 fields to further harden against TOCTOU bugs
x86 (Intel):
- Drop obsolete branch hint prefixes from the VMX instruction macros
- Use ASM_INPUT_RM() in __vmcs_writel() to coerce clang into using a
register input when appropriate
- Code cleanups
guest_memfd:
- Don't mark guest_memfd folios as accessed, as guest_memfd doesn't
support reclaim, the memory is unevictable, and there is no storage
to write back to
LoongArch selftests:
- Add KVM PMU test cases
s390 selftests:
- Enable more memory selftests
x86 selftests:
- Add support for Hygon CPUs in KVM selftests
- Fix a bug in the MSR test where it would get false failures on
AMD/Hygon CPUs with exactly one of RDPID or RDTSCP
- Add an MADV_COLLAPSE testcase for guest_memfd as a regression test
for a bug where the kernel would attempt to collapse guest_memfd
folios against KVM's will"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (373 commits)
KVM: x86: use inlines instead of macros for is_sev_*guest
x86/virt: Treat SVM as unsupported when running as an SEV+ guest
KVM: SEV: Goto an existing error label if charging misc_cg for an ASID fails
KVM: SVM: Move lock-protected allocation of SEV ASID into a separate helper
KVM: SEV: use mutex guard in snp_handle_guest_req()
KVM: SEV: use mutex guard in sev_mem_enc_unregister_region()
KVM: SEV: use mutex guard in sev_mem_enc_ioctl()
KVM: SEV: use mutex guard in snp_launch_update()
KVM: SEV: Assert that kvm->lock is held when querying SEV+ support
KVM: SEV: Document that checking for SEV+ guests when reclaiming memory is "safe"
KVM: SEV: Hide "struct kvm_sev_info" behind CONFIG_KVM_AMD_SEV=y
KVM: SEV: WARN on unhandled VM type when initializing VM
KVM: LoongArch: selftests: Add PMU overflow interrupt test
KVM: LoongArch: selftests: Add basic PMU event counting test
KVM: LoongArch: selftests: Add cpucfg read/write helpers
LoongArch: KVM: Add DMSINTC inject msi to vCPU
LoongArch: KVM: Add DMSINTC device support
LoongArch: KVM: Make vcpu_is_preempted() as a macro rather than function
LoongArch: KVM: Move host CSR_GSTAT save and restore in context switch
LoongArch: KVM: Move host CSR_EENTRY save and restore in context switch
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- "pid: make sub-init creation retryable" (Oleg Nesterov)
Make creation of init in a new namespace more robust by clearing away
some historical cruft which is no longer needed. Also some
documentation fixups
- "selftests/fchmodat2: Error handling and general" (Mark Brown)
Fix and a cleanup for the fchmodat2() syscall selftest
- "lib: polynomial: Move to math/ and clean up" (Andy Shevchenko)
- "hung_task: Provide runtime reset interface for hung task detector"
(Aaron Tomlin)
Give administrators the ability to zero out
/proc/sys/kernel/hung_task_detect_count
- "tools/getdelays: use the static UAPI headers from
tools/include/uapi" (Thomas Weißschuh)
Teach getdelays to use the in-kernel UAPI headers rather than the
system-provided ones
- "watchdog/hardlockup: Improvements to hardlockup" (Mayank Rungta)
Several cleanups and fixups to the hardlockup detector code and its
documentation
- "lib/bch: fix undefined behavior from signed left-shifts" (Josh Law)
A couple of small/theoretical fixes in the bch code
- "ocfs2/dlm: fix two bugs in dlm_match_regions()" (Junrui Luo)
- "cleanup the RAID5 XOR library" (Christoph Hellwig)
A quite far-reaching cleanup to this code. I can't do better than to
quote Christoph:
"The XOR library used for the RAID5 parity is a bit of a mess right
now. The main file sits in crypto/ despite not being cryptography
and not using the crypto API, with the generic implementations
sitting in include/asm-generic and the arch implementations
sitting in an asm/ header in theory. The latter doesn't work for
many cases, so architectures often build the code directly into
the core kernel, or create another module for the architecture
code.
Change this to a single module in lib/ that also contains the
architecture optimizations, similar to the library work Eric
Biggers has done for the CRC and crypto libraries later. After
that it changes to better calling conventions that allow for
smarter architecture implementations (although none is contained
here yet), and uses static_call to avoid indirection function call
overhead"
- "lib/list_sort: Clean up list_sort() scheduling workarounds"
(Kuan-Wei Chiu)
Clean up this library code by removing a hacky thing which was added
for UBIFS, which UBIFS doesn't actually need
- "Fix bugs in extract_iter_to_sg()" (Christian Ehrhardt)
Fix a few bugs in the scatterlist code, add in-kernel tests for the
now-fixed bugs and fix a leak in the test itself
- "kdump: Enable LUKS-encrypted dump target support in ARM64 and
PowerPC" (Coiby Xu)
Enable support of the LUKS-encrypted device dump target on arm64 and
powerpc
- "ocfs2: consolidate extent list validation into block read callbacks"
(Joseph Qi)
Cleanup, simplify, and make more robust ocfs2's validation of extent
list fields (Kernel test robot loves mounting corrupted fs images!)
* tag 'mm-nonmm-stable-2026-04-15-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (127 commits)
ocfs2: validate group add input before caching
ocfs2: validate bg_bits during freefrag scan
ocfs2: fix listxattr handling when the buffer is full
doc: watchdog: fix typos etc
update Sean's email address
ocfs2: use get_random_u32() where appropriate
ocfs2: split transactions in dio completion to avoid credit exhaustion
ocfs2: remove redundant l_next_free_rec check in __ocfs2_find_path()
ocfs2: validate extent block list fields during block read
ocfs2: remove empty extent list check in ocfs2_dx_dir_lookup_rec()
ocfs2: validate dx_root extent list fields during block read
ocfs2: fix use-after-free in ocfs2_fault() when VM_FAULT_RETRY
ocfs2: handle invalid dinode in ocfs2_group_extend
.get_maintainer.ignore: add Askar
ocfs2: validate bg_list extent bounds in discontig groups
checkpatch: exclude forward declarations of const structs
tools/accounting: handle truncated taskstats netlink messages
taskstats: set version in TGID exit notifications
ocfs2/heartbeat: fix slot mapping rollback leaks on error paths
arm64,ppc64le/kdump: pass dm-crypt keys to kdump kernel
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- "maple_tree: Replace big node with maple copy" (Liam Howlett)
Mainly prepararatory work for ongoing development but it does reduce
stack usage and is an improvement.
- "mm, swap: swap table phase III: remove swap_map" (Kairui Song)
Offers memory savings by removing the static swap_map. It also yields
some CPU savings and implements several cleanups.
- "mm: memfd_luo: preserve file seals" (Pratyush Yadav)
File seal preservation to LUO's memfd code
- "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan
Chen)
Additional userspace stats reportng to zswap
- "arch, mm: consolidate empty_zero_page" (Mike Rapoport)
Some cleanups for our handling of ZERO_PAGE() and zero_pfn
- "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu
Han)
A robustness improvement and some cleanups in the kmemleak code
- "Improve khugepaged scan logic" (Vernon Yang)
Improve khugepaged scan logic and reduce CPU consumption by
prioritizing scanning tasks that access memory frequently
- "Make KHO Stateless" (Jason Miu)
Simplify Kexec Handover by transitioning KHO from an xarray-based
metadata tracking system with serialization to a radix tree data
structure that can be passed directly to the next kernel
- "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas
Ballasi and Steven Rostedt)
Enhance vmscan's tracepointing
- "mm: arch/shstk: Common shadow stack mapping helper and
VM_NOHUGEPAGE" (Catalin Marinas)
Cleanup for the shadow stack code: remove per-arch code in favour of
a generic implementation
- "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin)
Fix a WARN() which can be emitted the KHO restores a vmalloc area
- "mm: Remove stray references to pagevec" (Tal Zussman)
Several cleanups, mainly udpating references to "struct pagevec",
which became folio_batch three years ago
- "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl
Shutsemau)
Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail
pages encode their relationship to the head page
- "mm/damon/core: improve DAMOS quota efficiency for core layer
filters" (SeongJae Park)
Improve two problematic behaviors of DAMOS that makes it less
efficient when core layer filters are used
- "mm/damon: strictly respect min_nr_regions" (SeongJae Park)
Improve DAMON usability by extending the treatment of the
min_nr_regions user-settable parameter
- "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka)
The proper fix for a previously hotfixed SMP=n issue. Code
simplifications and cleanups ensued
- "mm: cleanups around unmapping / zapping" (David Hildenbrand)
A bunch of cleanups around unmapping and zapping. Mostly
simplifications, code movements, documentation and renaming of
zapping functions
- "support batched checking of the young flag for MGLRU" (Baolin Wang)
Batched checking of the young flag for MGLRU. It's part cleanups; one
benchmark shows large performance benefits for arm64
- "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner)
memcg cleanup and robustness improvements
- "Allow order zero pages in page reporting" (Yuvraj Sakshith)
Enhance free page reporting - it is presently and undesirably order-0
pages when reporting free memory.
- "mm: vma flag tweaks" (Lorenzo Stoakes)
Cleanup work following from the recent conversion of the VMA flags to
a bitmap
- "mm/damon: add optional debugging-purpose sanity checks" (SeongJae
Park)
Add some more developer-facing debug checks into DAMON core
- "mm/damon: test and document power-of-2 min_region_sz requirement"
(SeongJae Park)
An additional DAMON kunit test and makes some adjustments to the
addr_unit parameter handling
- "mm/damon/core: make passed_sample_intervals comparisons
overflow-safe" (SeongJae Park)
Fix a hard-to-hit time overflow issue in DAMON core
- "mm/damon: improve/fixup/update ratio calculation, test and
documentation" (SeongJae Park)
A batch of misc/minor improvements and fixups for DAMON
- "mm: move vma_(kernel|mmu)_pagesize() out of hugetlb.c" (David
Hildenbrand)
Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code
movement was required.
- "zram: recompression cleanups and tweaks" (Sergey Senozhatsky)
A somewhat random mix of fixups, recompression cleanups and
improvements in the zram code
- "mm/damon: support multiple goal-based quota tuning algorithms"
(SeongJae Park)
Extend DAMOS quotas goal auto-tuning to support multiple tuning
algorithms that users can select
- "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao)
Fix the khugpaged sysfs handling so we no longer spam the logs with
reams of junk when starting/stopping khugepaged
- "mm: improve map count checks" (Lorenzo Stoakes)
Provide some cleanups and slight fixes in the mremap, mmap and vma
code
- "mm/damon: support addr_unit on default monitoring targets for
modules" (SeongJae Park)
Extend the use of DAMON core's addr_unit tunable
- "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache)
Cleanups to khugepaged and is a base for Nico's planned khugepaged
mTHP support
- "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand)
Code movement and cleanups in the memhotplug and sparsemem code
- "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup
CONFIG_MIGRATION" (David Hildenbrand)
Rationalize some memhotplug Kconfig support
- "change young flag check functions to return bool" (Baolin Wang)
Cleanups to change all young flag check functions to return bool
- "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh
Law and SeongJae Park)
Fix a few potential DAMON bugs
- "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo
Stoakes)
Convert a lot of the existing use of the legacy vm_flags_t data type
to the new vma_flags_t type which replaces it. Mainly in the vma
code.
- "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes)
Expand the mmap_prepare functionality, which is intended to replace
the deprecated f_op->mmap hook which has been the source of bugs and
security issues for some time. Cleanups, documentation, extension of
mmap_prepare into filesystem drivers
- "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes)
Simplify and clean up zap_huge_pmd(). Additional cleanups around
vm_normal_folio_pmd() and the softleaf functionality are performed.
* tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits)
mm: fix deferred split queue races during migration
mm/khugepaged: fix issue with tracking lock
mm/huge_memory: add and use has_deposited_pgtable()
mm/huge_memory: add and use normal_or_softleaf_folio_pmd()
mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio()
mm/huge_memory: separate out the folio part of zap_huge_pmd()
mm/huge_memory: use mm instead of tlb->mm
mm/huge_memory: remove unnecessary sanity checks
mm/huge_memory: deduplicate zap deposited table call
mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE()
mm/huge_memory: add a common exit path to zap_huge_pmd()
mm/huge_memory: handle buggy PMD entry in zap_huge_pmd()
mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc
mm/huge: avoid big else branch in zap_huge_pmd()
mm/huge_memory: simplify vma_is_specal_huge()
mm: on remap assert that input range within the proposed VMA
mm: add mmap_action_map_kernel_pages[_full]()
uio: replace deprecated mmap hook with mmap_prepare in uio_info
drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare
mm: allow handling of stacked mmap_prepare hooks in more drivers
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
- randomize_kstack: Improve implementation across arches (Ryan Roberts)
- lkdtm/fortify: Drop unneeded FORTIFY_STR_OBJECT test
- refcount: Remove unused __signed_wrap function annotations
* tag 'hardening-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
lkdtm/fortify: Drop unneeded FORTIFY_STR_OBJECT test
refcount: Remove unused __signed_wrap function annotations
randomize_kstack: Unify random source across arches
randomize_kstack: Maintain kstack_offset per task
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
- ESA nesting support
- 4k memslots
- LPSW/E fix
|
|
On s390, an individual PCI function can generally be identified by two
identifiers, the FID and the UID. Which identifier is used depends on
the scope and the platform configuration.
The first identifier, the FID, is always available and identifies a PCI
device uniquely within a machine. The FID may be virtualized by
hypervisors, but on the LPAR level, the machine scope makes it
impossible to create the same configuration based on FIDs on two
different LPARs of the same machine, and difficult to reuse across
machines.
Such matching LPAR configurations are useful, though, allowing
standardized setups and booting a Linux installation on different LPARs.
To this end the UID, or user-defined identifier, was introduced. While
it is only guaranteed to be unique within an LPAR and only if indicated
by firmware, it allows users to replicate PCI device setups.
On s390, which uses a machine hypervisor, a per PCI function hotplug
model is used. The shortcoming with the UID then is, that it is not
visible to the user without first attaching the PCI function and
accessing the "uid" device attribute. The FID, on the other hand, is
used as the slot name and is thus known even with the PCI function in
standby.
Remedy this shortcoming by providing the UID as an attribute on the slot
allowing the user to identify a PCI function based on the UID without
having to first attach it. Do this via a macro mechanism analogous to
what was introduced by commit 265baca69a07 ("s390/pci: Stop usurping
pdev->dev.groups") for the PCI device attributes.
Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>
Reviewed-by: Julian Ruess <julianr@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # drivers/pci/slot.c
Link: https://lore.kernel.org/r/20260407-uid_slot-v8-2-15ae4409d2ce@linux.ibm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
In order to be able to do this, we need to change VM_DATA_DEFAULT_FLAGS
and friends and update the architecture-specific definitions also.
We then have to update some KSM logic to handle VMA flags, and introduce
VMA_STACK_FLAGS to define the vma_flags_t equivalent of VM_STACK_FLAGS.
We also introduce two helper functions for use during the time we are
converting legacy flags to vma_flags_t values - vma_flags_to_legacy() and
legacy_to_vma_flags().
This enables us to iteratively make changes to break these changes up into
separate parts.
We use these explicitly here to keep VM_STACK_FLAGS around for certain
users which need to maintain the legacy vm_flags_t values for the time
being.
We are no longer able to rely on the simple VM_xxx being set to zero if
the feature is not enabled, so in the case of VM_DROPPABLE we introduce
VMA_DROPPABLE as the vma_flags_t equivalent, which is set to
EMPTY_VMA_FLAGS if the droppable flag is not available.
While we're here, we make the description of do_brk_flags() into a kdoc
comment, as it almost was already.
We use vma_flags_to_legacy() to not need to update the vm_get_page_prot()
logic as this time.
Note that in create_init_stack_vma() we have to replace the BUILD_BUG_ON()
with a VM_WARN_ON_ONCE() as the tested values are no longer build time
available.
We also update mprotect_fixup() to use VMA flags where possible, though we
have to live with a little duplication between vm_flags_t and vma_flags_t
values for the time being until further conversions are made.
While we're here, update VM_SPECIAL to be defined in terms of
VMA_SPECIAL_FLAGS now we have vma_flags_to_legacy().
Finally, we update the VMA tests to reflect these changes.
Link: https://lkml.kernel.org/r/d02e3e45d9a33d7904b149f5604904089fd640ae.1774034900.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com> [SELinux]
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The pmdp_clear_flush_young() is used to clear the young flag and flush the
TLB, returning whether the young flag was set for this PMD entry. Change
the return type to bool to make the intention clearer.
Link: https://lkml.kernel.org/r/a668b9a974c0d675e7a41f6973bcbe3336e8b373.1774075004.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Callers use pmdp_test_and_clear_young() to clear the young flag and check
whether it was set for this PMD entry. Change the return type to bool to
make the intention clearer.
Link: https://lkml.kernel.org/r/f1d31307a13365d3d0fed5809727dcc2dd59631b.1774075004.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The ptep_clear_flush_young() and clear_flush_young_ptes() are used to
clear the young flag and flush the TLB, returning whether the young flag
was set. Change the return type to bool to make the intention clearer.
Link: https://lkml.kernel.org/r/24af5144b96103631594501f77d4525f2475c1be.1774075004.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|