summaryrefslogtreecommitdiff
path: root/arch/loongarch/kvm/vcpu.c
AgeCommit message (Collapse)Author
2026-04-09LoongArch: KVM: Move host CSR_GSTAT save and restore in context switchBibo Mao
CSR register LOONGARCH_CSR_GSTAT stores guest VMID information. With existing implementation method, VMID is per vCPU, similar with ASID in kernel. LOONGARCH_CSR_GSTAT is written at VM entry even if VMID is not changed. Here move LOONGARCH_CSR_GSTAT save/restore in vCPU context switch, and update LOONGARCH_CSR_GSTAT only when VMID is updated at VM entry. At most time VM enter/exit is much more frequent than vCPU thread context switch. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-04-09LoongArch: KVM: Move host CSR_EENTRY save and restore in context switchBibo Mao
CSR register LOONGARCH_CSR_EENTRY is shared between host CPU and guest vCPU, KVM need save and restore LOONGARCH_CSR_EENTRY register. Here move LOONGARCH_CSR_EENTRY saving in to context switch function rather than VM entry. At most time VM enter/exit is much more frequent than vCPU thread context switch. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-04-09LoongArch: KVM: Check kvm_request_pending() in kvm_late_check_requests()Bibo Mao
Add kvm_request_pending() checking firstly in kvm_late_check_requests(), at most time there is no pending request, then the following pending bit checking can be skipped. Also embed function kvm_check_pmu() in to kvm_late_check_requests(), and put it after the kvm_request_pending() checking. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-04-09LoongArch: KVM: Use CSR_CRMD_PLV in kvm_arch_vcpu_in_kernel()Tao Cui
The function reads LOONGARCH_CSR_CRMD but uses CSR_PRMD_PPLV to extract the privilege level. While both masks have the same value (0x3), CSR_CRMD_PLV is the semantically correct constant for CRMD. Cc: stable@vger.kernel.org Reviewed-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Tao Cui <cuitao@kylinos.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-03-26LoongArch: KVM: Make kvm_get_vcpu_by_cpuid() more robustHuacai Chen
kvm_get_vcpu_by_cpuid() takes a cpuid parameter whose type is int, so cpuid can be negative. Let kvm_get_vcpu_by_cpuid() return NULL for this case so as to make it more robust. This fix an out-of-bounds access to kvm_arch::phyid_map::phys_map[]. Cc: <stable@vger.kernel.org> Fixes: 73516e9da512adc ("LoongArch: KVM: Add vcpu mapping from physical cpuid") Reported-by: Aurelien Jarno <aurel32@debian.org> Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1131431 Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-03-11Merge tag 'kvm-x86-generic-7.0-rc3' of https://github.com/kvm-x86/linux into ↵Paolo Bonzini
HEAD KVM generic changes for 7.0 - Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from being unnecessary and confusing, triggered compiler warnings due to -Wflex-array-member-not-at-end. - Document that vcpu->mutex is take outside of kvm->slots_lock and kvm->slots_arch_lock, which is intentional and desirable despite being rather unintuitive.
2026-02-21Convert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-06LoongArch: KVM: Add paravirt preempt feature in hypervisor sideBibo Mao
Feature KVM_FEATURE_PREEMPT is added to show whether vCPU is preempted or not. It is to help guest OS scheduling or lock checking etc. Here add KVM_FEATURE_PREEMPT feature and use one byte as preempted flag in the steal time structure. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-06LoongArch: KVM: Add FPU/LBT delay load supportBibo Mao
FPU/LBT are lazy enabled with KVM hypervisor. After FPU/LBT enabled and loaded, vCPU can be preempted and FPU/LBT will be lost again, there will be unnecessary FPU/LBT exceptions, load and store stuff. Here delay the FPU/LBT load until the guest entry. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-06LoongArch: KVM: Move LBT capability check in exception handlerBibo Mao
Like FPU exception handler, check LBT capability in the LBT exception handler rather than function kvm_own_lbt(). Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-06LoongArch: KVM: Move LASX capability check in exception handlerBibo Mao
Like FPU exception handler, check LASX capability in the LASX exception handler rather than function kvm_own_lasx(). Since LASX capability in the function kvm_guest_has_lasx() implies FPU and LSX capability, only checking kvm_guest_has_lasx() is OK here. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-06LoongArch: KVM: Move LSX capability check in exception handlerBibo Mao
Like FPU exception handler, check LSX capability in the LSX exception handler rather than function kvm_own_lsx(). Since LSX capability in the function kvm_guest_has_lsx() implies FPU capability, only checking kvm_guest_has_lsx() is OK here. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-06LoongArch: KVM: Handle LOONGARCH_CSR_IPR during vCPU context switchBibo Mao
Register LOONGARCH_CSR_IPR is interrupt priority setting for nested interrupt handling. Though LoongArch Linux AVEC driver does not use this register, KVM hypervisor needs to save and restore this it during vCPU context switch. Because Linux AVEC driver may use this register in future, or other OS may use it. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-06LoongArch: KVM: Check VM msgint feature during interrupt handlingBibo Mao
During message interrupt handling and relative CSR registers saving and restore, it is better to check VM msgint feature rather than host msgint feature, because VM may disable this feature even if host supports this. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-06LoongArch: KVM: Add more CPUCFG mask bitsBibo Mao
With new CPU cores there are more features supported which are indicated in CPUCFG2 bits 24:30 and CPUCFG3 bits 17:23. The KVM hypervisor cannot enable or disable (most of) these features and there is no KVM exception when instructions of these features are executed in guest mode. Here add more CPUCFG mask support with LA664 CPU type. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-01-08KVM: Remove subtle "struct kvm_stats_desc" pseudo-overlaySean Christopherson
Remove KVM's internal pseudo-overlay of kvm_stats_desc, which subtly aliases the flexible name[] in the uAPI definition with a fixed-size array of the same name. The unusual embedded structure results in compiler warnings due to -Wflex-array-member-not-at-end, and also necessitates an extra level of dereferencing in KVM. To avoid the "overlay", define the uAPI structure to have a fixed-size name when building for the kernel. Opportunistically clean up the indentation for the stats macros, and replace spaces with tabs. No functional change intended. Reported-by: Gustavo A. R. Silva <gustavoars@kernel.org> Closes: https://lore.kernel.org/all/aPfNKRpLfhmhYqfP@kspp Acked-by: Marc Zyngier <maz@kernel.org> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> [..] Acked-by: Anup Patel <anup@brainfault.org> Reviewed-by: Bibo Mao <maobibo@loongson.cn> Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://patch.msgid.link/20251205232655.445294-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-12-13Merge tag 'loongarch-6.19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch updates from Huacai Chen: - Add basic LoongArch32 support Note: Build infrastructures of LoongArch32 are not enabled yet, because we need to adjust irqchip drivers and wait for GNU toolchain be upstream first. - Select HAVE_ARCH_BITREVERSE in Kconfig - Fix build and boot for CONFIG_RANDSTRUCT - Correct the calculation logic of thread_count - Some bug fixes and other small changes * tag 'loongarch-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: (22 commits) LoongArch: Adjust default config files for 32BIT/64BIT LoongArch: Adjust VDSO/VSYSCALL for 32BIT/64BIT LoongArch: Adjust misc routines for 32BIT/64BIT LoongArch: Adjust user accessors for 32BIT/64BIT LoongArch: Adjust system call for 32BIT/64BIT LoongArch: Adjust module loader for 32BIT/64BIT LoongArch: Adjust time routines for 32BIT/64BIT LoongArch: Adjust process management for 32BIT/64BIT LoongArch: Adjust memory management for 32BIT/64BIT LoongArch: Adjust boot & setup for 32BIT/64BIT LoongArch: Adjust common macro definitions for 32BIT/64BIT LoongArch: Add adaptive CSR accessors for 32BIT/64BIT LoongArch: Add atomic operations for 32BIT/64BIT LoongArch: Add new PCI ID for pci_fixup_vgadev() LoongArch: Add and use some macros for AVEC LoongArch: Correct the calculation logic of thread_count LoongArch: Use unsigned long for _end and _text LoongArch: Use __pmd()/__pte() for swap entry conversions LoongArch: Fix arch_dup_task_struct() for CONFIG_RANDSTRUCT LoongArch: Fix build errors for CONFIG_RANDSTRUCT ...
2025-12-08LoongArch: Adjust time routines for 32BIT/64BITHuacai Chen
Adjust time routines for both 32BIT and 64BIT, including: rdtime_h() / rdtime_l() definitions for 32BIT and rdtime_d() definition for 64BIT, get_cycles() and get_cycles64() definitions for 32BIT/64BIT, show time frequency info ("CPU MHz" and "BogoMIPS") in /proc/cpuinfo, etc. Use do_div() for division which works on both 32BIT and 64BIT platforms. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-12-02Merge tag 'loongarch-kvm-6.19' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD LoongArch KVM changes for v6.19 1. Get VM PMU capability from HW GCFG register. 2. Add AVEC basic support. 3. Use 64-bit register definition for EIOINTC. 4. Add KVM timer test cases for tools/selftests.
2025-11-27LoongArch: KVM: Add AVEC basic supportSong Gao
Check whether the host CPU supported AVEC, and save/restore CSR_MSGIS0- CSR_MSGIS3 when necessary. Reviewed-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Song Gao <gaosong@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-26Merge tag 'kvm-x86-tdx-6.19' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM TDX changes for 6.19: - Overhaul the TDX code to address systemic races where KVM (acting on behalf of userspace) could inadvertantly trigger lock contention in the TDX-Module, which KVM was either working around in weird, ugly ways, or was simply oblivious to (as proven by Yan tripping several KVM_BUG_ON()s with clever selftests). - Fix a bug where KVM could corrupt a vCPU's cpu_list when freeing a vCPU if creating said vCPU failed partway through. - Fix a few sparse warnings (bad annotation, 0 != NULL). - Use struct_size() to simplify copying capabilities to userspace.
2025-11-10LoongArch: KVM: Skip PMU checking on vCPU context switchBibo Mao
PMU hardware about VM is switched on VM exit to host rather than vCPU context sched off, PMU is checked and restored on return to VM. It is not necessary to check PMU on vCPU context sched on callback, since the request is made on the VM exit entry or VM PMU CSR access abort routine already. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-10LoongArch: KVM: Restore guest PMU if it is enabledBibo Mao
On LoongArch system, guest PMU hardware is shared by guest and host but PMU interrupt is separated. PMU is pass-through to VM, and there is PMU context switch when exit to host and return to guest. There is optimiation to check whether PMU is enabled by guest. If not, it is not necessary to return to guest. However, if it is enabled, PMU context for guest need switch on. Now KVM_REQ_PMU notification is set on vCPU context switch, but it is missing if there is no vCPU context switch while PMU is used by guest VM, so fix it. Cc: <stable@vger.kernel.org> Fixes: f4e40ea9f78f ("LoongArch: KVM: Add PMU support for guest") Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-05KVM: Rename kvm_arch_vcpu_async_ioctl() to kvm_arch_vcpu_unlocked_ioctl()Sean Christopherson
Rename the "async" ioctl API to "unlocked" so that upcoming usage in x86's TDX code doesn't result in a massive misnomer. To avoid having to retry SEAMCALLs, TDX needs to acquire kvm->lock *and* all vcpu->mutex locks, and acquiring all of those locks after/inside the current vCPU's mutex is a non-starter. However, TDX also needs to acquire the vCPU's mutex and load the vCPU, i.e. the handling is very much not async to the vCPU. No functional change intended. Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-07Merge tag 'hyperv-next-signed-20251006' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Unify guest entry code for KVM and MSHV (Sean Christopherson) - Switch Hyper-V MSI domain to use msi_create_parent_irq_domain() (Nam Cao) - Add CONFIG_HYPERV_VMBUS and limit the semantics of CONFIG_HYPERV (Mukesh Rathor) - Add kexec/kdump support on Azure CVMs (Vitaly Kuznetsov) - Deprecate hyperv_fb in favor of Hyper-V DRM driver (Prasanna Kumar T S M) - Miscellaneous enhancements, fixes and cleanups (Abhishek Tiwari, Alok Tiwari, Nuno Das Neves, Wei Liu, Roman Kisel, Michael Kelley) * tag 'hyperv-next-signed-20251006' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: hyperv: Remove the spurious null directive line MAINTAINERS: Mark hyperv_fb driver Obsolete fbdev/hyperv_fb: deprecate this in favor of Hyper-V DRM driver Drivers: hv: Make CONFIG_HYPERV bool Drivers: hv: Add CONFIG_HYPERV_VMBUS option Drivers: hv: vmbus: Fix typos in vmbus_drv.c Drivers: hv: vmbus: Fix sysfs output format for ring buffer index Drivers: hv: vmbus: Clean up sscanf format specifier in target_cpu_store() x86/hyperv: Switch to msi_create_parent_irq_domain() mshv: Use common "entry virt" APIs to do work in root before running guest entry: Rename "kvm" entry code assets to "virt" to genericize APIs entry/kvm: KVM: Move KVM details related to signal/-EINTR into KVM proper mshv: Handle NEED_RESCHED_LAZY before transferring to guest x86/hyperv: Add kexec/kdump support on Azure CVMs Drivers: hv: Simplify data structures for VMBus channel close message Drivers: hv: util: Cosmetic changes for hv_utils_transport.c mshv: Add support for a new parent partition configuration clocksource: hyper-v: Skip unnecessary checks for the root partition hyperv: Add missing field to hv_output_map_device_interrupt
2025-09-30entry/kvm: KVM: Move KVM details related to signal/-EINTR into KVM properSean Christopherson
Move KVM's morphing of pending signals into userspace exits into KVM proper, and drop the @vcpu param from xfer_to_guest_mode_handle_work(). How KVM responds to -EINTR is a detail that really belongs in KVM itself, and invoking kvm_handle_signal_exit() from kernel code creates an inverted module dependency. E.g. attempting to move kvm_handle_signal_exit() into kvm_main.c would generate an linker error when building kvm.ko as a module. Dropping KVM details will also converting the KVM "entry" code into a more generic virtualization framework so that it can be used when running as a Hyper-V root partition. Lastly, eliminating usage of "struct kvm_vcpu" outside of KVM is also nice to have for KVM x86 developers, as keeping the details of kvm_vcpu purely within KVM allows changing the layout of the structure without having to boot into a new kernel, e.g. allows rebuilding and reloading kvm.ko with a modified kvm_vcpu structure as part of debug/development. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-09-23LoongArch: KVM: Add PTW feature detection on new hardwareBibo Mao
With new Loongson-3A6000/3C6000 hardware platforms (or later), hardware page table walking (PTW) feature is supported on host. So here add this feature detection on KVM host. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-08-20LoongArch: KVM: Make function kvm_own_lbt() robustBibo Mao
Add the flag KVM_LARCH_LBT checking in function kvm_own_lbt(), so that it can be called safely rather than duplicated enabling again. Cc: stable@vger.kernel.org Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-07-21LoongArch: KVM: Add stat information with kernel irqchipBibo Mao
Move stat information about kernel irqchip from VM to vCPU, since all vm exiting events should be vCPU relative. And also add entry with structure kvm_vcpu_stats_desc[], so that it can display with directory /sys/kernel/debug/kvm. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-04-26LoongArch: KVM: Fix PMU pass-through issue if VM exits to host finallyBibo Mao
In function kvm_pre_enter_guest(), it prepares to enter guest and check whether there are pending signals or events. And it will not enter guest if there are, PMU pass-through preparation for guest should be cancelled and host should own PMU hardware. Cc: stable@vger.kernel.org Fixes: f4e40ea9f78f ("LoongArch: KVM: Add PMU support for guest") Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-04-26LoongArch: KVM: Fully clear some CSRs when VM rebootBibo Mao
Some registers such as LOONGARCH_CSR_ESTAT and LOONGARCH_CSR_GINTC are partly cleared with function _kvm_setcsr(). This comes from the hardware specification, some bits are read only in VM mode, and however they can be written in host mode. So they are partly cleared in VM mode, and can be fully cleared in host mode. These read only bits show pending interrupt or exception status. When VM reset, the read-only bits should be cleared, otherwise vCPU will receive unknown interrupts in boot stage. Here registers LOONGARCH_CSR_ESTAT/LOONGARCH_CSR_GINTC are fully cleared in ioctl KVM_REG_LOONGARCH_VCPU_RESET vCPU reset path. Cc: stable@vger.kernel.org Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-03-25Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - Nested virtualization support for VGICv3, giving the nested hypervisor control of the VGIC hardware when running an L2 VM - Removal of 'late' nested virtualization feature register masking, making the supported feature set directly visible to userspace - Support for emulating FEAT_PMUv3 on Apple silicon, taking advantage of an IMPLEMENTATION DEFINED trap that covers all PMUv3 registers - Paravirtual interface for discovering the set of CPU implementations where a VM may run, addressing a longstanding issue of guest CPU errata awareness in big-little systems and cross-implementation VM migration - Userspace control of the registers responsible for identifying a particular CPU implementation (MIDR_EL1, REVIDR_EL1, AIDR_EL1), allowing VMs to be migrated cross-implementation - pKVM updates, including support for tracking stage-2 page table allocations in the protected hypervisor in the 'SecPageTable' stat - Fixes to vPMU, ensuring that userspace updates to the vPMU after KVM_RUN are reflected into the backing perf events LoongArch: - Remove unnecessary header include path - Assume constant PGD during VM context switch - Add perf events support for guest VM RISC-V: - Disable the kernel perf counter during configure - KVM selftests improvements for PMU - Fix warning at the time of KVM module removal x86: - Add support for aging of SPTEs without holding mmu_lock. Not taking mmu_lock allows multiple aging actions to run in parallel, and more importantly avoids stalling vCPUs. This includes an implementation of per-rmap-entry locking; aging the gfn is done with only a per-rmap single-bin spinlock taken, whereas locking an rmap for write requires taking both the per-rmap spinlock and the mmu_lock. Note that this decreases slightly the accuracy of accessed-page information, because changes to the SPTE outside aging might not use atomic operations even if they could race against a clear of the Accessed bit. This is deliberate because KVM and mm/ tolerate false positives/negatives for accessed information, and testing has shown that reducing the latency of aging is far more beneficial to overall system performance than providing "perfect" young/old information. - Defer runtime CPUID updates until KVM emulates a CPUID instruction, to coalesce updates when multiple pieces of vCPU state are changing, e.g. as part of a nested transition - Fix a variety of nested emulation bugs, and add VMX support for synthesizing nested VM-Exit on interception (instead of injecting #UD into L2) - Drop "support" for async page faults for protected guests that do not set SEND_ALWAYS (i.e. that only want async page faults at CPL3) - Bring a bit of sanity to x86's VM teardown code, which has accumulated a lot of cruft over the years. Particularly, destroy vCPUs before the MMU, despite the latter being a VM-wide operation - Add common secure TSC infrastructure for use within SNP and in the future TDX - Block KVM_CAP_SYNC_REGS if guest state is protected. It does not make sense to use the capability if the relevant registers are not available for reading or writing - Don't take kvm->lock when iterating over vCPUs in the suspend notifier to fix a largely theoretical deadlock - Use the vCPU's actual Xen PV clock information when starting the Xen timer, as the cached state in arch.hv_clock can be stale/bogus - Fix a bug where KVM could bleed PVCLOCK_GUEST_STOPPED across different PV clocks; restrict PVCLOCK_GUEST_STOPPED to kvmclock, as KVM's suspend notifier only accounts for kvmclock, and there's no evidence that the flag is actually supported by Xen guests - Clean up the per-vCPU "cache" of its reference pvclock, and instead only track the vCPU's TSC scaling (multipler+shift) metadata (which is moderately expensive to compute, and rarely changes for modern setups) - Don't write to the Xen hypercall page on MSR writes that are initiated by the host (userspace or KVM) to fix a class of bugs where KVM can write to guest memory at unexpected times, e.g. during vCPU creation if userspace has set the Xen hypercall MSR index to collide with an MSR that KVM emulates - Restrict the Xen hypercall MSR index to the unofficial synthetic range to reduce the set of possible collisions with MSRs that are emulated by KVM (collisions can still happen as KVM emulates Hyper-V MSRs, which also reside in the synthetic range) - Clean up and optimize KVM's handling of Xen MSR writes and xen_hvm_config - Update Xen TSC leaves during CPUID emulation instead of modifying the CPUID entries when updating PV clocks; there is no guarantee PV clocks will be updated between TSC frequency changes and CPUID emulation, and guest reads of the TSC leaves should be rare, i.e. are not a hot path x86 (Intel): - Fix a bug where KVM unnecessarily reads XFD_ERR from hardware and thus modifies the vCPU's XFD_ERR on a #NM due to CR0.TS=1 - Pass XFD_ERR as the payload when injecting #NM, as a preparatory step for upcoming FRED virtualization support - Decouple the EPT entry RWX protection bit macros from the EPT Violation bits, both as a general cleanup and in anticipation of adding support for emulating Mode-Based Execution Control (MBEC) - Reject KVM_RUN if userspace manages to gain control and stuff invalid guest state while KVM is in the middle of emulating nested VM-Enter - Add a macro to handle KVM's sanity checks on entry/exit VMCS control pairs in anticipation of adding sanity checks for secondary exit controls (the primary field is out of bits) x86 (AMD): - Ensure the PSP driver is initialized when both the PSP and KVM modules are built-in (the initcall framework doesn't handle dependencies) - Use long-term pins when registering encrypted memory regions, so that the pages are migrated out of MIGRATE_CMA/ZONE_MOVABLE and don't lead to excessive fragmentation - Add macros and helpers for setting GHCB return/error codes - Add support for Idle HLT interception, which elides interception if the vCPU has a pending, unmasked virtual IRQ when HLT is executed - Fix a bug in INVPCID emulation where KVM fails to check for a non-canonical address - Don't attempt VMRUN for SEV-ES+ guests if the vCPU's VMSA is invalid, e.g. because the vCPU was "destroyed" via SNP's AP Creation hypercall - Reject SNP AP Creation if the requested SEV features for the vCPU don't match the VM's configured set of features Selftests: - Fix again the Intel PMU counters test; add a data load and do CLFLUSH{OPT} on the data instead of executing code. The theory is that modern Intel CPUs have learned new code prefetching tricks that bypass the PMU counters - Fix a flaw in the Intel PMU counters test where it asserts that an event is counting correctly without actually knowing what the event counts on the underlying hardware - Fix a variety of flaws, bugs, and false failures/passes dirty_log_test, and improve its coverage by collecting all dirty entries on each iteration - Fix a few minor bugs related to handling of stats FDs - Add infrastructure to make vCPU and VM stats FDs available to tests by default (open the FDs during VM/vCPU creation) - Relax an assertion on the number of HLT exits in the xAPIC IPI test when running on a CPU that supports AMD's Idle HLT (which elides interception of HLT if a virtual IRQ is pending and unmasked)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (216 commits) RISC-V: KVM: Optimize comments in kvm_riscv_vcpu_isa_disable_allowed RISC-V: KVM: Teardown riscv specific bits after kvm_exit LoongArch: KVM: Register perf callbacks for guest LoongArch: KVM: Implement arch-specific functions for guest perf LoongArch: KVM: Add stub for kvm_arch_vcpu_preempted_in_kernel() LoongArch: KVM: Remove PGD saving during VM context switch LoongArch: KVM: Remove unnecessary header include path KVM: arm64: Tear down vGIC on failed vCPU creation KVM: arm64: PMU: Reload when resetting KVM: arm64: PMU: Reload when user modifies registers KVM: arm64: PMU: Fix SET_ONE_REG for vPMC regs KVM: arm64: PMU: Assume PMU presence in pmu-emul.c KVM: arm64: PMU: Set raw values from user to PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu KVM: arm64: Factor out pKVM hyp vcpu creation to separate function KVM: arm64: Initialize HCRX_EL2 traps in pKVM KVM: arm64: Factor out setting HCRX_EL2 traps into separate function KVM: x86: block KVM_CAP_SYNC_REGS if guest state is protected KVM: x86: Add infrastructure for secure TSC KVM: x86: Push down setting vcpu.arch.user_set_tsc ...
2025-03-25Merge tag 'timers-cleanups-2025-03-23' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer cleanups from Thomas Gleixner: "A treewide hrtimer timer cleanup hrtimers are initialized with hrtimer_init() and a subsequent store to the callback pointer. This turned out to be suboptimal for the upcoming Rust integration and is obviously a silly implementation to begin with. This cleanup replaces the hrtimer_init(T); T->function = cb; sequence with hrtimer_setup(T, cb); The conversion was done with Coccinelle and a few manual fixups. Once the conversion has completely landed in mainline, hrtimer_init() will be removed and the hrtimer::function becomes a private member" * tag 'timers-cleanups-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (100 commits) wifi: rt2x00: Switch to use hrtimer_update_function() io_uring: Use helper function hrtimer_update_function() serial: xilinx_uartps: Use helper function hrtimer_update_function() ASoC: fsl: imx-pcm-fiq: Switch to use hrtimer_setup() RDMA: Switch to use hrtimer_setup() virtio: mem: Switch to use hrtimer_setup() drm/vmwgfx: Switch to use hrtimer_setup() drm/xe/oa: Switch to use hrtimer_setup() drm/vkms: Switch to use hrtimer_setup() drm/msm: Switch to use hrtimer_setup() drm/i915/request: Switch to use hrtimer_setup() drm/i915/uncore: Switch to use hrtimer_setup() drm/i915/pmu: Switch to use hrtimer_setup() drm/i915/perf: Switch to use hrtimer_setup() drm/i915/gvt: Switch to use hrtimer_setup() drm/i915/huc: Switch to use hrtimer_setup() drm/amdgpu: Switch to use hrtimer_setup() stm class: heartbeat: Switch to use hrtimer_setup() i2c: Switch to use hrtimer_setup() iio: Switch to use hrtimer_setup() ...
2025-03-18LoongArch: KVM: Implement arch-specific functions for guest perfBibo Mao
Three architecture specific functions are added for the guest perf feature, they are kvm_arch_vcpu_in_kernel(), kvm_arch_vcpu_get_ip() and kvm_arch_pmi_in_guest(). Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-03-18LoongArch: KVM: Add stub for kvm_arch_vcpu_preempted_in_kernel()Bibo Mao
Pause-Loop Exiting is not supported by LoongArch hardware, nor is pv spinlock feature. So function kvm_vcpu_on_spin() is not used. Function kvm_arch_vcpu_preempted_in_kernel() is defined as a stub function here since it is only called by unused function kvm_vcpu_on_spin(). Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-03-18LoongArch: KVM: Remove PGD saving during VM context switchBibo Mao
PGD table for primary mmu keeps unchanged once VM is created, it is not necessary to save PGD table pointer during VM context switch. And it can be acquired when VM is created. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-03-08LoongArch: KVM: Add interrupt checking for AVECBibo Mao
There is a newly added macro INT_AVEC with CSR ESTAT register, which is bit 14 used for LoongArch AVEC support. AVEC interrupt status bit 14 is supported with macro CSR_ESTAT_IS, so here replace the hard-coded value 0x1fff with macro CSR_ESTAT_IS so that the AVEC interrupt status is also supported by KVM. Cc: stable@vger.kernel.org Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-02-18LoongArch: KVM: Switch to use hrtimer_setup()Nam Cao
hrtimer_setup() takes the callback function pointer as argument and initializes the timer completely. Replace hrtimer_init() and the open coded initialization of hrtimer::function with the new setup mechanism. Patch was created by using Coccinelle. Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/a5b1b53813a15a73afdfff6fbb4c9064fa582be1.1738746821.git.namcao@linutronix.de
2025-02-13LoongArch: KVM: Remove duplicated cache attribute settingBibo Mao
Cache attribute comes from GPA->HPA secondary mmu page table and is configured when kvm is enabled. It is the same for all VMs, so remove duplicated cache attribute setting on vCPU context switch. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-01-13LoongArch: KVM: Add hypercall service support for usermode VMMBibo Mao
Some VMMs provides special hypercall service in usermode, KVM should not handle the usermode hypercall service, thus pass it to usermode, let the usermode VMM handle it. Here a new code KVM_HCALL_CODE_USER_SERVICE is added for the user-mode hypercall service, KVM lets all six registers visible to usermode VMM. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-12-02LoongArch: KVM: Protect kvm_check_requests() with SRCUHuacai Chen
When we enable lockdep we get such a warning: ============================= WARNING: suspicious RCU usage 6.12.0-rc7+ #1891 Tainted: G W ----------------------------- include/linux/kvm_host.h:1043 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by qemu-system-loo/948: #0: 90000001184a00a8 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0xf4/0xe20 [kvm] stack backtrace: CPU: 0 UID: 0 PID: 948 Comm: qemu-system-loo Tainted: G W 6.12.0-rc7+ #1891 Tainted: [W]=WARN Hardware name: Loongson Loongson-3A5000-7A1000-1w-CRB/Loongson-LS3A5000-7A1000-1w-CRB, BIOS vUDK2018-LoongArch-V2.0.0-prebeta9 10/21/2022 Stack : 0000000000000089 9000000005a0db9c 90000000071519c8 900000012c578000 900000012c57b920 0000000000000000 900000012c57b928 9000000007e53788 900000000815bcc8 900000000815bcc0 900000012c57b790 0000000000000001 0000000000000001 4b031894b9d6b725 0000000004dec000 90000001003299c0 0000000000000414 0000000000000001 000000000000002d 0000000000000003 0000000000000030 00000000000003b4 0000000004dec000 90000001184a0000 900000000806d000 9000000007e53788 00000000000000b4 0000000000000004 0000000000000004 0000000000000000 0000000000000000 9000000107baf600 9000000008916000 9000000007e53788 9000000005924778 0000000010000044 00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d ... Call Trace: [<9000000005924778>] show_stack+0x38/0x180 [<90000000071519c4>] dump_stack_lvl+0x94/0xe4 [<90000000059eb754>] lockdep_rcu_suspicious+0x194/0x240 [<ffff8000022143bc>] kvm_gfn_to_hva_cache_init+0xfc/0x120 [kvm] [<ffff80000222ade4>] kvm_pre_enter_guest+0x3a4/0x520 [kvm] [<ffff80000222b3dc>] kvm_handle_exit+0x23c/0x480 [kvm] Fix it by protecting kvm_check_requests() with SRCU. Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-13LoongArch: KVM: Add IPI device supportXianglai Li
Add device model for IPI interrupt controller, implement basic create & destroy interfaces, and register device model to kvm device table. Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-10-23LoongArch: KVM: Mark hrtimer to expire in hard interrupt contextHuacai Chen
Like commit 2c0d278f3293f ("KVM: LAPIC: Mark hrtimer to expire in hard interrupt context") and commit 9090825fa9974 ("KVM: arm/arm64: Let the timer expire in hardirq context on RT"), On PREEMPT_RT enabled kernels unmarked hrtimers are moved into soft interrupt expiry mode by default. Then the timers are canceled from an preempt-notifier which is invoked with disabled preemption which is not allowed on PREEMPT_RT. The timer callback is short so in could be invoked in hard-IRQ context. So let the timer expire on hard-IRQ context even on -RT. This fix a "scheduling while atomic" bug for PREEMPT_RT enabled kernels: BUG: scheduling while atomic: qemu-system-loo/1011/0x00000002 Modules linked in: amdgpu rfkill nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ns CPU: 1 UID: 0 PID: 1011 Comm: qemu-system-loo Tainted: G W 6.12.0-rc2+ #1774 Tainted: [W]=WARN Hardware name: Loongson Loongson-3A5000-7A1000-1w-CRB/Loongson-LS3A5000-7A1000-1w-CRB, BIOS vUDK2018-LoongArch-V2.0.0-prebeta9 10/21/2022 Stack : ffffffffffffffff 0000000000000000 9000000004e3ea38 9000000116744000 90000001167475a0 0000000000000000 90000001167475a8 9000000005644830 90000000058dc000 90000000058dbff8 9000000116747420 0000000000000001 0000000000000001 6a613fc938313980 000000000790c000 90000001001c1140 00000000000003fe 0000000000000001 000000000000000d 0000000000000003 0000000000000030 00000000000003f3 000000000790c000 9000000116747830 90000000057ef000 0000000000000000 9000000005644830 0000000000000004 0000000000000000 90000000057f4b58 0000000000000001 9000000116747868 900000000451b600 9000000005644830 9000000003a13998 0000000010000020 00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d ... Call Trace: [<9000000003a13998>] show_stack+0x38/0x180 [<9000000004e3ea34>] dump_stack_lvl+0x84/0xc0 [<9000000003a71708>] __schedule_bug+0x48/0x60 [<9000000004e45734>] __schedule+0x1114/0x1660 [<9000000004e46040>] schedule_rtlock+0x20/0x60 [<9000000004e4e330>] rtlock_slowlock_locked+0x3f0/0x10a0 [<9000000004e4f038>] rt_spin_lock+0x58/0x80 [<9000000003b02d68>] hrtimer_cancel_wait_running+0x68/0xc0 [<9000000003b02e30>] hrtimer_cancel+0x70/0x80 [<ffff80000235eb70>] kvm_restore_timer+0x50/0x1a0 [kvm] [<ffff8000023616c8>] kvm_arch_vcpu_load+0x68/0x2a0 [kvm] [<ffff80000234c2d4>] kvm_sched_in+0x34/0x60 [kvm] [<9000000003a749a0>] finish_task_switch.isra.0+0x140/0x2e0 [<9000000004e44a70>] __schedule+0x450/0x1660 [<9000000004e45cb0>] schedule+0x30/0x180 [<ffff800002354c70>] kvm_vcpu_block+0x70/0x120 [kvm] [<ffff800002354d80>] kvm_vcpu_halt+0x60/0x3e0 [kvm] [<ffff80000235b194>] kvm_handle_gspr+0x3f4/0x4e0 [kvm] [<ffff80000235f548>] kvm_handle_exit+0x1c8/0x260 [kvm] Reviewed-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-09-12LoongArch: KVM: Enable paravirt feature control from VMMBibo Mao
Export kernel paravirt features to user space, so that VMM can control each single paravirt feature. By default paravirt features will be the same with kvm supported features if VMM does not set it. Also a new feature KVM_FEATURE_VIRT_EXTIOI is added which can be set from user space. This feature indicates that the virt EIOINTC can route interrupts to 256 vCPUs, rather than 4 vCPUs like with real HW. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-09-12LoongArch: KVM: Add PMU support for guestSong Gao
On LoongArch, the host and guest have their own PMU CSRs registers and they share PMU hardware resources. A set of PMU CSRs consists of a CTRL register and a CNTR register. We can set which PMU CSRs are used by the guest by writing to the GCFG register [24:26] bits. On KVM side: - Save the host PMU CSRs into structure kvm_context. - If the host supports the PMU feature. - When entering guest mode, save the host PMU CSRs and restore the guest PMU CSRs. - When exiting guest mode, save the guest PMU CSRs and restore the host PMU CSRs. Reviewed-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Song Gao <gaosong@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-09-11LoongArch: KVM: Add vm migration support for LBT registersBibo Mao
Every vcpu has separate LBT registers. And there are four scr registers, one flags and ftop register for LBT extension. When VM migrates, VMM needs to get LBT registers for every vcpu. Here macro KVM_REG_LOONGARCH_LBT is added for new vcpu lbt register type, the following macro is added to get/put LBT registers. KVM_REG_LOONGARCH_LBT_SCR0 KVM_REG_LOONGARCH_LBT_SCR1 KVM_REG_LOONGARCH_LBT_SCR2 KVM_REG_LOONGARCH_LBT_SCR3 KVM_REG_LOONGARCH_LBT_EFLAGS KVM_REG_LOONGARCH_LBT_FTOP Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-09-11LoongArch: KVM: Add Binary Translation extension supportBibo Mao
Loongson Binary Translation (LBT) is used to accelerate binary translation, which contains 4 scratch registers (scr0 to scr3), x86/ARM eflags (eflags) and x87 fpu stack pointer (ftop). Like FPU extension, here a lazy enabling method is used for LBT. the LBT context is saved/restored on the vcpu context switch path. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-09-11LoongArch: KVM: Add VM feature detection functionBibo Mao
Loongson SIMD Extension (LSX), Loongson Advanced SIMD Extension (LASX) and Loongson Binary Translation (LBT) features are defined in register CPUCFG2. Two kinds of LSX/LASX/LBT feature detection are added here, one is VCPU feature, and the other is VM feature. VCPU feature dection can only work with VCPU thread itself, and requires VCPU thread is created already. So LSX/LASX/LBT feature detection for VM is added also, it can be done even if VM is not created, and also can be done by any threads besides VCPU threads. Here ioctl command KVM_HAS_DEVICE_ATTR is added for VM, and macro KVM_LOONGARCH_VM_FEAT_CTRL is added to check supported feature. And five sub-features relative with LSX/LASX/LBT are added as following: KVM_LOONGARCH_VM_FEAT_LSX KVM_LOONGARCH_VM_FEAT_LASX KVM_LOONGARCH_VM_FEAT_X86BT KVM_LOONGARCH_VM_FEAT_ARMBT KVM_LOONGARCH_VM_FEAT_MIPSBT Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-08-26LoongArch: KVM: Invalidate guest steal time address on vCPU resetBibo Mao
If ParaVirt steal time feature is enabled, there is a percpu gpa address passed from guest vCPU and host modifies guest memory space with this gpa address. When vCPU is reset normally, it will notify host and invalidate gpa address. However if VM is crashed and VMM reboots VM forcely, the vCPU reboot notification callback will not be called in VM. Host needs invalidate the gpa address, else host will modify guest memory during VM reboots. Here it is invalidated from the vCPU KVM_REG_LOONGARCH_VCPU_RESET ioctl interface. Also funciton kvm_reset_timer() is removed at vCPU reset stage, since SW emulated timer is only used in vCPU block state. When a vCPU is removed from the block waiting queue, kvm_restore_timer() is called and SW timer is cancelled. And the timer register is also cleared at VMM when a vCPU is reset. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>