lwn.git - Linux kernel documentation tree maintained by Jonathan Corbet

Age	Commit message (Collapse)	Author
2024-11-13	x86: KVM: Advertise CPUIDs for new instructions in Clearwater Forest	Tao Su
	Latest Intel platform Clearwater Forest has introduced new instructions enumerated by CPUIDs of SHA512, SM3, SM4 and AVX-VNNI-INT16. Advertise these CPUIDs to userspace so that guests can query them directly. SHA512, SM3 and SM4 are on an expected-dense CPUID leaf and some other bits on this leaf have kernel usages. Considering they have not truly kernel usages, hide them in /proc/cpuinfo. These new instructions only operate in xmm, ymm registers and have no new VMX controls, so there is no additional host enabling required for guests to use these instructions, i.e. advertising these CPUIDs to userspace is safe. Tested-by: Jiaan Lu <jiaan.lu@intel.com> Tested-by: Xuelian Guo <xuelian.guo@intel.com> Signed-off-by: Tao Su <tao1.su@linux.intel.com> Message-ID: <20241105054825.870939-1-tao1.su@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-11-01	KVM: x86: AMD's IBPB is not equivalent to Intel's IBPB	Jim Mattson
	From Intel's documentation [1], "CPUID.(EAX=07H,ECX=0):EDX[26] enumerates support for indirect branch restricted speculation (IBRS) and the indirect branch predictor barrier (IBPB)." Further, from [2], "Software that executed before the IBPB command cannot control the predicted targets of indirect branches (4) executed after the command on the same logical processor," where footnote 4 reads, "Note that indirect branches include near call indirect, near jump indirect and near return instructions. Because it includes near returns, it follows that RSB entries created before an IBPB command cannot control the predicted targets of returns executed after the command on the same logical processor." [emphasis mine] On the other hand, AMD's IBPB "may not prevent return branch predictions from being specified by pre-IBPB branch targets" [3]. However, some AMD processors have an "enhanced IBPB" [terminology mine] which does clear the return address predictor. This feature is enumerated by CPUID.80000008:EDX.IBPB_RET[bit 30] [4]. Adjust the cross-vendor features enumerated by KVM_GET_SUPPORTED_CPUID accordingly. [1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/cpuid-enumeration-and-architectural-msrs.html [2] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/speculative-execution-side-channel-mitigations.html#Footnotes [3] https://www.amd.com/en/resources/product-security/bulletin/amd-sb-1040.html [4] https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24594.pdf Fixes: 0c54914d0c52 ("KVM: x86: use Intel speculation bugs and features as derived in generic x86 code") Suggested-by: Venkatesh Srinivas <venkateshs@chromium.org> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20241011214353.1625057-5-jmattson@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01	KVM: x86: Advertise AMD_IBPB_RET to userspace	Jim Mattson
	This is an inherent feature of IA32_PRED_CMD[0], so it is trivially virtualizable (as long as IA32_PRED_CMD[0] is virtualized). Suggested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20241011214353.1625057-4-jmattson@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-22	KVM: x86: Advertise AVX10.1 CPUID to userspace	Tao Su
	Advertise AVX10.1 related CPUIDs, i.e. report AVX10 support bit via CPUID.(EAX=07H, ECX=01H):EDX[bit 19] and new CPUID leaf 0x24H so that guest OS and applications can query the AVX10.1 CPUIDs directly. Intel AVX10 represents the first major new vector ISA since the introduction of Intel AVX512, which will establish a common, converged vector instruction set across all Intel architectures[1]. AVX10.1 is an early version of AVX10, that enumerates the Intel AVX512 instruction set at 128, 256, and 512 bits which is enabled on Granite Rapids. I.e., AVX10.1 is only a new CPUID enumeration with no new functionality. New features, e.g. Embedded Rounding and Suppress All Exceptions (SAE) will be introduced in AVX10.2. Advertising AVX10.1 is safe because there is nothing to enable for AVX10.1, i.e. it's purely a new way to enumerate support, thus there will never be anything for the kernel to enable. Note just the CPUID checking is changed when using AVX512 related instructions, e.g. if using one AVX512 instruction needs to check (AVX512 AND AVX512DQ), it can check ((AVX512 AND AVX512DQ) OR AVX10.1) after checking XCR0[7:5]. The versions of AVX10 are expected to be inclusive, e.g. version N+1 is a superset of version N. Per the spec, the version can never be 0, just advertise AVX10.1 if it's supported in hardware. Moreover, advertising AVX10_{128,256,512} needs to land in the same commit as advertising basic AVX10.1 support, otherwise KVM would advertise an impossible CPU model. E.g. a CPU with AVX512 but not AVX10.1/512 is impossible per the SDM. As more and more AVX related CPUIDs are added (it would have resulted in around 40-50 CPUID flags when developing AVX10), the versioning approach is introduced. But incrementing version numbers are bad for virtualization. E.g. if AVX10.2 has a feature that shouldn't be enumerated to guests for whatever reason, then KVM can't enumerate any "later" features either, because the only way to hide the problematic AVX10.2 feature is to set the version to AVX10.1 or lower[2]. But most AVX features are just passed through and don't have virtualization controls, so AVX10 should not be problematic in practice, so long as Intel honors their promise that future versions will be supersets of past versions. [1] https://cdrdv2.intel.com/v1/dl/getContent/784267 [2] https://lore.kernel.org/all/Zkz5Ak0PQlAN8DxK@google.com/ Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Tao Su <tao1.su@linux.intel.com> Link: https://lore.kernel.org/r/20240819062327.3269720-1-tao1.su@linux.intel.com [sean: minor changelog tweaks] Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-07-16	KVM: x86: Introduce kvm_x86_call() to simplify static calls of kvm_x86_ops	Wei Wang
	Introduces kvm_x86_call(), to streamline the usage of static calls of kvm_x86_ops. The current implementation of these calls is verbose and could lead to alignment challenges. This makes the code susceptible to exceeding the "80 columns per single line of code" limit as defined in the coding-style document. Another issue with the existing implementation is that the addition of kvm_x86_ prefix to hooks at the static_call sites hinders code readability and navigation. kvm_x86_call() is added to improve code readability and maintainability, while adhering to the coding style guidelines. Signed-off-by: Wei Wang <wei.w.wang@intel.com> Link: https://lore.kernel.org/r/20240507133103.15052-3-wei.w.wang@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-10	KVM: x86: Bury guest_cpuid_is_amd_or_hygon() in cpuid.c	Sean Christopherson
	Move guest_cpuid_is_amd_or_hygon() into cpuid.c now that, except for one Intel quirk in the emulator, KVM checks for AMD vs. Intel compatible vCPUs, not exact vendors, i.e. now that there should not be any reason for KVM at-large to care about the exact vendor. Opportunistically refactor the guts of the helper to use "entry" instead of "best", and short circuit the !entry path to make the common case more readable. Link: https://lore.kernel.org/r/20240405235603.1173076-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-05-12	Merge tag 'kvm-x86-misc-6.10' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM x86 misc changes for 6.10: - Advertise the max mappable GPA in the "guest MAXPHYADDR" CPUID field, which is unused by hardware, so that KVM can communicate its inability to map GPAs that set bits 51:48 due to lack of 5-level paging. Guest firmware is expected to use the information to safely remap BARs in the uppermost GPA space, i.e to avoid placing a BAR at a legal, but unmappable, GPA. - Use vfree() instead of kvfree() for allocations that always use vcalloc() or __vcalloc(). - Don't completely ignore same-value writes to immutable feature MSRs, as doing so results in KVM failing to reject accesses to MSR that aren't supposed to exist given the vCPU model and/or KVM configuration. - Don't mark APICv as being inhibited due to ABSENT if APICv is disabled KVM-wide to avoid confusing debuggers (KVM will never bother clearing the ABSENT inhibit, even if userspace enables in-kernel local APIC).
2024-05-10	Merge tag 'loongarch-kvm-6.10' of ↵	Paolo Bonzini
	git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD LoongArch KVM changes for v6.10 1. Add ParaVirt IPI support. 2. Add software breakpoint support. 3. Add mmio trace events support.
2024-04-11	KVM: SVM: Invert handling of SEV and SEV_ES feature flags	Sean Christopherson
	Leave SEV and SEV_ES '0' in kvm_cpu_caps by default, and instead set them in sev_set_cpu_caps() if SEV and SEV-ES support are fully enabled. Aside from the fact that sev_set_cpu_caps() is wildly misleading when it clears capabilities, this will allow compiling out sev.c without falsely advertising SEV/SEV-ES support in KVM_GET_SUPPORTED_CPUID. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20240404121327.3107131-2-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-11	KVM: x86: Snapshot if a vCPU's vendor model is AMD vs. Intel compatible	Sean Christopherson
	Add kvm_vcpu_arch.is_amd_compatible to cache if a vCPU's vendor model is compatible with AMD, i.e. if the vCPU vendor is AMD or Hygon, along with helpers to check if a vCPU is compatible AMD vs. Intel. To handle Intel vs. AMD behavior related to masking the LVTPC entry, KVM will need to check for vendor compatibility on every PMI injection, i.e. querying for AMD will soon be a moderately hot path. Note! This subtly (or maybe not-so-subtly) makes "Intel compatible" KVM's default behavior, both if userspace omits (or never sets) CPUID 0x0 and if userspace sets a completely unknown vendor. One could argue that KVM should treat such vCPUs as not being compatible with Intel or AMD, but that would add useless complexity to KVM. KVM needs to do something in the face of vendor specific behavior, and so unless KVM conjured up a magic third option, choosing to treat unknown vendors as neither Intel nor AMD means that checks on AMD compatibility would yield Intel behavior, and checks for Intel compatibility would yield AMD behavior. And that's far worse as it would effectively yield random behavior depending on whether KVM checked for AMD vs. Intel vs. !AMD vs. !Intel. And practically speaking, all x86 CPUs follow either Intel or AMD architecture, i.e. "supporting" an unknown third architecture adds no value. Deliberately don't convert any of the existing guest_cpuid_is_intel() checks, as the Intel side of things is messier due to some flows explicitly checking for exactly vendor==Intel, versus some flows assuming anything that isn't "AMD compatible" gets Intel behavior. The Intel code will be cleaned up in the future. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240405235603.1173076-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-09	KVM: x86: Advertise max mappable GPA in CPUID.0x80000008.GuestPhysBits	Gerd Hoffmann
	Use the GuestPhysBits field in CPUID.0x80000008 to communicate the max mappable GPA to userspace, i.e. the max GPA that is addressable by the CPU itself. Typically this is identical to the max effective GPA, except in the case where the CPU supports MAXPHYADDR > 48 but does not support 5-level TDP (the CPU consults bits 51:48 of the GPA only when walking the fifth level TDP page table entry). Enumerating the max mappable GPA via CPUID will allow guest firmware to map resources like PCI bars in the highest possible address space, while ensuring that the GPA is addressable by the CPU. Without precise knowledge about the max mappable GPA, the guest must assume that 5-level paging is unsupported and thus restrict its mappings to the lower 48 bits. Advertise the max mappable GPA via KVM_GET_SUPPORTED_CPUID as userspace doesn't have easy access to whether or not 5-level paging is supported, and to play nice with userspace VMMs that reflect the supported CPUID directly into the guest. AMD's APM (3.35) defines GuestPhysBits (EAX[23:16]) as: Maximum guest physical address size in bits. This number applies only to guests using nested paging. When this field is zero, refer to the PhysAddrSize field for the maximum guest physical address size. Tom Lendacky confirmed that the purpose of GuestPhysBits is software use and KVM can use it as described above. Real hardware always returns zero. Leave GuestPhysBits as '0' when TDP is disabled in order to comply with the APM's statement that GuestPhysBits "applies only to guest using nested paging". As above, guest firmware will likely create suboptimal mappings, but that is a very minor issue and not a functional concern. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20240313125844.912415-3-kraxel@redhat.com [sean: massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-09	KVM: x86: Don't advertise guest.MAXPHYADDR as host.MAXPHYADDR in CPUID	Gerd Hoffmann
	Drop KVM's propagation of GuestPhysBits (CPUID leaf 80000008, EAX[23:16]) to HostPhysBits (same leaf, EAX[7:0]) when advertising the address widths to userspace via KVM_GET_SUPPORTED_CPUID. Per AMD, GuestPhysBits is intended for software use, and physical CPUs do not set that field. I.e. GuestPhysBits will be non-zero if and only if KVM is running as a nested hypervisor, and in that case, GuestPhysBits is NOT guaranteed to capture the CPU's effective MAXPHYADDR when running with TDP enabled. E.g. KVM will soon use GuestPhysBits to communicate the CPU's maximum addressable guest physical address, which would result in KVM under- reporting PhysBits when running as an L1 on a CPU with MAXPHYADDR=52, but without 5-level paging. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20240313125844.912415-2-kraxel@redhat.com [sean: rewrite changelog with --verbose, Cc stable@] Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-03-06	KVM: x86: Use actual kvm_cpuid.base for clearing KVM_FEATURE_PV_UNHALT	Vitaly Kuznetsov
	Commit ee3a5f9e3d9b ("KVM: x86: Do runtime CPUID update before updating vcpu->arch.cpuid_entries") moved tweaking of the supplied CPUID data earlier in kvm_set_cpuid() but __kvm_update_cpuid_runtime() actually uses 'vcpu->arch.kvm_cpuid' (though __kvm_find_kvm_cpuid_features()) which gets set later in kvm_set_cpuid(). In some cases, e.g. when kvm_set_cpuid() is called for the first time and 'vcpu->arch.kvm_cpuid' is clear, __kvm_find_kvm_cpuid_features() fails to find KVM PV feature entry and the logic which clears KVM_FEATURE_PV_UNHALT after enabling KVM_X86_DISABLE_EXITS_HLT does not work. The logic, introduced by the commit ee3a5f9e3d9b ("KVM: x86: Do runtime CPUID update before updating vcpu->arch.cpuid_entries") must stay: the supplied CPUID data is tweaked by KVM first (__kvm_update_cpuid_runtime()) and checked later (kvm_check_cpuid()) and the actual data (vcpu->arch.cpuid_*, vcpu->arch.kvm_cpuid, vcpu->arch.xen.cpuid,..) is only updated on success. Switch to searching for KVM_SIGNATURE in the supplied CPUID data to discover KVM PV feature entry instead of using stale 'vcpu->arch.kvm_cpuid'. While on it, drop pointless "&& (best->eax & (1 << KVM_FEATURE_PV_UNHALT)" check when clearing KVM_FEATURE_PV_UNHALT bit. Fixes: ee3a5f9e3d9b ("KVM: x86: Do runtime CPUID update before updating vcpu->arch.cpuid_entries") Reported-and-tested-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20240228101837.93642-3-vkuznets@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-03-06	KVM: x86: Introduce __kvm_get_hypervisor_cpuid() helper	Vitaly Kuznetsov
	Similar to kvm_find_kvm_cpuid_features()/__kvm_find_kvm_cpuid_features(), introduce a helper to search for the specific hypervisor signature in any struct kvm_cpuid_entry2 array, not only in vcpu->arch.cpuid_entries. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20240228101837.93642-2-vkuznets@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-01-17	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds
	Pull kvm updates from Paolo Bonzini: "Generic: - Use memdup_array_user() to harden against overflow. - Unconditionally advertise KVM_CAP_DEVICE_CTRL for all architectures. - Clean up Kconfigs that all KVM architectures were selecting - New functionality around "guest_memfd", a new userspace API that creates an anonymous file and returns a file descriptor that refers to it. guest_memfd files are bound to their owning virtual machine, cannot be mapped, read, or written by userspace, and cannot be resized. guest_memfd files do however support PUNCH_HOLE, which can be used to switch a memory area between guest_memfd and regular anonymous memory. - New ioctl KVM_SET_MEMORY_ATTRIBUTES allowing userspace to specify per-page attributes for a given page of guest memory; right now the only attribute is whether the guest expects to access memory via guest_memfd or not, which in Confidential SVMs backed by SEV-SNP, TDX or ARM64 pKVM is checked by firmware or hypervisor that guarantees confidentiality (AMD PSP, Intel TDX module, or EL2 in the case of pKVM). x86: - Support for "software-protected VMs" that can use the new guest_memfd and page attributes infrastructure. This is mostly useful for testing, since there is no pKVM-like infrastructure to provide a meaningfully reduced TCB. - Fix a relatively benign off-by-one error when splitting huge pages during CLEAR_DIRTY_LOG. - Fix a bug where KVM could incorrectly test-and-clear dirty bits in non-leaf TDP MMU SPTEs if a racing thread replaces a huge SPTE with a non-huge SPTE. - Use more generic lockdep assertions in paths that don't actually care about whether the caller is a reader or a writer. - let Xen guests opt out of having PV clock reported as "based on a stable TSC", because some of them don't expect the "TSC stable" bit (added to the pvclock ABI by KVM, but never set by Xen) to be set. - Revert a bogus, made-up nested SVM consistency check for TLB_CONTROL. - Advertise flush-by-ASID support for nSVM unconditionally, as KVM always flushes on nested transitions, i.e. always satisfies flush requests. This allows running bleeding edge versions of VMware Workstation on top of KVM. - Sanity check that the CPU supports flush-by-ASID when enabling SEV support. - On AMD machines with vNMI, always rely on hardware instead of intercepting IRET in some cases to detect unmasking of NMIs - Support for virtualizing Linear Address Masking (LAM) - Fix a variety of vPMU bugs where KVM fail to stop/reset counters and other state prior to refreshing the vPMU model. - Fix a double-overflow PMU bug by tracking emulated counter events using a dedicated field instead of snapshotting the "previous" counter. If the hardware PMC count triggers overflow that is recognized in the same VM-Exit that KVM manually bumps an event count, KVM would pend PMIs for both the hardware-triggered overflow and for KVM-triggered overflow. - Turn off KVM_WERROR by default for all configs so that it's not inadvertantly enabled by non-KVM developers, which can be problematic for subsystems that require no regressions for W=1 builds. - Advertise all of the host-supported CPUID bits that enumerate IA32_SPEC_CTRL "features". - Don't force a masterclock update when a vCPU synchronizes to the current TSC generation, as updating the masterclock can cause kvmclock's time to "jump" unexpectedly, e.g. when userspace hotplugs a pre-created vCPU. - Use RIP-relative address to read kvm_rebooting in the VM-Enter fault paths, partly as a super minor optimization, but mostly to make KVM play nice with position independent executable builds. - Guard KVM-on-HyperV's range-based TLB flush hooks with an #ifdef on CONFIG_HYPERV as a minor optimization, and to self-document the code. - Add CONFIG_KVM_HYPERV to allow disabling KVM support for HyperV "emulation" at build time. ARM64: - LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB base granule sizes. Branch shared with the arm64 tree. - Large Fine-Grained Trap rework, bringing some sanity to the feature, although there is more to come. This comes with a prefix branch shared with the arm64 tree. - Some additional Nested Virtualization groundwork, mostly introducing the NV2 VNCR support and retargetting the NV support to that version of the architecture. - A small set of vgic fixes and associated cleanups. Loongarch: - Optimization for memslot hugepage checking - Cleanup and fix some HW/SW timer issues - Add LSX/LASX (128bit/256bit SIMD) support RISC-V: - KVM_GET_REG_LIST improvement for vector registers - Generate ISA extension reg_list using macros in get-reg-list selftest - Support for reporting steal time along with selftest s390: - Bugfixes Selftests: - Fix an annoying goof where the NX hugepage test prints out garbage instead of the magic token needed to run the test. - Fix build errors when a header is delete/moved due to a missing flag in the Makefile. - Detect if KVM bugged/killed a selftest's VM and print out a helpful message instead of complaining that a random ioctl() failed. - Annotate the guest printf/assert helpers with __printf(), and fix the various bugs that were lurking due to lack of said annotation" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (185 commits) x86/kvm: Do not try to disable kvmclock if it was not enabled KVM: x86: add missing "depends on KVM" KVM: fix direction of dependency on MMU notifiers KVM: introduce CONFIG_KVM_COMMON KVM: arm64: Add missing memory barriers when switching to pKVM's hyp pgd KVM: arm64: vgic-its: Avoid potential UAF in LPI translation cache RISC-V: KVM: selftests: Add get-reg-list test for STA registers RISC-V: KVM: selftests: Add steal_time test support RISC-V: KVM: selftests: Add guest_sbi_probe_extension RISC-V: KVM: selftests: Move sbi_ecall to processor.c RISC-V: KVM: Implement SBI STA extension RISC-V: KVM: Add support for SBI STA registers RISC-V: KVM: Add support for SBI extension registers RISC-V: KVM: Add SBI STA info to vcpu_arch RISC-V: KVM: Add steal-update vcpu request RISC-V: KVM: Add SBI STA extension skeleton RISC-V: paravirt: Implement steal-time support RISC-V: Add SBI STA extension definitions RISC-V: paravirt: Add skeleton for pv-time support RISC-V: KVM: Fix indentation in kvm_riscv_vcpu_set_reg_csr() ...
2024-01-08	Merge tag 'kvm-x86-lam-6.8' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM x86 support for virtualizing Linear Address Masking (LAM) Add KVM support for Linear Address Masking (LAM). LAM tweaks the canonicality checks for most virtual address usage in 64-bit mode, such that only the most significant bit of the untranslated address bits must match the polarity of the last translated address bit. This allows software to use ignored, untranslated address bits for metadata, e.g. to efficiently tag pointers for address sanitization. LAM can be enabled separately for user pointers and supervisor pointers, and for userspace LAM can be select between 48-bit and 57-bit masking - 48-bit LAM: metadata bits 62:48, i.e. LAM width of 15. - 57-bit LAM: metadata bits 62:57, i.e. LAM width of 6. For user pointers, LAM enabling utilizes two previously-reserved high bits from CR3 (similar to how PCID_NOFLUSH uses bit 63): LAM_U48 and LAM_U57, bits 62 and 61 respectively. Note, if LAM_57 is set, LAM_U48 is ignored, i.e.: - CR3.LAM_U48=0 && CR3.LAM_U57=0 == LAM disabled for user pointers - CR3.LAM_U48=1 && CR3.LAM_U57=0 == LAM-48 enabled for user pointers - CR3.LAM_U48=x && CR3.LAM_U57=1 == LAM-57 enabled for user pointers For supervisor pointers, LAM is controlled by a single bit, CR4.LAM_SUP, with the 48-bit versus 57-bit LAM behavior following the current paging mode, i.e.: - CR4.LAM_SUP=0 && CR4.LA57=x == LAM disabled for supervisor pointers - CR4.LAM_SUP=1 && CR4.LA57=0 == LAM-48 enabled for supervisor pointers - CR4.LAM_SUP=1 && CR4.LA57=1 == LAM-57 enabled for supervisor pointers The modified LAM canonicality checks: - LAM_S48 : [ 1 ][ metadata ][ 1 ] 63 47 - LAM_U48 : [ 0 ][ metadata ][ 0 ] 63 47 - LAM_S57 : [ 1 ][ metadata ][ 1 ] 63 56 - LAM_U57 + 5-lvl paging : [ 0 ][ metadata ][ 0 ] 63 56 - LAM_U57 + 4-lvl paging : [ 0 ][ metadata ][ 0...0 ] 63 56..47 The bulk of KVM support for LAM is to emulate LAM's modified canonicality checks. The approach taken by KVM is to "fill" the metadata bits using the highest bit of the translated address, e.g. for LAM-48, bit 47 is sign-extended to bits 62:48. The most significant bit, 63, is not modified, i.e. its value from the raw, untagged virtual address is kept for the canonicality check. This untagging allows Aside from emulating LAM's canonical checks behavior, LAM has the usual KVM touchpoints for selectable features: enumeration (CPUID.7.1:EAX.LAM[bit 26], enabling via CR3 and CR4 bits, etc.
2024-01-08	Merge tag 'kvm-x86-misc-6.8' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM x86 misc changes for 6.8: - Turn off KVM_WERROR by default for all configs so that it's not inadvertantly enabled by non-KVM developers, which can be problematic for subsystems that require no regressions for W=1 builds. - Advertise all of the host-supported CPUID bits that enumerate IA32_SPEC_CTRL "features". - Don't force a masterclock update when a vCPU synchronizes to the current TSC generation, as updating the masterclock can cause kvmclock's time to "jump" unexpectedly, e.g. when userspace hotplugs a pre-created vCPU. - Use RIP-relative address to read kvm_rebooting in the VM-Enter fault paths, partly as a super minor optimization, but mostly to make KVM play nice with position independent executable builds.
2024-01-08	Merge tag 'kvm-x86-hyperv-6.8' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM x86 Hyper-V changes for 6.8: - Guard KVM-on-HyperV's range-based TLB flush hooks with an #ifdef on CONFIG_HYPERV as a minor optimization, and to self-document the code. - Add CONFIG_KVM_HYPERV to allow disabling KVM support for HyperV "emulation" at build time.
2024-01-03	arch/x86: Fix typos	Bjorn Helgaas
	Fix typos, most reported by "codespell arch/x86". Only touches comments, no code changes. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20240103004011.1758650-1-helgaas@kernel.org
2023-12-07	KVM: x86: Make Hyper-V emulation optional	Vitaly Kuznetsov
	Hyper-V emulation in KVM is a fairly big chunk and in some cases it may be desirable to not compile it in to reduce module sizes as well as the attack surface. Introduce CONFIG_KVM_HYPERV option to make it possible. Note, there's room for further nVMX/nSVM code optimizations when !CONFIG_KVM_HYPERV, this will be done in follow-up patches. Reorganize Makefile a bit so all CONFIG_HYPERV and CONFIG_KVM_HYPERV files are grouped together. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Tested-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com> Link: https://lore.kernel.org/r/20231205103630.1391318-13-vkuznets@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-30	KVM: x86: Harden copying of userspace-array against overflow	Philipp Stanner
	cpuid.c utilizes vmemdup_user() and array_size() to copy two userspace arrays. This, currently, does not check for an overflow. Use the new wrapper vmemdup_array_user() to copy the arrays more safely, as vmemdup_user() doesn't check for overflow. Note, KVM explicitly checks the number of entries before duplicating the array, i.e. adding the overflow check should be a glorified nop. Suggested-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Philipp Stanner <pstanner@redhat.com> Link: https://lore.kernel.org/r/20231102181526.43279-2-pstanner@redhat.com [sean: call out that KVM pre-checks the number of entries] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-30	KVM: x86: Advertise CPUID.(EAX=7,ECX=2):EDX[5:0] to userspace	Jim Mattson
	The low five bits {INTEL_PSFD, IPRED_CTRL, RRSBA_CTRL, DDPD_U, BHI_CTRL} advertise the availability of specific bits in IA32_SPEC_CTRL. Since KVM dynamically determines the legal IA32_SPEC_CTRL bits for the underlying hardware, the hard work has already been done. Just let userspace know that a guest can use these IA32_SPEC_CTRL bits. The sixth bit (MCDT_NO) states that the processor does not exhibit MXCSR Configuration Dependent Timing (MCDT) behavior. This is an inherent property of the physical processor that is inherited by the virtual CPU. Pass that information on to userspace. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20231024001636.890236-1-jmattson@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-28	KVM: x86: Advertise and enable LAM (user and supervisor)	Robert Hoo
	LAM is enumerated by CPUID.7.1:EAX.LAM[bit 26]. Advertise the feature to userspace and enable it as the final step after the LAM virtualization support for supervisor and user pointers. SGX LAM support is not advertised yet. SGX LAM support is enumerated in SGX's own CPUID and there's no hard requirement that it must be supported when LAM is reported in CPUID leaf 0x7. Signed-off-by: Robert Hoo <robert.hu@linux.intel.com> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Jingqi Liu <jingqi.liu@intel.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Tested-by: Xuelian Guo <xuelian.guo@intel.com> Link: https://lore.kernel.org/r/20230913124227.12574-13-binbin.wu@linux.intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-10-31	Merge tag 'kvm-x86-xen-6.7' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM x86 Xen changes for 6.7: - Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n. - Use the fast path directly from the timer callback when delivering Xen timer events. Avoid the problematic races with using the fast path by ensuring the hrtimer isn't running when (re)starting the timer or saving the timer information (for userspace). - Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future flag.
2023-10-31	Merge tag 'kvm-x86-misc-6.7' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM x86 misc changes for 6.7: - Add CONFIG_KVM_MAX_NR_VCPUS to allow supporting up to 4096 vCPUs without forcing more common use cases to eat the extra memory overhead. - Add IBPB and SBPB virtualization support. - Fix a bug where restoring a vCPU snapshot that was taken within 1 second of creating the original vCPU would cause KVM to try to synchronize the vCPU's TSC and thus clobber the correct TSC being set by userspace. - Compute guest wall clock using a single TSC read to avoid generating an inaccurate time, e.g. if the vCPU is preempted between multiple TSC reads. - "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which complain about a "Firmware Bug" if the bit isn't set for select F/M/S combos. - Don't apply side effects to Hyper-V's synthetic timer on writes from userspace to fix an issue where the auto-enable behavior can trigger spurious interrupts, i.e. do auto-enabling only for guest writes. - Remove an unnecessary kick of all vCPUs when synchronizing the dirty log without PML enabled. - Advertise "support" for non-serializing FS/GS base MSR writes as appropriate. - Use octal notation for file permissions through KVM x86. - Fix a handful of typo fixes and warts.
2023-10-18	x86: KVM: Add feature flag for CPUID.80000021H:EAX[bit 1]	Jim Mattson
	Define an X86_FEATURE_* flag for CPUID.80000021H:EAX.[bit 1], and advertise the feature to userspace via KVM_GET_SUPPORTED_CPUID. Per AMD's "Processor Programming Reference (PPR) for AMD Family 19h Model 61h, Revision B1 Processors (56713-B1-PUB)," this CPUID bit indicates that a WRMSR to MSR_FS_BASE, MSR_GS_BASE, or MSR_KERNEL_GS_BASE is non-serializing. This is a change in previously architected behavior. Effectively, this CPUID bit is a "defeature" bit, or a reverse polarity feature bit. When this CPUID bit is clear, the feature (serialization on WRMSR to any of these three MSRs) is available. When this CPUID bit is set, the feature is not available. KVM_GET_SUPPORTED_CPUID must pass this bit through from the underlying hardware, if it is set. Leaving the bit clear claims that WRMSR to these three MSRs will be serializing in a guest running under KVM. That isn't true. Though KVM could emulate the feature by intercepting writes to the specified MSRs, it does not do so today. The guest is allowed direct read/write access to these MSRs without interception, so the innate hardware behavior is preserved under KVM. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231005031237.1652871-1-jmattson@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-10-12	KVM: x86: Constrain guest-supported xfeatures only at KVM_GET_XSAVE{2}	Sean Christopherson
	Mask off xfeatures that aren't exposed to the guest only when saving guest state via KVM_GET_XSAVE{2} instead of modifying user_xfeatures directly. Preserving the maximal set of xfeatures in user_xfeatures restores KVM's ABI for KVM_SET_XSAVE, which prior to commit ad856280ddea ("x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0") allowed userspace to load xfeatures that are supported by the host, irrespective of what xfeatures are exposed to the guest. There is no known use case where userspace intentionally loads xfeatures that aren't exposed to the guest, but the bug fixed by commit ad856280ddea was specifically that KVM_GET_SAVE{2} would save xfeatures that weren't exposed to the guest, e.g. would lead to userspace unintentionally loading guest-unsupported xfeatures when live migrating a VM. Restricting KVM_SET_XSAVE to guest-supported xfeatures is especially problematic for QEMU-based setups, as QEMU has a bug where instead of terminating the VM if KVM_SET_XSAVE fails, QEMU instead simply stops loading guest state, i.e. resumes the guest after live migration with incomplete guest state, and ultimately results in guest data corruption. Note, letting userspace restore all host-supported xfeatures does not fix setups where a VM is migrated from a host without commit ad856280ddea, to a target with a subset of host-supported xfeatures. However there is no way to safely address that scenario, e.g. KVM could silently drop the unsupported features, but that would be a clear violation of KVM's ABI and so would require userspace to opt-in, at which point userspace could simply be updated to sanitize the to-be-loaded XSAVE state. Reported-by: Tyler Stachecki <stachecki.tyler@gmail.com> Closes: https://lore.kernel.org/all/20230914010003.358162-1-tstachecki@bloomberg.net Fixes: ad856280ddea ("x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0") Cc: stable@vger.kernel.org Cc: Leonardo Bras <leobras@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Message-Id: <20230928001956.924301-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-04	KVM: x86: Add SBPB support	Josh Poimboeuf
	Add support for the AMD Selective Branch Predictor Barrier (SBPB) by advertising the CPUID bit and handling PRED_CMD writes accordingly. Note, like SRSO_NO and IBPB_BRTYPE before it, advertise support for SBPB even if it's not enumerated by in the raw CPUID. Some CPUs that gained support via a uCode patch don't report SBPB via CPUID (the kernel forces the flag). Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/a4ab1e7fe50096d50fde33e739ed2da40b41ea6a.1692919072.git.jpoimboe@kernel.org Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-10-04	KVM: x86: Add IBPB_BRTYPE support	Josh Poimboeuf
	Add support for the IBPB_BRTYPE CPUID flag, which indicates that IBPB includes branch type prediction flushing. Note, like SRSO_NO, advertise support for IBPB_BRTYPE even if it's not enumerated by in the raw CPUID, i.e. bypass the cpuid_count() in __kvm_cpu_cap_mask(). Some CPUs that gained support via a uCode patch don't report IBPB_BRTYPE via CPUID (the kernel forces the flag). Opportunistically use kvm_cpu_cap_check_and_set() for SRSO_NO instead of manually querying host support (cpu_feature_enabled() and boot_cpu_has() yield the same end result in this case). Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/79d5f5914fb42c2c62418ffbcd78f138645ded21.1692919072.git.jpoimboe@kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-10-04	KVM: X86: Reduce size of kvm_vcpu_arch structure when CONFIG_KVM_XEN=n	Peng Hao
	When CONFIG_KVM_XEN=n, the size of kvm_vcpu_arch can be reduced from 5100+ to 4400+ by adding macro control. Signed-off-by: Peng Hao <flyingpeng@tencent.com> Link: https://lore.kernel.org/all/CAPm50aKwbZGeXPK5uig18Br8CF1hOS71CE2j_dLX+ub7oJdpGg@mail.gmail.com [sean: fix whitespace damage] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-31	Merge tag 'kvm-x86-misc-6.6' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM x86 changes for 6.6: - Misc cleanups - Retry APIC optimized recalculation if a vCPU is added/enabled - Overhaul emergency reboot code to bring SVM up to par with VMX, tie the "emergency disabling" behavior to KVM actually being loaded, and move all of the logic within KVM - Fix user triggerable WARNs in SVM where KVM incorrectly assumes the TSC ratio MSR can diverge from the default iff TSC scaling is enabled, and clean up related code - Add a framework to allow "caching" feature flags so that KVM can check if the guest can use a feature without needing to search guest CPUID
2023-08-17	KVM: x86: Disallow guest CPUID lookups when IRQs are disabled	Sean Christopherson
	Now that KVM has a framework for caching guest CPUID feature flags, add a "rule" that IRQs must be enabled when doing guest CPUID lookups, and enforce the rule via a lockdep assertion. CPUID lookups are slow, and within KVM, IRQs are only ever disabled in hot paths, e.g. the core run loop, fast page fault handling, etc. I.e. querying guest CPUID with IRQs disabled, especially in the run loop, should be avoided. Link: https://lore.kernel.org/r/20230815203653.519297-16-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-17	KVM: x86/mmu: Use KVM-governed feature framework to track "GBPAGES enabled"	Sean Christopherson
	Use the governed feature framework to track whether or not the guest can use 1GiB pages, and drop the one-off helper that wraps the surprisingly non-trivial logic surrounding 1GiB page usage in the guest. No functional change intended. Reviewed-by: Yuan Yao <yuan.yao@intel.com> Link: https://lore.kernel.org/r/20230815203653.519297-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-17	KVM: x86: Add a framework for enabling KVM-governed x86 features	Sean Christopherson
	Introduce yet another X86_FEATURE flag framework to manage and cache KVM governed features (for lack of a better name). "Governed" in this case means that KVM has some level of involvement and/or vested interest in whether or not an X86_FEATURE can be used by the guest. The intent of the framework is twofold: to simplify caching of guest CPUID flags that KVM needs to frequently query, and to add clarity to such caching, e.g. it isn't immediately obvious that SVM's bundle of flags for "optional nested SVM features" track whether or not a flag is exposed to L1. Begrudgingly define KVM_MAX_NR_GOVERNED_FEATURES for the size of the bitmap to avoid exposing governed_features.h in arch/x86/include/asm/, but add a FIXME to call out that it can and should be cleaned up once "struct kvm_vcpu_arch" is no longer expose to the kernel at large. Cc: Zeng Guang <guang.zeng@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Yuan Yao <yuan.yao@intel.com> Link: https://lore.kernel.org/r/20230815203653.519297-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-03	KVM: x86: Advertise AMX-COMPLEX CPUID to userspace	Tao Su
	Latest Intel platform GraniteRapids-D introduces AMX-COMPLEX, which adds two instructions to perform matrix multiplication of two tiles containing complex elements and accumulate the results into a packed single precision tile. AMX-COMPLEX is enumerated via CPUID.(EAX=7,ECX=1):EDX[bit 8] Advertise AMX_COMPLEX if it's supported in hardware. There are no VMX controls for the feature, i.e. the instructions can't be interecepted, and KVM advertises base AMX in CPUID if AMX is supported in hardware, even if KVM doesn't advertise AMX as being supported in XCR0, e.g. because the process didn't opt-in to allocating tile data. Signed-off-by: Tao Su <tao1.su@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20230802022954.193843-1-tao1.su@linux.intel.com [sean: tweak last paragraph of changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02	KVM: x86: Advertise host CPUID 0x80000005 in KVM_GET_SUPPORTED_CPUID	Takahiro Itazuri
	Advertise CPUID 0x80000005 (L1 cache and TLB info) to userspace so that VMMs that reflect KVM_GET_SUPPORTED_CPUID into KVM_SET_CPUID2 will enumerate sane cache/TLB information to the guest. CPUID 0x80000006 (L2 cache and TLB and L3 cache info) has been returned since commit 43d05de2bee7 ("KVM: pass through CPUID(0x80000006)"). Enumerating both 0x80000005 and 0x80000006 with KVM_GET_SUPPORTED_CPUID is better than reporting one or the other, and 0x80000005 could be helpful for VMM to pass it to KVM_SET_CPUID{,2} for the same reason with 0x80000006. Signed-off-by: Takahiro Itazuri <itazur@amazon.com> Link: https://lore.kernel.org/all/ZK7NmfKI9xur%2FMop@google.com Link: https://lore.kernel.org/r/20230712183136.85561-1-itazur@amazon.com [sean: add link, massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-07-27	x86/srso: Add SRSO_NO support	Borislav Petkov (AMD)
	Add support for the CPUID flag which denotes that the CPU is not affected by SRSO. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-07-01	Merge tag 'kvm-x86-pmu-6.5' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM x86/pmu changes for 6.5: - Add support for AMD PerfMonV2, with a variety of cleanups and minor fixes included along the way
2023-06-06	KVM: x86/cpuid: Add AMD CPUID ExtPerfMonAndDbg leaf 0x80000022	Like Xu
	CPUID leaf 0x80000022 i.e. ExtPerfMonAndDbg advertises some new performance monitoring features for AMD processors. Bit 0 of EAX indicates support for Performance Monitoring Version 2 (PerfMonV2) features. If found to be set during PMU initialization, the EBX bits of the same CPUID function can be used to determine the number of available PMCs for different PMU types. Expose the relevant bits via KVM_GET_SUPPORTED_CPUID so that guests can make use of the PerfMonV2 features. Co-developed-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20230603011058.1038821-13-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-06	KVM: x86: Explicitly zero cpuid "0xa" leaf when PMU is disabled	Like Xu
	Add an explicit !enable_pmu check as relying on kvm_pmu_cap to be zeroed isn't obvious. Although when !enable_pmu, KVM will have zero-padded kvm_pmu_cap to do subsequent CPUID leaf assignments. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20230603011058.1038821-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-01	KVM: x86: Update number of entries for KVM_GET_CPUID2 on success, not failure	Sean Christopherson
	Update cpuid->nent if and only if kvm_vcpu_ioctl_get_cpuid2() succeeds. The sole caller copies @cpuid to userspace only on success, i.e. the existing code effectively does nothing. Arguably, KVM should report the number of entries when returning -E2BIG so that userspace doesn't have to guess the size, but all other similar KVM ioctls() don't report the size either, i.e. userspace is conditioned to guess. Suggested-by: Takahiro Itazuri <itazur@amazon.com> Link: https://lore.kernel.org/all/20230410141820.57328-1-itazur@amazon.com Link: https://lore.kernel.org/r/20230526210340.2799158-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-05-21	KVM: x86: Don't adjust guest's CPUID.0x12.1 (allowed SGX enclave XFRM)	Sean Christopherson
	Drop KVM's manipulation of guest's CPUID.0x12.1 ECX and EDX, i.e. the allowed XFRM of SGX enclaves, now that KVM explicitly checks the guest's allowed XCR0 when emulating ECREATE. Note, this could theoretically break a setup where userspace advertises a "bad" XFRM and relies on KVM to provide a sane CPUID model, but QEMU is the only known user of KVM SGX, and QEMU explicitly sets the SGX CPUID XFRM subleaf based on the guest's XCR0. Reviewed-by: Kai Huang <kai.huang@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230503160838.3412617-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-26	Merge tag 'kvm-x86-selftests-6.4' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM selftests, and an AMX/XCR0 bugfix, for 6.4: - Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if XTILE_DATA is not being reported due to userspace not opting in via prctl() - Overhaul the AMX selftests to improve coverage and cleanup the test - Misc cleanups
2023-04-26	Merge tag 'kvm-x86-pmu-6.4' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM x86 PMU changes for 6.4: - Disallow virtualizing legacy LBRs if architectural LBRs are available, the two are mutually exclusive in hardware - Disallow writes to immutable feature MSRs (notably PERF_CAPABILITIES) after KVM_RUN, and overhaul the vmx_pmu_caps selftest to better validate PERF_CAPABILITIES - Apply PMU filters to emulated events and add test coverage to the pmu_event_filter selftest - Misc cleanups and fixes
2023-04-11	KVM: x86: Add a helper to handle filtering of unpermitted XCR0 features	Aaron Lewis
	Add a helper, kvm_get_filtered_xcr0(), to dedup code that needs to account for XCR0 features that require explicit opt-in on a per-process basis. In addition to documenting when KVM should/shouldn't consult xstate_get_guest_group_perm(), the helper will also allow sanitizing the filtered XCR0 to avoid enumerating architecturally illegal XCR0 values, e.g. XTILE_CFG without XTILE_DATA. No functional changes intended. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Mingwei Zhang <mizhang@google.com> [sean: rename helper, move to x86.h, massage changelog] Reviewed-by: Aaron Lewis <aaronlewis@google.com> Tested-by: Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20230405004520.421768-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06	KVM: x86: Add a helper to query whether or not a vCPU has ever run	Sean Christopherson
	Add a helper to query if a vCPU has run so that KVM doesn't have to open code the check on last_vmentry_cpu being set to a magic value. No functional change intended. Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com> Cc: Like Xu <like.xu.linux@gmail.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20230311004618.920745-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-23	x86: KVM: Add common feature flag for AMD's PSFD	Sean Christopherson
	Use a common X86_FEATURE_* flag for AMD's PSFD, and suppress it from /proc/cpuinfo via the standard method of an empty string instead of hacking in a one-off "private" #define in KVM. The request that led to KVM defining its own flag was really just that the feature not show up in /proc/cpuinfo, and additional patches+discussions in the interim have clarified that defining flags in cpufeatures.h purely so that KVM can advertise features to userspace is ok so long as the kernel already uses a word to track the associated CPUID leaf. No functional change intended. Link: https://lore.kernel.org/all/d1b1e0da-29f0-c443-6c86-9549bbe1c79d@redhat.como Link: https://lore.kernel.org/all/YxGZH7aOXQF7Pu5q@nazgul.tnic Link: https://lore.kernel.org/all/Y3O7UYWfOLfJkwM%2F@zn.tnic Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20230124194519.2893234-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-22	KVM: x86: Add helpers to query individual CR0/CR4 bits	Binbin Wu
	Add helpers to check if a specific CR0/CR4 bit is set to avoid a plethora of implicit casts from the "unsigned long" return of kvm_read_cr*_bits(), and to make each caller's intent more obvious. Defer converting helpers that do truly ugly casts from "unsigned long" to "int", e.g. is_pse(), to a future commit so that their conversion is more isolated. Opportunistically drop the superfluous pcid_enabled from kvm_set_cr3(); the local variable is used only once, immediately after its declaration. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20230322045824.22970-2-binbin.wu@linux.intel.com [sean: move "obvious" conversions to this commit, massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-16	kvm: x86: Advertise FLUSH_L1D to user space	Emanuele Giuseppe Esposito
	FLUSH_L1D was already added in 11e34e64e4103, but the feature is not visible to userspace yet. The bit definition: CPUID.(EAX=7,ECX=0):EDX[bit 28] If the feature is supported by the host, kvm should support it too so that userspace can choose whether to expose it to the guest or not. One disadvantage of not exposing it is that the guest will report a non existing vulnerability in /sys/devices/system/cpu/vulnerabilities/mmio_stale_data because the mitigation is present only if the guest supports (FLUSH_L1D and MD_CLEAR) or FB_CLEAR. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20230201132905.549148-4-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-25	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds
	Pull kvm updates from Paolo Bonzini: "ARM: - Provide a virtual cache topology to the guest to avoid inconsistencies with migration on heterogenous systems. Non secure software has no practical need to traverse the caches by set/way in the first place - Add support for taking stage-2 access faults in parallel. This was an accidental omission in the original parallel faults implementation, but should provide a marginal improvement to machines w/o FEAT_HAFDBS (such as hardware from the fruit company) - A preamble to adding support for nested virtualization to KVM, including vEL2 register state, rudimentary nested exception handling and masking unsupported features for nested guests - Fixes to the PSCI relay that avoid an unexpected host SVE trap when resuming a CPU when running pKVM - VGIC maintenance interrupt support for the AIC - Improvements to the arch timer emulation, primarily aimed at reducing the trap overhead of running nested - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the interest of CI systems - Avoid VM-wide stop-the-world operations when a vCPU accesses its own redistributor - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected exceptions in the host - Aesthetic and comment/kerneldoc fixes - Drop the vestiges of the old Columbia mailing list and add [Oliver] as co-maintainer RISC-V: - Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE - Correctly place the guest in S-mode after redirecting a trap to the guest - Redirect illegal instruction traps to guest - SBI PMU support for guest s390: - Sort out confusion between virtual and physical addresses, which currently are the same on s390 - A new ioctl that performs cmpxchg on guest memory - A few fixes x86: - Change tdp_mmu to a read-only parameter - Separate TDP and shadow MMU page fault paths - Enable Hyper-V invariant TSC control - Fix a variety of APICv and AVIC bugs, some of them real-world, some of them affecting architecurally legal but unlikely to happen in practice - Mark APIC timer as expired if its in one-shot mode and the count underflows while the vCPU task was being migrated - Advertise support for Intel's new fast REP string features - Fix a double-shootdown issue in the emergency reboot code - Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give SVM similar treatment to VMX - Update Xen's TSC info CPUID sub-leaves as appropriate - Add support for Hyper-V's extended hypercalls, where "support" at this point is just forwarding the hypercalls to userspace - Clean up the kvm->lock vs. kvm->srcu sequences when updating the PMU and MSR filters - One-off fixes and cleanups - Fix and cleanup the range-based TLB flushing code, used when KVM is running on Hyper-V - Add support for filtering PMU events using a mask. If userspace wants to restrict heavily what events the guest can use, it can now do so without needing an absurd number of filter entries - Clean up KVM's handling of "PMU MSRs to save", especially when vPMU support is disabled - Add PEBS support for Intel Sapphire Rapids - Fix a mostly benign overflow bug in SEV's send\|receive_update_data() - Move several SVM-specific flags into vcpu_svm x86 Intel: - Handle NMI VM-Exits before leaving the noinstr region - A few trivial cleanups in the VM-Enter flows - Stop enabling VMFUNC for L1 purely to document that KVM doesn't support EPTP switching (or any other VM function) for L1 - Fix a crash when using eVMCS's enlighted MSR bitmaps Generic: - Clean up the hardware enable and initialization flow, which was scattered around multiple arch-specific hooks. Instead, just let the arch code call into generic code. Both x86 and ARM should benefit from not having to fight common KVM code's notion of how to do initialization - Account allocations in generic kvm_arch_alloc_vm() - Fix a memory leak if coalesced MMIO unregistration fails selftests: - On x86, cache the CPU vendor (AMD vs. Intel) and use the info to emit the correct hypercall instruction instead of relying on KVM to patch in VMMCALL - Use TAP interface for kvm_binary_stats_test and tsc_msrs_test" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (325 commits) KVM: SVM: hyper-v: placate modpost section mismatch error KVM: x86/mmu: Make tdp_mmu_allowed static KVM: arm64: nv: Use reg_to_encoding() to get sysreg ID KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes KVM: arm64: nv: Filter out unsupported features from ID regs KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2 KVM: arm64: nv: Allow a sysreg to be hidden from userspace only KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor KVM: arm64: nv: Add accessors for SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2 KVM: arm64: nv: Handle SMCs taken from virtual EL2 KVM: arm64: nv: Handle trapped ERET from virtual EL2 KVM: arm64: nv: Inject HVC exceptions to the virtual EL2 KVM: arm64: nv: Support virtual EL2 exceptions KVM: arm64: nv: Handle HCR_EL2.NV system register traps KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state KVM: arm64: nv: Add EL2 system registers to vcpu context KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set KVM: arm64: nv: Introduce nested virtualization VCPU feature KVM: arm64: Use the S2 MMU context to iterate over S2 table ...