summaryrefslogtreecommitdiff
path: root/arch
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2026-03-15 12:22:10 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2026-03-15 12:22:10 -0700
commit11e8c7e9471cf8e6ae6ec7324a3174191cd965e3 (patch)
tree4e18e18bf7c1ff66b5f5a0d609766215ecc56472 /arch
parent4f3df2e5ea69f5717d2721922aff263c31957548 (diff)
parentd2ea4ff1ce50787a98a3900b3fb1636f3620b7cf (diff)
downloadlwn-11e8c7e9471cf8e6ae6ec7324a3174191cd965e3.tar.gz
lwn-11e8c7e9471cf8e6ae6ec7324a3174191cd965e3.zip
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini: "Quite a large pull request, partly due to skipping last week and therefore having material from ~all submaintainers in this one. About a fourth of it is a new selftest, and a couple more changes are large in number of files touched (fixing a -Wflex-array-member-not-at-end compiler warning) or lines changed (reformatting of a table in the API documentation, thanks rST). But who am I kidding---it's a lot of commits and there are a lot of bugs being fixed here, some of them on the nastier side like the RISC-V ones. ARM: - Correctly handle deactivation of interrupts that were activated from LRs. Since EOIcount only denotes deactivation of interrupts that are not present in an LR, start EOIcount deactivation walk *after* the last irq that made it into an LR - Avoid calling into the stubs to probe for ICH_VTR_EL2.TDS when pKVM is already enabled -- not only thhis isn't possible (pKVM will reject the call), but it is also useless: this can only happen for a CPU that has already booted once, and the capability will not change - Fix a couple of low-severity bugs in our S2 fault handling path, affecting the recently introduced LS64 handling and the even more esoteric handling of hwpoison in a nested context - Address yet another syzkaller finding in the vgic initialisation, where we would end-up destroying an uninitialised vgic with nasty consequences - Address an annoying case of pKVM failing to boot when some of the memblock regions that the host is faulting in are not page-aligned - Inject some sanity in the NV stage-2 walker by checking the limits against the advertised PA size, and correctly report the resulting faults PPC: - Fix a PPC e500 build error due to a long-standing wart that was exposed by the recent conversion to kmalloc_obj(); rip out all the ugliness that led to the wart RISC-V: - Prevent speculative out-of-bounds access using array_index_nospec() in APLIC interrupt handling, ONE_REG regiser access, AIA CSR access, float register access, and PMU counter access - Fix potential use-after-free issues in kvm_riscv_gstage_get_leaf(), kvm_riscv_aia_aplic_has_attr(), and kvm_riscv_aia_imsic_has_attr() - Fix potential null pointer dereference in kvm_riscv_vcpu_aia_rmw_topei() - Fix off-by-one array access in SBI PMU - Skip THP support check during dirty logging - Fix error code returned for Smstateen and Ssaia ONE_REG interface - Check host Ssaia extension when creating AIA irqchip x86: - Fix cases where CPUID mitigation features were incorrectly marked as available whenever the kernel used scattered feature words for them - Validate _all_ GVAs, rather than just the first GVA, when processing a range of GVAs for Hyper-V's TLB flush hypercalls - Fix a brown paper bug in add_atomic_switch_msr() - Use hlist_for_each_entry_srcu() when traversing mask_notifier_list, to fix a lockdep warning; KVM doesn't hold RCU, just irq_srcu - Ensure AVIC VMCB fields are initialized if the VM has an in-kernel local APIC (and AVIC is enabled at the module level) - Update CR8 write interception when AVIC is (de)activated, to fix a bug where the guest can run in perpetuity with the CR8 intercept enabled - Add a quirk to skip the consistency check on FREEZE_IN_SMM, i.e. to allow L1 hypervisors to set FREEZE_IN_SMM. This reverts (by default) an unintentional tightening of userspace ABI in 6.17, and provides some amount of backwards compatibility with hypervisors who want to freeze PMCs on VM-Entry - Validate the VMCS/VMCB on return to a nested guest from SMM, because either userspace or the guest could stash invalid values in memory and trigger the processor's consistency checks Generic: - Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from being unnecessary and confusing, triggered compiler warnings due to -Wflex-array-member-not-at-end - Document that vcpu->mutex is take outside of kvm->slots_lock and kvm->slots_arch_lock, which is intentional and desirable despite being rather unintuitive Selftests: - Increase the maximum number of NUMA nodes in the guest_memfd selftest to 64 (from 8)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (43 commits) KVM: selftests: Verify SEV+ guests can read and write EFER, CR0, CR4, and CR8 Documentation: kvm: fix formatting of the quirks table KVM: x86: clarify leave_smm() return value selftests: kvm: add a test that VMX validates controls on RSM selftests: kvm: extract common functionality out of smm_test.c KVM: SVM: check validity of VMCB controls when returning from SMM KVM: VMX: check validity of VMCS controls when returning from SMM KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC KVM: x86: Introduce KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM KVM: x86: Fix SRCU list traversal in kvm_fire_mask_notifiers() KVM: VMX: Fix a wrong MSR update in add_atomic_switch_msr() KVM: x86: hyper-v: Validate all GVAs during PV TLB flush KVM: x86: synthesize CPUID bits only if CPU capability is set KVM: PPC: e500: Rip out "struct tlbe_ref" KVM: PPC: e500: Fix build error due to using kmalloc_obj() with wrong type KVM: selftests: Increase 'maxnode' for guest_memfd tests KVM: arm64: pkvm: Don't reprobe for ICH_VTR_EL2.TDS on CPU hotplug KVM: arm64: vgic: Pick EOIcount deactivations from AP-list tail KVM: arm64: Remove the redundant ISB in __kvm_at_s1e2() ...
Diffstat (limited to 'arch')
-rw-r--r--arch/arm64/include/asm/kvm_host.h3
-rw-r--r--arch/arm64/kernel/cpufeature.c9
-rw-r--r--arch/arm64/kvm/at.c2
-rw-r--r--arch/arm64/kvm/guest.c4
-rw-r--r--arch/arm64/kvm/hyp/nvhe/mem_protect.c2
-rw-r--r--arch/arm64/kvm/mmu.c14
-rw-r--r--arch/arm64/kvm/nested.c27
-rw-r--r--arch/arm64/kvm/vgic/vgic-init.c32
-rw-r--r--arch/arm64/kvm/vgic/vgic-v2.c4
-rw-r--r--arch/arm64/kvm/vgic/vgic-v3.c12
-rw-r--r--arch/arm64/kvm/vgic/vgic.c6
-rw-r--r--arch/loongarch/kvm/vcpu.c2
-rw-r--r--arch/loongarch/kvm/vm.c2
-rw-r--r--arch/mips/kvm/mips.c4
-rw-r--r--arch/powerpc/kvm/book3s.c4
-rw-r--r--arch/powerpc/kvm/booke.c4
-rw-r--r--arch/powerpc/kvm/e500.h6
-rw-r--r--arch/powerpc/kvm/e500_mmu.c4
-rw-r--r--arch/powerpc/kvm/e500_mmu_host.c91
-rw-r--r--arch/riscv/kvm/aia.c15
-rw-r--r--arch/riscv/kvm/aia_aplic.c23
-rw-r--r--arch/riscv/kvm/aia_device.c18
-rw-r--r--arch/riscv/kvm/aia_imsic.c4
-rw-r--r--arch/riscv/kvm/mmu.c6
-rw-r--r--arch/riscv/kvm/vcpu.c2
-rw-r--r--arch/riscv/kvm/vcpu_fp.c17
-rw-r--r--arch/riscv/kvm/vcpu_onereg.c54
-rw-r--r--arch/riscv/kvm/vcpu_pmu.c16
-rw-r--r--arch/riscv/kvm/vm.c2
-rw-r--r--arch/s390/kvm/kvm-s390.c4
-rw-r--r--arch/x86/include/asm/kvm_host.h3
-rw-r--r--arch/x86/include/uapi/asm/kvm.h1
-rw-r--r--arch/x86/kvm/cpuid.c5
-rw-r--r--arch/x86/kvm/hyperv.c9
-rw-r--r--arch/x86/kvm/ioapic.c3
-rw-r--r--arch/x86/kvm/svm/avic.c9
-rw-r--r--arch/x86/kvm/svm/nested.c12
-rw-r--r--arch/x86/kvm/svm/svm.c17
-rw-r--r--arch/x86/kvm/svm/svm.h1
-rw-r--r--arch/x86/kvm/vmx/nested.c61
-rw-r--r--arch/x86/kvm/vmx/nested.h1
-rw-r--r--arch/x86/kvm/vmx/vmx.c10
-rw-r--r--arch/x86/kvm/x86.c4
43 files changed, 335 insertions, 194 deletions
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 2ca264b3db5f..70cb9cfd760a 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -784,6 +784,9 @@ struct kvm_host_data {
/* Number of debug breakpoints/watchpoints for this CPU (minus 1) */
unsigned int debug_brps;
unsigned int debug_wrps;
+
+ /* Last vgic_irq part of the AP list recorded in an LR */
+ struct vgic_irq *last_lr_irq;
};
struct kvm_host_psci_config {
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index c31f8e17732a..32c2dbcc0c64 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2345,6 +2345,15 @@ static bool can_trap_icv_dir_el1(const struct arm64_cpu_capabilities *entry,
!is_midr_in_range_list(has_vgic_v3))
return false;
+ /*
+ * pKVM prevents late onlining of CPUs. This means that whatever
+ * state the capability is in after deprivilege cannot be affected
+ * by a new CPU booting -- this is garanteed to be a CPU we have
+ * already seen, and the cap is therefore unchanged.
+ */
+ if (system_capabilities_finalized() && is_protected_kvm_enabled())
+ return cpus_have_final_cap(ARM64_HAS_ICH_HCR_EL2_TDIR);
+
if (is_kernel_in_hyp_mode())
res.a1 = read_sysreg_s(SYS_ICH_VTR_EL2);
else
diff --git a/arch/arm64/kvm/at.c b/arch/arm64/kvm/at.c
index 6588ea251ed7..c5c5644b1878 100644
--- a/arch/arm64/kvm/at.c
+++ b/arch/arm64/kvm/at.c
@@ -1504,8 +1504,6 @@ int __kvm_at_s1e2(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
fail = true;
}
- isb();
-
if (!fail)
par = read_sysreg_par();
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 1c87699fd886..332c453b87cf 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -29,7 +29,7 @@
#include "trace.h"
-const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
+const struct kvm_stats_desc kvm_vm_stats_desc[] = {
KVM_GENERIC_VM_STATS()
};
@@ -42,7 +42,7 @@ const struct kvm_stats_header kvm_vm_stats_header = {
sizeof(kvm_vm_stats_desc),
};
-const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
+const struct kvm_stats_desc kvm_vcpu_stats_desc[] = {
KVM_GENERIC_VCPU_STATS(),
STATS_DESC_COUNTER(VCPU, hvc_exit_stat),
STATS_DESC_COUNTER(VCPU, wfe_exit_stat),
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 38f66a56a766..d815265bd374 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -518,7 +518,7 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
granule = kvm_granule_size(level);
cur.start = ALIGN_DOWN(addr, granule);
cur.end = cur.start + granule;
- if (!range_included(&cur, range))
+ if (!range_included(&cur, range) && level < KVM_PGTABLE_LAST_LEVEL)
continue;
*range = cur;
return 0;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index ec2eee857208..17d64a1e11e5 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1751,6 +1751,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
force_pte = (max_map_size == PAGE_SIZE);
vma_pagesize = min_t(long, vma_pagesize, max_map_size);
+ vma_shift = __ffs(vma_pagesize);
}
/*
@@ -1837,10 +1838,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
if (exec_fault && s2_force_noncacheable)
ret = -ENOEXEC;
- if (ret) {
- kvm_release_page_unused(page);
- return ret;
- }
+ if (ret)
+ goto out_put_page;
/*
* Guest performs atomic/exclusive operations on memory with unsupported
@@ -1850,7 +1849,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
*/
if (esr_fsc_is_excl_atomic_fault(kvm_vcpu_get_esr(vcpu))) {
kvm_inject_dabt_excl_atomic(vcpu, kvm_vcpu_get_hfar(vcpu));
- return 1;
+ ret = 1;
+ goto out_put_page;
}
if (nested)
@@ -1936,6 +1936,10 @@ out_unlock:
mark_page_dirty_in_slot(kvm, memslot, gfn);
return ret != -EAGAIN ? ret : 0;
+
+out_put_page:
+ kvm_release_page_unused(page);
+ return ret;
}
/* Resolve the access fault by making the page young again. */
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 12c9f6e8dfda..2c43097248b2 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -152,31 +152,31 @@ static int get_ia_size(struct s2_walk_info *wi)
return 64 - wi->t0sz;
}
-static int check_base_s2_limits(struct s2_walk_info *wi,
+static int check_base_s2_limits(struct kvm_vcpu *vcpu, struct s2_walk_info *wi,
int level, int input_size, int stride)
{
- int start_size, ia_size;
+ int start_size, pa_max;
- ia_size = get_ia_size(wi);
+ pa_max = kvm_get_pa_bits(vcpu->kvm);
/* Check translation limits */
switch (BIT(wi->pgshift)) {
case SZ_64K:
- if (level == 0 || (level == 1 && ia_size <= 42))
+ if (level == 0 || (level == 1 && pa_max <= 42))
return -EFAULT;
break;
case SZ_16K:
- if (level == 0 || (level == 1 && ia_size <= 40))
+ if (level == 0 || (level == 1 && pa_max <= 40))
return -EFAULT;
break;
case SZ_4K:
- if (level < 0 || (level == 0 && ia_size <= 42))
+ if (level < 0 || (level == 0 && pa_max <= 42))
return -EFAULT;
break;
}
/* Check input size limits */
- if (input_size > ia_size)
+ if (input_size > pa_max)
return -EFAULT;
/* Check number of entries in starting level table */
@@ -269,16 +269,19 @@ static int walk_nested_s2_pgd(struct kvm_vcpu *vcpu, phys_addr_t ipa,
if (input_size > 48 || input_size < 25)
return -EFAULT;
- ret = check_base_s2_limits(wi, level, input_size, stride);
- if (WARN_ON(ret))
+ ret = check_base_s2_limits(vcpu, wi, level, input_size, stride);
+ if (WARN_ON(ret)) {
+ out->esr = compute_fsc(0, ESR_ELx_FSC_FAULT);
return ret;
+ }
base_lower_bound = 3 + input_size - ((3 - level) * stride +
wi->pgshift);
base_addr = wi->baddr & GENMASK_ULL(47, base_lower_bound);
if (check_output_size(wi, base_addr)) {
- out->esr = compute_fsc(level, ESR_ELx_FSC_ADDRSZ);
+ /* R_BFHQH */
+ out->esr = compute_fsc(0, ESR_ELx_FSC_ADDRSZ);
return 1;
}
@@ -293,8 +296,10 @@ static int walk_nested_s2_pgd(struct kvm_vcpu *vcpu, phys_addr_t ipa,
paddr = base_addr | index;
ret = read_guest_s2_desc(vcpu, paddr, &desc, wi);
- if (ret < 0)
+ if (ret < 0) {
+ out->esr = ESR_ELx_FSC_SEA_TTW(level);
return ret;
+ }
new_desc = desc;
diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index 9b3091ad868c..e9b8b5fc480c 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -143,6 +143,21 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
kvm->arch.vgic.in_kernel = true;
kvm->arch.vgic.vgic_model = type;
kvm->arch.vgic.implementation_rev = KVM_VGIC_IMP_REV_LATEST;
+ kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
+
+ aa64pfr0 = kvm_read_vm_id_reg(kvm, SYS_ID_AA64PFR0_EL1) & ~ID_AA64PFR0_EL1_GIC;
+ pfr1 = kvm_read_vm_id_reg(kvm, SYS_ID_PFR1_EL1) & ~ID_PFR1_EL1_GIC;
+
+ if (type == KVM_DEV_TYPE_ARM_VGIC_V2) {
+ kvm->arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF;
+ } else {
+ INIT_LIST_HEAD(&kvm->arch.vgic.rd_regions);
+ aa64pfr0 |= SYS_FIELD_PREP_ENUM(ID_AA64PFR0_EL1, GIC, IMP);
+ pfr1 |= SYS_FIELD_PREP_ENUM(ID_PFR1_EL1, GIC, GICv3);
+ }
+
+ kvm_set_vm_id_reg(kvm, SYS_ID_AA64PFR0_EL1, aa64pfr0);
+ kvm_set_vm_id_reg(kvm, SYS_ID_PFR1_EL1, pfr1);
kvm_for_each_vcpu(i, vcpu, kvm) {
ret = vgic_allocate_private_irqs_locked(vcpu, type);
@@ -157,25 +172,10 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
vgic_cpu->private_irqs = NULL;
}
+ kvm->arch.vgic.vgic_model = 0;
goto out_unlock;
}
- kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
-
- aa64pfr0 = kvm_read_vm_id_reg(kvm, SYS_ID_AA64PFR0_EL1) & ~ID_AA64PFR0_EL1_GIC;
- pfr1 = kvm_read_vm_id_reg(kvm, SYS_ID_PFR1_EL1) & ~ID_PFR1_EL1_GIC;
-
- if (type == KVM_DEV_TYPE_ARM_VGIC_V2) {
- kvm->arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF;
- } else {
- INIT_LIST_HEAD(&kvm->arch.vgic.rd_regions);
- aa64pfr0 |= SYS_FIELD_PREP_ENUM(ID_AA64PFR0_EL1, GIC, IMP);
- pfr1 |= SYS_FIELD_PREP_ENUM(ID_PFR1_EL1, GIC, GICv3);
- }
-
- kvm_set_vm_id_reg(kvm, SYS_ID_AA64PFR0_EL1, aa64pfr0);
- kvm_set_vm_id_reg(kvm, SYS_ID_PFR1_EL1, pfr1);
-
if (type == KVM_DEV_TYPE_ARM_VGIC_V3)
kvm->arch.vgic.nassgicap = system_supports_direct_sgis();
diff --git a/arch/arm64/kvm/vgic/vgic-v2.c b/arch/arm64/kvm/vgic/vgic-v2.c
index 585491fbda80..cafa3cb32bda 100644
--- a/arch/arm64/kvm/vgic/vgic-v2.c
+++ b/arch/arm64/kvm/vgic/vgic-v2.c
@@ -115,7 +115,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
struct vgic_v2_cpu_if *cpuif = &vgic_cpu->vgic_v2;
u32 eoicount = FIELD_GET(GICH_HCR_EOICOUNT, cpuif->vgic_hcr);
- struct vgic_irq *irq;
+ struct vgic_irq *irq = *host_data_ptr(last_lr_irq);
DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
@@ -123,7 +123,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
vgic_v2_fold_lr(vcpu, cpuif->vgic_lr[lr]);
/* See the GICv3 equivalent for the EOIcount handling rationale */
- list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) {
+ list_for_each_entry_continue(irq, &vgic_cpu->ap_list_head, ap_list) {
u32 lr;
if (!eoicount) {
diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
index 386ddf69a9c5..6a355eca1934 100644
--- a/arch/arm64/kvm/vgic/vgic-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-v3.c
@@ -148,7 +148,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
struct vgic_v3_cpu_if *cpuif = &vgic_cpu->vgic_v3;
u32 eoicount = FIELD_GET(ICH_HCR_EL2_EOIcount, cpuif->vgic_hcr);
- struct vgic_irq *irq;
+ struct vgic_irq *irq = *host_data_ptr(last_lr_irq);
DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
@@ -158,12 +158,12 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
/*
* EOIMode=0: use EOIcount to emulate deactivation. We are
* guaranteed to deactivate in reverse order of the activation, so
- * just pick one active interrupt after the other in the ap_list,
- * and replay the deactivation as if the CPU was doing it. We also
- * rely on priority drop to have taken place, and the list to be
- * sorted by priority.
+ * just pick one active interrupt after the other in the tail part
+ * of the ap_list, past the LRs, and replay the deactivation as if
+ * the CPU was doing it. We also rely on priority drop to have taken
+ * place, and the list to be sorted by priority.
*/
- list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) {
+ list_for_each_entry_continue(irq, &vgic_cpu->ap_list_head, ap_list) {
u64 lr;
/*
diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index 430aa98888fd..e22b79cfff96 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -814,6 +814,9 @@ retry:
static inline void vgic_fold_lr_state(struct kvm_vcpu *vcpu)
{
+ if (!*host_data_ptr(last_lr_irq))
+ return;
+
if (kvm_vgic_global_state.type == VGIC_V2)
vgic_v2_fold_lr_state(vcpu);
else
@@ -960,10 +963,13 @@ static void vgic_flush_lr_state(struct kvm_vcpu *vcpu)
if (irqs_outside_lrs(&als))
vgic_sort_ap_list(vcpu);
+ *host_data_ptr(last_lr_irq) = NULL;
+
list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) {
scoped_guard(raw_spinlock, &irq->irq_lock) {
if (likely(vgic_target_oracle(irq) == vcpu)) {
vgic_populate_lr(vcpu, irq, count++);
+ *host_data_ptr(last_lr_irq) = irq;
}
}
diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
index 09e137f2f841..8ffd50a470e6 100644
--- a/arch/loongarch/kvm/vcpu.c
+++ b/arch/loongarch/kvm/vcpu.c
@@ -14,7 +14,7 @@
#define CREATE_TRACE_POINTS
#include "trace.h"
-const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
+const struct kvm_stats_desc kvm_vcpu_stats_desc[] = {
KVM_GENERIC_VCPU_STATS(),
STATS_DESC_COUNTER(VCPU, int_exits),
STATS_DESC_COUNTER(VCPU, idle_exits),
diff --git a/arch/loongarch/kvm/vm.c b/arch/loongarch/kvm/vm.c
index 5282158b8122..17b3d5b36cfc 100644
--- a/arch/loongarch/kvm/vm.c
+++ b/arch/loongarch/kvm/vm.c
@@ -10,7 +10,7 @@
#include <asm/kvm_eiointc.h>
#include <asm/kvm_pch_pic.h>
-const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
+const struct kvm_stats_desc kvm_vm_stats_desc[] = {
KVM_GENERIC_VM_STATS(),
STATS_DESC_ICOUNTER(VM, pages),
STATS_DESC_ICOUNTER(VM, hugepages),
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 29d9f630edfb..a53abbba43ea 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -38,7 +38,7 @@
#define VECTORSPACING 0x100 /* for EI/VI mode */
#endif
-const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
+const struct kvm_stats_desc kvm_vm_stats_desc[] = {
KVM_GENERIC_VM_STATS()
};
@@ -51,7 +51,7 @@ const struct kvm_stats_header kvm_vm_stats_header = {
sizeof(kvm_vm_stats_desc),
};
-const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
+const struct kvm_stats_desc kvm_vcpu_stats_desc[] = {
KVM_GENERIC_VCPU_STATS(),
STATS_DESC_COUNTER(VCPU, wait_exits),
STATS_DESC_COUNTER(VCPU, cache_exits),
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index d79c5d1098c0..2efbe05caed7 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -38,7 +38,7 @@
/* #define EXIT_DEBUG */
-const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
+const struct kvm_stats_desc kvm_vm_stats_desc[] = {
KVM_GENERIC_VM_STATS(),
STATS_DESC_ICOUNTER(VM, num_2M_pages),
STATS_DESC_ICOUNTER(VM, num_1G_pages)
@@ -53,7 +53,7 @@ const struct kvm_stats_header kvm_vm_stats_header = {
sizeof(kvm_vm_stats_desc),
};
-const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
+const struct kvm_stats_desc kvm_vcpu_stats_desc[] = {
KVM_GENERIC_VCPU_STATS(),
STATS_DESC_COUNTER(VCPU, sum_exits),
STATS_DESC_COUNTER(VCPU, mmio_exits),
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 3401b96be475..f3ddb24ece74 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -36,7 +36,7 @@
unsigned long kvmppc_booke_handlers;
-const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
+const struct kvm_stats_desc kvm_vm_stats_desc[] = {
KVM_GENERIC_VM_STATS(),
STATS_DESC_ICOUNTER(VM, num_2M_pages),
STATS_DESC_ICOUNTER(VM, num_1G_pages)
@@ -51,7 +51,7 @@ const struct kvm_stats_header kvm_vm_stats_header = {
sizeof(kvm_vm_stats_desc),
};
-const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
+const struct kvm_stats_desc kvm_vcpu_stats_desc[] = {
KVM_GENERIC_VCPU_STATS(),
STATS_DESC_COUNTER(VCPU, sum_exits),
STATS_DESC_COUNTER(VCPU, mmio_exits),
diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h
index f9acf866c709..e4469ad73a2e 100644
--- a/arch/powerpc/kvm/e500.h
+++ b/arch/powerpc/kvm/e500.h
@@ -39,15 +39,11 @@ enum vcpu_ftr {
/* bits [6-5] MAS2_X1 and MAS2_X0 and [4-0] bits for WIMGE */
#define E500_TLB_MAS2_ATTR (0x7f)
-struct tlbe_ref {
+struct tlbe_priv {
kvm_pfn_t pfn; /* valid only for TLB0, except briefly */
unsigned int flags; /* E500_TLB_* */
};
-struct tlbe_priv {
- struct tlbe_ref ref;
-};
-
#ifdef CONFIG_KVM_E500V2
struct vcpu_id_table;
#endif
diff --git a/arch/powerpc/kvm/e500_mmu.c b/arch/powerpc/kvm/e500_mmu.c
index 48580c85f23b..75ed1496ead5 100644
--- a/arch/powerpc/kvm/e500_mmu.c
+++ b/arch/powerpc/kvm/e500_mmu.c
@@ -920,12 +920,12 @@ int kvmppc_e500_tlb_init(struct kvmppc_vcpu_e500 *vcpu_e500)
vcpu_e500->gtlb_offset[0] = 0;
vcpu_e500->gtlb_offset[1] = KVM_E500_TLB0_SIZE;
- vcpu_e500->gtlb_priv[0] = kzalloc_objs(struct tlbe_ref,
+ vcpu_e500->gtlb_priv[0] = kzalloc_objs(struct tlbe_priv,
vcpu_e500->gtlb_params[0].entries);
if (!vcpu_e500->gtlb_priv[0])
goto free_vcpu;
- vcpu_e500->gtlb_priv[1] = kzalloc_objs(struct tlbe_ref,
+ vcpu_e500->gtlb_priv[1] = kzalloc_objs(struct tlbe_priv,
vcpu_e500->gtlb_params[1].entries);
if (!vcpu_e500->gtlb_priv[1])
goto free_vcpu;
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index 06caf8bbbe2b..37e0d3d9e244 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -189,16 +189,16 @@ void inval_gtlbe_on_host(struct kvmppc_vcpu_e500 *vcpu_e500, int tlbsel,
{
struct kvm_book3e_206_tlb_entry *gtlbe =
get_entry(vcpu_e500, tlbsel, esel);
- struct tlbe_ref *ref = &vcpu_e500->gtlb_priv[tlbsel][esel].ref;
+ struct tlbe_priv *tlbe = &vcpu_e500->gtlb_priv[tlbsel][esel];
/* Don't bother with unmapped entries */
- if (!(ref->flags & E500_TLB_VALID)) {
- WARN(ref->flags & (E500_TLB_BITMAP | E500_TLB_TLB0),
- "%s: flags %x\n", __func__, ref->flags);
+ if (!(tlbe->flags & E500_TLB_VALID)) {
+ WARN(tlbe->flags & (E500_TLB_BITMAP | E500_TLB_TLB0),
+ "%s: flags %x\n", __func__, tlbe->flags);
WARN_ON(tlbsel == 1 && vcpu_e500->g2h_tlb1_map[esel]);
}
- if (tlbsel == 1 && ref->flags & E500_TLB_BITMAP) {
+ if (tlbsel == 1 && tlbe->flags & E500_TLB_BITMAP) {
u64 tmp = vcpu_e500->g2h_tlb1_map[esel];
int hw_tlb_indx;
unsigned long flags;
@@ -216,28 +216,28 @@ void inval_gtlbe_on_host(struct kvmppc_vcpu_e500 *vcpu_e500, int tlbsel,
}
mb();
vcpu_e500->g2h_tlb1_map[esel] = 0;
- ref->flags &= ~(E500_TLB_BITMAP | E500_TLB_VALID);
+ tlbe->flags &= ~(E500_TLB_BITMAP | E500_TLB_VALID);
local_irq_restore(flags);
}
- if (tlbsel == 1 && ref->flags & E500_TLB_TLB0) {
+ if (tlbsel == 1 && tlbe->flags & E500_TLB_TLB0) {
/*
* TLB1 entry is backed by 4k pages. This should happen
* rarely and is not worth optimizing. Invalidate everything.
*/
kvmppc_e500_tlbil_all(vcpu_e500);
- ref->flags &= ~(E500_TLB_TLB0 | E500_TLB_VALID);
+ tlbe->flags &= ~(E500_TLB_TLB0 | E500_TLB_VALID);
}
/*
* If TLB entry is still valid then it's a TLB0 entry, and thus
* backed by at most one host tlbe per shadow pid
*/
- if (ref->flags & E500_TLB_VALID)
+ if (tlbe->flags & E500_TLB_VALID)
kvmppc_e500_tlbil_one(vcpu_e500, gtlbe);
/* Mark the TLB as not backed by the host anymore */
- ref->flags = 0;
+ tlbe->flags = 0;
}
static inline int tlbe_is_writable(struct kvm_book3e_206_tlb_entry *tlbe)
@@ -245,26 +245,26 @@ static inline int tlbe_is_writable(struct kvm_book3e_206_tlb_entry *tlbe)
return tlbe->mas7_3 & (MAS3_SW|MAS3_UW);
}
-static inline void kvmppc_e500_ref_setup(struct tlbe_ref *ref,
- struct kvm_book3e_206_tlb_entry *gtlbe,
- kvm_pfn_t pfn, unsigned int wimg,
- bool writable)
+static inline void kvmppc_e500_tlbe_setup(struct tlbe_priv *tlbe,
+ struct kvm_book3e_206_tlb_entry *gtlbe,
+ kvm_pfn_t pfn, unsigned int wimg,
+ bool writable)
{
- ref->pfn = pfn;
- ref->flags = E500_TLB_VALID;
+ tlbe->pfn = pfn;
+ tlbe->flags = E500_TLB_VALID;
if (writable)
- ref->flags |= E500_TLB_WRITABLE;
+ tlbe->flags |= E500_TLB_WRITABLE;
/* Use guest supplied MAS2_G and MAS2_E */
- ref->flags |= (gtlbe->mas2 & MAS2_ATTRIB_MASK) | wimg;
+ tlbe->flags |= (gtlbe->mas2 & MAS2_ATTRIB_MASK) | wimg;
}
-static inline void kvmppc_e500_ref_release(struct tlbe_ref *ref)
+static inline void kvmppc_e500_tlbe_release(struct tlbe_priv *tlbe)
{
- if (ref->flags & E500_TLB_VALID) {
+ if (tlbe->flags & E500_TLB_VALID) {
/* FIXME: don't log bogus pfn for TLB1 */
- trace_kvm_booke206_ref_release(ref->pfn, ref->flags);
- ref->flags = 0;
+ trace_kvm_booke206_ref_release(tlbe->pfn, tlbe->flags);
+ tlbe->flags = 0;
}
}
@@ -284,11 +284,8 @@ static void clear_tlb_privs(struct kvmppc_vcpu_e500 *vcpu_e500)
int i;
for (tlbsel = 0; tlbsel <= 1; tlbsel++) {
- for (i = 0; i < vcpu_e500->gtlb_params[tlbsel].entries; i++) {
- struct tlbe_ref *ref =
- &vcpu_e500->gtlb_priv[tlbsel][i].ref;
- kvmppc_e500_ref_release(ref);
- }
+ for (i = 0; i < vcpu_e500->gtlb_params[tlbsel].entries; i++)
+ kvmppc_e500_tlbe_release(&vcpu_e500->gtlb_priv[tlbsel][i]);
}
}
@@ -304,18 +301,18 @@ void kvmppc_core_flush_tlb(struct kvm_vcpu *vcpu)
static void kvmppc_e500_setup_stlbe(
struct kvm_vcpu *vcpu,
struct kvm_book3e_206_tlb_entry *gtlbe,
- int tsize, struct tlbe_ref *ref, u64 gvaddr,
+ int tsize, struct tlbe_priv *tlbe, u64 gvaddr,
struct kvm_book3e_206_tlb_entry *stlbe)
{
- kvm_pfn_t pfn = ref->pfn;
+ kvm_pfn_t pfn = tlbe->pfn;
u32 pr = vcpu->arch.shared->msr & MSR_PR;
- bool writable = !!(ref->flags & E500_TLB_WRITABLE);
+ bool writable = !!(tlbe->flags & E500_TLB_WRITABLE);
- BUG_ON(!(ref->flags & E500_TLB_VALID));
+ BUG_ON(!(tlbe->flags & E500_TLB_VALID));
/* Force IPROT=0 for all guest mappings. */
stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID;
- stlbe->mas2 = (gvaddr & MAS2_EPN) | (ref->flags & E500_TLB_MAS2_ATTR);
+ stlbe->mas2 = (gvaddr & MAS2_EPN) | (tlbe->flags & E500_TLB_MAS2_ATTR);
stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) |
e500_shadow_mas3_attrib(gtlbe->mas7_3, writable, pr);
}
@@ -323,7 +320,7 @@ static void kvmppc_e500_setup_stlbe(
static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
u64 gvaddr, gfn_t gfn, struct kvm_book3e_206_tlb_entry *gtlbe,
int tlbsel, struct kvm_book3e_206_tlb_entry *stlbe,
- struct tlbe_ref *ref)
+ struct tlbe_priv *tlbe)
{
struct kvm_memory_slot *slot;
unsigned int psize;
@@ -455,9 +452,9 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
}
}
- kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg, writable);
+ kvmppc_e500_tlbe_setup(tlbe, gtlbe, pfn, wimg, writable);
kvmppc_e500_setup_stlbe(&vcpu_e500->vcpu, gtlbe, tsize,
- ref, gvaddr, stlbe);
+ tlbe, gvaddr, stlbe);
writable = tlbe_is_writable(stlbe);
/* Clear i-cache for new pages */
@@ -474,17 +471,17 @@ static int kvmppc_e500_tlb0_map(struct kvmppc_vcpu_e500 *vcpu_e500, int esel,
struct kvm_book3e_206_tlb_entry *stlbe)
{
struct kvm_book3e_206_tlb_entry *gtlbe;
- struct tlbe_ref *ref;
+ struct tlbe_priv *tlbe;
int stlbsel = 0;
int sesel = 0;
int r;
gtlbe = get_entry(vcpu_e500, 0, esel);
- ref = &vcpu_e500->gtlb_priv[0][esel].ref;
+ tlbe = &vcpu_e500->gtlb_priv[0][esel];
r = kvmppc_e500_shadow_map(vcpu_e500, get_tlb_eaddr(gtlbe),
get_tlb_raddr(gtlbe) >> PAGE_SHIFT,
- gtlbe, 0, stlbe, ref);
+ gtlbe, 0, stlbe, tlbe);
if (r)
return r;
@@ -494,7 +491,7 @@ static int kvmppc_e500_tlb0_map(struct kvmppc_vcpu_e500 *vcpu_e500, int esel,
}
static int kvmppc_e500_tlb1_map_tlb1(struct kvmppc_vcpu_e500 *vcpu_e500,
- struct tlbe_ref *ref,
+ struct tlbe_priv *tlbe,
int esel)
{
unsigned int sesel = vcpu_e500->host_tlb1_nv++;
@@ -507,10 +504,10 @@ static int kvmppc_e500_tlb1_map_tlb1(struct kvmppc_vcpu_e500 *vcpu_e500,
vcpu_e500->g2h_tlb1_map[idx] &= ~(1ULL << sesel);
}
- vcpu_e500->gtlb_priv[1][esel].ref.flags |= E500_TLB_BITMAP;
+ vcpu_e500->gtlb_priv[1][esel].flags |= E500_TLB_BITMAP;
vcpu_e500->g2h_tlb1_map[esel] |= (u64)1 << sesel;
vcpu_e500->h2g_tlb1_rmap[sesel] = esel + 1;
- WARN_ON(!(ref->flags & E500_TLB_VALID));
+ WARN_ON(!(tlbe->flags & E500_TLB_VALID));
return sesel;
}
@@ -522,24 +519,24 @@ static int kvmppc_e500_tlb1_map(struct kvmppc_vcpu_e500 *vcpu_e500,
u64 gvaddr, gfn_t gfn, struct kvm_book3e_206_tlb_entry *gtlbe,
struct kvm_book3e_206_tlb_entry *stlbe, int esel)
{
- struct tlbe_ref *ref = &vcpu_e500->gtlb_priv[1][esel].ref;
+ struct tlbe_priv *tlbe = &vcpu_e500->gtlb_priv[1][esel];
int sesel;
int r;
r = kvmppc_e500_shadow_map(vcpu_e500, gvaddr, gfn, gtlbe, 1, stlbe,
- ref);
+ tlbe);
if (r)
return r;
/* Use TLB0 when we can only map a page with 4k */
if (get_tlb_tsize(stlbe) == BOOK3E_PAGESZ_4K) {
- vcpu_e500->gtlb_priv[1][esel].ref.flags |= E500_TLB_TLB0;
+ vcpu_e500->gtlb_priv[1][esel].flags |= E500_TLB_TLB0;
write_stlbe(vcpu_e500, gtlbe, stlbe, 0, 0);
return 0;
}
/* Otherwise map into TLB1 */
- sesel = kvmppc_e500_tlb1_map_tlb1(vcpu_e500, ref, esel);
+ sesel = kvmppc_e500_tlb1_map_tlb1(vcpu_e500, tlbe, esel);
write_stlbe(vcpu_e500, gtlbe, stlbe, 1, sesel);
return 0;
@@ -561,11 +558,11 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 eaddr, gpa_t gpaddr,
priv = &vcpu_e500->gtlb_priv[tlbsel][esel];
/* Triggers after clear_tlb_privs or on initial mapping */
- if (!(priv->ref.flags & E500_TLB_VALID)) {
+ if (!(priv->flags & E500_TLB_VALID)) {
kvmppc_e500_tlb0_map(vcpu_e500, esel, &stlbe);
} else {
kvmppc_e500_setup_stlbe(vcpu, gtlbe, BOOK3E_PAGESZ_4K,
- &priv->ref, eaddr, &stlbe);
+ priv, eaddr, &stlbe);
write_stlbe(vcpu_e500, gtlbe, &stlbe, 0, 0);
}
break;
diff --git a/arch/riscv/kvm/aia.c b/arch/riscv/kvm/aia.c
index cac3c2b51d72..5ec503288555 100644
--- a/arch/riscv/kvm/aia.c
+++ b/arch/riscv/kvm/aia.c
@@ -13,6 +13,7 @@
#include <linux/irqchip/riscv-imsic.h>
#include <linux/irqdomain.h>
#include <linux/kvm_host.h>
+#include <linux/nospec.h>
#include <linux/percpu.h>
#include <linux/spinlock.h>
#include <asm/cpufeature.h>
@@ -182,9 +183,14 @@ int kvm_riscv_vcpu_aia_get_csr(struct kvm_vcpu *vcpu,
unsigned long *out_val)
{
struct kvm_vcpu_aia_csr *csr = &vcpu->arch.aia_context.guest_csr;
+ unsigned long regs_max = sizeof(struct kvm_riscv_aia_csr) / sizeof(unsigned long);
- if (reg_num >= sizeof(struct kvm_riscv_aia_csr) / sizeof(unsigned long))
+ if (!riscv_isa_extension_available(vcpu->arch.isa, SSAIA))
return -ENOENT;
+ if (reg_num >= regs_max)
+ return -ENOENT;
+
+ reg_num = array_index_nospec(reg_num, regs_max);
*out_val = 0;
if (kvm_riscv_aia_available())
@@ -198,9 +204,14 @@ int kvm_riscv_vcpu_aia_set_csr(struct kvm_vcpu *vcpu,
unsigned long val)
{
struct kvm_vcpu_aia_csr *csr = &vcpu->arch.aia_context.guest_csr;
+ unsigned long regs_max = sizeof(struct kvm_riscv_aia_csr) / sizeof(unsigned long);
- if (reg_num >= sizeof(struct kvm_riscv_aia_csr) / sizeof(unsigned long))
+ if (!riscv_isa_extension_available(vcpu->arch.isa, SSAIA))
return -ENOENT;
+ if (reg_num >= regs_max)
+ return -ENOENT;
+
+ reg_num = array_index_nospec(reg_num, regs_max);
if (kvm_riscv_aia_available()) {
((unsigned long *)csr)[reg_num] = val;
diff --git a/arch/riscv/kvm/aia_aplic.c b/arch/riscv/kvm/aia_aplic.c
index d1e50bf5c351..3464f3351df7 100644
--- a/arch/riscv/kvm/aia_aplic.c
+++ b/arch/riscv/kvm/aia_aplic.c
@@ -10,6 +10,7 @@
#include <linux/irqchip/riscv-aplic.h>
#include <linux/kvm_host.h>
#include <linux/math.h>
+#include <linux/nospec.h>
#include <linux/spinlock.h>
#include <linux/swab.h>
#include <kvm/iodev.h>
@@ -45,7 +46,7 @@ static u32 aplic_read_sourcecfg(struct aplic *aplic, u32 irq)
if (!irq || aplic->nr_irqs <= irq)
return 0;
- irqd = &aplic->irqs[irq];
+ irqd = &aplic->irqs[array_index_nospec(irq, aplic->nr_irqs)];
raw_spin_lock_irqsave(&irqd->lock, flags);
ret = irqd->sourcecfg;
@@ -61,7 +62,7 @@ static void aplic_write_sourcecfg(struct aplic *aplic, u32 irq, u32 val)
if (!irq || aplic->nr_irqs <= irq)
return;
- irqd = &aplic->irqs[irq];
+ irqd = &aplic->irqs[array_index_nospec(irq, aplic->nr_irqs)];
if (val & APLIC_SOURCECFG_D)
val = 0;
@@ -81,7 +82,7 @@ static u32 aplic_read_target(struct aplic *aplic, u32 irq)
if (!irq || aplic->nr_irqs <= irq)
return 0;
- irqd = &aplic->irqs[irq];
+ irqd = &aplic->irqs[array_index_nospec(irq, aplic->nr_irqs)];
raw_spin_lock_irqsave(&irqd->lock, flags);
ret = irqd->target;
@@ -97,7 +98,7 @@ static void aplic_write_target(struct aplic *aplic, u32 irq, u32 val)
if (!irq || aplic->nr_irqs <= irq)
return;
- irqd = &aplic->irqs[irq];
+ irqd = &aplic->irqs[array_index_nospec(irq, aplic->nr_irqs)];
val &= APLIC_TARGET_EIID_MASK |
(APLIC_TARGET_HART_IDX_MASK << APLIC_TARGET_HART_IDX_SHIFT) |
@@ -116,7 +117,7 @@ static bool aplic_read_pending(struct aplic *aplic, u32 irq)
if (!irq || aplic->nr_irqs <= irq)
return false;
- irqd = &aplic->irqs[irq];
+ irqd = &aplic->irqs[array_index_nospec(irq, aplic->nr_irqs)];
raw_spin_lock_irqsave(&irqd->lock, flags);
ret = (irqd->state & APLIC_IRQ_STATE_PENDING) ? true : false;
@@ -132,7 +133,7 @@ static void aplic_write_pending(struct aplic *aplic, u32 irq, bool pending)
if (!irq || aplic->nr_irqs <= irq)
return;
- irqd = &aplic->irqs[irq];
+ irqd = &aplic->irqs[array_index_nospec(irq, aplic->nr_irqs)];
raw_spin_lock_irqsave(&irqd->lock, flags);
@@ -170,7 +171,7 @@ static bool aplic_read_enabled(struct aplic *aplic, u32 irq)
if (!irq || aplic->nr_irqs <= irq)
return false;
- irqd = &aplic->irqs[irq];
+ irqd = &aplic->irqs[array_index_nospec(irq, aplic->nr_irqs)];
raw_spin_lock_irqsave(&irqd->lock, flags);
ret = (irqd->state & APLIC_IRQ_STATE_ENABLED) ? true : false;
@@ -186,7 +187,7 @@ static void aplic_write_enabled(struct aplic *aplic, u32 irq, bool enabled)
if (!irq || aplic->nr_irqs <= irq)
return;
- irqd = &aplic->irqs[irq];
+ irqd = &aplic->irqs[array_index_nospec(irq, aplic->nr_irqs)];
raw_spin_lock_irqsave(&irqd->lock, flags);
if (enabled)
@@ -205,7 +206,7 @@ static bool aplic_read_input(struct aplic *aplic, u32 irq)
if (!irq || aplic->nr_irqs <= irq)
return false;
- irqd = &aplic->irqs[irq];
+ irqd = &aplic->irqs[array_index_nospec(irq, aplic->nr_irqs)];
raw_spin_lock_irqsave(&irqd->lock, flags);
@@ -254,7 +255,7 @@ static void aplic_update_irq_range(struct kvm *kvm, u32 first, u32 last)
for (irq = first; irq <= last; irq++) {
if (!irq || aplic->nr_irqs <= irq)
continue;
- irqd = &aplic->irqs[irq];
+ irqd = &aplic->irqs[array_index_nospec(irq, aplic->nr_irqs)];
raw_spin_lock_irqsave(&irqd->lock, flags);
@@ -283,7 +284,7 @@ int kvm_riscv_aia_aplic_inject(struct kvm *kvm, u32 source, bool level)
if (!aplic || !source || (aplic->nr_irqs <= source))
return -ENODEV;
- irqd = &aplic->irqs[source];
+ irqd = &aplic->irqs[array_index_nospec(source, aplic->nr_irqs)];
ie = (aplic->domaincfg & APLIC_DOMAINCFG_IE) ? true : false;
raw_spin_lock_irqsave(&irqd->lock, flags);
diff --git a/arch/riscv/kvm/aia_device.c b/arch/riscv/kvm/aia_device.c
index b195a93add1c..49c71d3cdb00 100644
--- a/arch/riscv/kvm/aia_device.c
+++ b/arch/riscv/kvm/aia_device.c
@@ -11,6 +11,7 @@
#include <linux/irqchip/riscv-imsic.h>
#include <linux/kvm_host.h>
#include <linux/uaccess.h>
+#include <linux/cpufeature.h>
static int aia_create(struct kvm_device *dev, u32 type)
{
@@ -22,6 +23,9 @@ static int aia_create(struct kvm_device *dev, u32 type)
if (irqchip_in_kernel(kvm))
return -EEXIST;
+ if (!riscv_isa_extension_available(NULL, SSAIA))
+ return -ENODEV;
+
ret = -EBUSY;
if (kvm_trylock_all_vcpus(kvm))
return ret;
@@ -437,7 +441,7 @@ static int aia_get_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
static int aia_has_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
{
- int nr_vcpus;
+ int nr_vcpus, r = -ENXIO;
switch (attr->group) {
case KVM_DEV_RISCV_AIA_GRP_CONFIG:
@@ -466,12 +470,18 @@ static int aia_has_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
}
break;
case KVM_DEV_RISCV_AIA_GRP_APLIC:
- return kvm_riscv_aia_aplic_has_attr(dev->kvm, attr->attr);
+ mutex_lock(&dev->kvm->lock);
+ r = kvm_riscv_aia_aplic_has_attr(dev->kvm, attr->attr);
+ mutex_unlock(&dev->kvm->lock);
+ break;
case KVM_DEV_RISCV_AIA_GRP_IMSIC:
- return kvm_riscv_aia_imsic_has_attr(dev->kvm, attr->attr);
+ mutex_lock(&dev->kvm->lock);
+ r = kvm_riscv_aia_imsic_has_attr(dev->kvm, attr->attr);
+ mutex_unlock(&dev->kvm->lock);
+ break;
}
- return -ENXIO;
+ return r;
}
struct kvm_device_ops kvm_riscv_aia_device_ops = {
diff --git a/arch/riscv/kvm/aia_imsic.c b/arch/riscv/kvm/aia_imsic.c
index 06752fa24798..8786f52cf65a 100644
--- a/arch/riscv/kvm/aia_imsic.c
+++ b/arch/riscv/kvm/aia_imsic.c
@@ -908,6 +908,10 @@ int kvm_riscv_vcpu_aia_imsic_rmw(struct kvm_vcpu *vcpu, unsigned long isel,
int r, rc = KVM_INSN_CONTINUE_NEXT_SEPC;
struct imsic *imsic = vcpu->arch.aia_context.imsic_state;
+ /* If IMSIC vCPU state not initialized then forward to user space */
+ if (!imsic)
+ return KVM_INSN_EXIT_TO_USER_SPACE;
+
if (isel == KVM_RISCV_AIA_IMSIC_TOPEI) {
/* Read pending and enabled interrupt with highest priority */
topei = imsic_mrif_topei(imsic->swfile, imsic->nr_eix,
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 0b75eb2a1820..088d33ba90ed 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -245,6 +245,7 @@ out:
bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
{
struct kvm_gstage gstage;
+ bool mmu_locked;
if (!kvm->arch.pgd)
return false;
@@ -253,9 +254,12 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
gstage.flags = 0;
gstage.vmid = READ_ONCE(kvm->arch.vmid.vmid);
gstage.pgd = kvm->arch.pgd;
+ mmu_locked = spin_trylock(&kvm->mmu_lock);
kvm_riscv_gstage_unmap_range(&gstage, range->start << PAGE_SHIFT,
(range->end - range->start) << PAGE_SHIFT,
range->may_block);
+ if (mmu_locked)
+ spin_unlock(&kvm->mmu_lock);
return false;
}
@@ -535,7 +539,7 @@ int kvm_riscv_mmu_map(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot,
goto out_unlock;
/* Check if we are backed by a THP and thus use block mapping if possible */
- if (vma_pagesize == PAGE_SIZE)
+ if (!logging && (vma_pagesize == PAGE_SIZE))
vma_pagesize = transparent_hugepage_adjust(kvm, memslot, hva, &hfn, &gpa);
if (writable) {
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index a55a95da54d0..fdd99ac1e714 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -24,7 +24,7 @@
#define CREATE_TRACE_POINTS
#include "trace.h"
-const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
+const struct kvm_stats_desc kvm_vcpu_stats_desc[] = {
KVM_GENERIC_VCPU_STATS(),
STATS_DESC_COUNTER(VCPU, ecall_exit_stat),
STATS_DESC_COUNTER(VCPU, wfi_exit_stat),
diff --git a/arch/riscv/kvm/vcpu_fp.c b/arch/riscv/kvm/vcpu_fp.c
index 030904d82b58..bd5a9e7e7165 100644
--- a/arch/riscv/kvm/vcpu_fp.c
+++ b/arch/riscv/kvm/vcpu_fp.c
@@ -10,6 +10,7 @@
#include <linux/errno.h>
#include <linux/err.h>
#include <linux/kvm_host.h>
+#include <linux/nospec.h>
#include <linux/uaccess.h>
#include <asm/cpufeature.h>
@@ -93,9 +94,11 @@ int kvm_riscv_vcpu_get_reg_fp(struct kvm_vcpu *vcpu,
if (reg_num == KVM_REG_RISCV_FP_F_REG(fcsr))
reg_val = &cntx->fp.f.fcsr;
else if ((KVM_REG_RISCV_FP_F_REG(f[0]) <= reg_num) &&
- reg_num <= KVM_REG_RISCV_FP_F_REG(f[31]))
+ reg_num <= KVM_REG_RISCV_FP_F_REG(f[31])) {
+ reg_num = array_index_nospec(reg_num,
+ ARRAY_SIZE(cntx->fp.f.f));
reg_val = &cntx->fp.f.f[reg_num];
- else
+ } else
return -ENOENT;
} else if ((rtype == KVM_REG_RISCV_FP_D) &&
riscv_isa_extension_available(vcpu->arch.isa, d)) {
@@ -107,6 +110,8 @@ int kvm_riscv_vcpu_get_reg_fp(struct kvm_vcpu *vcpu,
reg_num <= KVM_REG_RISCV_FP_D_REG(f[31])) {
if (KVM_REG_SIZE(reg->id) != sizeof(u64))
return -EINVAL;
+ reg_num = array_index_nospec(reg_num,
+ ARRAY_SIZE(cntx->fp.d.f));
reg_val = &cntx->fp.d.f[reg_num];
} else
return -ENOENT;
@@ -138,9 +143,11 @@ int kvm_riscv_vcpu_set_reg_fp(struct kvm_vcpu *vcpu,
if (reg_num == KVM_REG_RISCV_FP_F_REG(fcsr))
reg_val = &cntx->fp.f.fcsr;
else if ((KVM_REG_RISCV_FP_F_REG(f[0]) <= reg_num) &&
- reg_num <= KVM_REG_RISCV_FP_F_REG(f[31]))
+ reg_num <= KVM_REG_RISCV_FP_F_REG(f[31])) {
+ reg_num = array_index_nospec(reg_num,
+ ARRAY_SIZE(cntx->fp.f.f));
reg_val = &cntx->fp.f.f[reg_num];
- else
+ } else
return -ENOENT;
} else if ((rtype == KVM_REG_RISCV_FP_D) &&
riscv_isa_extension_available(vcpu->arch.isa, d)) {
@@ -152,6 +159,8 @@ int kvm_riscv_vcpu_set_reg_fp(struct kvm_vcpu *vcpu,
reg_num <= KVM_REG_RISCV_FP_D_REG(f[31])) {
if (KVM_REG_SIZE(reg->id) != sizeof(u64))
return -EINVAL;
+ reg_num = array_index_nospec(reg_num,
+ ARRAY_SIZE(cntx->fp.d.f));
reg_val = &cntx->fp.d.f[reg_num];
} else
return -ENOENT;
diff --git a/arch/riscv/kvm/vcpu_onereg.c b/arch/riscv/kvm/vcpu_onereg.c
index e7ab6cb00646..45ecc0082e90 100644
--- a/arch/riscv/kvm/vcpu_onereg.c
+++ b/arch/riscv/kvm/vcpu_onereg.c
@@ -10,6 +10,7 @@
#include <linux/bitops.h>
#include <linux/errno.h>
#include <linux/err.h>
+#include <linux/nospec.h>
#include <linux/uaccess.h>
#include <linux/kvm_host.h>
#include <asm/cacheflush.h>
@@ -127,6 +128,7 @@ static int kvm_riscv_vcpu_isa_check_host(unsigned long kvm_ext, unsigned long *g
kvm_ext >= ARRAY_SIZE(kvm_isa_ext_arr))
return -ENOENT;
+ kvm_ext = array_index_nospec(kvm_ext, ARRAY_SIZE(kvm_isa_ext_arr));
*guest_ext = kvm_isa_ext_arr[kvm_ext];
switch (*guest_ext) {
case RISCV_ISA_EXT_SMNPM:
@@ -443,13 +445,16 @@ static int kvm_riscv_vcpu_get_reg_core(struct kvm_vcpu *vcpu,
unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
KVM_REG_SIZE_MASK |
KVM_REG_RISCV_CORE);
+ unsigned long regs_max = sizeof(struct kvm_riscv_core) / sizeof(unsigned long);
unsigned long reg_val;
if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
return -EINVAL;
- if (reg_num >= sizeof(struct kvm_riscv_core) / sizeof(unsigned long))
+ if (reg_num >= regs_max)
return -ENOENT;
+ reg_num = array_index_nospec(reg_num, regs_max);
+
if (reg_num == KVM_REG_RISCV_CORE_REG(regs.pc))
reg_val = cntx->sepc;
else if (KVM_REG_RISCV_CORE_REG(regs.pc) < reg_num &&
@@ -476,13 +481,16 @@ static int kvm_riscv_vcpu_set_reg_core(struct kvm_vcpu *vcpu,
unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
KVM_REG_SIZE_MASK |
KVM_REG_RISCV_CORE);
+ unsigned long regs_max = sizeof(struct kvm_riscv_core) / sizeof(unsigned long);
unsigned long reg_val;
if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
return -EINVAL;
- if (reg_num >= sizeof(struct kvm_riscv_core) / sizeof(unsigned long))
+ if (reg_num >= regs_max)
return -ENOENT;
+ reg_num = array_index_nospec(reg_num, regs_max);
+
if (copy_from_user(&reg_val, uaddr, KVM_REG_SIZE(reg->id)))
return -EFAULT;
@@ -507,10 +515,13 @@ static int kvm_riscv_vcpu_general_get_csr(struct kvm_vcpu *vcpu,
unsigned long *out_val)
{
struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
+ unsigned long regs_max = sizeof(struct kvm_riscv_csr) / sizeof(unsigned long);
- if (reg_num >= sizeof(struct kvm_riscv_csr) / sizeof(unsigned long))
+ if (reg_num >= regs_max)
return -ENOENT;
+ reg_num = array_index_nospec(reg_num, regs_max);
+
if (reg_num == KVM_REG_RISCV_CSR_REG(sip)) {
kvm_riscv_vcpu_flush_interrupts(vcpu);
*out_val = (csr->hvip >> VSIP_TO_HVIP_SHIFT) & VSIP_VALID_MASK;
@@ -526,10 +537,13 @@ static int kvm_riscv_vcpu_general_set_csr(struct kvm_vcpu *vcpu,
unsigned long reg_val)
{
struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
+ unsigned long regs_max = sizeof(struct kvm_riscv_csr) / sizeof(unsigned long);
- if (reg_num >= sizeof(struct kvm_riscv_csr) / sizeof(unsigned long))
+ if (reg_num >= regs_max)
return -ENOENT;
+ reg_num = array_index_nospec(reg_num, regs_max);
+
if (reg_num == KVM_REG_RISCV_CSR_REG(sip)) {
reg_val &= VSIP_VALID_MASK;
reg_val <<= VSIP_TO_HVIP_SHIFT;
@@ -548,10 +562,15 @@ static inline int kvm_riscv_vcpu_smstateen_set_csr(struct kvm_vcpu *vcpu,
unsigned long reg_val)
{
struct kvm_vcpu_smstateen_csr *csr = &vcpu->arch.smstateen_csr;
+ unsigned long regs_max = sizeof(struct kvm_riscv_smstateen_csr) /
+ sizeof(unsigned long);
- if (reg_num >= sizeof(struct kvm_riscv_smstateen_csr) /
- sizeof(unsigned long))
- return -EINVAL;
+ if (!riscv_isa_extension_available(vcpu->arch.isa, SMSTATEEN))
+ return -ENOENT;
+ if (reg_num >= regs_max)
+ return -ENOENT;
+
+ reg_num = array_index_nospec(reg_num, regs_max);
((unsigned long *)csr)[reg_num] = reg_val;
return 0;
@@ -562,10 +581,15 @@ static int kvm_riscv_vcpu_smstateen_get_csr(struct kvm_vcpu *vcpu,
unsigned long *out_val)
{
struct kvm_vcpu_smstateen_csr *csr = &vcpu->arch.smstateen_csr;
+ unsigned long regs_max = sizeof(struct kvm_riscv_smstateen_csr) /
+ sizeof(unsigned long);
- if (reg_num >= sizeof(struct kvm_riscv_smstateen_csr) /
- sizeof(unsigned long))
- return -EINVAL;
+ if (!riscv_isa_extension_available(vcpu->arch.isa, SMSTATEEN))
+ return -ENOENT;
+ if (reg_num >= regs_max)
+ return -ENOENT;
+
+ reg_num = array_index_nospec(reg_num, regs_max);
*out_val = ((unsigned long *)csr)[reg_num];
return 0;
@@ -595,10 +619,7 @@ static int kvm_riscv_vcpu_get_reg_csr(struct kvm_vcpu *vcpu,
rc = kvm_riscv_vcpu_aia_get_csr(vcpu, reg_num, &reg_val);
break;
case KVM_REG_RISCV_CSR_SMSTATEEN:
- rc = -EINVAL;
- if (riscv_has_extension_unlikely(RISCV_ISA_EXT_SMSTATEEN))
- rc = kvm_riscv_vcpu_smstateen_get_csr(vcpu, reg_num,
- &reg_val);
+ rc = kvm_riscv_vcpu_smstateen_get_csr(vcpu, reg_num, &reg_val);
break;
default:
rc = -ENOENT;
@@ -640,10 +661,7 @@ static int kvm_riscv_vcpu_set_reg_csr(struct kvm_vcpu *vcpu,
rc = kvm_riscv_vcpu_aia_set_csr(vcpu, reg_num, reg_val);
break;
case KVM_REG_RISCV_CSR_SMSTATEEN:
- rc = -EINVAL;
- if (riscv_has_extension_unlikely(RISCV_ISA_EXT_SMSTATEEN))
- rc = kvm_riscv_vcpu_smstateen_set_csr(vcpu, reg_num,
- reg_val);
+ rc = kvm_riscv_vcpu_smstateen_set_csr(vcpu, reg_num, reg_val);
break;
default:
rc = -ENOENT;
diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
index 4d8d5e9aa53d..e873430e596b 100644
--- a/arch/riscv/kvm/vcpu_pmu.c
+++ b/arch/riscv/kvm/vcpu_pmu.c
@@ -10,6 +10,7 @@
#include <linux/errno.h>
#include <linux/err.h>
#include <linux/kvm_host.h>
+#include <linux/nospec.h>
#include <linux/perf/riscv_pmu.h>
#include <asm/csr.h>
#include <asm/kvm_vcpu_sbi.h>
@@ -87,7 +88,8 @@ static void kvm_pmu_release_perf_event(struct kvm_pmc *pmc)
static u64 kvm_pmu_get_perf_event_hw_config(u32 sbi_event_code)
{
- return hw_event_perf_map[sbi_event_code];
+ return hw_event_perf_map[array_index_nospec(sbi_event_code,
+ SBI_PMU_HW_GENERAL_MAX)];
}
static u64 kvm_pmu_get_perf_event_cache_config(u32 sbi_event_code)
@@ -218,6 +220,7 @@ static int pmu_fw_ctr_read_hi(struct kvm_vcpu *vcpu, unsigned long cidx,
return -EINVAL;
}
+ cidx = array_index_nospec(cidx, RISCV_KVM_MAX_COUNTERS);
pmc = &kvpmu->pmc[cidx];
if (pmc->cinfo.type != SBI_PMU_CTR_TYPE_FW)
@@ -244,6 +247,7 @@ static int pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
return -EINVAL;
}
+ cidx = array_index_nospec(cidx, RISCV_KVM_MAX_COUNTERS);
pmc = &kvpmu->pmc[cidx];
if (pmc->cinfo.type == SBI_PMU_CTR_TYPE_FW) {
@@ -520,11 +524,12 @@ int kvm_riscv_vcpu_pmu_ctr_info(struct kvm_vcpu *vcpu, unsigned long cidx,
{
struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
- if (cidx > RISCV_KVM_MAX_COUNTERS || cidx == 1) {
+ if (cidx >= RISCV_KVM_MAX_COUNTERS || cidx == 1) {
retdata->err_val = SBI_ERR_INVALID_PARAM;
return 0;
}
+ cidx = array_index_nospec(cidx, RISCV_KVM_MAX_COUNTERS);
retdata->out_val = kvpmu->pmc[cidx].cinfo.value;
return 0;
@@ -559,7 +564,8 @@ int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
}
/* Start the counters that have been configured and requested by the guest */
for_each_set_bit(i, &ctr_mask, RISCV_MAX_COUNTERS) {
- pmc_index = i + ctr_base;
+ pmc_index = array_index_nospec(i + ctr_base,
+ RISCV_KVM_MAX_COUNTERS);
if (!test_bit(pmc_index, kvpmu->pmc_in_use))
continue;
/* The guest started the counter again. Reset the overflow status */
@@ -630,7 +636,8 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
/* Stop the counters that have been configured and requested by the guest */
for_each_set_bit(i, &ctr_mask, RISCV_MAX_COUNTERS) {
- pmc_index = i + ctr_base;
+ pmc_index = array_index_nospec(i + ctr_base,
+ RISCV_KVM_MAX_COUNTERS);
if (!test_bit(pmc_index, kvpmu->pmc_in_use))
continue;
pmc = &kvpmu->pmc[pmc_index];
@@ -761,6 +768,7 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
}
}
+ ctr_idx = array_index_nospec(ctr_idx, RISCV_KVM_MAX_COUNTERS);
pmc = &kvpmu->pmc[ctr_idx];
pmc->idx = ctr_idx;
diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
index 58bce57dc55b..13c63ae1a78b 100644
--- a/arch/riscv/kvm/vm.c
+++ b/arch/riscv/kvm/vm.c
@@ -13,7 +13,7 @@
#include <linux/kvm_host.h>
#include <asm/kvm_mmu.h>
-const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
+const struct kvm_stats_desc kvm_vm_stats_desc[] = {
KVM_GENERIC_VM_STATS()
};
static_assert(ARRAY_SIZE(kvm_vm_stats_desc) ==
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index bc7d6fa66eaf..3eb60aa932ec 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -65,7 +65,7 @@
#define VCPU_IRQS_MAX_BUF (sizeof(struct kvm_s390_irq) * \
(KVM_MAX_VCPUS + LOCAL_IRQS))
-const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
+const struct kvm_stats_desc kvm_vm_stats_desc[] = {
KVM_GENERIC_VM_STATS(),
STATS_DESC_COUNTER(VM, inject_io),
STATS_DESC_COUNTER(VM, inject_float_mchk),
@@ -91,7 +91,7 @@ const struct kvm_stats_header kvm_vm_stats_header = {
sizeof(kvm_vm_stats_desc),
};
-const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
+const struct kvm_stats_desc kvm_vcpu_stats_desc[] = {
KVM_GENERIC_VCPU_STATS(),
STATS_DESC_COUNTER(VCPU, exit_userspace),
STATS_DESC_COUNTER(VCPU, exit_null),
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ff07c45e3c73..6e4e3ef9b8c7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2485,7 +2485,8 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS | \
KVM_X86_QUIRK_SLOT_ZAP_ALL | \
KVM_X86_QUIRK_STUFF_FEATURE_MSRS | \
- KVM_X86_QUIRK_IGNORE_GUEST_PAT)
+ KVM_X86_QUIRK_IGNORE_GUEST_PAT | \
+ KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM)
#define KVM_X86_CONDITIONAL_QUIRKS \
(KVM_X86_QUIRK_CD_NW_CLEARED | \
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 846a63215ce1..0d4538fa6c31 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -476,6 +476,7 @@ struct kvm_sync_regs {
#define KVM_X86_QUIRK_SLOT_ZAP_ALL (1 << 7)
#define KVM_X86_QUIRK_STUFF_FEATURE_MSRS (1 << 8)
#define KVM_X86_QUIRK_IGNORE_GUEST_PAT (1 << 9)
+#define KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM (1 << 10)
#define KVM_STATE_NESTED_FORMAT_VMX 0
#define KVM_STATE_NESTED_FORMAT_SVM 1
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index d2486506a808..8137927e7387 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -776,7 +776,10 @@ do { \
#define SYNTHESIZED_F(name) \
({ \
kvm_cpu_cap_synthesized |= feature_bit(name); \
- F(name); \
+ \
+ BUILD_BUG_ON(X86_FEATURE_##name >= MAX_CPU_FEATURES); \
+ if (boot_cpu_has(X86_FEATURE_##name)) \
+ F(name); \
})
/*
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 30202942289a..9b140bbdc1d8 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1981,16 +1981,17 @@ int kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
if (entries[i] == KVM_HV_TLB_FLUSHALL_ENTRY)
goto out_flush_all;
- if (is_noncanonical_invlpg_address(entries[i], vcpu))
- continue;
-
/*
* Lower 12 bits of 'address' encode the number of additional
* pages to flush.
*/
gva = entries[i] & PAGE_MASK;
- for (j = 0; j < (entries[i] & ~PAGE_MASK) + 1; j++)
+ for (j = 0; j < (entries[i] & ~PAGE_MASK) + 1; j++) {
+ if (is_noncanonical_invlpg_address(gva + j * PAGE_SIZE, vcpu))
+ continue;
+
kvm_x86_call(flush_tlb_gva)(vcpu, gva + j * PAGE_SIZE);
+ }
++vcpu->stat.tlb_flush;
}
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index bb257793b6cb..eed96ff6e722 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -321,7 +321,8 @@ void kvm_fire_mask_notifiers(struct kvm *kvm, unsigned irqchip, unsigned pin,
idx = srcu_read_lock(&kvm->irq_srcu);
gsi = kvm_irq_map_chip_pin(kvm, irqchip, pin);
if (gsi != -1)
- hlist_for_each_entry_rcu(kimn, &ioapic->mask_notifier_list, link)
+ hlist_for_each_entry_srcu(kimn, &ioapic->mask_notifier_list, link,
+ srcu_read_lock_held(&kvm->irq_srcu))
if (kimn->irq == gsi)
kimn->func(kimn, mask);
srcu_read_unlock(&kvm->irq_srcu, idx);
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index f92214b1a938..f7ec7914e3c4 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -189,12 +189,12 @@ static void avic_activate_vmcb(struct vcpu_svm *svm)
struct kvm_vcpu *vcpu = &svm->vcpu;
vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK);
-
vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK;
vmcb->control.avic_physical_id |= avic_get_max_physical_id(vcpu);
-
vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
+ svm_clr_intercept(svm, INTERCEPT_CR8_WRITE);
+
/*
* Note: KVM supports hybrid-AVIC mode, where KVM emulates x2APIC MSR
* accesses, while interrupt injection to a running vCPU can be
@@ -226,6 +226,9 @@ static void avic_deactivate_vmcb(struct vcpu_svm *svm)
vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK);
vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK;
+ if (!sev_es_guest(svm->vcpu.kvm))
+ svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
+
/*
* If running nested and the guest uses its own MSR bitmap, there
* is no need to update L0's msr bitmap
@@ -368,7 +371,7 @@ void avic_init_vmcb(struct vcpu_svm *svm, struct vmcb *vmcb)
vmcb->control.avic_physical_id = __sme_set(__pa(kvm_svm->avic_physical_id_table));
vmcb->control.avic_vapic_bar = APIC_DEFAULT_PHYS_BASE;
- if (kvm_apicv_activated(svm->vcpu.kvm))
+ if (kvm_vcpu_apicv_active(&svm->vcpu))
avic_activate_vmcb(svm);
else
avic_deactivate_vmcb(svm);
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 53ab6ce3cc26..b36c33255bed 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -418,6 +418,15 @@ static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu)
return __nested_vmcb_check_controls(vcpu, ctl);
}
+int nested_svm_check_cached_vmcb12(struct kvm_vcpu *vcpu)
+{
+ if (!nested_vmcb_check_save(vcpu) ||
+ !nested_vmcb_check_controls(vcpu))
+ return -EINVAL;
+
+ return 0;
+}
+
/*
* If a feature is not advertised to L1, clear the corresponding vmcb12
* intercept.
@@ -1028,8 +1037,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
nested_copy_vmcb_control_to_cache(svm, &vmcb12->control);
nested_copy_vmcb_save_to_cache(svm, &vmcb12->save);
- if (!nested_vmcb_check_save(vcpu) ||
- !nested_vmcb_check_controls(vcpu)) {
+ if (nested_svm_check_cached_vmcb12(vcpu) < 0) {
vmcb12->control.exit_code = SVM_EXIT_ERR;
vmcb12->control.exit_info_1 = 0;
vmcb12->control.exit_info_2 = 0;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 8f8bc863e214..e6477affac9a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1077,8 +1077,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu, bool init_event)
svm_set_intercept(svm, INTERCEPT_CR0_WRITE);
svm_set_intercept(svm, INTERCEPT_CR3_WRITE);
svm_set_intercept(svm, INTERCEPT_CR4_WRITE);
- if (!kvm_vcpu_apicv_active(vcpu))
- svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
+ svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
set_dr_intercepts(svm);
@@ -1189,7 +1188,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu, bool init_event)
if (guest_cpu_cap_has(vcpu, X86_FEATURE_ERAPS))
svm->vmcb->control.erap_ctl |= ERAP_CONTROL_ALLOW_LARGER_RAP;
- if (kvm_vcpu_apicv_active(vcpu))
+ if (enable_apicv && irqchip_in_kernel(vcpu->kvm))
avic_init_vmcb(svm, vmcb);
if (vnmi)
@@ -2674,9 +2673,11 @@ static int dr_interception(struct kvm_vcpu *vcpu)
static int cr8_write_interception(struct kvm_vcpu *vcpu)
{
+ u8 cr8_prev = kvm_get_cr8(vcpu);
int r;
- u8 cr8_prev = kvm_get_cr8(vcpu);
+ WARN_ON_ONCE(kvm_vcpu_apicv_active(vcpu));
+
/* instruction emulation calls kvm_set_cr8() */
r = cr_interception(vcpu);
if (lapic_in_kernel(vcpu))
@@ -4879,11 +4880,15 @@ static int svm_leave_smm(struct kvm_vcpu *vcpu, const union kvm_smram *smram)
vmcb12 = map.hva;
nested_copy_vmcb_control_to_cache(svm, &vmcb12->control);
nested_copy_vmcb_save_to_cache(svm, &vmcb12->save);
- ret = enter_svm_guest_mode(vcpu, smram64->svm_guest_vmcb_gpa, vmcb12, false);
- if (ret)
+ if (nested_svm_check_cached_vmcb12(vcpu) < 0)
+ goto unmap_save;
+
+ if (enter_svm_guest_mode(vcpu, smram64->svm_guest_vmcb_gpa,
+ vmcb12, false) != 0)
goto unmap_save;
+ ret = 0;
svm->nested.nested_run_pending = 1;
unmap_save:
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index ebd7b36b1ceb..6942e6b0eda6 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -797,6 +797,7 @@ static inline int nested_svm_simple_vmexit(struct vcpu_svm *svm, u32 exit_code)
int nested_svm_exit_handled(struct vcpu_svm *svm);
int nested_svm_check_permissions(struct kvm_vcpu *vcpu);
+int nested_svm_check_cached_vmcb12(struct kvm_vcpu *vcpu);
int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr,
bool has_error_code, u32 error_code);
int nested_svm_exit_special(struct vcpu_svm *svm);
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 248635da6766..937aeb474af7 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3300,10 +3300,24 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
if (CC(vmcs12->guest_cr4 & X86_CR4_CET && !(vmcs12->guest_cr0 & X86_CR0_WP)))
return -EINVAL;
- if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) &&
- (CC(!kvm_dr7_valid(vmcs12->guest_dr7)) ||
- CC(!vmx_is_valid_debugctl(vcpu, vmcs12->guest_ia32_debugctl, false))))
- return -EINVAL;
+ if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) {
+ u64 debugctl = vmcs12->guest_ia32_debugctl;
+
+ /*
+ * FREEZE_IN_SMM is not virtualized, but allow L1 to set it in
+ * vmcs12's DEBUGCTL under a quirk for backwards compatibility.
+ * Note that the quirk only relaxes the consistency check. The
+ * vmcc02 bit is still under the control of the host. In
+ * particular, if a host administrator decides to clear the bit,
+ * then L1 has no say in the matter.
+ */
+ if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM))
+ debugctl &= ~DEBUGCTLMSR_FREEZE_IN_SMM;
+
+ if (CC(!kvm_dr7_valid(vmcs12->guest_dr7)) ||
+ CC(!vmx_is_valid_debugctl(vcpu, debugctl, false)))
+ return -EINVAL;
+ }
if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PAT) &&
CC(!kvm_pat_valid(vmcs12->guest_ia32_pat)))
@@ -6842,13 +6856,34 @@ void vmx_leave_nested(struct kvm_vcpu *vcpu)
free_nested(vcpu);
}
+int nested_vmx_check_restored_vmcs12(struct kvm_vcpu *vcpu)
+{
+ enum vm_entry_failure_code ignored;
+ struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+
+ if (nested_cpu_has_shadow_vmcs(vmcs12) &&
+ vmcs12->vmcs_link_pointer != INVALID_GPA) {
+ struct vmcs12 *shadow_vmcs12 = get_shadow_vmcs12(vcpu);
+
+ if (shadow_vmcs12->hdr.revision_id != VMCS12_REVISION ||
+ !shadow_vmcs12->hdr.shadow_vmcs)
+ return -EINVAL;
+ }
+
+ if (nested_vmx_check_controls(vcpu, vmcs12) ||
+ nested_vmx_check_host_state(vcpu, vmcs12) ||
+ nested_vmx_check_guest_state(vcpu, vmcs12, &ignored))
+ return -EINVAL;
+
+ return 0;
+}
+
static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
struct kvm_nested_state __user *user_kvm_nested_state,
struct kvm_nested_state *kvm_state)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
struct vmcs12 *vmcs12;
- enum vm_entry_failure_code ignored;
struct kvm_vmx_nested_state_data __user *user_vmx_nested_state =
&user_kvm_nested_state->data.vmx[0];
int ret;
@@ -6979,25 +7014,20 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
vmx->nested.mtf_pending =
!!(kvm_state->flags & KVM_STATE_NESTED_MTF_PENDING);
- ret = -EINVAL;
if (nested_cpu_has_shadow_vmcs(vmcs12) &&
vmcs12->vmcs_link_pointer != INVALID_GPA) {
struct vmcs12 *shadow_vmcs12 = get_shadow_vmcs12(vcpu);
+ ret = -EINVAL;
if (kvm_state->size <
sizeof(*kvm_state) +
sizeof(user_vmx_nested_state->vmcs12) + sizeof(*shadow_vmcs12))
goto error_guest_mode;
+ ret = -EFAULT;
if (copy_from_user(shadow_vmcs12,
user_vmx_nested_state->shadow_vmcs12,
- sizeof(*shadow_vmcs12))) {
- ret = -EFAULT;
- goto error_guest_mode;
- }
-
- if (shadow_vmcs12->hdr.revision_id != VMCS12_REVISION ||
- !shadow_vmcs12->hdr.shadow_vmcs)
+ sizeof(*shadow_vmcs12)))
goto error_guest_mode;
}
@@ -7008,9 +7038,8 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
kvm_state->hdr.vmx.preemption_timer_deadline;
}
- if (nested_vmx_check_controls(vcpu, vmcs12) ||
- nested_vmx_check_host_state(vcpu, vmcs12) ||
- nested_vmx_check_guest_state(vcpu, vmcs12, &ignored))
+ ret = nested_vmx_check_restored_vmcs12(vcpu);
+ if (ret < 0)
goto error_guest_mode;
vmx->nested.dirty_vmcs12 = true;
diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index b844c5d59025..213a448104af 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -22,6 +22,7 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps);
void nested_vmx_hardware_unsetup(void);
__init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *));
void nested_vmx_set_vmcs_shadowing_bitmap(void);
+int nested_vmx_check_restored_vmcs12(struct kvm_vcpu *vcpu);
void nested_vmx_free_vcpu(struct kvm_vcpu *vcpu);
enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
bool from_vmentry);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 967b58a8ab9d..8b24e682535b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1149,7 +1149,7 @@ static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr,
}
vmx_add_auto_msr(&m->guest, msr, guest_val, VM_ENTRY_MSR_LOAD_COUNT, kvm);
- vmx_add_auto_msr(&m->guest, msr, host_val, VM_EXIT_MSR_LOAD_COUNT, kvm);
+ vmx_add_auto_msr(&m->host, msr, host_val, VM_EXIT_MSR_LOAD_COUNT, kvm);
}
static bool update_transition_efer(struct vcpu_vmx *vmx)
@@ -8528,9 +8528,13 @@ int vmx_leave_smm(struct kvm_vcpu *vcpu, const union kvm_smram *smram)
}
if (vmx->nested.smm.guest_mode) {
+ /* Triple fault if the state is invalid. */
+ if (nested_vmx_check_restored_vmcs12(vcpu) < 0)
+ return 1;
+
ret = nested_vmx_enter_non_root_mode(vcpu, false);
- if (ret)
- return ret;
+ if (ret != NVMX_VMENTRY_SUCCESS)
+ return 1;
vmx->nested.nested_run_pending = 1;
vmx->nested.smm.guest_mode = false;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a03530795707..fd1c4a36b593 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -243,7 +243,7 @@ EXPORT_SYMBOL_FOR_KVM_INTERNAL(enable_ipiv);
bool __read_mostly enable_device_posted_irqs = true;
EXPORT_SYMBOL_FOR_KVM_INTERNAL(enable_device_posted_irqs);
-const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
+const struct kvm_stats_desc kvm_vm_stats_desc[] = {
KVM_GENERIC_VM_STATS(),
STATS_DESC_COUNTER(VM, mmu_shadow_zapped),
STATS_DESC_COUNTER(VM, mmu_pte_write),
@@ -269,7 +269,7 @@ const struct kvm_stats_header kvm_vm_stats_header = {
sizeof(kvm_vm_stats_desc),
};
-const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
+const struct kvm_stats_desc kvm_vcpu_stats_desc[] = {
KVM_GENERIC_VCPU_STATS(),
STATS_DESC_COUNTER(VCPU, pf_taken),
STATS_DESC_COUNTER(VCPU, pf_fixed),