summaryrefslogtreecommitdiff
path: root/arch/loongarch/kernel
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2024-05-15 14:46:43 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2024-05-15 14:46:43 -0700
commitf4b0c4b508364fde023e4f7b9f23f7e38c663dfe (patch)
treed10d9c6602dcd1d2d50effe18ce63edc4d4bb706 /arch/loongarch/kernel
parent2e9250022e9f2c9cde3b98fd26dcad1c2a9aedf3 (diff)
parentcba23f333fedf8e39743b0c9787b45a5bd7d03af (diff)
downloadlwn-f4b0c4b508364fde023e4f7b9f23f7e38c663dfe.tar.gz
lwn-f4b0c4b508364fde023e4f7b9f23f7e38c663dfe.zip
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini: "ARM: - Move a lot of state that was previously stored on a per vcpu basis into a per-CPU area, because it is only pertinent to the host while the vcpu is loaded. This results in better state tracking, and a smaller vcpu structure. - Add full handling of the ERET/ERETAA/ERETAB instructions in nested virtualisation. The last two instructions also require emulating part of the pointer authentication extension. As a result, the trap handling of pointer authentication has been greatly simplified. - Turn the global (and not very scalable) LPI translation cache into a per-ITS, scalable cache, making non directly injected LPIs much cheaper to make visible to the vcpu. - A batch of pKVM patches, mostly fixes and cleanups, as the upstreaming process seems to be resuming. Fingers crossed! - Allocate PPIs and SGIs outside of the vcpu structure, allowing for smaller EL2 mapping and some flexibility in implementing more or less than 32 private IRQs. - Purge stale mpidr_data if a vcpu is created after the MPIDR map has been created. - Preserve vcpu-specific ID registers across a vcpu reset. - Various minor cleanups and improvements. LoongArch: - Add ParaVirt IPI support - Add software breakpoint support - Add mmio trace events support RISC-V: - Support guest breakpoints using ebreak - Introduce per-VCPU mp_state_lock and reset_cntx_lock - Virtualize SBI PMU snapshot and counter overflow interrupts - New selftests for SBI PMU and Guest ebreak - Some preparatory work for both TDX and SNP page fault handling. This also cleans up the page fault path, so that the priorities of various kinds of fauls (private page, no memory, write to read-only slot, etc.) are easier to follow. x86: - Minimize amount of time that shadow PTEs remain in the special REMOVED_SPTE state. This is a state where the mmu_lock is held for reading but concurrent accesses to the PTE have to spin; shortening its use allows other vCPUs to repopulate the zapped region while the zapper finishes tearing down the old, defunct page tables. - Advertise the max mappable GPA in the "guest MAXPHYADDR" CPUID field, which is defined by hardware but left for software use. This lets KVM communicate its inability to map GPAs that set bits 51:48 on hosts without 5-level nested page tables. Guest firmware is expected to use the information when mapping BARs; this avoids that they end up at a legal, but unmappable, GPA. - Fixed a bug where KVM would not reject accesses to MSR that aren't supposed to exist given the vCPU model and/or KVM configuration. - As usual, a bunch of code cleanups. x86 (AMD): - Implement a new and improved API to initialize SEV and SEV-ES VMs, which will also be extendable to SEV-SNP. The new API specifies the desired encryption in KVM_CREATE_VM and then separately initializes the VM. The new API also allows customizing the desired set of VMSA features; the features affect the measurement of the VM's initial state, and therefore enabling them cannot be done tout court by the hypervisor. While at it, the new API includes two bugfixes that couldn't be applied to the old one without a flag day in userspace or without affecting the initial measurement. When a SEV-ES VM is created with the new VM type, KVM_GET_REGS/KVM_SET_REGS and friends are rejected once the VMSA has been encrypted. Also, the FPU and AVX state will be synchronized and encrypted too. - Support for GHCB version 2 as applicable to SEV-ES guests. This, once more, is only accessible when using the new KVM_SEV_INIT2 flow for initialization of SEV-ES VMs. x86 (Intel): - An initial bunch of prerequisite patches for Intel TDX were merged. They generally don't do anything interesting. The only somewhat user visible change is a new debugging mode that checks that KVM's MMU never triggers a #VE virtualization exception in the guest. - Clear vmcs.EXIT_QUALIFICATION when synthesizing an EPT Misconfig VM-Exit to L1, as per the SDM. Generic: - Use vfree() instead of kvfree() for allocations that always use vcalloc() or __vcalloc(). - Remove .change_pte() MMU notifier - the changes to non-KVM code are small and Andrew Morton asked that I also take those through the KVM tree. The callback was only ever implemented by KVM (which was also the original user of MMU notifiers) but it had been nonfunctional ever since calls to set_pte_at_notify were wrapped with invalidate_range_start and invalidate_range_end... in 2012. Selftests: - Enhance the demand paging test to allow for better reporting and stressing of UFFD performance. - Convert the steal time test to generate TAP-friendly output. - Fix a flaky false positive in the xen_shinfo_test due to comparing elapsed time across two different clock domains. - Skip the MONITOR/MWAIT test if the host doesn't actually support MWAIT. - Avoid unnecessary use of "sudo" in the NX hugepage test wrapper shell script, to play nice with running in a minimal userspace environment. - Allow skipping the RSEQ test's sanity check that the vCPU was able to complete a reasonable number of KVM_RUNs, as the assert can fail on a completely valid setup. If the test is run on a large-ish system that is otherwise idle, and the test isn't affined to a low-ish number of CPUs, the vCPU task can be repeatedly migrated to CPUs that are in deep sleep states, which results in the vCPU having very little net runtime before the next migration due to high wakeup latencies. - Define _GNU_SOURCE for all selftests to fix a warning that was introduced by a change to kselftest_harness.h late in the 6.9 cycle, and because forcing every test to #define _GNU_SOURCE is painful. - Provide a global pseudo-RNG instance for all tests, so that library code can generate random, but determinstic numbers. - Use the global pRNG to randomly force emulation of select writes from guest code on x86, e.g. to help validate KVM's emulation of locked accesses. - Allocate and initialize x86's GDT, IDT, TSS, segments, and default exception handlers at VM creation, instead of forcing tests to manually trigger the related setup. Documentation: - Fix a goof in the KVM_CREATE_GUEST_MEMFD documentation" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (225 commits) selftests/kvm: remove dead file KVM: selftests: arm64: Test vCPU-scoped feature ID registers KVM: selftests: arm64: Test that feature ID regs survive a reset KVM: selftests: arm64: Store expected register value in set_id_regs KVM: selftests: arm64: Rename helper in set_id_regs to imply VM scope KVM: arm64: Only reset vCPU-scoped feature ID regs once KVM: arm64: Reset VM feature ID regs from kvm_reset_sys_regs() KVM: arm64: Rename is_id_reg() to imply VM scope KVM: arm64: Destroy mpidr_data for 'late' vCPU creation KVM: arm64: Use hVHE in pKVM by default on CPUs with VHE support KVM: arm64: Fix hvhe/nvhe early alias parsing KVM: SEV: Allow per-guest configuration of GHCB protocol version KVM: SEV: Add GHCB handling for termination requests KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests KVM: SEV: Add support to handle AP reset MSR protocol KVM: x86: Explicitly zero kvm_caps during vendor module load KVM: x86: Fully re-initialize supported_mce_cap on vendor module load KVM: x86: Fully re-initialize supported_vm_types on vendor module load KVM: x86/mmu: Sanity check that __kvm_faultin_pfn() doesn't create noslot pfns KVM: x86/mmu: Initialize kvm_page_fault's pfn and hva to error values ...
Diffstat (limited to 'arch/loongarch/kernel')
-rw-r--r--arch/loongarch/kernel/Makefile1
-rw-r--r--arch/loongarch/kernel/irq.c24
-rw-r--r--arch/loongarch/kernel/paravirt.c151
-rw-r--r--arch/loongarch/kernel/perf_event.c14
-rw-r--r--arch/loongarch/kernel/smp.c52
-rw-r--r--arch/loongarch/kernel/time.c12
6 files changed, 189 insertions, 65 deletions
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3a7620b66bc6..c9bfeda89e40 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -51,6 +51,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
obj-$(CONFIG_STACKTRACE) += stacktrace.o
obj-$(CONFIG_PROC_FS) += proc.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o
obj-$(CONFIG_SMP) += smp.o
diff --git a/arch/loongarch/kernel/irq.c b/arch/loongarch/kernel/irq.c
index 883e5066ae44..f4991c03514f 100644
--- a/arch/loongarch/kernel/irq.c
+++ b/arch/loongarch/kernel/irq.c
@@ -87,23 +87,9 @@ static void __init init_vec_parent_group(void)
acpi_table_parse(ACPI_SIG_MCFG, early_pci_mcfg_parse);
}
-static int __init get_ipi_irq(void)
-{
- struct irq_domain *d = irq_find_matching_fwnode(cpuintc_handle, DOMAIN_BUS_ANY);
-
- if (d)
- return irq_create_mapping(d, INT_IPI);
-
- return -EINVAL;
-}
-
void __init init_IRQ(void)
{
int i;
-#ifdef CONFIG_SMP
- int r, ipi_irq;
- static int ipi_dummy_dev;
-#endif
unsigned int order = get_order(IRQ_STACK_SIZE);
struct page *page;
@@ -113,13 +99,7 @@ void __init init_IRQ(void)
init_vec_parent_group();
irqchip_init();
#ifdef CONFIG_SMP
- ipi_irq = get_ipi_irq();
- if (ipi_irq < 0)
- panic("IPI IRQ mapping failed\n");
- irq_set_percpu_devid(ipi_irq);
- r = request_percpu_irq(ipi_irq, loongson_ipi_interrupt, "IPI", &ipi_dummy_dev);
- if (r < 0)
- panic("IPI IRQ request failed\n");
+ mp_ops.init_ipi();
#endif
for (i = 0; i < NR_IRQS; i++)
@@ -133,5 +113,5 @@ void __init init_IRQ(void)
per_cpu(irq_stack, i), per_cpu(irq_stack, i) + IRQ_STACK_SIZE);
}
- set_csr_ecfg(ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC);
+ set_csr_ecfg(ECFGF_SIP0 | ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC);
}
diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c
new file mode 100644
index 000000000000..1633ed4f692f
--- /dev/null
+++ b/arch/loongarch/kernel/paravirt.c
@@ -0,0 +1,151 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/export.h>
+#include <linux/types.h>
+#include <linux/interrupt.h>
+#include <linux/jump_label.h>
+#include <linux/kvm_para.h>
+#include <linux/static_call.h>
+#include <asm/paravirt.h>
+
+struct static_key paravirt_steal_enabled;
+struct static_key paravirt_steal_rq_enabled;
+
+static u64 native_steal_clock(int cpu)
+{
+ return 0;
+}
+
+DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);
+
+#ifdef CONFIG_SMP
+static void pv_send_ipi_single(int cpu, unsigned int action)
+{
+ int min, old;
+ irq_cpustat_t *info = &per_cpu(irq_stat, cpu);
+
+ old = atomic_fetch_or(BIT(action), &info->message);
+ if (old)
+ return;
+
+ min = cpu_logical_map(cpu);
+ kvm_hypercall3(KVM_HCALL_FUNC_IPI, 1, 0, min);
+}
+
+#define KVM_IPI_CLUSTER_SIZE (2 * BITS_PER_LONG)
+
+static void pv_send_ipi_mask(const struct cpumask *mask, unsigned int action)
+{
+ int i, cpu, min = 0, max = 0, old;
+ __uint128_t bitmap = 0;
+ irq_cpustat_t *info;
+
+ if (cpumask_empty(mask))
+ return;
+
+ action = BIT(action);
+ for_each_cpu(i, mask) {
+ info = &per_cpu(irq_stat, i);
+ old = atomic_fetch_or(action, &info->message);
+ if (old)
+ continue;
+
+ cpu = cpu_logical_map(i);
+ if (!bitmap) {
+ min = max = cpu;
+ } else if (cpu < min && cpu > (max - KVM_IPI_CLUSTER_SIZE)) {
+ /* cpu < min, and bitmap still enough */
+ bitmap <<= min - cpu;
+ min = cpu;
+ } else if (cpu > min && cpu < (min + KVM_IPI_CLUSTER_SIZE)) {
+ /* cpu > min, and bitmap still enough */
+ max = cpu > max ? cpu : max;
+ } else {
+ /*
+ * With cpu, bitmap will exceed KVM_IPI_CLUSTER_SIZE,
+ * send IPI here directly and skip the remaining CPUs.
+ */
+ kvm_hypercall3(KVM_HCALL_FUNC_IPI, (unsigned long)bitmap,
+ (unsigned long)(bitmap >> BITS_PER_LONG), min);
+ min = max = cpu;
+ bitmap = 0;
+ }
+ __set_bit(cpu - min, (unsigned long *)&bitmap);
+ }
+
+ if (bitmap)
+ kvm_hypercall3(KVM_HCALL_FUNC_IPI, (unsigned long)bitmap,
+ (unsigned long)(bitmap >> BITS_PER_LONG), min);
+}
+
+static irqreturn_t pv_ipi_interrupt(int irq, void *dev)
+{
+ u32 action;
+ irq_cpustat_t *info;
+
+ /* Clear SWI interrupt */
+ clear_csr_estat(1 << INT_SWI0);
+ info = this_cpu_ptr(&irq_stat);
+ action = atomic_xchg(&info->message, 0);
+
+ if (action & SMP_RESCHEDULE) {
+ scheduler_ipi();
+ info->ipi_irqs[IPI_RESCHEDULE]++;
+ }
+
+ if (action & SMP_CALL_FUNCTION) {
+ generic_smp_call_function_interrupt();
+ info->ipi_irqs[IPI_CALL_FUNCTION]++;
+ }
+
+ return IRQ_HANDLED;
+}
+
+static void pv_init_ipi(void)
+{
+ int r, swi;
+
+ swi = get_percpu_irq(INT_SWI0);
+ if (swi < 0)
+ panic("SWI0 IRQ mapping failed\n");
+ irq_set_percpu_devid(swi);
+ r = request_percpu_irq(swi, pv_ipi_interrupt, "SWI0-IPI", &irq_stat);
+ if (r < 0)
+ panic("SWI0 IRQ request failed\n");
+}
+#endif
+
+static bool kvm_para_available(void)
+{
+ int config;
+ static int hypervisor_type;
+
+ if (!hypervisor_type) {
+ config = read_cpucfg(CPUCFG_KVM_SIG);
+ if (!memcmp(&config, KVM_SIGNATURE, 4))
+ hypervisor_type = HYPERVISOR_KVM;
+ }
+
+ return hypervisor_type == HYPERVISOR_KVM;
+}
+
+int __init pv_ipi_init(void)
+{
+ int feature;
+
+ if (!cpu_has_hypervisor)
+ return 0;
+ if (!kvm_para_available())
+ return 0;
+
+ feature = read_cpucfg(CPUCFG_KVM_FEATURE);
+ if (!(feature & KVM_FEATURE_IPI))
+ return 0;
+
+#ifdef CONFIG_SMP
+ mp_ops.init_ipi = pv_init_ipi;
+ mp_ops.send_ipi_single = pv_send_ipi_single;
+ mp_ops.send_ipi_mask = pv_send_ipi_mask;
+#endif
+
+ return 0;
+}
diff --git a/arch/loongarch/kernel/perf_event.c b/arch/loongarch/kernel/perf_event.c
index cac7cba81b65..f86a4b838dd7 100644
--- a/arch/loongarch/kernel/perf_event.c
+++ b/arch/loongarch/kernel/perf_event.c
@@ -456,16 +456,6 @@ static void loongarch_pmu_disable(struct pmu *pmu)
static DEFINE_MUTEX(pmu_reserve_mutex);
static atomic_t active_events = ATOMIC_INIT(0);
-static int get_pmc_irq(void)
-{
- struct irq_domain *d = irq_find_matching_fwnode(cpuintc_handle, DOMAIN_BUS_ANY);
-
- if (d)
- return irq_create_mapping(d, INT_PCOV);
-
- return -EINVAL;
-}
-
static void reset_counters(void *arg);
static int __hw_perf_event_init(struct perf_event *event);
@@ -473,7 +463,7 @@ static void hw_perf_event_destroy(struct perf_event *event)
{
if (atomic_dec_and_mutex_lock(&active_events, &pmu_reserve_mutex)) {
on_each_cpu(reset_counters, NULL, 1);
- free_irq(get_pmc_irq(), &loongarch_pmu);
+ free_irq(get_percpu_irq(INT_PCOV), &loongarch_pmu);
mutex_unlock(&pmu_reserve_mutex);
}
}
@@ -562,7 +552,7 @@ static int loongarch_pmu_event_init(struct perf_event *event)
if (event->cpu >= 0 && !cpu_online(event->cpu))
return -ENODEV;
- irq = get_pmc_irq();
+ irq = get_percpu_irq(INT_PCOV);
flags = IRQF_PERCPU | IRQF_NOBALANCING | IRQF_NO_THREAD | IRQF_NO_SUSPEND | IRQF_SHARED;
if (!atomic_inc_not_zero(&active_events)) {
mutex_lock(&pmu_reserve_mutex);
diff --git a/arch/loongarch/kernel/smp.c b/arch/loongarch/kernel/smp.c
index aabee0b280fe..0dfe2388ef41 100644
--- a/arch/loongarch/kernel/smp.c
+++ b/arch/loongarch/kernel/smp.c
@@ -29,6 +29,7 @@
#include <asm/loongson.h>
#include <asm/mmu_context.h>
#include <asm/numa.h>
+#include <asm/paravirt.h>
#include <asm/processor.h>
#include <asm/setup.h>
#include <asm/time.h>
@@ -66,11 +67,6 @@ static cpumask_t cpu_core_setup_map;
struct secondary_data cpuboot_data;
static DEFINE_PER_CPU(int, cpu_state);
-enum ipi_msg_type {
- IPI_RESCHEDULE,
- IPI_CALL_FUNCTION,
-};
-
static const char *ipi_types[NR_IPI] __tracepoint_string = {
[IPI_RESCHEDULE] = "Rescheduling interrupts",
[IPI_CALL_FUNCTION] = "Function call interrupts",
@@ -190,24 +186,19 @@ static u32 ipi_read_clear(int cpu)
static void ipi_write_action(int cpu, u32 action)
{
- unsigned int irq = 0;
-
- while ((irq = ffs(action))) {
- uint32_t val = IOCSR_IPI_SEND_BLOCKING;
+ uint32_t val;
- val |= (irq - 1);
- val |= (cpu << IOCSR_IPI_SEND_CPU_SHIFT);
- iocsr_write32(val, LOONGARCH_IOCSR_IPI_SEND);
- action &= ~BIT(irq - 1);
- }
+ val = IOCSR_IPI_SEND_BLOCKING | action;
+ val |= (cpu << IOCSR_IPI_SEND_CPU_SHIFT);
+ iocsr_write32(val, LOONGARCH_IOCSR_IPI_SEND);
}
-void loongson_send_ipi_single(int cpu, unsigned int action)
+static void loongson_send_ipi_single(int cpu, unsigned int action)
{
ipi_write_action(cpu_logical_map(cpu), (u32)action);
}
-void loongson_send_ipi_mask(const struct cpumask *mask, unsigned int action)
+static void loongson_send_ipi_mask(const struct cpumask *mask, unsigned int action)
{
unsigned int i;
@@ -222,11 +213,11 @@ void loongson_send_ipi_mask(const struct cpumask *mask, unsigned int action)
*/
void arch_smp_send_reschedule(int cpu)
{
- loongson_send_ipi_single(cpu, SMP_RESCHEDULE);
+ mp_ops.send_ipi_single(cpu, ACTION_RESCHEDULE);
}
EXPORT_SYMBOL_GPL(arch_smp_send_reschedule);
-irqreturn_t loongson_ipi_interrupt(int irq, void *dev)
+static irqreturn_t loongson_ipi_interrupt(int irq, void *dev)
{
unsigned int action;
unsigned int cpu = smp_processor_id();
@@ -246,6 +237,26 @@ irqreturn_t loongson_ipi_interrupt(int irq, void *dev)
return IRQ_HANDLED;
}
+static void loongson_init_ipi(void)
+{
+ int r, ipi_irq;
+
+ ipi_irq = get_percpu_irq(INT_IPI);
+ if (ipi_irq < 0)
+ panic("IPI IRQ mapping failed\n");
+
+ irq_set_percpu_devid(ipi_irq);
+ r = request_percpu_irq(ipi_irq, loongson_ipi_interrupt, "IPI", &irq_stat);
+ if (r < 0)
+ panic("IPI IRQ request failed\n");
+}
+
+struct smp_ops mp_ops = {
+ .init_ipi = loongson_init_ipi,
+ .send_ipi_single = loongson_send_ipi_single,
+ .send_ipi_mask = loongson_send_ipi_mask,
+};
+
static void __init fdt_smp_setup(void)
{
#ifdef CONFIG_OF
@@ -289,6 +300,7 @@ void __init loongson_smp_setup(void)
cpu_data[0].core = cpu_logical_map(0) % loongson_sysconf.cores_per_package;
cpu_data[0].package = cpu_logical_map(0) / loongson_sysconf.cores_per_package;
+ pv_ipi_init();
iocsr_write32(0xffffffff, LOONGARCH_IOCSR_IPI_EN);
pr_info("Detected %i available CPU(s)\n", loongson_sysconf.nr_cpus);
}
@@ -323,7 +335,7 @@ void loongson_boot_secondary(int cpu, struct task_struct *idle)
csr_mail_send(entry, cpu_logical_map(cpu), 0);
- loongson_send_ipi_single(cpu, SMP_BOOT_CPU);
+ loongson_send_ipi_single(cpu, ACTION_BOOT_CPU);
}
/*
@@ -333,7 +345,7 @@ void loongson_init_secondary(void)
{
unsigned int cpu = smp_processor_id();
unsigned int imask = ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 |
- ECFGF_IPI | ECFGF_PMC | ECFGF_TIMER;
+ ECFGF_IPI | ECFGF_PMC | ECFGF_TIMER | ECFGF_SIP0;
change_csr_ecfg(ECFG0_IM, imask);
diff --git a/arch/loongarch/kernel/time.c b/arch/loongarch/kernel/time.c
index e7015f7b70e3..fd5354f9be7c 100644
--- a/arch/loongarch/kernel/time.c
+++ b/arch/loongarch/kernel/time.c
@@ -123,16 +123,6 @@ void sync_counter(void)
csr_write64(init_offset, LOONGARCH_CSR_CNTC);
}
-static int get_timer_irq(void)
-{
- struct irq_domain *d = irq_find_matching_fwnode(cpuintc_handle, DOMAIN_BUS_ANY);
-
- if (d)
- return irq_create_mapping(d, INT_TI);
-
- return -EINVAL;
-}
-
int constant_clockevent_init(void)
{
unsigned int cpu = smp_processor_id();
@@ -142,7 +132,7 @@ int constant_clockevent_init(void)
static int irq = 0, timer_irq_installed = 0;
if (!timer_irq_installed) {
- irq = get_timer_irq();
+ irq = get_percpu_irq(INT_TI);
if (irq < 0)
pr_err("Failed to map irq %d (timer)\n", irq);
}