diff options
Diffstat (limited to 'Documentation/virt/kvm/api.rst')
-rw-r--r-- | Documentation/virt/kvm/api.rst | 252 |
1 files changed, 236 insertions, 16 deletions
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index c8e2e9cd84dc..11e00a46c610 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -982,12 +982,22 @@ memory. __u8 pad2[30]; }; -If the KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL flag is returned from the -KVM_CAP_XEN_HVM check, it may be set in the flags field of this ioctl. -This requests KVM to generate the contents of the hypercall page -automatically; hypercalls will be intercepted and passed to userspace -through KVM_EXIT_XEN. In this case, all of the blob size and address -fields must be zero. +If certain flags are returned from the KVM_CAP_XEN_HVM check, they may +be set in the flags field of this ioctl: + +The KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL flag requests KVM to generate +the contents of the hypercall page automatically; hypercalls will be +intercepted and passed to userspace through KVM_EXIT_XEN. In this +ase, all of the blob size and address fields must be zero. + +The KVM_XEN_HVM_CONFIG_EVTCHN_SEND flag indicates to KVM that userspace +will always use the KVM_XEN_HVM_EVTCHN_SEND ioctl to deliver event +channel interrupts rather than manipulating the guest's shared_info +structures directly. This, in turn, may allow KVM to enable features +such as intercepting the SCHEDOP_poll hypercall to accelerate PV +spinlock operation for the guest. Userspace may still use the ioctl +to deliver events if it was advertised, even if userspace does not +send this indication that it will always do so No other flags are currently valid in the struct kvm_xen_hvm_config. @@ -1476,14 +1486,43 @@ Possible values are: [s390] KVM_MP_STATE_LOAD the vcpu is in a special load/startup state [s390] + KVM_MP_STATE_SUSPENDED the vcpu is in a suspend state and is waiting + for a wakeup event [arm64] ========================== =============================================== On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an in-kernel irqchip, the multiprocessing state must be maintained by userspace on these architectures. -For arm64/riscv: -^^^^^^^^^^^^^^^^ +For arm64: +^^^^^^^^^^ + +If a vCPU is in the KVM_MP_STATE_SUSPENDED state, KVM will emulate the +architectural execution of a WFI instruction. + +If a wakeup event is recognized, KVM will exit to userspace with a +KVM_SYSTEM_EVENT exit, where the event type is KVM_SYSTEM_EVENT_WAKEUP. If +userspace wants to honor the wakeup, it must set the vCPU's MP state to +KVM_MP_STATE_RUNNABLE. If it does not, KVM will continue to await a wakeup +event in subsequent calls to KVM_RUN. + +.. warning:: + + If userspace intends to keep the vCPU in a SUSPENDED state, it is + strongly recommended that userspace take action to suppress the + wakeup event (such as masking an interrupt). Otherwise, subsequent + calls to KVM_RUN will immediately exit with a KVM_SYSTEM_EVENT_WAKEUP + event and inadvertently waste CPU cycles. + + Additionally, if userspace takes action to suppress a wakeup event, + it is strongly recommended that it also restores the vCPU to its + original state when the vCPU is made RUNNABLE again. For example, + if userspace masked a pending interrupt to suppress the wakeup, + the interrupt should be unmasked before returning control to the + guest. + +For riscv: +^^^^^^^^^^ The only states that are valid are KVM_MP_STATE_STOPPED and KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not. @@ -1887,22 +1926,25 @@ the future. 4.55 KVM_SET_TSC_KHZ -------------------- -:Capability: KVM_CAP_TSC_CONTROL +:Capability: KVM_CAP_TSC_CONTROL / KVM_CAP_VM_TSC_CONTROL :Architectures: x86 -:Type: vcpu ioctl +:Type: vcpu ioctl / vm ioctl :Parameters: virtual tsc_khz :Returns: 0 on success, -1 on error Specifies the tsc frequency for the virtual machine. The unit of the frequency is KHz. +If the KVM_CAP_VM_TSC_CONTROL capability is advertised, this can also +be used as a vm ioctl to set the initial tsc frequency of subsequently +created vCPUs. 4.56 KVM_GET_TSC_KHZ -------------------- -:Capability: KVM_CAP_GET_TSC_KHZ +:Capability: KVM_CAP_GET_TSC_KHZ / KVM_CAP_VM_TSC_CONTROL :Architectures: x86 -:Type: vcpu ioctl +:Type: vcpu ioctl / vm ioctl :Parameters: none :Returns: virtual tsc-khz on success, negative value on error @@ -2601,6 +2643,24 @@ EINVAL. After the vcpu's SVE configuration is finalized, further attempts to write this register will fail with EPERM. +arm64 bitmap feature firmware pseudo-registers have the following bit pattern:: + + 0x6030 0000 0016 <regno:16> + +The bitmap feature firmware registers exposes the hypercall services that +are available for userspace to configure. The set bits corresponds to the +services that are available for the guests to access. By default, KVM +sets all the supported bits during VM initialization. The userspace can +discover the available services via KVM_GET_ONE_REG, and write back the +bitmap corresponding to the features that it wishes guests to see via +KVM_SET_ONE_REG. + +Note: These registers are immutable once any of the vCPUs of the VM has +run at least once. A KVM_SET_ONE_REG in such a scenario will return +a -EBUSY to userspace. + +(See Documentation/virt/kvm/arm/hypercalls.rst for more details.) + MIPS registers are mapped using the lower 32 bits. The upper 16 of that is the register group type: @@ -3754,12 +3814,18 @@ in case of KVM_S390_MEMOP_F_CHECK_ONLY), the ioctl returns a positive error number indicating the type of exception. This exception is also raised directly at the corresponding VCPU if the flag KVM_S390_MEMOP_F_INJECT_EXCEPTION is set. +On protection exceptions, unless specified otherwise, the injected +translation-exception identifier (TEID) indicates suppression. If the KVM_S390_MEMOP_F_SKEY_PROTECTION flag is set, storage key protection is also in effect and may cause exceptions if accesses are prohibited given the access key designated by "key"; the valid range is 0..15. KVM_S390_MEMOP_F_SKEY_PROTECTION is available if KVM_CAP_S390_MEM_OP_EXTENSION is > 0. +Since the accessed memory may span multiple pages and those pages might have +different storage keys, it is possible that a protection exception occurs +after memory has been modified. In this case, if the exception is injected, +the TEID does not indicate suppression. Absolute read/write: ^^^^^^^^^^^^^^^^^^^^ @@ -5216,7 +5282,25 @@ have deterministic behavior. struct { __u64 gfn; } shared_info; - __u64 pad[4]; + struct { + __u32 send_port; + __u32 type; /* EVTCHNSTAT_ipi / EVTCHNSTAT_interdomain */ + __u32 flags; + union { + struct { + __u32 port; + __u32 vcpu; + __u32 priority; + } port; + struct { + __u32 port; /* Zero for eventfd */ + __s32 fd; + } eventfd; + __u32 padding[4]; + } deliver; + } evtchn; + __u32 xen_version; + __u64 pad[8]; } u; }; @@ -5247,6 +5331,30 @@ KVM_XEN_ATTR_TYPE_SHARED_INFO KVM_XEN_ATTR_TYPE_UPCALL_VECTOR Sets the exception vector used to deliver Xen event channel upcalls. + This is the HVM-wide vector injected directly by the hypervisor + (not through the local APIC), typically configured by a guest via + HVM_PARAM_CALLBACK_IRQ. + +KVM_XEN_ATTR_TYPE_EVTCHN + This attribute is available when the KVM_CAP_XEN_HVM ioctl indicates + support for KVM_XEN_HVM_CONFIG_EVTCHN_SEND features. It configures + an outbound port number for interception of EVTCHNOP_send requests + from the guest. A given sending port number may be directed back + to a specified vCPU (by APIC ID) / port / priority on the guest, + or to trigger events on an eventfd. The vCPU and priority can be + changed by setting KVM_XEN_EVTCHN_UPDATE in a subsequent call, + but other fields cannot change for a given sending port. A port + mapping is removed by using KVM_XEN_EVTCHN_DEASSIGN in the flags + field. + +KVM_XEN_ATTR_TYPE_XEN_VERSION + This attribute is available when the KVM_CAP_XEN_HVM ioctl indicates + support for KVM_XEN_HVM_CONFIG_EVTCHN_SEND features. It configures + the 32-bit version code returned to the guest when it invokes the + XENVER_version call; typically (XEN_MAJOR << 16 | XEN_MINOR). PV + Xen guests will often use this to as a dummy hypercall to trigger + event channel delivery, so responding within the kernel without + exiting to userspace is beneficial. 4.127 KVM_XEN_HVM_GET_ATTR -------------------------- @@ -5258,7 +5366,8 @@ KVM_XEN_ATTR_TYPE_UPCALL_VECTOR :Returns: 0 on success, < 0 on error Allows Xen VM attributes to be read. For the structure and types, -see KVM_XEN_HVM_SET_ATTR above. +see KVM_XEN_HVM_SET_ATTR above. The KVM_XEN_ATTR_TYPE_EVTCHN +attribute cannot be read. 4.128 KVM_XEN_VCPU_SET_ATTR --------------------------- @@ -5285,6 +5394,13 @@ see KVM_XEN_HVM_SET_ATTR above. __u64 time_blocked; __u64 time_offline; } runstate; + __u32 vcpu_id; + struct { + __u32 port; + __u32 priority; + __u64 expires_ns; + } timer; + __u8 vector; } u; }; @@ -5326,6 +5442,27 @@ KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST or RUNSTATE_offline) to set the current accounted state as of the adjusted state_entry_time. +KVM_XEN_VCPU_ATTR_TYPE_VCPU_ID + This attribute is available when the KVM_CAP_XEN_HVM ioctl indicates + support for KVM_XEN_HVM_CONFIG_EVTCHN_SEND features. It sets the Xen + vCPU ID of the given vCPU, to allow timer-related VCPU operations to + be intercepted by KVM. + +KVM_XEN_VCPU_ATTR_TYPE_TIMER + This attribute is available when the KVM_CAP_XEN_HVM ioctl indicates + support for KVM_XEN_HVM_CONFIG_EVTCHN_SEND features. It sets the + event channel port/priority for the VIRQ_TIMER of the vCPU, as well + as allowing a pending timer to be saved/restored. + +KVM_XEN_VCPU_ATTR_TYPE_UPCALL_VECTOR + This attribute is available when the KVM_CAP_XEN_HVM ioctl indicates + support for KVM_XEN_HVM_CONFIG_EVTCHN_SEND features. It sets the + per-vCPU local APIC upcall vector, configured by a Xen guest with + the HVMOP_set_evtchn_upcall_vector hypercall. This is typically + used by Windows guests, and is distinct from the HVM-wide upcall + vector configured with HVM_PARAM_CALLBACK_IRQ. + + 4.129 KVM_XEN_VCPU_GET_ATTR --------------------------- @@ -5645,6 +5782,25 @@ enabled with ``arch_prctl()``, but this may change in the future. The offsets of the state save areas in struct kvm_xsave follow the contents of CPUID leaf 0xD on the host. +4.135 KVM_XEN_HVM_EVTCHN_SEND +----------------------------- + +:Capability: KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_EVTCHN_SEND +:Architectures: x86 +:Type: vm ioctl +:Parameters: struct kvm_irq_routing_xen_evtchn +:Returns: 0 on success, < 0 on error + + +:: + + struct kvm_irq_routing_xen_evtchn { + __u32 port; + __u32 vcpu; + __u32 priority; + }; + +This ioctl injects an event channel interrupt directly to the guest vCPU. 5. The kvm_run structure ======================== @@ -5987,6 +6143,9 @@ should put the acknowledged interrupt vector into the 'epr' field. #define KVM_SYSTEM_EVENT_SHUTDOWN 1 #define KVM_SYSTEM_EVENT_RESET 2 #define KVM_SYSTEM_EVENT_CRASH 3 + #define KVM_SYSTEM_EVENT_WAKEUP 4 + #define KVM_SYSTEM_EVENT_SUSPEND 5 + #define KVM_SYSTEM_EVENT_SEV_TERM 6 __u32 type; __u32 ndata; __u64 data[16]; @@ -6011,6 +6170,13 @@ Valid values for 'type' are: has requested a crash condition maintenance. Userspace can choose to ignore the request, or to gather VM memory core dump and/or reset/shutdown of the VM. + - KVM_SYSTEM_EVENT_SEV_TERM -- an AMD SEV guest requested termination. + The guest physical address of the guest's GHCB is stored in `data[0]`. + - KVM_SYSTEM_EVENT_WAKEUP -- the exiting vCPU is in a suspended state and + KVM has recognized a wakeup event. Userspace may honor this event by + marking the exiting vCPU as runnable, or deny it and call KVM_RUN again. + - KVM_SYSTEM_EVENT_SUSPEND -- the guest has requested a suspension of + the VM. If KVM_CAP_SYSTEM_EVENT_DATA is present, the 'data' field can contain architecture specific information for the system-level event. Only @@ -6027,6 +6193,32 @@ Previous versions of Linux defined a `flags` member in this struct. The field is now aliased to `data[0]`. Userspace can assume that it is only written if ndata is greater than 0. +For arm/arm64: +-------------- + +KVM_SYSTEM_EVENT_SUSPEND exits are enabled with the +KVM_CAP_ARM_SYSTEM_SUSPEND VM capability. If a guest invokes the PSCI +SYSTEM_SUSPEND function, KVM will exit to userspace with this event +type. + +It is the sole responsibility of userspace to implement the PSCI +SYSTEM_SUSPEND call according to ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND". +KVM does not change the vCPU's state before exiting to userspace, so +the call parameters are left in-place in the vCPU registers. + +Userspace is _required_ to take action for such an exit. It must +either: + + - Honor the guest request to suspend the VM. Userspace can request + in-kernel emulation of suspension by setting the calling vCPU's + state to KVM_MP_STATE_SUSPENDED. Userspace must configure the vCPU's + state according to the parameters passed to the PSCI function when + the calling vCPU is resumed. See ARM DEN0022D.b 5.19.1 "Intended use" + for details on the function parameters. + + - Deny the guest request to suspend the VM. See ARM DEN0022D.b 5.19.2 + "Caller responsibilities" for possible return values. + :: /* KVM_EXIT_IOAPIC_EOI */ @@ -7147,6 +7339,15 @@ The valid bits in cap.args[0] are: Additionally, when this quirk is disabled, KVM clears CPUID.01H:ECX[bit 3] if IA32_MISC_ENABLE[bit 18] is cleared. + + KVM_X86_QUIRK_FIX_HYPERCALL_INSN By default, KVM rewrites guest + VMMCALL/VMCALL instructions to match the + vendor's hypercall instruction for the + system. When this quirk is disabled, KVM + will no longer rewrite invalid guest + hypercall instructions. Executing the + incorrect hypercall instruction will + generate a #UD within the guest. =================================== ============================================ 8. Other capabilities. @@ -7624,8 +7825,9 @@ PVHVM guests. Valid flags are:: #define KVM_XEN_HVM_CONFIG_HYPERCALL_MSR (1 << 0) #define KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL (1 << 1) #define KVM_XEN_HVM_CONFIG_SHARED_INFO (1 << 2) - #define KVM_XEN_HVM_CONFIG_RUNSTATE (1 << 2) - #define KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL (1 << 3) + #define KVM_XEN_HVM_CONFIG_RUNSTATE (1 << 3) + #define KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL (1 << 4) + #define KVM_XEN_HVM_CONFIG_EVTCHN_SEND (1 << 5) The KVM_XEN_HVM_CONFIG_HYPERCALL_MSR flag indicates that the KVM_XEN_HVM_CONFIG ioctl is available, for the guest to set its hypercall page. @@ -7649,6 +7851,14 @@ The KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL flag indicates that IRQ routing entries of the type KVM_IRQ_ROUTING_XEN_EVTCHN are supported, with the priority field set to indicate 2 level event channel delivery. +The KVM_XEN_HVM_CONFIG_EVTCHN_SEND flag indicates that KVM supports +injecting event channel events directly into the guest with the +KVM_XEN_HVM_EVTCHN_SEND ioctl. It also indicates support for the +KVM_XEN_ATTR_TYPE_EVTCHN/XEN_VERSION HVM attributes and the +KVM_XEN_VCPU_ATTR_TYPE_VCPU_ID/TIMER/UPCALL_VECTOR vCPU attributes. +related to event channel delivery, timers, and the XENVER_version +interception. + 8.31 KVM_CAP_PPC_MULTITCE ------------------------- @@ -7736,6 +7946,16 @@ At this time, KVM_PMU_CAP_DISABLE is the only capability. Setting this capability will disable PMU virtualization for that VM. Usermode should adjust CPUID leaf 0xA to reflect that the PMU is disabled. +8.36 KVM_CAP_ARM_SYSTEM_SUSPEND +------------------------------- + +:Capability: KVM_CAP_ARM_SYSTEM_SUSPEND +:Architectures: arm64 +:Type: vm + +When enabled, KVM will exit to userspace with KVM_EXIT_SYSTEM_EVENT of +type KVM_SYSTEM_EVENT_SUSPEND to process the guest suspend request. + 9. Known KVM API problems ========================= |