<feed xmlns='http://www.w3.org/2005/Atom'>
<title>lwn.git/arch/x86/kvm/svm/nested.c, branch docs-6.1</title>
<subtitle>Linux kernel documentation tree maintained by Jonathan Corbet</subtitle>
<id>http://mirrors.hust.edu.cn/git/lwn.git/atom?h=docs-6.1</id>
<link rel='self' href='http://mirrors.hust.edu.cn/git/lwn.git/atom?h=docs-6.1'/>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/'/>
<updated>2022-07-28T17:22:25+00:00</updated>
<entry>
<title>KVM: x86: Split kvm_is_valid_cr4() and export only the non-vendor bits</title>
<updated>2022-07-28T17:22:25+00:00</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2022-06-07T21:35:50+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=c33f6f2228fe8517e38941a508e9f905f99ecba9'/>
<id>urn:sha1:c33f6f2228fe8517e38941a508e9f905f99ecba9</id>
<content type='text'>
Split the common x86 parts of kvm_is_valid_cr4(), i.e. the reserved bits
checks, into a separate helper, __kvm_is_valid_cr4(), and export only the
inner helper to vendor code in order to prevent nested VMX from calling
back into vmx_is_valid_cr4() via kvm_is_valid_cr4().

On SVM, this is a nop as SVM doesn't place any additional restrictions on
CR4.

On VMX, this is also currently a nop, but only because nested VMX is
missing checks on reserved CR4 bits for nested VM-Enter.  That bug will
be fixed in a future patch, and could simply use kvm_is_valid_cr4() as-is,
but nVMX has _another_ bug where VMXON emulation doesn't enforce VMX's
restrictions on CR0/CR4.  The cleanest and most intuitive way to fix the
VMXON bug is to use nested_host_cr{0,4}_valid().  If the CR4 variant
routes through kvm_is_valid_cr4(), using nested_host_cr4_valid() won't do
the right thing for the VMXON case as vmx_is_valid_cr4() enforces VMX's
restrictions if and only if the vCPU is post-VMXON.

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
Message-Id: &lt;20220607213604.3346000-2-seanjc@google.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: nSVM: Pull CS.Base from actual VMCB12 for soft int/ex re-injection</title>
<updated>2022-07-28T17:22:14+00:00</updated>
<author>
<name>Maciej S. Szmigiero</name>
<email>maciej.szmigiero@oracle.com</email>
</author>
<published>2022-07-18T15:47:13+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=da0b93d65e5bba6d4381f9dc99c547b60582e7a7'/>
<id>urn:sha1:da0b93d65e5bba6d4381f9dc99c547b60582e7a7</id>
<content type='text'>
enter_svm_guest_mode() first calls nested_vmcb02_prepare_control() to copy
control fields from VMCB12 to the current VMCB, then
nested_vmcb02_prepare_save() to perform a similar copy of the save area.

This means that nested_vmcb02_prepare_control() still runs with the
previous save area values in the current VMCB so it shouldn't take the L2
guest CS.Base from this area.

Explicitly pull CS.Base from the actual VMCB12 instead in
enter_svm_guest_mode().

Granted, having a non-zero CS.Base is a very rare thing (and even
impossible in 64-bit mode), having it change between nested VMRUNs is
probably even rarer, but if it happens it would create a really subtle bug
so it's better to fix it upfront.

Fixes: 6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction")
Signed-off-by: Maciej S. Szmigiero &lt;maciej.szmigiero@oracle.com&gt;
Message-Id: &lt;4caa0f67589ae3c22c311ee0e6139496902f2edc.1658159083.git.maciej.szmigiero@oracle.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: x86: nSVM: always intercept x2apic msrs</title>
<updated>2022-06-24T16:50:26+00:00</updated>
<author>
<name>Maxim Levitsky</name>
<email>mlevitsk@redhat.com</email>
</author>
<published>2022-05-19T10:27:02+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=7a8f7c1f3434afc94b6895eff713e0fcea01e3b4'/>
<id>urn:sha1:7a8f7c1f3434afc94b6895eff713e0fcea01e3b4</id>
<content type='text'>
As a preparation for x2avic, this patch ensures that x2apic msrs
are always intercepted for the nested guest.

Reviewed-by: Suravee Suthikulpanit &lt;suravee.suthikulpanit@amd.com&gt;
Tested-by: Suravee Suthikulpanit &lt;suravee.suthikulpanit@amd.com&gt;
Signed-off-by: Maxim Levitsky &lt;mlevitsk@redhat.com&gt;
Message-Id: &lt;20220519102709.24125-11-suravee.suthikulpanit@amd.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'kvm-5.20-early'</title>
<updated>2022-06-09T15:38:12+00:00</updated>
<author>
<name>Paolo Bonzini</name>
<email>pbonzini@redhat.com</email>
</author>
<published>2022-06-09T15:38:12+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=e15f5e6fa6ca1b3baf087314b2541afa935d00e7'/>
<id>urn:sha1:e15f5e6fa6ca1b3baf087314b2541afa935d00e7</id>
<content type='text'>
s390:

* add an interface to provide a hypervisor dump for secure guests

* improve selftests to show tests

x86:

* Intel IPI virtualization

* Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS

* PEBS virtualization

* Simplify PMU emulation by just using PERF_TYPE_RAW events

* More accurate event reinjection on SVM (avoid retrying instructions)

* Allow getting/setting the state of the speaker port data bit

* Rewrite gfn-pfn cache refresh

* Refuse starting the module if VM-Entry/VM-Exit controls are inconsistent

* "Notify" VM exit
</content>
</entry>
<entry>
<title>KVM: x86: SVM: fix nested PAUSE filtering when L0 intercepts PAUSE</title>
<updated>2022-06-09T14:52:21+00:00</updated>
<author>
<name>Paolo Bonzini</name>
<email>pbonzini@redhat.com</email>
</author>
<published>2022-05-31T17:57:32+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=e3cdaab5ff022874e65df80ae8b8382ccc0a4fe0'/>
<id>urn:sha1:e3cdaab5ff022874e65df80ae8b8382ccc0a4fe0</id>
<content type='text'>
Commit 74fd41ed16fd ("KVM: x86: nSVM: support PAUSE filtering when L0
doesn't intercept PAUSE") introduced passthrough support for nested pause
filtering, (when the host doesn't intercept PAUSE) (either disabled with
kvm module param, or disabled with '-overcommit cpu-pm=on')

Before this commit, L1 KVM didn't intercept PAUSE at all; afterwards,
the feature was exposed as supported by KVM cpuid unconditionally, thus
if L1 could try to use it even when the L0 KVM can't really support it.

In this case the fallback caused KVM to intercept each PAUSE instruction;
in some cases, such intercept can slow down the nested guest so much
that it can fail to boot.  Instead, before the problematic commit KVM
was already setting both thresholds to 0 in vmcb02, but after the first
userspace VM exit shrink_ple_window was called and would reset the
pause_filter_count to the default value.

To fix this, change the fallback strategy - ignore the guest threshold
values, but use/update the host threshold values unless the guest
specifically requests disabling PAUSE filtering (either simple or
advanced).

Also fix a minor bug: on nested VM exit, when PAUSE filter counter
were copied back to vmcb01, a dirty bit was not set.

Thanks a lot to Suravee Suthikulpanit for debugging this!

Fixes: 74fd41ed16fd ("KVM: x86: nSVM: support PAUSE filtering when L0 doesn't intercept PAUSE")
Reported-by: Suravee Suthikulpanit &lt;suravee.suthikulpanit@amd.com&gt;
Tested-by: Suravee Suthikulpanit &lt;suravee.suthikulpanit@amd.com&gt;
Co-developed-by: Maxim Levitsky &lt;mlevitsk@redhat.com&gt;
Message-Id: &lt;20220518072709.730031-1-mlevitsk@redhat.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: x86: Introduce "struct kvm_caps" to track misc caps/settings</title>
<updated>2022-06-08T09:21:16+00:00</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2022-05-24T13:56:23+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=938c8745bcf2f732ee928a0b9bd592198a88cfa4'/>
<id>urn:sha1:938c8745bcf2f732ee928a0b9bd592198a88cfa4</id>
<content type='text'>
Add kvm_caps to hold a variety of capabilites and defaults that aren't
handled by kvm_cpu_caps because they aren't CPUID bits in order to reduce
the amount of boilerplate code required to add a new feature.  The vast
majority (all?) of the caps interact with vendor code and are written
only during initialization, i.e. should be tagged __read_mostly, declared
extern in x86.h, and exported.

No functional change intended.

Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
Message-Id: &lt;20220524135624.22988-4-chenyi.qiang@intel.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: nSVM: Transparently handle L1 -&gt; L2 NMI re-injection</title>
<updated>2022-06-08T08:47:03+00:00</updated>
<author>
<name>Maciej S. Szmigiero</name>
<email>maciej.szmigiero@oracle.com</email>
</author>
<published>2022-05-01T22:07:34+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=159fc6fa3b7db24db85598115cc43dc47196919e'/>
<id>urn:sha1:159fc6fa3b7db24db85598115cc43dc47196919e</id>
<content type='text'>
A NMI that L1 wants to inject into its L2 should be directly re-injected,
without causing L0 side effects like engaging NMI blocking for L1.

It's also worth noting that in this case it is L1 responsibility
to track the NMI window status for its L2 guest.

Signed-off-by: Maciej S. Szmigiero &lt;maciej.szmigiero@oracle.com&gt;
Message-Id: &lt;f894d13501cd48157b3069a4b4c7369575ddb60e.1651440202.git.maciej.szmigiero@oracle.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: SVM: Re-inject INTn instead of retrying the insn on "failure"</title>
<updated>2022-06-08T08:46:53+00:00</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2022-05-01T22:07:30+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=7e5b5ef8dca3229a5226eabf53bdc7b67ebd07ad'/>
<id>urn:sha1:7e5b5ef8dca3229a5226eabf53bdc7b67ebd07ad</id>
<content type='text'>
Re-inject INTn software interrupts instead of retrying the instruction if
the CPU encountered an intercepted exception while vectoring the INTn,
e.g. if KVM intercepted a #PF when utilizing shadow paging.  Retrying the
instruction is architecturally wrong e.g. will result in a spurious #DB
if there's a code breakpoint on the INT3/O, and lack of re-injection also
breaks nested virtualization, e.g. if L1 injects a software interrupt and
vectoring the injected interrupt encounters an exception that is
intercepted by L0 but not L1.

Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
Signed-off-by: Maciej S. Szmigiero &lt;maciej.szmigiero@oracle.com&gt;
Message-Id: &lt;1654ad502f860948e4f2d57b8bd881d67301f785.1651440202.git.maciej.szmigiero@oracle.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction</title>
<updated>2022-06-08T08:46:50+00:00</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2022-05-01T22:07:29+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=6ef88d6e36c2b4b3886ec9967cafabe4424d27d5'/>
<id>urn:sha1:6ef88d6e36c2b4b3886ec9967cafabe4424d27d5</id>
<content type='text'>
Re-inject INT3/INTO instead of retrying the instruction if the CPU
encountered an intercepted exception while vectoring the software
exception, e.g. if vectoring INT3 encounters a #PF and KVM is using
shadow paging.  Retrying the instruction is architecturally wrong, e.g.
will result in a spurious #DB if there's a code breakpoint on the INT3/O,
and lack of re-injection also breaks nested virtualization, e.g. if L1
injects a software exception and vectoring the injected exception
encounters an exception that is intercepted by L0 but not L1.

Due to, ahem, deficiencies in the SVM architecture, acquiring the next
RIP may require flowing through the emulator even if NRIPS is supported,
as the CPU clears next_rip if the VM-Exit is due to an exception other
than "exceptions caused by the INT3, INTO, and BOUND instructions".  To
deal with this, "skip" the instruction to calculate next_rip (if it's
not already known), and then unwind the RIP write and any side effects
(RFLAGS updates).

Save the computed next_rip and use it to re-stuff next_rip if injection
doesn't complete.  This allows KVM to do the right thing if next_rip was
known prior to injection, e.g. if L1 injects a soft event into L2, and
there is no backing INTn instruction, e.g. if L1 is injecting an
arbitrary event.

Note, it's impossible to guarantee architectural correctness given SVM's
architectural flaws.  E.g. if the guest executes INTn (no KVM injection),
an exit occurs while vectoring the INTn, and the guest modifies the code
stream while the exit is being handled, KVM will compute the incorrect
next_rip due to "skipping" the wrong instruction.  A future enhancement
to make this less awful would be for KVM to detect that the decoded
instruction is not the correct INTn and drop the to-be-injected soft
event (retrying is a lesser evil compared to shoving the wrong RIP on the
exception stack).

Reported-by: Maciej S. Szmigiero &lt;maciej.szmigiero@oracle.com&gt;
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
Signed-off-by: Maciej S. Szmigiero &lt;maciej.szmigiero@oracle.com&gt;
Message-Id: &lt;65cb88deab40bc1649d509194864312a89bbe02e.1651440202.git.maciej.szmigiero@oracle.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: nSVM: Sync next_rip field from vmcb12 to vmcb02</title>
<updated>2022-06-08T08:46:40+00:00</updated>
<author>
<name>Maciej S. Szmigiero</name>
<email>maciej.szmigiero@oracle.com</email>
</author>
<published>2022-05-01T22:07:25+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=00f08d99dd7d3ab3d8cb1fa356e857fcccbbdde8'/>
<id>urn:sha1:00f08d99dd7d3ab3d8cb1fa356e857fcccbbdde8</id>
<content type='text'>
The next_rip field of a VMCB is *not* an output-only field for a VMRUN.
This field value (instead of the saved guest RIP) in used by the CPU for
the return address pushed on stack when injecting a software interrupt or
INT3 or INTO exception.

Make sure this field gets synced from vmcb12 to vmcb02 when entering L2 or
loading a nested state and NRIPS is exposed to L1.  If NRIPS is supported
in hardware but not exposed to L1 (nrips=0 or hidden by userspace), stuff
vmcb02's next_rip from the new L2 RIP to emulate a !NRIPS CPU (which
saves RIP on the stack as-is).

Reviewed-by: Maxim Levitsky &lt;mlevitsk@redhat.com&gt;
Co-developed-by: Sean Christopherson &lt;seanjc@google.com&gt;
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
Signed-off-by: Maciej S. Szmigiero &lt;maciej.szmigiero@oracle.com&gt;
Message-Id: &lt;c2e0a3d78db3ae30530f11d4e9254b452a89f42b.1651440202.git.maciej.szmigiero@oracle.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
</feed>
