<feed xmlns='http://www.w3.org/2005/Atom'>
<title>lwn.git/arch/x86/kernel/machine_kexec_64.c, branch docs-next</title>
<subtitle>Linux kernel documentation tree maintained by Jonathan Corbet</subtitle>
<id>http://mirrors.hust.edu.cn/git/lwn.git/atom?h=docs-next</id>
<link rel='self' href='http://mirrors.hust.edu.cn/git/lwn.git/atom?h=docs-next'/>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/'/>
<updated>2026-01-13T14:28:59+00:00</updated>
<entry>
<title>x86/crash: Use set_memory_p() instead of __set_memory_prot()</title>
<updated>2026-01-13T14:28:59+00:00</updated>
<author>
<name>Coiby Xu</name>
<email>coxu@redhat.com</email>
</author>
<published>2026-01-06T09:50:58+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=8a4e92b3260ae7664d0531e1b42c38d336e7717a'/>
<id>urn:sha1:8a4e92b3260ae7664d0531e1b42c38d336e7717a</id>
<content type='text'>
set_memory_p() is available to use outside of its compilation unit since:

  030ad7af9437 ("x86/mm: Regularize set_memory_p() parameters and make non-static").

There is no use for __set_memory_prot() anymore so drop it too.

  [ bp: Massage commit message. ]

Signed-off-by: Coiby Xu &lt;coxu@redhat.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Link: https://patch.msgid.link/20260106095100.656292-1-coxu@redhat.com
</content>
</entry>
<entry>
<title>Merge tag 'x86_core_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2025-10-11T18:19:16+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-10-11T18:19:16+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=9591fdb0611dccdeeeeacb99d89f0098737d209b'/>
<id>urn:sha1:9591fdb0611dccdeeeeacb99d89f0098737d209b</id>
<content type='text'>
Pull more x86 updates from Borislav Petkov:

 - Remove a bunch of asm implementing condition flags testing in KVM's
   emulator in favor of int3_emulate_jcc() which is written in C

 - Replace KVM fastops with C-based stubs which avoids problems with the
   fastop infra related to latter not adhering to the C ABI due to their
   special calling convention and, more importantly, bypassing compiler
   control-flow integrity checking because they're written in asm

 - Remove wrongly used static branches and other ugliness accumulated
   over time in hyperv's hypercall implementation with a proper static
   function call to the correct hypervisor call variant

 - Add some fixes and modifications to allow running FRED-enabled
   kernels in KVM even on non-FRED hardware

 - Add kCFI improvements like validating indirect calls and prepare for
   enabling kCFI with GCC. Add cmdline params documentation and other
   code cleanups

 - Use the single-byte 0xd6 insn as the official #UD single-byte
   undefined opcode instruction as agreed upon by both x86 vendors

 - Other smaller cleanups and touchups all over the place

* tag 'x86_core_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  x86,retpoline: Optimize patch_retpoline()
  x86,ibt: Use UDB instead of 0xEA
  x86/cfi: Remove __noinitretpoline and __noretpoline
  x86/cfi: Add "debug" option to "cfi=" bootparam
  x86/cfi: Standardize on common "CFI:" prefix for CFI reports
  x86/cfi: Document the "cfi=" bootparam options
  x86/traps: Clarify KCFI instruction layout
  compiler_types.h: Move __nocfi out of compiler-specific header
  objtool: Validate kCFI calls
  x86/fred: KVM: VMX: Always use FRED for IRQs when CONFIG_X86_FRED=y
  x86/fred: Play nice with invoking asm_fred_entry_from_kvm() on non-FRED hardware
  x86/fred: Install system vector handlers even if FRED isn't fully enabled
  x86/hyperv: Use direct call to hypercall-page
  x86/hyperv: Clean up hv_do_hypercall()
  KVM: x86: Remove fastops
  KVM: x86: Convert em_salc() to C
  KVM: x86: Introduce EM_ASM_3WCL
  KVM: x86: Introduce EM_ASM_1SRC2
  KVM: x86: Introduce EM_ASM_2CL
  KVM: x86: Introduce EM_ASM_2W
  ...
</content>
</entry>
<entry>
<title>x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum</title>
<updated>2025-09-05T17:40:40+00:00</updated>
<author>
<name>Kai Huang</name>
<email>kai.huang@intel.com</email>
</author>
<published>2025-09-01T16:09:27+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=b18651f70ce0e45d52b9e66d9065b831b3f30784'/>
<id>urn:sha1:b18651f70ce0e45d52b9e66d9065b831b3f30784</id>
<content type='text'>
Some early TDX-capable platforms have an erratum: A kernel partial
write (a write transaction of less than cacheline lands at memory
controller) to TDX private memory poisons that memory, and a subsequent
read triggers a machine check.

On those platforms, the old kernel must reset TDX private memory before
jumping to the new kernel, otherwise the new kernel may see unexpected
machine check.  Currently the kernel doesn't track which page is a TDX
private page.  For simplicity just fail kexec/kdump for those platforms.

Leverage the existing machine_kexec_prepare() to fail kexec/kdump by
adding the check of the presence of the TDX erratum (which is only
checked for if the kernel is built with TDX host support).  This rejects
kexec/kdump when the kernel is loading the kexec/kdump kernel image.

The alternative is to reject kexec/kdump when the kernel is jumping to
the new kernel.  But for kexec this requires adding a new check (e.g.,
arch_kexec_allowed()) in the common code to fail kernel_kexec() at early
stage.  Kdump (crash_kexec()) needs similar check, but it's hard to
justify because crash_kexec() is not supposed to abort.

It's feasible to further relax this limitation, i.e., only fail kexec
when TDX is actually enabled by the kernel.  But this is still a half
measure compared to resetting TDX private memory so just do the simplest
thing for now.

The impact to userspace is the users will get an error when loading the
kexec/kdump kernel image:

  kexec_load failed: Operation not supported

This might be confusing to the users, thus also print the reason in the
dmesg:

  [..] kexec: Not allowed on platform with tdx_pw_mce bug.

Signed-off-by: Kai Huang &lt;kai.huang@intel.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Signed-off-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Reviewed-by: Rick Edgecombe &lt;rick.p.edgecombe@intel.com&gt;
Reviewed-by: Binbin Wu &lt;binbin.wu@linux.intel.com&gt;
Tested-by: Farrah Chen &lt;farrah.chen@intel.com&gt;
Link: https://lore.kernel.org/all/20250901160930.1785244-5-pbonzini%40redhat.com
</content>
</entry>
<entry>
<title>x86/sme: Use percpu boolean to control WBINVD during kexec</title>
<updated>2025-09-05T17:40:40+00:00</updated>
<author>
<name>Kai Huang</name>
<email>kai.huang@intel.com</email>
</author>
<published>2025-09-01T16:09:25+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=83214a775f33bc9d61c2c284f2ace3f854a4cddb'/>
<id>urn:sha1:83214a775f33bc9d61c2c284f2ace3f854a4cddb</id>
<content type='text'>
TL;DR:

Prepare to unify how TDX and SME do cache flushing during kexec by
making a percpu boolean control whether to do the WBINVD.

-- Background --

On SME platforms, dirty cacheline aliases with and without encryption
bit can coexist, and the CPU can flush them back to memory in random
order.  During kexec, the caches must be flushed before jumping to the
new kernel otherwise the dirty cachelines could silently corrupt the
memory used by the new kernel due to different encryption property.

TDX also needs a cache flush during kexec for the same reason.  It would
be good to have a generic way to flush the cache instead of scattering
checks for each feature all around.

When SME is enabled, the kernel basically encrypts all memory including
the kernel itself and a simple memory write from the kernel could dirty
cachelines.  Currently, the kernel uses WBINVD to flush the cache for
SME during kexec in two places:

1) the one in stop_this_cpu() for all remote CPUs when the kexec-ing CPU
   stops them;
2) the one in the relocate_kernel() where the kexec-ing CPU jumps to the
   new kernel.

-- Solution --

Unlike SME, TDX can only dirty cachelines when it is used (i.e., when
SEAMCALLs are performed).  Since there are no more SEAMCALLs after the
aforementioned WBINVDs, leverage this for TDX.

To unify the approach for SME and TDX, use a percpu boolean to indicate
the cache may be in an incoherent state and needs flushing during kexec,
and set the boolean for SME.  TDX can then leverage it.

While SME could use a global flag (since it's enabled at early boot and
enabled on all CPUs), the percpu flag fits TDX better:

The percpu flag can be set when a CPU makes a SEAMCALL, and cleared when
another WBINVD on the CPU obviates the need for a kexec-time WBINVD.
Saving kexec-time WBINVD is valuable, because there is an existing
race[*] where kexec could proceed while another CPU is active.  WBINVD
could make this race worse, so it's worth skipping it when possible.

-- Side effect to SME --

Today the first WBINVD in the stop_this_cpu() is performed when SME is
*supported* by the platform, and the second WBINVD is done in
relocate_kernel() when SME is *activated* by the kernel.  Make things
simple by changing to do the second WBINVD when the platform supports
SME.  This allows the kernel to simply turn on this percpu boolean when
bringing up a CPU by checking whether the platform supports SME.

No other functional change intended.

[*] The aforementioned race:

During kexec native_stop_other_cpus() is called to stop all remote CPUs
before jumping to the new kernel.  native_stop_other_cpus() firstly
sends normal REBOOT vector IPIs to stop remote CPUs and waits them to
stop.  If that times out, it sends NMI to stop the CPUs that are still
alive.  The race happens when native_stop_other_cpus() has to send NMIs
and could potentially result in the system hang (for more information
please see [1]).

Signed-off-by: Kai Huang &lt;kai.huang@intel.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Signed-off-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Reviewed-by: Tom Lendacky &lt;thomas.lendacky@amd.com&gt;
Reviewed-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Tested-by: Tom Lendacky &lt;thomas.lendacky@amd.com&gt;
Link: https://lore.kernel.org/kvm/b963fcd60abe26c7ec5dc20b42f1a2ebbcc72397.1750934177.git.kai.huang@intel.com/ [1]
Link: https://lore.kernel.org/all/20250901160930.1785244-3-pbonzini%40redhat.com
</content>
</entry>
<entry>
<title>x86/kexec: Consolidate relocate_kernel() function parameters</title>
<updated>2025-09-05T17:40:40+00:00</updated>
<author>
<name>Kai Huang</name>
<email>kai.huang@intel.com</email>
</author>
<published>2025-09-01T16:09:24+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=744b02f62634b64345d05a8a3f145d56469313b4'/>
<id>urn:sha1:744b02f62634b64345d05a8a3f145d56469313b4</id>
<content type='text'>
During kexec, the kernel jumps to the new kernel in relocate_kernel(),
which is implemented in assembly and both 32-bit and 64-bit have their
own version.

Currently, for both 32-bit and 64-bit, the last two parameters of the
relocate_kernel() are both 'unsigned int' but actually they only convey
a boolean, i.e., one bit information.  The 'unsigned int' has enough
space to carry two bits information therefore there's no need to pass
the two booleans in two separate 'unsigned int'.

Consolidate the last two function parameters of relocate_kernel() into a
single 'unsigned int' and pass flags instead.

Only consolidate the 64-bit version albeit the similar optimization can
be done for the 32-bit version too.  Don't bother changing the 32-bit
version while it is working (since assembly code change is required).

Signed-off-by: Kai Huang &lt;kai.huang@intel.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Signed-off-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Reviewed-by: Tom Lendacky &lt;thomas.lendacky@amd.com&gt;
Reviewed-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Link: https://lore.kernel.org/all/20250901160930.1785244-2-pbonzini%40redhat.com
</content>
</entry>
<entry>
<title>objtool: Validate kCFI calls</title>
<updated>2025-08-18T12:23:09+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2025-04-12T11:56:01+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=894af4a1cde61c3401f237184fb770f72ff12df8'/>
<id>urn:sha1:894af4a1cde61c3401f237184fb770f72ff12df8</id>
<content type='text'>
Validate that all indirect calls adhere to kCFI rules. Notably doing
nocfi indirect call to a cfi function is broken.

Apparently some Rust 'core' code violates this and explodes when ran
with FineIBT.

All the ANNOTATE_NOCFI_SYM sites are prime targets for attackers.

 - runtime EFI is especially henous because it also needs to disable
   IBT. Basically calling unknown code without CFI protection at
   runtime is a massice security issue.

 - Kexec image handover; if you can exploit this, you get to keep it :-)

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Josh Poimboeuf &lt;jpoimboe@kernel.org&gt;
Acked-by: Sean Christopherson &lt;seanjc@google.com&gt;
Link: https://lkml.kernel.org/r/20250714103441.496787279@infradead.org
</content>
</entry>
<entry>
<title>Merge tag 'mm-nonmm-stable-2025-05-31-15-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm</title>
<updated>2025-06-01T02:12:53+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-06-01T02:12:53+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=7d4e49a77d9930c69751b9192448fda6ff9100f1'/>
<id>urn:sha1:7d4e49a77d9930c69751b9192448fda6ff9100f1</id>
<content type='text'>
Pull non-MM updates from Andrew Morton:

 - "hung_task: extend blocking task stacktrace dump to semaphore" from
   Lance Yang enhances the hung task detector.

   The detector presently dumps the blocking tasks's stack when it is
   blocked on a mutex. Lance's series extends this to semaphores

 - "nilfs2: improve sanity checks in dirty state propagation" from
   Wentao Liang addresses a couple of minor flaws in nilfs2

 - "scripts/gdb: Fixes related to lx_per_cpu()" from Illia Ostapyshyn
   fixes a couple of issues in the gdb scripts

 - "Support kdump with LUKS encryption by reusing LUKS volume keys" from
   Coiby Xu addresses a usability problem with kdump.

   When the dump device is LUKS-encrypted, the kdump kernel may not have
   the keys to the encrypted filesystem. A full writeup of this is in
   the series [0/N] cover letter

 - "sysfs: add counters for lockups and stalls" from Max Kellermann adds
   /sys/kernel/hardlockup_count and /sys/kernel/hardlockup_count and
   /sys/kernel/rcu_stall_count

 - "fork: Page operation cleanups in the fork code" from Pasha Tatashin
   implements a number of code cleanups in fork.c

 - "scripts/gdb/symbols: determine KASLR offset on s390 during early
   boot" from Ilya Leoshkevich fixes some s390 issues in the gdb
   scripts

* tag 'mm-nonmm-stable-2025-05-31-15-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (67 commits)
  llist: make llist_add_batch() a static inline
  delayacct: remove redundant code and adjust indentation
  squashfs: add optional full compressed block caching
  crash_dump, nvme: select CONFIGFS_FS as built-in
  scripts/gdb/symbols: determine KASLR offset on s390 during early boot
  scripts/gdb/symbols: factor out pagination_off()
  scripts/gdb/symbols: factor out get_vmlinux()
  kernel/panic.c: format kernel-doc comments
  mailmap: update and consolidate Casey Connolly's name and email
  nilfs2: remove wbc-&gt;for_reclaim handling
  fork: define a local GFP_VMAP_STACK
  fork: check charging success before zeroing stack
  fork: clean-up naming of vm_stack/vm_struct variables in vmap stacks code
  fork: clean-up ifdef logic around stack allocation
  kernel/rcu/tree_stall: add /sys/kernel/rcu_stall_count
  kernel/watchdog: add /sys/kernel/{hard,soft}lockup_count
  x86/crash: make the page that stores the dm crypt keys inaccessible
  x86/crash: pass dm crypt keys to kdump kernel
  Revert "x86/mm: Remove unused __set_memory_prot()"
  crash_dump: retrieve dm crypt keys in kdump kernel
  ...
</content>
</entry>
<entry>
<title>x86/crash: make the page that stores the dm crypt keys inaccessible</title>
<updated>2025-05-21T17:48:22+00:00</updated>
<author>
<name>Coiby Xu</name>
<email>coxu@redhat.com</email>
</author>
<published>2025-05-02T01:12:42+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=cc66e4863ac3a00c709bfcbec685d48c184172b5'/>
<id>urn:sha1:cc66e4863ac3a00c709bfcbec685d48c184172b5</id>
<content type='text'>
This adds an addition layer of protection for the saved copy of dm crypt
key.  Trying to access the saved copy will cause page fault.

Link: https://lkml.kernel.org/r/20250502011246.99238-9-coxu@redhat.com
Signed-off-by: Coiby Xu &lt;coxu@redhat.com&gt;
Suggested-by: Pingfan Liu &lt;kernelfans@gmail.com&gt;
Acked-by: Baoquan He &lt;bhe@redhat.com&gt;
Cc: "Daniel P. Berrange" &lt;berrange@redhat.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Dave Young &lt;dyoung@redhat.com&gt;
Cc: Jan Pazdziora &lt;jpazdziora@redhat.com&gt;
Cc: Milan Broz &lt;gmazyland@gmail.com&gt;
Cc: Ondrej Kozina &lt;okozina@redhat.com&gt;
Cc: Vitaly Kuznetsov &lt;vkuznets@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>x86/kexec: Invalidate GDT/IDT from relocate_kernel() instead of earlier</title>
<updated>2025-04-10T10:17:14+00:00</updated>
<author>
<name>David Woodhouse</name>
<email>dwmw@amazon.co.uk</email>
</author>
<published>2025-03-26T14:16:03+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=de085ddd493bccb77a3ec1b99ae7466133540f4d'/>
<id>urn:sha1:de085ddd493bccb77a3ec1b99ae7466133540f4d</id>
<content type='text'>
Reduce the window during which exceptions are unhandled, by leaving the
GDT/IDT in place all the way into the relocate_kernel() function, until
the moment that %cr3 gets replaced.

Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Brian Gerst &lt;brgerst@gmail.com&gt;
Cc: Juergen Gross &lt;jgross@suse.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20250326142404.256980-4-dwmw2@infradead.org
</content>
</entry>
<entry>
<title>x86/kexec: Add 8250 MMIO serial port output</title>
<updated>2025-04-10T10:17:14+00:00</updated>
<author>
<name>David Woodhouse</name>
<email>dwmw@amazon.co.uk</email>
</author>
<published>2025-03-26T14:16:02+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=7516e7216bdfb9e2fab0a0ca3bd23cb2e61e46ed'/>
<id>urn:sha1:7516e7216bdfb9e2fab0a0ca3bd23cb2e61e46ed</id>
<content type='text'>
This supports the same 32-bit MMIO-mapped 8250 as the early_printk code.

It's not clear why the early_printk code supports this form and only this
form; the actual runtime 8250_pci doesn't seem to support it. But having
hacked up QEMU to expose such a device, early_printk does work with it,
and now so does the kexec debug code.

Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Andy Shevchenko &lt;andriy.shevchenko@linux.intel.com&gt;
Cc: Brian Gerst &lt;brgerst@gmail.com&gt;
Cc: Juergen Gross &lt;jgross@suse.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20250326142404.256980-3-dwmw2@infradead.org
</content>
</entry>
</feed>
