mm: Add support for unaccepted memory

UEFI Specification version 2.9 introduces the concept of memory acceptance. Some Virtual Machine platforms, such as Intel TDX or AMD SEV-SNP, require memory to be accepted before it can be used by the guest. Accepting happens via a protocol specific to the Virtual Machine platform. There are several ways the kernel can deal with unaccepted memory: 1. Accept all the memory during boot. It is easy to implement and it doesn't have runtime cost once the system is booted. The downside is very long boot time. Accept can be parallelized to multiple CPUs to keep it manageable (i.e. via DEFERRED_STRUCT_PAGE_INIT), but it tends to saturate memory bandwidth and does not scale beyond the point. 2. Accept a block of memory on the first use. It requires more infrastructure and changes in page allocator to make it work, but it provides good boot time. On-demand memory accept means latency spikes every time kernel steps onto a new memory block. The spikes will go away once workload data set size gets stabilized or all memory gets accepted. 3. Accept all memory in background. Introduce a thread (or multiple) that gets memory accepted proactively. It will minimize time the system experience latency spikes on memory allocation while keeping low boot time. This approach cannot function on its own. It is an extension of #2: background memory acceptance requires functional scheduler, but the page allocator may need to tap into unaccepted memory before that. The downside of the approach is that these threads also steal CPU cycles and memory bandwidth from the user's workload and may hurt user experience. Implement #1 and #2 for now. #2 is the default. Some workloads may want to use #1 with accept_memory=eager in kernel command line. #3 can be implemented later based on user's demands. Support of unaccepted memory requires a few changes in core-mm code: - memblock accepts memory on allocation. It serves early boot memory allocations and doesn't limit them to pre-accepted pool of memory. - page allocator accepts memory on the first allocation of the page. When kernel runs out of accepted memory, it accepts memory until the high watermark is reached. It helps to minimize fragmentation. EFI code will provide two helpers if the platform supports unaccepted memory: - accept_memory() makes a range of physical addresses accepted. - range_contains_unaccepted_memory() checks anything within the range of physical addresses requires acceptance. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mike Rapoport <rppt@linux.ibm.com> # memblock Link: https://lore.kernel.org/r/20230606142637.5171-2-kirill.shutemov@linux.intel.com
author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> 2023-06-06 17:26:29 +0300
committer: Borislav Petkov (AMD) <bp@alien8.de> 2023-06-06 16:38:22 +0200
commit: dcdfdd40fa82b6704d2841938e5c8ec3051eb0d6 (patch)
tree: 00d76b51e01723a62127c08fb13cd3c11d3f08e3 /mm/vmstat.c
parent: 9561de3a55bed6bdd44a12820ba81ec416e705a7 (diff)
download: lwn-dcdfdd40fa82b6704d2841938e5c8ec3051eb0d6.tar.gz
lwn-dcdfdd40fa82b6704d2841938e5c8ec3051eb0d6.zip
1 files changed, 3 insertions, 0 deletions
diff --git a/mm/vmstat.c b/mm/vmstat.c
index c28046371b45..282349cabf01 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1180,6 +1180,9 @@ const char * const vmstat_text[] = {
 	"nr_zspages",
 #endif
 	"nr_free_cma",
+#ifdef CONFIG_UNACCEPTED_MEMORY
+	"nr_unaccepted",
+#endif
 
 	/* enum numa_stat_item counters */
 #ifdef CONFIG_NUMA
author	Kirill A. Shutemov <kirill.shutemov@linux.intel.com>	2023-06-06 17:26:29 +0300
committer	Borislav Petkov (AMD) <bp@alien8.de>	2023-06-06 16:38:22 +0200
commit	dcdfdd40fa82b6704d2841938e5c8ec3051eb0d6 (patch)
tree	00d76b51e01723a62127c08fb13cd3c11d3f08e3 /mm/vmstat.c
parent	9561de3a55bed6bdd44a12820ba81ec416e705a7 (diff)
download	lwn-dcdfdd40fa82b6704d2841938e5c8ec3051eb0d6.tar.gz lwn-dcdfdd40fa82b6704d2841938e5c8ec3051eb0d6.zip