From 4c93c355d5d563f300df7e61ef753d7a064411e9 Mon Sep 17 00:00:00 2001 From: Christoph Lameter Date: Tue, 16 Oct 2007 01:26:08 -0700 Subject: SLUB: Place kmem_cache_cpu structures in a NUMA aware way The kmem_cache_cpu structures introduced are currently an array placed in the kmem_cache struct. Meaning the kmem_cache_cpu structures are overwhelmingly on the wrong node for systems with a higher amount of nodes. These are performance critical structures since the per node information has to be touched for every alloc and free in a slab. In order to place the kmem_cache_cpu structure optimally we put an array of pointers to kmem_cache_cpu structs in kmem_cache (similar to SLAB). However, the kmem_cache_cpu structures can now be allocated in a more intelligent way. We would like to put per cpu structures for the same cpu but different slab caches in cachelines together to save space and decrease the cache footprint. However, the slab allocators itself control only allocations per node. We set up a simple per cpu array for every processor with 100 per cpu structures which is usually enough to get them all set up right. If we run out then we fall back to kmalloc_node. This also solves the bootstrap problem since we do not have to use slab allocator functions early in boot to get memory for the small per cpu structures. Pro: - NUMA aware placement improves memory performance - All global structures in struct kmem_cache become readonly - Dense packing of per cpu structures reduces cacheline footprint in SMP and NUMA. - Potential avoidance of exclusive cacheline fetches on the free and alloc hotpath since multiple kmem_cache_cpu structures are in one cacheline. This is particularly important for the kmalloc array. Cons: - Additional reference to one read only cacheline (per cpu array of pointers to kmem_cache_cpu) in both slab_alloc() and slab_free(). [akinobu.mita@gmail.com: fix cpu hotplug offline/online path] Signed-off-by: Christoph Lameter Cc: "Pekka Enberg" Cc: Akinobu Mita Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/slub_def.h | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) (limited to 'include/linux/slub_def.h') diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h index 92e10cf6d0e8..f74716b59ce2 100644 --- a/include/linux/slub_def.h +++ b/include/linux/slub_def.h @@ -16,8 +16,7 @@ struct kmem_cache_cpu { struct page *page; int node; unsigned int offset; - /* Lots of wasted space */ -} ____cacheline_aligned_in_smp; +}; struct kmem_cache_node { spinlock_t list_lock; /* Protect partial list and nr_partial */ @@ -62,7 +61,11 @@ struct kmem_cache { int defrag_ratio; struct kmem_cache_node *node[MAX_NUMNODES]; #endif - struct kmem_cache_cpu cpu_slab[NR_CPUS]; +#ifdef CONFIG_SMP + struct kmem_cache_cpu *cpu_slab[NR_CPUS]; +#else + struct kmem_cache_cpu cpu_slab; +#endif }; /* -- cgit v1.2.3