summaryrefslogtreecommitdiff
path: root/arch/x86
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2024-11-19 15:20:04 -0800
committerLinus Torvalds <torvalds@linux-foundation.org>2024-11-19 15:20:04 -0800
commitfb1dd1403c7b2219b8c1524c909938bd4b3f401f (patch)
tree968c542a3f3394ddcc725b9b9343cde603131b16 /arch/x86
parenta5c93bfec0beca4435d1995bc3ff2ac003fe7552 (diff)
parentff8d523cc4520a5ce86cde0fd57c304e2b4f61b3 (diff)
downloadlwn-fb1dd1403c7b2219b8c1524c909938bd4b3f401f.tar.gz
lwn-fb1dd1403c7b2219b8c1524c909938bd4b3f401f.zip
Merge tag 'core-debugobjects-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull debugobjects updates from Thomas Gleixner: - Prevent destroying the kmem_cache on early failure. Destroying a kmem_cache requires work queues to be set up, but in the early failure case they are not yet initializated. So rather leak the cache instead of triggering a BUG. - Reduce parallel pool fill attempts. Refilling the object pool requires to take the global pool lock, which causes a massive performance issue when a large number of CPUs attempt to refill concurrently. It turns out that it's sufficient to let one CPU handle the refill from the to free list and in case there are not enough objects on it to allocate new objects from the kmem cache. This also splits the free list handling from the actual allocation path as that yields better results on RT where allocation is restricted to preemptible code paths. The refill from free list has no such restrictions. - Consolidate the global and the per CPU pools to use the same data structure, so all helper functions can be shared. - Simplify the object allocation/free logic. The allocation/free logic is an incomprehensible maze, which tries to utilize the to free list and the global pool in the best way. This all can be simplified into a straight forward comprehensible code flow. - Convert the allocation/free mechanism to batch mode. Transferring objects from the global pool to the per CPU pools or vice versa is done by walking the hlist and moving object by object. That not only increases the pool lock held time, it also dirties up to 17 cache lines. This can be avoided by storing the pointer to the first object in a batch of 16 objects in the objects themself and propagate it through the batch when an object is enqueued into a pool or to a temporary hlist head on allocation. This allows to move batches of objects with at max four cache lines dirtied and reduces the pool lock held time and therefore contention significantly. - Improve the object reusage The current implementation is too agressively freeing unused objects, which is counterproductive on bursty workloads like a kernel compile. Address this by: * increasing the per CPU pool size * refilling the per CPU pool from the to be freed pool when the per CPU pool emptied a batch * keeping track of object usage with a exponentially wheighted moving average which prevents the work queue callback to free objects prematuraly. This combined reduces the allocation/free rate for a full kernel compile significantly: kmem_cache_alloc() kmem_cache_free() Baseline: 380k 330k Improved: 170k 117k - A few cleanups and a more cache line friendly layout of debug information on top. * tag 'core-debugobjects-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) debugobjects: Track object usage to avoid premature freeing of objects debugobjects: Refill per CPU pool more agressively debugobjects: Double the per CPU slots debugobjects: Move pool statistics into global_pool struct debugobjects: Implement batch processing debugobjects: Prepare kmem_cache allocations for batching debugobjects: Prepare for batching debugobjects: Use static key for boot pool selection debugobjects: Rework free_object_work() debugobjects: Rework object freeing debugobjects: Rework object allocation debugobjects: Move min/max count into pool struct debugobjects: Rename and tidy up per CPU pools debugobjects: Use separate list head for boot pool debugobjects: Move pools into a datastructure debugobjects: Reduce parallel pool fill attempts debugobjects: Make debug_objects_enabled bool debugobjects: Provide and use free_object_list() debugobjects: Remove pointless debug printk debugobjects: Reuse put_objects() on OOM ...
Diffstat (limited to 'arch/x86')
0 files changed, 0 insertions, 0 deletions