drm/buddy: Improve offset-aligned allocation handling

Large alignment requests previously forced the buddy allocator to search by alignment order, which often caused higher-order free blocks to be split even when a suitably aligned smaller region already existed within them. This led to excessive fragmentation, especially for workloads requesting small sizes with large alignment constraints. This change prioritizes the requested allocation size during the search and uses an augmented RB-tree field (subtree_max_alignment) to efficiently locate free blocks that satisfy both size and offset-alignment requirements. As a result, the allocator can directly select an aligned sub-region without splitting larger blocks unnecessarily. A practical example is the VKCTS test dEQP-VK.memory.allocation.basic.size_8KiB.reverse.count_4000, which repeatedly allocates 8 KiB buffers with a 256 KiB alignment. Previously, such allocations caused large blocks to be split aggressively, despite smaller aligned regions being sufficient. With this change, those aligned regions are reused directly, significantly reducing fragmentation. This improvement is visible in the amdgpu VRAM buddy allocator state (/sys/kernel/debug/dri/1/amdgpu_vram_mm). After the change, higher-order blocks are preserved and the number of low-order fragments is substantially reduced. Before: order- 5 free: 1936 MiB, blocks: 15490 order- 4 free: 967 MiB, blocks: 15486 order- 3 free: 483 MiB, blocks: 15485 order- 2 free: 241 MiB, blocks: 15486 order- 1 free: 241 MiB, blocks: 30948 After: order- 5 free: 493 MiB, blocks: 3941 order- 4 free: 246 MiB, blocks: 3943 order- 3 free: 123 MiB, blocks: 4101 order- 2 free: 61 MiB, blocks: 4101 order- 1 free: 61 MiB, blocks: 8018 By avoiding unnecessary splits, this change improves allocator efficiency and helps maintain larger contiguous free regions under heavy offset-aligned allocation workloads. v2:(Matthew) - Update augmented information along the path to the inserted node. v3: - Move the patch to gpu/buddy.c file. v4:(Matthew) - Use the helper instead of calling _ffs directly - Remove gpu_buddy_block_order(block) >= order check and drop order - Drop !node check as all callers handle this already - Return larger than any other possible alignment for __ffs64(0) - Replace __ffs with __ffs64 v5:(Matthew) - Drop subtree_max_alignment initialization at gpu_block_alloc() Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260306060155.2114-1-Arunpravin.PaneerSelvam@amd.com
author: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> 2026-03-06 11:31:54 +0530
committer: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> 2026-03-09 12:36:10 +0530
commit: 493740d790cce709d285cd1022d16d05439b7d5b (patch)
tree: 1a50cf9c900152e77096b6b99c58cbeb32ab3820 /include/linux
parent: e597a809a2b97e927060ba182f58eb3e6101bc70 (diff)
download: lwn-493740d790cce709d285cd1022d16d05439b7d5b.tar.gz
lwn-493740d790cce709d285cd1022d16d05439b7d5b.zip
1 files changed, 2 insertions, 0 deletions
diff --git a/include/linux/gpu_buddy.h b/include/linux/gpu_buddy.h
index f1fb6eff604a..5fa917ba5450 100644
--- a/include/linux/gpu_buddy.h
+++ b/include/linux/gpu_buddy.h
@@ -11,6 +11,7 @@
 #include <linux/slab.h>
 #include <linux/sched.h>
 #include <linux/rbtree.h>
+#include <linux/rbtree_augmented.h>
 
 /**
  * GPU_BUDDY_RANGE_ALLOCATION - Allocate within a specific address range
@@ -128,6 +129,7 @@ struct gpu_buddy_block {
 	};
 /* private: */
 	struct list_head tmp_link;
+	unsigned int subtree_max_alignment;
 };
 
 /* Order-zero must be at least SZ_4K */
author	Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>	2026-03-06 11:31:54 +0530
committer	Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>	2026-03-09 12:36:10 +0530
commit	493740d790cce709d285cd1022d16d05439b7d5b (patch)
tree	1a50cf9c900152e77096b6b99c58cbeb32ab3820 /include/linux
parent	e597a809a2b97e927060ba182f58eb3e6101bc70 (diff)
download	lwn-493740d790cce709d285cd1022d16d05439b7d5b.tar.gz lwn-493740d790cce709d285cd1022d16d05439b7d5b.zip