lwn.git/drivers/gpu/drm/drm_mm.c, branch docs-5.13

drm/mm: cleanup and improve next_hole_*_addr()

2020-06-23T13:46:40+00:00

Skipping just one branch of the tree is not the most effective approach. Instead use a macro to define the traversal functions and sort out both branch sides. This improves the performance of the unit tests by a factor of more than 4. Signed-off-by: Christian König Reviewed-by: Nirmoy Das Link: https://patchwork.freedesktop.org/patch/370298/

drm/mm: optimize find_hole() as well

2020-06-23T13:46:06+00:00

Abort early if there isn't enough space to allocate from a subtree. Signed-off-by: Christian König Acked-by: Nirmoy Das Link: https://patchwork.freedesktop.org/patch/370297/

drm/mm: remove unused rb_hole_size()

2020-06-23T13:37:27+00:00

Just some code cleanup. Signed-off-by: Christian König Reviewed-by: Nirmoy Das Link: https://patchwork.freedesktop.org/patch/370296/

drm/mm: remove invalid entry based optimization

2020-06-15T08:51:18+00:00

When the current entry is rejected as candidate for the search it does not mean that we can abort the subtree search. It is perfectly possible that only the alignment, but not the size is the reason for the rejection. Signed-off-by: Christian König Reviewed-by: Nirmoy Das Link: https://patchwork.freedesktop.org/patch/369394/

drm/mm: fix hole size comparison

2020-06-04T07:57:22+00:00

Fixes: 0cdea4455acd350a ("drm/mm: optimize rb_hole_addr rbtree search") Signed-off-by: Nirmoy Das Reported-by: Christian König Reviewed-by: Christian König Signed-off-by: Christian König Link: https://patchwork.freedesktop.org/patch/367726/

drm/mm: optimize rb_hole_addr rbtree search

2020-05-05T11:39:38+00:00

Userspace can severely fragment rb_hole_addr rbtree by manipulating alignment while allocating buffers. Fragmented rb_hole_addr rbtree would result in large delays while allocating buffer object for a userspace application. It takes long time to find suitable hole because if we fail to find a suitable hole in the first attempt then we look for neighbouring nodes using rb_prev()/rb_next(). Traversing rbtree using rb_prev()/rb_next() can take really long time if the tree is fragmented. This patch improves searches in fragmented rb_hole_addr rbtree by modifying it to an augmented rbtree which will store an extra field in drm_mm_node, subtree_max_hole. Each drm_mm_node now stores maximum hole size for its subtree in drm_mm_node->subtree_max_hole. Using drm_mm_node->subtree_max_hole, it is possible to eliminate a complete subtree if that subtree is unable to serve a request hence reducing number of rb_prev()/rb_next() used. With this patch applied, 1 million bo allocs on amdgpu took ~8 sec, compared to 50k bo allocs which took 28 sec without it. partial test code: int test_fragmentation(void) { int i = 0; uint32_t minor_version; uint32_t major_version; struct amdgpu_bo_alloc_request request = {}; amdgpu_bo_handle vram_handle[MAX_ALLOC] = {}; amdgpu_device_handle device_handle; request.alloc_size = 4096; request.phys_alignment = 8192; request.preferred_heap = AMDGPU_GEM_DOMAIN_VRAM; int fd = open("/dev/dri/card0", O_RDWR | O_CLOEXEC); amdgpu_device_initialize(fd, &major_version, &minor_version, &device_handle); for (i = 0; i < MAX_ALLOC; i++) { amdgpu_bo_alloc(device_handle, &request, &vram_handle[i]); } for (i = 0; i < MAX_ALLOC; i++) amdgpu_bo_free(vram_handle[i]); return 0; } v2: Use RB_DECLARE_CALLBACKS_MAX to maintain subtree_max_hole v3: insert_hole_addr() should be static a function fix return value of next_hole_high_addr()/next_hole_low_addr() Reported-by: kbuild test robot v4: Fix commit message. Signed-off-by: Nirmoy Das Reviewed-by: Chris Wilson Acked-by: Christian König Link: https://patchwork.freedesktop.org/patch/364341/ Signed-off-by: Christian König

drm/mm: revert "Break long searches in fragmented address spaces"

2020-03-31T12:47:51+00:00

This reverts commit 7be1b9b8e9d1e9ef0342d2e001f44eec4030aa4d. The drm_mm is supposed to work in atomic context, so calling schedule() or in this case cond_resched() is illegal. Signed-off-by: Christian König Acked-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/359278/

drm/mm: Remove redundant assignment in drm_mm_reserve_node

2020-03-10T10:25:07+00:00

In Pete Goodliffe words, "You can improve a system by adding new code. You can also improve a system by removing code" - In this case, commit "202b52b7fbf70" added new code to initialize end of the node. So, there is no need for duplicated initialization, and this patch simply removes it. Signed-off-by: Akeem G Abodunrin Cc: Chris Wilson Reviewed-by: Chris Wilson Signed-off-by: Chris Wilson Link: https://patchwork.freedesktop.org/patch/msgid/20200309151156.25040-1-akeem.g.abodunrin@intel.com

drm/mm: Break long searches in fragmented address spaces

2020-03-06T11:15:43+00:00

We try hard to select a suitable hole in the drm_mm first time. But if that is unsuccessful, we then have to look at neighbouring nodes, and this requires traversing the rbtree. Walking the rbtree can be slow (much slower than a linear list for deep trees), and if the drm_mm has been purposefully fragmented our search can be trapped for a long, long time. For non-preemptible kernels, we need to break up long CPU bound sections by manually checking for cond_resched(); similarly we should also bail out if we have been told to terminate. (In an ideal world, we would break for any signal, but we need to trade off having to perform the search again after ERESTARTSYS, which again may form a trap of making no forward progress.) Reported-by: Zbigniew Kempczyński Signed-off-by: Chris Wilson Cc: Zbigniew Kempczyński Cc: Joonas Lahtinen Reviewed-by: Andi Shyti Link: https://patchwork.freedesktop.org/patch/msgid/20200207151720.2812125-1-chris@chris-wilson.co.uk

drm/mm: Use clear_bit_unlock() for releasing the drm_mm_node()

2019-10-04T12:43:43+00:00

A few callers need to serialise the destruction of their drm_mm_node and ensure it is removed from the drm_mm before freeing. However, to be completely sure that any access from another thread is complete before we free the struct, we require the RELEASE semantics of clear_bit_unlock(). This allows the conditional locking such as Thread A Thread B mutex_lock(mm_lock); if (drm_mm_node_allocated(node)) { drm_mm_node_remove(node); mutex_lock(mm_lock); mutex_unlock(mm_lock); if (drm_mm_node_allocated(node)) drm_mm_node_remove(node); mutex_unlock(mm_lock); } kfree(node); to serialise correctly without any lingering accesses from A to the freed node. Allocation / insertion of the node is assumed never to race with removal or eviction scanning. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin Link: https://patchwork.freedesktop.org/patch/msgid/20191003210100.22250-5-chris@chris-wilson.co.uk