diff options
| author | Matthew Auld <matthew.auld@intel.com> | 2026-06-25 16:20:56 +0100 |
|---|---|---|
| committer | Thomas Hellström <thomas.hellstrom@linux.intel.com> | 2026-07-02 12:29:43 +0200 |
| commit | b5c55015d4164a0f206bcdcf2985da948b3c7837 (patch) | |
| tree | bdec1aff493f6002a3e785a91d90bc21d9c7eefd /drivers/gpu | |
| parent | ed8b0d731892c68b41ecbd27c952af284816dec1 (diff) | |
| download | linux-next-b5c55015d4164a0f206bcdcf2985da948b3c7837.tar.gz linux-next-b5c55015d4164a0f206bcdcf2985da948b3c7837.zip | |
drm/xe: fix NPD in bo_meminfo()
When a buffer object is purged, its ttm.resource is set to NULL via the
TTM pipeline gutting flow. However, the BO remains in the client's
object list until userspace explicitly closes the GEM handle. If memory
stats are queried during this time, accessing bo->ttm.resource->mem_type
will result in a NULL pointer dereference.
Fix this by safely skipping purged BOs in bo_meminfo, as they no longer
consume any memory.
User is getting NPD on device resume, and possible theory is that in
bo_move(), if we need to evict something to SYSTEM to save the CCS state,
but the BO is marked as dontneed, this won't trigger a move but will
nuke the pages, leaving us with a NULL bo resource. And the meminfo()
doesn't look ready to handle a NULL resource.
v2 (Sashiko):
- There could potentially be other cases where we might end up with a
NULL resource, so make this a general NULL check for now.
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/8419
Fixes: ad9843aac91a ("drm/xe/madvise: Implement purgeable buffer object support")
Assisted-by: Copilot:gemini-3.1-pro-preview
Reported-by: Matthew Schwartz <matthew.schwartz@linux.dev>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Arvind Yadav <arvind.yadav@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Tested-by: Matthew Schwartz <matthew.schwartz@linux.dev>
Link: https://patch.msgid.link/20260625152054.450125-6-matthew.auld@intel.com
(cherry picked from commit c9a8e7daa0afe3161111e27fd92176e608c7f186)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Diffstat (limited to 'drivers/gpu')
| -rw-r--r-- | drivers/gpu/drm/xe/xe_drm_client.c | 12 |
1 files changed, 11 insertions, 1 deletions
diff --git a/drivers/gpu/drm/xe/xe_drm_client.c b/drivers/gpu/drm/xe/xe_drm_client.c index 84b66147bf49..81020b4b344e 100644 --- a/drivers/gpu/drm/xe/xe_drm_client.c +++ b/drivers/gpu/drm/xe/xe_drm_client.c @@ -168,10 +168,20 @@ static void bo_meminfo(struct xe_bo *bo, struct drm_memory_stats stats[TTM_NUM_MEM_TYPES]) { u64 sz = xe_bo_size(bo); - u32 mem_type = bo->ttm.resource->mem_type; + u32 mem_type; xe_bo_assert_held(bo); + /* + * The resource can be NULL if the BO has been purged, plus maybe some + * other cases. Either way there shouldn't be any memory to account for, + * or a current resource to account this against, so skip for now. + */ + if (!bo->ttm.resource) + return; + + mem_type = bo->ttm.resource->mem_type; + if (drm_gem_object_is_shared_for_memory_stats(&bo->ttm.base)) stats[mem_type].shared += sz; else |
