diff options
| author | Yifan Zhang <yifan1.zhang@amd.com> | 2026-05-11 22:14:23 +0800 |
|---|---|---|
| committer | Alex Deucher <alexander.deucher@amd.com> | 2026-05-19 11:45:41 -0400 |
| commit | 70cadefcc6160c575b04f763ada34c20e868d577 (patch) | |
| tree | ade504a592bf74d5a40441afacd3678cf2ce77ca /drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | |
| parent | 633d6d808d56b73c5455f17f3b082f4c26b38d5f (diff) | |
| download | linux-next-70cadefcc6160c575b04f763ada34c20e868d577.tar.gz linux-next-70cadefcc6160c575b04f763ada34c20e868d577.zip | |
drm/amdgpu: unmap all user mappings of framebuffer and doorbell before mode1 reset
During Mode 1 reset, the ASIC undergoes a reset cycle and becomes temporarily
inaccessible via PCIe. Any attempt to access framebuffer or MMIO registers during
this window can result in uncompleted PCIe transactions, leading to NMI panics or
system hangs.
To prevent this, Unmap all of the applications mappings of the framebuffer
and doorbell BARs before mode1 reset. Also prevent new mappings from coming in
during the reset process.
v2: remove inode in kfd_dev (Christian)
v3: correct unmap offset (Felix), remove prevent new mappings part
to avoid deadlock (Christian)
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Diffstat (limited to 'drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c')
| -rw-r--r-- | drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 25 |
1 files changed, 25 insertions, 0 deletions
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index d9e283f3b57d..9783a3cefb04 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -36,6 +36,9 @@ #include "amdgpu_ras.h" #include "amdgpu_umc.h" #include "amdgpu_reset.h" +#if IS_ENABLED(CONFIG_HSA_AMD) +#include "kfd_priv.h" +#endif /* Total memory size in system memory and all GPU VRAM. Used to * estimate worst case amount of memory to reserve for page tables @@ -320,6 +323,28 @@ void amdgpu_amdkfd_gpu_reset(struct amdgpu_device *adev) (void)amdgpu_reset_domain_schedule(adev->reset_domain, &adev->kfd.reset_work); } +void amdgpu_amdkfd_clear_kfd_mapping(struct amdgpu_device *adev) +{ +#if IS_ENABLED(CONFIG_HSA_AMD) + struct kfd_dev *kfd = adev->kfd.dev; + unsigned int i; + + if (!kfd) + return; + + for (i = 0; i < kfd->num_nodes; i++) { + struct kfd_node *node = kfd->nodes[i]; + + kfd_dev_unmap_mapping_range(KFD_MMAP_TYPE_DOORBELL | + KFD_MMAP_GPU_ID(node->id), + kfd_doorbell_process_slice(kfd)); + kfd_dev_unmap_mapping_range(KFD_MMAP_TYPE_MMIO | + KFD_MMAP_GPU_ID(node->id), + PAGE_SIZE); + } +#endif +} + int amdgpu_amdkfd_alloc_kernel_mem(struct amdgpu_device *adev, size_t size, u32 domain, void **mem_obj, uint64_t *gpu_addr, void **cpu_ptr, bool cp_mqd_gfx9) |
