diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2022-05-25 16:18:27 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2022-05-25 16:18:27 -0700 |
commit | 2518f226c60d8e04d18ba4295500a5b0b8ac7659 (patch) | |
tree | e74de5ca0db01398cbb0c34376f74a81d7583c75 /drivers/gpu/drm/amd/amdkfd/kfd_svm.c | |
parent | 86c87bea6b42100c67418af690919c44de6ede6e (diff) | |
parent | c4955d9cd2fc56c43e78c908dad4e2cac7cc9073 (diff) | |
download | lwn-2518f226c60d8e04d18ba4295500a5b0b8ac7659.tar.gz lwn-2518f226c60d8e04d18ba4295500a5b0b8ac7659.zip |
Merge tag 'drm-next-2022-05-25' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
"Intel have enabled DG2 on certain SKUs for laptops, AMD has started
some new GPU support, msm has user allocated VA controls
dma-buf:
- add dma_resv_replace_fences
- add dma_resv_get_singleton
- make dma_excl_fence private
core:
- EDID parser refactorings
- switch drivers to drm_mode_copy/duplicate
- DRM managed mutex initialization
display-helper:
- put HDMI, SCDC, HDCP, DSC and DP into new module
gem:
- rework fence handling
ttm:
- rework bulk move handling
- add common debugfs for resource managers
- convert to kvcalloc
format helpers:
- support monochrome formats
- RGB888, RGB565 to XRGB8888 conversions
fbdev:
- cfb/sys_imageblit fixes
- pagelist corruption fix
- create offb platform device
- deferred io improvements
sysfb:
- Kconfig rework
- support for VESA mode selection
bridge:
- conversions to devm_drm_of_get_bridge
- conversions to panel_bridge
- analogix_dp - autosuspend support
- it66121 - audio support
- tc358767 - DSI to DPI support
- icn6211 - PLL/I2C fixes, DT property
- adv7611 - enable DRM_BRIDGE_OP_HPD
- anx7625 - fill ELD if no monitor
- dw_hdmi - add audio support
- lontium LT9211 support, i.MXMP LDB
- it6505: Kconfig fix, DPCD set power fix
- adv7511 - CEC support for ADV7535
panel:
- ltk035c5444t, B133UAN01, NV3052C panel support
- DataImage FG040346DSSWBG04 support
- st7735r - DT bindings fix
- ssd130x - fixes
i915:
- DG2 laptop PCI-IDs ("motherboard down")
- Initial RPL-P PCI IDs
- compute engine ABI
- DG2 Tile4 support
- DG2 CCS clear color compression support
- DG2 render/media compression formats support
- ATS-M platform info
- RPL-S PCI IDs added
- Bump ADL-P DMC version to v2.16
- Support static DRRS
- Support multiple eDP/LVDS native mode refresh rates
- DP HDR support for HSW+
- Lots of display refactoring + fixes
- GuC hwconfig support and query
- sysfs support for multi-tile
- fdinfo per-client gpu utilisation
- add geometry subslices query
- fix prime mmap with LMEM
- fix vm open count and remove vma refcounts
- contiguous allocation fixes
- steered register write support
- small PCI BAR enablement
- GuC error capture support
- sunset igpu legacy mmap support for newer devices
- GuC version 70.1.1 support
amdgpu:
- Initial SoC21 support
- SMU 13.x enablement
- SMU 13.0.4 support
- ttm_eu cleanups
- USB-C, GPUVM updates
- TMZ fixes for RV
- RAS support for VCN
- PM sysfs code cleanup
- DC FP rework
- extend CG/PG flags to 64-bit
- SI dpm lockdep fix
- runtime PM fixes
amdkfd:
- RAS/SVM fixes
- TLB flush fixes
- CRIU GWS support
- ignore bogus MEC signals more efficiently
msm:
- Fourcc modifier for tiled but not compressed layouts
- Support for userspace allocated IOVA (GPU virtual address)
- DPU: DSC (Display Stream Compression) support
- DP: eDP support
- DP: conversion to use drm_bridge and drm_bridge_connector
- Merge DPU1 and MDP5 MDSS driver
- DPU: writeback support
nouveau:
- make some structures static
- make some variables static
- switch to drm_gem_plane_helper_prepare_fb
radeon:
- misc fixes/cleanups
mxsfb:
- rework crtc mode setting
- LCDIF CRC support
etnaviv:
- fencing improvements
- fix address space collisions
- cleanup MMU reference handling
gma500:
- GEM/GTT improvements
- connector handling fixes
komeda:
- switch to plane reset helper
mediatek:
- MIPI DSI improvements
omapdrm:
- GEM improvements
qxl:
- aarch64 support
vc4:
- add a CL submission tracepoint
- HDMI YUV support
- HDMI/clock improvements
- drop is_hdmi caching
virtio:
- remove restriction of non-zero blob types
vmwgfx:
- support for cursormob and cursorbypass 4
- fence improvements
tidss:
- reset DISPC on startup
solomon:
- SPI support
- DT improvements
sun4i:
- allwinner D1 support
- drop is_hdmi caching
imx:
- use swap() instead of open-coding
- use devm_platform_ioremap_resource
- remove redunant initializations
ast:
- Displayport support
rockchip:
- Refactor IOMMU initialisation
- make some structures static
- replace drm_detect_hdmi_monitor with drm_display_info.is_hdmi
- support swapped YUV formats,
- clock improvements
- rk3568 support
- VOP2 support
mediatek:
- MT8186 support
tegra:
- debugabillity improvements"
* tag 'drm-next-2022-05-25' of git://anongit.freedesktop.org/drm/drm: (1740 commits)
drm/i915/dsi: fix VBT send packet port selection for ICL+
drm/i915/uc: Fix undefined behavior due to shift overflowing the constant
drm/i915/reg: fix undefined behavior due to shift overflowing the constant
drm/i915/gt: Fix use of static in macro mismatch
drm/i915/audio: fix audio code enable/disable pipe logging
drm/i915: Fix CFI violation with show_dynamic_id()
drm/i915: Fix 'mixing different enum types' warnings in intel_display_power.c
drm/i915/gt: Fix build error without CONFIG_PM
drm/msm/dpu: handle pm_runtime_get_sync() errors in bind path
drm/msm/dpu: add DRM_MODE_ROTATE_180 back to supported rotations
drm/msm: don't free the IRQ if it was not requested
drm/msm/dpu: limit writeback modes according to max_linewidth
drm/amd: Don't reset dGPUs if the system is going to s2idle
drm/amdgpu: Unmap legacy queue when MES is enabled
drm: msm: fix possible memory leak in mdp5_crtc_cursor_set()
drm/msm: Fix fb plane offset calculation
drm/msm/a6xx: Fix refcount leak in a6xx_gpu_init
drm/msm/dsi: don't powerup at modeset time for parade-ps8640
drm/rockchip: Change register space names in vop2
dt-bindings: display: rockchip: make reg-names mandatory for VOP2
...
Diffstat (limited to 'drivers/gpu/drm/amd/amdkfd/kfd_svm.c')
-rw-r--r-- | drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 103 |
1 files changed, 67 insertions, 36 deletions
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 3b8856b4cece..29e9ebf6d8d5 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -149,8 +149,7 @@ svm_range_dma_map_dev(struct amdgpu_device *adev, struct svm_range *prange, int i, r; if (!addr) { - addr = kvmalloc_array(prange->npages, sizeof(*addr), - GFP_KERNEL | __GFP_ZERO); + addr = kvcalloc(prange->npages, sizeof(*addr), GFP_KERNEL); if (!addr) return -ENOMEM; prange->dma_addr[gpuidx] = addr; @@ -548,7 +547,7 @@ svm_range_vram_node_new(struct amdgpu_device *adev, struct svm_range *prange, goto reserve_bo_failed; } - r = dma_resv_reserve_shared(bo->tbo.base.resv, 1); + r = dma_resv_reserve_fences(bo->tbo.base.resv, 1); if (r) { pr_debug("failed %d to reserve bo\n", r); amdgpu_bo_unreserve(bo); @@ -686,7 +685,8 @@ svm_range_check_attr(struct kfd_process *p, static void svm_range_apply_attrs(struct kfd_process *p, struct svm_range *prange, - uint32_t nattr, struct kfd_ioctl_svm_attribute *attrs) + uint32_t nattr, struct kfd_ioctl_svm_attribute *attrs, + bool *update_mapping) { uint32_t i; int gpuidx; @@ -702,6 +702,7 @@ svm_range_apply_attrs(struct kfd_process *p, struct svm_range *prange, case KFD_IOCTL_SVM_ATTR_ACCESS: case KFD_IOCTL_SVM_ATTR_ACCESS_IN_PLACE: case KFD_IOCTL_SVM_ATTR_NO_ACCESS: + *update_mapping = true; gpuidx = kfd_process_gpuidx_from_gpuid(p, attrs[i].value); if (attrs[i].type == KFD_IOCTL_SVM_ATTR_NO_ACCESS) { @@ -716,9 +717,11 @@ svm_range_apply_attrs(struct kfd_process *p, struct svm_range *prange, } break; case KFD_IOCTL_SVM_ATTR_SET_FLAGS: + *update_mapping = true; prange->flags |= attrs[i].value; break; case KFD_IOCTL_SVM_ATTR_CLR_FLAGS: + *update_mapping = true; prange->flags &= ~attrs[i].value; break; case KFD_IOCTL_SVM_ATTR_GRANULARITY: @@ -951,6 +954,7 @@ svm_range_split_adjust(struct svm_range *new, struct svm_range *old, new->prefetch_loc = old->prefetch_loc; new->actual_loc = old->actual_loc; new->granularity = old->granularity; + new->mapped_to_gpu = old->mapped_to_gpu; bitmap_copy(new->bitmap_access, old->bitmap_access, MAX_GPU_INSTANCE); bitmap_copy(new->bitmap_aip, old->bitmap_aip, MAX_GPU_INSTANCE); @@ -1188,9 +1192,9 @@ svm_range_unmap_from_gpu(struct amdgpu_device *adev, struct amdgpu_vm *vm, pr_debug("[0x%llx 0x%llx]\n", start, last); - return amdgpu_vm_bo_update_mapping(adev, adev, vm, false, true, NULL, - start, last, init_pte_value, 0, - NULL, NULL, fence, NULL); + return amdgpu_vm_update_range(adev, vm, false, true, true, NULL, start, + last, init_pte_value, 0, 0, NULL, NULL, + fence); } static int @@ -1204,6 +1208,17 @@ svm_range_unmap_from_gpus(struct svm_range *prange, unsigned long start, uint32_t gpuidx; int r = 0; + if (!prange->mapped_to_gpu) { + pr_debug("prange 0x%p [0x%lx 0x%lx] not mapped to GPU\n", + prange, prange->start, prange->last); + return 0; + } + + if (prange->start == start && prange->last == last) { + pr_debug("unmap svms 0x%p prange 0x%p\n", prange->svms, prange); + prange->mapped_to_gpu = false; + } + bitmap_or(bitmap, prange->bitmap_access, prange->bitmap_aip, MAX_GPU_INSTANCE); p = container_of(prange->svms, struct kfd_process, svms); @@ -1239,11 +1254,10 @@ static int svm_range_map_to_gpu(struct kfd_process_device *pdd, struct svm_range *prange, unsigned long offset, unsigned long npages, bool readonly, dma_addr_t *dma_addr, struct amdgpu_device *bo_adev, - struct dma_fence **fence) + struct dma_fence **fence, bool flush_tlb) { struct amdgpu_device *adev = pdd->dev->adev; struct amdgpu_vm *vm = drm_priv_to_vm(pdd->drm_priv); - bool table_freed = false; uint64_t pte_flags; unsigned long last_start; int last_domain; @@ -1278,13 +1292,12 @@ svm_range_map_to_gpu(struct kfd_process_device *pdd, struct svm_range *prange, (last_domain == SVM_RANGE_VRAM_DOMAIN) ? 1 : 0, pte_flags); - r = amdgpu_vm_bo_update_mapping(adev, bo_adev, vm, false, false, - NULL, last_start, - prange->start + i, pte_flags, - last_start - prange->start, - NULL, dma_addr, - &vm->last_update, - &table_freed); + r = amdgpu_vm_update_range(adev, vm, false, false, flush_tlb, NULL, + last_start, prange->start + i, + pte_flags, + last_start - prange->start, + bo_adev ? bo_adev->vm_manager.vram_base_offset : 0, + NULL, dma_addr, &vm->last_update); for (j = last_start - prange->start; j <= i; j++) dma_addr[j] |= last_domain; @@ -1306,8 +1319,6 @@ svm_range_map_to_gpu(struct kfd_process_device *pdd, struct svm_range *prange, if (fence) *fence = dma_fence_get(vm->last_update); - if (table_freed) - kfd_flush_tlb(pdd, TLB_FLUSH_LEGACY); out: return r; } @@ -1315,7 +1326,7 @@ out: static int svm_range_map_to_gpus(struct svm_range *prange, unsigned long offset, unsigned long npages, bool readonly, - unsigned long *bitmap, bool wait) + unsigned long *bitmap, bool wait, bool flush_tlb) { struct kfd_process_device *pdd; struct amdgpu_device *bo_adev; @@ -1350,7 +1361,8 @@ svm_range_map_to_gpus(struct svm_range *prange, unsigned long offset, r = svm_range_map_to_gpu(pdd, prange, offset, npages, readonly, prange->dma_addr[gpuidx], - bo_adev, wait ? &fence : NULL); + bo_adev, wait ? &fence : NULL, + flush_tlb); if (r) break; @@ -1363,6 +1375,8 @@ svm_range_map_to_gpus(struct svm_range *prange, unsigned long offset, break; } } + + kfd_flush_tlb(pdd, TLB_FLUSH_LEGACY); } return r; @@ -1372,7 +1386,7 @@ struct svm_validate_context { struct kfd_process *process; struct svm_range *prange; bool intr; - unsigned long bitmap[MAX_GPU_INSTANCE]; + DECLARE_BITMAP(bitmap, MAX_GPU_INSTANCE); struct ttm_validate_buffer tv[MAX_GPU_INSTANCE]; struct list_head validate_list; struct ww_acquire_ctx ticket; @@ -1469,8 +1483,8 @@ static void *kfd_svm_page_owner(struct kfd_process *p, int32_t gpuidx) * 5. Release page table (and SVM BO) reservation */ static int svm_range_validate_and_map(struct mm_struct *mm, - struct svm_range *prange, - int32_t gpuidx, bool intr, bool wait) + struct svm_range *prange, int32_t gpuidx, + bool intr, bool wait, bool flush_tlb) { struct svm_validate_context ctx; unsigned long start, end, addr; @@ -1509,8 +1523,12 @@ static int svm_range_validate_and_map(struct mm_struct *mm, prange->bitmap_aip, MAX_GPU_INSTANCE); } - if (bitmap_empty(ctx.bitmap, MAX_GPU_INSTANCE)) - return 0; + if (bitmap_empty(ctx.bitmap, MAX_GPU_INSTANCE)) { + if (!prange->mapped_to_gpu) + return 0; + + bitmap_copy(ctx.bitmap, prange->bitmap_access, MAX_GPU_INSTANCE); + } if (prange->actual_loc && !prange->ttm_res) { /* This should never happen. actual_loc gets set by @@ -1582,7 +1600,7 @@ static int svm_range_validate_and_map(struct mm_struct *mm, } r = svm_range_map_to_gpus(prange, offset, npages, readonly, - ctx.bitmap, wait); + ctx.bitmap, wait, flush_tlb); unlock_out: svm_range_unlock(prange); @@ -1590,8 +1608,10 @@ unlock_out: addr = next; } - if (addr == end) + if (addr == end) { prange->validated_once = true; + prange->mapped_to_gpu = true; + } unreserve_out: svm_range_unreserve_bos(&ctx); @@ -1676,7 +1696,7 @@ static void svm_range_restore_work(struct work_struct *work) mutex_lock(&prange->migrate_mutex); r = svm_range_validate_and_map(mm, prange, MAX_GPU_INSTANCE, - false, true); + false, true, false); if (r) pr_debug("failed %d to map 0x%lx to gpus\n", r, prange->start); @@ -1822,6 +1842,7 @@ static struct svm_range *svm_range_clone(struct svm_range *old) new->prefetch_loc = old->prefetch_loc; new->actual_loc = old->actual_loc; new->granularity = old->granularity; + new->mapped_to_gpu = old->mapped_to_gpu; bitmap_copy(new->bitmap_access, old->bitmap_access, MAX_GPU_INSTANCE); bitmap_copy(new->bitmap_aip, old->bitmap_aip, MAX_GPU_INSTANCE); @@ -2687,11 +2708,6 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid, pr_debug("kfd process not founded pasid 0x%x\n", pasid); return 0; } - if (!p->xnack_enabled) { - pr_debug("XNACK not enabled for pasid 0x%x\n", pasid); - r = -EFAULT; - goto out; - } svms = &p->svms; pr_debug("restoring svms 0x%p fault address 0x%llx\n", svms, addr); @@ -2702,6 +2718,12 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid, goto out; } + if (!p->xnack_enabled) { + pr_debug("XNACK not enabled for pasid 0x%x\n", pasid); + r = -EFAULT; + goto out; + } + /* p->lead_thread is available as kfd_process_wq_release flush the work * before releasing task ref. */ @@ -2812,7 +2834,7 @@ retry_write_locked: } } - r = svm_range_validate_and_map(mm, prange, gpuidx, false, false); + r = svm_range_validate_and_map(mm, prange, gpuidx, false, false, false); if (r) pr_debug("failed %d to map svms 0x%p [0x%lx 0x%lx] to gpus\n", r, svms, prange->start, prange->last); @@ -3225,6 +3247,8 @@ svm_range_set_attr(struct kfd_process *p, struct mm_struct *mm, struct svm_range_list *svms; struct svm_range *prange; struct svm_range *next; + bool update_mapping = false; + bool flush_tlb; int r = 0; pr_debug("pasid 0x%x svms 0x%p [0x%llx 0x%llx] pages 0x%llx\n", @@ -3263,7 +3287,7 @@ svm_range_set_attr(struct kfd_process *p, struct mm_struct *mm, svm_range_add_notifier_locked(mm, prange); } list_for_each_entry(prange, &update_list, update_list) { - svm_range_apply_attrs(p, prange, nattr, attrs); + svm_range_apply_attrs(p, prange, nattr, attrs, &update_mapping); /* TODO: unmap ranges from GPU that lost access */ } list_for_each_entry_safe(prange, next, &remove_list, update_list) { @@ -3296,8 +3320,15 @@ svm_range_set_attr(struct kfd_process *p, struct mm_struct *mm, continue; } + if (!migrated && !update_mapping) { + mutex_unlock(&prange->migrate_mutex); + continue; + } + + flush_tlb = !migrated && update_mapping && prange->mapped_to_gpu; + r = svm_range_validate_and_map(mm, prange, MAX_GPU_INSTANCE, - true, true); + true, true, flush_tlb); if (r) pr_debug("failed %d to map svm range\n", r); |