summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm
AgeCommit message (Collapse)Author
10 hoursMerge tag 'drm-fixes-2026-06-13' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm fixes from Dave Airlie: "Looks like it's settled down a bit more thankfully. Small changes across the board, amdgpu/xe leading with some colorop changes in the core/amd. Otherwise some misc driver fixes. colorop: - make lut interpolation mutable - track colorop updates correctly amdgpu: - UserQ fix - Userptr fix - MCCS freesync fix - track colorop changes correctly amdkfd: - Fix an event information leak - Events bounds check fix - Trap cleanup fix i915: - Check supported link rates DPCD read - Fix phys BO pread/pwrite with offset xe: - fix oops in suspend/shutdown without display - RAS fixes - Use HW_ERR prefix in log - include all registered queues in TLB invalidation - Fix refcount leak in xe_range_tree in error paths - fix job timeout recovery for unstarted jobs and kernel queues amdxdna: - fix possible leak of mm_struct ivpu: - fix integer truncation vc4: - fix leak in krealloc() error handling virtio: - fix dma_fence ref-count leak" * tag 'drm-fixes-2026-06-13' of https://gitlab.freedesktop.org/drm/kernel: (24 commits) accel/amdxdna: Fix mm_struct reference leak in aie2_populate_range() drm/xe: fix job timeout recovery for unstarted jobs and kernel queues drm/xe: fix refcount leak in xe_range_fence_insert() drm/xe: include all registered queues in TLB invalidation drm/xe/hw_error: Use HW_ERR prefix in log drm/xe/drm_ras: Add per node cleanup action drm/xe/drm_ras: Make counter allocation drm managed drm/xe/display: fix oops in suspend/shutdown without display drm/amd/display: use plane color_mgmt_changed to track colorop changes drm/atomic: track individual colorop updates drm/colorop: make lut(1/3)d_interpolation props correctly behave as mutable drm/colorop: Remove read-only comments from interpolation fields drm/i915/gem: Fix phys BO pread/pwrite with offset drm/vc4: fix krealloc() memory leak drm/virtio: Fix driver removal with disabled KMS drm/i915/edp: Check supported link rates DPCD read accel/ivpu: Fix signed integer truncation in IPC receive drm/virtio: fix dma_fence refcount leak on error in virtio_gpu_dma_fence_wait() drm/amd/display: Consult MCCS FreeSync cap only if requested & supported drm/amdkfd: Unwind debug trap enable on copy_to_user failure ...
11 hoursMerge tag 'drm-misc-fixes-2026-06-12' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: amd: - track colorop changes correctly amdxdna: - fix possible leak of mm_struct colorop: - make lut interpolation mutable - track colorop updates correctly ivpu: - fix integer truncation vc4: - fix leak in krealloc() error handling virtio: - fix dma_fence ref-count leak Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20260612081418.GA17001@2a02-2455-9062-2500-e496-5a17-62ba-545e.dyn6.pyur.net
29 hoursMerge tag 'drm-xe-fixes-2026-06-11' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes UAPI Changes: Cross-subsystem Changes: Core Changes: Driver Changes: - fix oops in suspend/shutdown without display (Jani) - RAS fixes (Raag) - Use HW_ERR prefix in log (Raag) - include all registered queues in TLB invalidation (Tangudu) - Fix refcount leak in xe_range_tree in error paths (Wentao) - fix job timeout recovery for unstarted jobs and kernel queues (Rodrigo) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/aitt8ZkYmxIT9cdP@gsse-cloud1.jf.intel.com
30 hoursMerge tag 'drm-intel-fixes-2026-06-11' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes - Check supported link rates DPCD read [edp] (Nikita Zhandarovich) - Fix phys BO pread/pwrite with offset [gem] (Joonas Lahtinen) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Tvrtko Ursulin <tursulin@igalia.com> Link: https://patch.msgid.link/aipkcUDnTlzre-8F@linux
44 hoursdrm/xe: fix job timeout recovery for unstarted jobs and kernel queuesRodrigo Vivi
A job that GuC never scheduled (never started) indicates a GuC scheduling failure; previously such jobs were silently errored out instead of triggering a GT reset to recover. Trigger a GT reset and resubmit them, but only when the queue was not already killed or banned: an unstarted job on an already banned queue is the ban working as intended and must neither clear the ban nor kick off a reset, otherwise a banned userspace queue could be resurrected and spam GT resets. Kernel queues are always recovered this way and wedge the device once recovery attempts are exhausted, since kernel work must not silently fail. A started job that times out on a userspace VM bind queue stays banned rather than being reset and retried. The queue is banned early in the timeout handler to signal the G2H scheduling-done handler so it wakes the disable-scheduling waiter; without it the waiter sleeps the full 5s timeout. When a reset is warranted the ban is cleared before rearming so that guc_exec_queue_start() can resubmit jobs after the GT reset - a still-banned queue would block resubmission and cause an infinite TDR loop. The already-banned case is gated out before this point via skip_timeout_check, so it is unaffected. v2: (Himal) Do it for any queue type, not just kernel/migration v3: - (Sashiko and Sanjay): don't clear the ban / GT reset for already killed/banned queues on unstarted-job timeout - Update commit message - (Matt) Add Fixes tag Fixes: fe05cee4d953 ("drm/xe: Don't short circuit TDR on jobs not started") Cc: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Sanjay Yadav <sanjay.kumar.yadav@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Assisted-by: GitHub-Copilot:claude-sonnet-4.6 Assisted-by: GitHub-Copilot:claude-opus-4.8 Tested-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com> Reviewed-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patch.msgid.link/20260610152548.404575-3-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit b1107d085e7e8ed15ba6f80c102528a9c8a6cb0e) Signed-off-by: Matthew Brost <matthew.brost@intel.com>
44 hoursdrm/xe: fix refcount leak in xe_range_fence_insert()Wentao Liang
xe_range_fence_insert() acquires a reference on fence via dma_fence_get() and stores it in rfence->fence. It then calls dma_fence_add_callback() and handles two cases: when the callback is successfully registered (err == 0) the fence is transferred to the tree for later cleanup; when the fence is already signaled (err == -ENOENT) it manually drops the extra reference with dma_fence_put(fence). However, dma_fence_add_callback() can fail with other errors (e.g. -EINVAL) and in that case the code falls through to the free: label without releasing the acquired reference, leaking it. Fix the leak by adding an else branch that calls dma_fence_put() before jumping to free: for any error other than -ENOENT. Fixes: 845f64bdbfc9 ("drm/xe: Introduce a range-fence utility") Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260610172705.3450560-1-matthew.brost@intel.com (cherry picked from commit 98c4a4201290823c2c5c7ba21692bd9a64b61021) Signed-off-by: Matthew Brost <matthew.brost@intel.com>
3 daysdrm/xe: include all registered queues in TLB invalidationTangudu Tilak Tirumalesh
Context-based TLB invalidation currently selects only scheduling-active exec queues via q->ops->active(). During rebind flows, queues may be suspended (or transitioning through resume) while still owning valid translations, causing them to be skipped from invalidation and leading to missed TLB invalidations on LR rebinds. The underlying issue is a TOCTOU: q->guc->state bits are flipped lock-free from enable_scheduling(), disable_scheduling{,_deregister}(), the suspend/resume sched-msg handlers, handle_sched_done(), and guc_exec_queue_stop(); nothing in send_tlb_inval_ctx_ppgtt() serializes against them, so any state-based predicate can race. Include all the registered queues so that TLB invalidations are not missed. This is race-free because list membership on vm->exec_queues.list is stable under vm->exec_queues.lock held by the caller. The performance impact is expected to be minimal and harmless. If it does turn out to be a concern, we can come back with a race-safe solution to ignore certain queues. Fixes: 6cdaa5346d6f ("drm/xe: Add context-based invalidation to GuC TLB invalidation backend") Assisted-by: Claude:claude-opus-4.6 Suggested-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Signed-off-by: Tangudu Tilak Tirumalesh <tilak.tirumalesh.tangudu@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260608162745.338725-2-tilak.tirumalesh.tangudu@intel.com Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> (cherry picked from commit aa625e1e9f0710e424fe4f0e3f032807df81b5b0) Signed-off-by: Matthew Brost <matthew.brost@intel.com>
3 daysdrm/xe/hw_error: Use HW_ERR prefix in logRaag Jadav
Hardware errors should be logged with HW_ERR prefix. Make them consistent with existing logs. Fixes: 01aab7e1c9d4 ("drm/xe/xe_hw_error: Add support for PVC SoC errors") Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Riana Tauro <riana.tauro@intel.com> Link: https://patch.msgid.link/20260602044919.702209-5-raag.jadav@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit ad60a618c49fef07d1860bfb1091140d29f5eddb) Signed-off-by: Matthew Brost <matthew.brost@intel.com>
3 daysdrm/xe/drm_ras: Add per node cleanup actionRaag Jadav
cleanup_node_param() is not registered for previous node in case of counter allocation failure, which results in stale memory of previous node that isn't cleaned up on unwind. Add per node cleanup action which guarantees cleanup on unwind and also simplifies the cleanup logic. Fixes: b40db12b542f ("drm/xe/xe_drm_ras: Add support for XE DRM RAS") Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Riana Tauro <riana.tauro@intel.com> Link: https://patch.msgid.link/20260602044919.702209-4-raag.jadav@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit 67fc5543d8274b2fcbef87734fad0469358f4478) Signed-off-by: Matthew Brost <matthew.brost@intel.com>
3 daysdrm/xe/drm_ras: Make counter allocation drm managedRaag Jadav
cleanup_node_param() is not registered for previous node in case of counter allocation failure, which results in stale memory of previous node that isn't cleaned up on unwind. Fix this using drm managed allocation, which is guaranteed to be cleaned up on unwind. Fixes: b40db12b542f ("drm/xe/xe_drm_ras: Add support for XE DRM RAS") Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Riana Tauro <riana.tauro@intel.com> Link: https://patch.msgid.link/20260602044919.702209-3-raag.jadav@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit 58d77c77ea0c5cb2b755ebe23e973c8272acd896) Signed-off-by: Matthew Brost <matthew.brost@intel.com>
3 daysdrm/xe/display: fix oops in suspend/shutdown without displayJani Nikula
The xe driver keeps track of whether to probe display, and whether display hardware is there, using xe->info.probe_display. It gets set to false if there's no display after intel_display_device_probe(). However, the display may also be disabled via fuses, detected at a later time in intel_display_device_info_runtime_init(). In this case, the xe driver does for_each_intel_crtc() on uninitialized mode config in xe_display_flush_cleanup_work(), leading to a NULL pointer dereference, and generally calls display code with display info cleared. Check for intel_display_device_present() after intel_display_device_info_runtime_init(), and reset xe->info.probe_display as necessary. Also do unset_display_features() for completeness, although display runtime init has already done that. This will need to be unified across all cases later. Move intel_display_device_info_runtime_init() call slightly earlier, similar to i915, to avoid a bunch of unnecessary setup for no display cases. Note #1: The xe driver has no business doing low level display plumbing like for_each_intel_crtc() to begin with. It all needs to happen in display code. Note #2: The actual bug is present already in commit 44e694958b95 ("drm/xe/display: Implement display support"), but the oops was likely introduced later at commit ddf6492e0e50 ("drm/xe/display: Make display suspend/resume work on discrete"). Fixes: 44e694958b95 ("drm/xe/display: Implement display support") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/7904 Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/6150 Cc: stable@vger.kernel.org # v6.8+ Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://patch.msgid.link/20260515160920.1082842-1-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com> (cherry picked from commit 7c3eb9f47533220888a67266448185fd0775d4da) Signed-off-by: Matthew Brost <matthew.brost@intel.com>
3 daysdrm/amd/display: use plane color_mgmt_changed to track colorop changesMelissa Wen
Ensure the driver tracks changes in any colorop property of a plane color pipeline by using the same mechanism of CRTC color management and update plane color blocks when any colorop property changes. It fixes an issue observed on gamescope settings for night mode which is done via shaper/3D-LUT updates. Fixes: 9ba25915efba ("drm/amd/display: Add support for sRGB EOTF in DEGAM block") Reviewed-by: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Melissa Wen <melissa.srw@gmail.com> Link: https://patch.msgid.link/20260609110420.1298352-5-mwen@igalia.com
3 daysdrm/atomic: track individual colorop updatesMelissa Wen
As we do for CRTC color mgmt properties, use color_mgmt_changed flag to track any value changes in the color pipeline of a given plane, so that drivers can update color blocks as soon as plane color pipeline or individual colorop values change. Since we're here, only announce and track changes to plane COLOR_PIPELINE prop if its value is actually changing. Fixes: 8c5ea1745f4c ("drm/colorop: Add BYPASS property") Fixes: 7fa3ee8c0a79 ("drm/colorop: Define LUT_1D interpolation") Fixes: 41651f9d42eb ("drm/colorop: Add 1D Curve subtype") Fixes: 3410108037d5 ("drm/colorop: Add multiplier type") Fixes: db971856bbe0 ("drm/colorop: Add 3D LUT support to color pipeline") Fixes: e5719e7f1900 ("drm/colorop: Add 3x4 CTM type") Fixes: 99a4e4f08abe ("drm/colorop: Add 1D Curve Custom LUT type") Fixes: 2afc3184f3b3 ("drm/plane: Add COLOR PIPELINE property") Reviewed-by: Harry Wentland <harry.wentland@amd.com> #v1 Reviewed-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Fixes: 9ba25915efba ("drm/amd/display: Add support for sRGB EOTF in DEGAM block") Signed-off-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Melissa Wen <melissa.srw@gmail.com> Link: https://patch.msgid.link/20260609110420.1298352-4-mwen@igalia.com
3 daysdrm/colorop: make lut(1/3)d_interpolation props correctly behave as mutableMelissa Wen
As interpolation props are actually mutable props, any changes should be handled by drm_colorop_state. Move their enum and make it correctly behaves as mutable. Fixes: 7fa3ee8c0a79 ("drm/colorop: Define LUT_1D interpolation") Fixes: db971856bbe0 ("drm/colorop: Add 3D LUT support to color pipeline") Reviewed-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Fixes: 9ba25915efba ("drm/amd/display: Add support for sRGB EOTF in DEGAM block") Signed-off-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Melissa Wen <melissa.srw@gmail.com> Link: https://patch.msgid.link/20260609110420.1298352-3-mwen@igalia.com
3 daysdrm/i915/gem: Fix phys BO pread/pwrite with offsetJoonas Lahtinen
sg_page() returns struct page pointer not (void *) so the scaling of pread/pwrite is wrong for phys BO and wrong parts of BO would be accessed if non-zero offset is used. Last impacted platform with overlay or cursor planes using phys mapping was Gen3/945G/Lakeport. Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org> Fixes: c6790dc22312 ("drm/i915: Wean off drm_pci_alloc/drm_pci_free") Cc: <stable@vger.kernel.org> # v4.5+ Cc: Tvrtko Ursulin <tursulin@ursulin.net> Cc: Simona Vetter <simona@ffwll.ch> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Link: https://patch.msgid.link/20260610060314.26111-1-joonas.lahtinen@linux.intel.com (cherry picked from commit 3e49a2f85070b2fb672c1e0fdba281a4ea3aebe6) Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
4 daysdrm/vc4: fix krealloc() memory leakAlexander A. Klimov
Don't just overwrite the original pointer passed to krealloc() with its return value without checking latter: MEM = krealloc(MEM, SZ, GFP); If krealloc() returns NULL, that erases the pointer to the still allocated memory, hence leaks this memory. Instead, use a temporary variable, check it's not NULL and only then assign it to the original pointer: TMP = krealloc(MEM, SZ, GFP); if (!TMP) return; MEM = TMP; While on it, use krealloc_array(). Fixes: 6d45c81d229d ("drm/vc4: Add support for branching in shader validation.") Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de> Signed-off-by: Maíra Canal <mcanal@igalia.com> Link: https://patch.msgid.link/20260606123817.37222-1-grandmaster@al2klimov.de
4 daysdrm/virtio: Fix driver removal with disabled KMSDmitry Osipenko
DRM atomic and modesetting aren't initialized if virtio-gpu driver built with disabled KMS, leading to access of uninitialized data on driver removal/unbinding and crashing kernel. Fix it by skipping shutting down atomic core with unavailable KMS. Fixes: 72122c69d717 ("drm/virtio: Add option to disable KMS support") Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Tested-by: Ryosuke Yasuoka <ryasuoka@redhat.com> Reviewed-by: Ryosuke Yasuoka <ryasuoka@redhat.com> Link: https://patch.msgid.link/20260604122743.13383-1-dmitry.osipenko@collabora.com
4 daysdrm/i915/edp: Check supported link rates DPCD readNikita Zhandarovich
intel_edp_set_sink_rates() reads DP_SUPPORTED_LINK_RATES into a local stack array and then parses the array unconditionally. If the read fails, the array contents are not valid and may result in bogus sink link rates being used. Use drm_dp_dpcd_read_data() and clear the sink rate array on failure, so the existing parser falls back to the default sink rate handling. Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE. Fixes: 68f357cb7347 ("drm/i915/dp: generate and cache sink rate array for all DP, not just eDP 1.4") Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://patch.msgid.link/20260529145759.1640646-1-n.zhandarovich@fintech.ru Signed-off-by: Jani Nikula <jani.nikula@intel.com> (cherry picked from commit bd61c7756b34157e093028225a69383b4b1203cc) Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
5 daysdrm/virtio: fix dma_fence refcount leak on error in virtio_gpu_dma_fence_wait()Wentao Liang
dma_fence_unwrap_for_each() internally calls dma_fence_unwrap_first() which does cursor->chain = dma_fence_get(head), taking an extra reference. On normal loop completion, dma_fence_unwrap_next() releases this via dma_fence_chain_walk() -> dma_fence_put(). When virtio_gpu_do_fence_wait() fails and the function returns early from inside the loop, the cursor->chain reference is never released. This is the only caller in the entire kernel that does an early return inside dma_fence_unwrap_for_each. Add dma_fence_put(itr.chain) before the early return. Cc: stable@vger.kernel.org Fixes: eba57fb5498f ("drm/virtio: Wait for each dma-fence of in-fence array individually") Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Link: https://patch.msgid.link/20260607090303.92423-1-vulab@iscas.ac.cn
5 daysMerge tag 'hyperv-fixes-signed-20260607' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - MSHV driver fixes from various people (Anirudh Rayabharam, Can Peng, Dexuan Cui, Michael Kelley, Jork Loeser, Wei Liu) - Hyper-V user space tools fixes (Thorsten Blum) - Allow VMBus to be unloaded after frame buffer is flushed (Michael Kelley) * tag 'hyperv-fixes-signed-20260607' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: mshv: support 1G hugepages by passing them as 2M-aligned chunks Drivers: hv: vmbus: Improve the logic of reserving fb_mmio on Gen2 VMs mshv: use kmalloc_array in mshv_root_scheduler_init mshv: Add conditional VMBus dependency hyperv: Clean up and fix the guest ID comment in hvgdk.h drm/hyperv: During panic do VMBus unload after frame buffer is flushed Drivers: hv: vmbus: Provide option to skip VMBus unload on panic mshv: unmap debugfs stats pages on kexec mshv: clean up SynIC state on kexec for L1VH mshv: limit SynIC management to MSHV-owned resources hv: utils: replace deprecated strcpy with strscpy in kvp_register hv: utils: handle and propagate errors in kvp_register mshv: add a missing padding field
5 daysMerge tag 'amd-drm-fixes-7.1-2026-06-04' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-7.1-2026-06-04: amdgpu: - UserQ fix - Userptr fix - MCCS freesync fix amdkfd: - Fix an event information leak - Events bounds check fix - Trap cleanup fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20260604230955.19629-1-alexander.deucher@amd.com
7 daysdrm/gem: Try to fix change_handle ioctl, attempt 4Simona Vetter
[airlied: just added some comments on how to reenable] On-list because the cat is out of the bag and we're clearly not good enough to figure this out in private. The story thus far: 5e28b7b94408 ("drm: Set old handle to NULL before prime swap in change_handle") tried to fix a race condition between the gem_close and gem_change_handle ioctls, but got a few things wrong: - There's a confusion with the local variable handle, which is actually the new handle, and so the two-stage trick was actually applied to the wrong idr slot. 7164d78559b0 ("drm/gem: fix race between change_handle and handle_delete") tried to fix that by adding yet another code block, but forgot to add the error handling. Which meant we now have two paths, both kinda wrong. - dc366607c41c ("drm: Replace old pointer to new idr") tried to apply another fix, but inconsistently, again because of the handle confusion - this would be the right fix (kinda, somewhat, it's a mess) if we'd do the two-stage approach for the new handle. Except that wasn't the intent of the original fix. We also didn't have an igt merged for the original ioctl, which is a big no-go. This was attempted to address off-list in the original bugfix, and amd QA people claimed the bug was fixed now. Very clearly that's not the case. Here's my attempt to sort this out: - Rename the local variable to new_handle, the old aliasing with args->handle is just too dangerously confusing. - Merge the gem obj lookup with the two-stage idr_replace so that we avoid getting ourselves confused there. - This means we don't have a surplus temporary reference anymore, only an inherited from the idr. A concurrent gem_close on the new_handle could steal that. Fix that with the same two-stage approach create_tail uses. This is a bit overkill as documented in the comment, but I also don't trust my ability to understand this all correctly, so go with the established pattern we have from other ioctls instead for maximum paranoia. - Adjust error paths. I've tried to make the error and success paths common, because they are identical except for which handle is removed and on which we call idr_replace to (re)install the object again. But that made things messier to read, so I've left it at the more verbose version, which unfortunately hides the symmetry in the entire code flow a bit. - While at it, also replace the 7 space indent with 1 tab. And finally, because I flat out don't trust my abilities here at all anymore: - Disable the ioctl until we have the igt situation and everything else sorted out on-list and with full consensus. v2: Sashiko noticed that I didn't handle the error path for idr_replace correctly, it must be checked with IS_ERR_OR_NULL like in gem_handle_delete. So yeah, definitely should just the existing paths 1:1 because this is endless amounts of tricky. Also add the Fixes: line for the original ioctl, I forgot that too. Reported-by: DARKNAVY (@DarkNavyOrg) <vr@darknavy.com> Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch> Fixes: dc366607c41c ("drm: Replace old pointer to new idr") Cc: syzbot+d7c9eed171647e421013@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Cc: Edward Adam Davis <eadavis@qq.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Fixes: 5e28b7b94408 ("drm: Set old handle to NULL before prime swap in change_handle") Cc: David Francis <David.Francis@amd.com> Cc: Puttimet Thammasaeng <pwn8official@gmail.com> Cc: Christian Koenig <Christian.Koenig@amd.com> Fixes: 7164d78559b0 ("drm/gem: fix race between change_handle and handle_delete") Cc: Zhenghang Xiao <kipreyyy@gmail.com> Fixes: 5e28b7b94408 ("drm: Set old handle to NULL before prime swap in change_handle") Reviewed-by: David Francis <David.Francis@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patch.msgid.link/20260604194437.1725314-1-simona.vetter@ffwll.ch
7 daysMerge tag 'drm-intel-fixes-2026-06-05' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes - Fix color blob reference handling in intel_plane_state (Chaitanya Kumar Borah) - Revert "drm/i915/backlight: Remove try_vesa_interface" [backlight] (Suraj Kandpal) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Tvrtko Ursulin <tursulin@igalia.com> Link: https://patch.msgid.link/aiKgmwz7VGOaFXIv@linux
7 daysMerge tag 'drm-misc-fixes-2026-06-05' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: dumb-buffer: - remove strict limits on buffer geometry ethosu: - reject unsupported NPU_OP_RESIZE - fix index of IFM region - fix weight index - fix overflows in DMA-size calculations - reject DMA commands with uninitialized length - fix OOB write in ethosu_gem_cmdstream_copy_and_validate imx: - fix kernel-doc warnings ivpu: - add overflow checks in firmware handling and get_info_ioctl v3d: - wait for pending L2T flush before cleaning caches - fix leak of vaddr - skip CSD when it has zeroed workgroups - fix ref counting in performance monitoring Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20260605072602.GA268798@linux.fritz.box
8 daysRevert "drm/i915/backlight: Remove try_vesa_interface"Suraj Kandpal
This reverts commit 40d2f5820951dee818d05c14677277048bd85f9f. Removing the try_vesa_interface gate caused a backlight regression on panels whose VBT correctly reports INTEL_BACKLIGHT_DISPLAY_DDI and whose PWM path is the actual backlight control, but whose DPCD optimistically advertises DP_EDP_BACKLIGHT_AUX_ENABLE_CAP / _BRIGHTNESS_AUX_SET_CAP. After the commit such panels silently bind to the VESA AUX backlight funcs; AUX writes complete but the panel ignores them, leaving brightness stuck (no-op backlight). Observed on at least KBL and TGL eDP setups. Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com> Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com> Link: https://patch.msgid.link/20260517024709.1016121-1-suraj.kandpal@intel.com (cherry picked from commit f30fddb4402313aa5301a74d721638d343395269) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
9 daysMerge tag 'drm-xe-fixes-2026-06-04' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes - Revert removing support for unpublished NVL-S GuC (Daniele) - Suspend fixes related to multi-queue (Niranjana) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/aiHPGiPrAyHgwBZl@intel.com
9 daysdrm/amd/display: Consult MCCS FreeSync cap only if requested & supportedMichel Dänzer
When the do_mccs parameter is false, we don't call dm_helpers_read_mccs_caps, so sink->mccs_caps.freesync_supported is unlikely to be true. Fixes: 6f71d5dd3206 ("drm/amd/display: Read sink freesync support via mccs") Bug: https://gitlab.freedesktop.org/drm/amd/-/work_items/5286 Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 115bf5ca318e18a3dc1888ec6271c7052774952a)
9 daysdrm/amdkfd: Unwind debug trap enable on copy_to_user failureYongqiang Sun
If kfd_dbg_trap_enable() fails while copying runtime_info to userspace, it had already activated the trap, set debug_trap_enabled, taken an extra process reference, and opened the debug event file. Return -EFAULT without unwinding that state, leaving inconsistent trap state and a refcount imbalance that could break later DISABLE/ENABLE. On copy_to_user failure, deactivate the trap and undo the rest of the enable setup before returning. Signed-off-by: Yongqiang Sun <Yongqiang.Sun@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 01112e241e37f9ac98b6f418d93ce2e0b87b7ee0)
9 daysdrm/amdkfd: Add bounds check for AMDKFD_IOC_WAIT_EVENTSSunday Clement
The kfd_wait_on_events ioctl passes a user-supplied num_events parameter directly to alloc_event_waiters() which calls kcalloc() without validation. This allows unprivileged users with /dev/kfd access to trigger large kernel memory allocations, potentially causing memory exhaustion and denial of service via the OOM killer. Add a check to reject num_events values exceeding KFD_SIGNAL_EVENT_LIMIT (4096), which is the maximum number of events a single process can create. Signed-off-by: Sunday Clement <Sunday.Clement@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 39eb6da7acee8d0cc12a8959235b590f295d7b4c)
9 daysdrm/amdgpu: restart the CS if some parts of the VM are still invalidatedChristian König
Make sure that we only submit work with full up to date VM page tables. Backport to 7.1 and older. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Tested-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 59720bfd8c6dbebeb8d5a7ab64241b007efd9213) Cc: stable@vger.kernel.org
9 daysdrm/amdgpu/userq: Fix reading timeline points in wait ioctlDavid Rosca
Use correct u64 type. Signed-off-by: David Rosca <david.rosca@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0ac98160dfb6ab3c6d7b38e0ff9687780beed9cb)
9 daysdrm/amdkfd: fix SMI event cross-process information leakYongqiang Sun
kfd_smi_ev_enabled() skips the suser privilege check when pid=0. PROCESS_START, PROCESS_END, and VMFAULT events are emitted with pid=0 while carrying another process's PID and command name, so any /dev/kfd user in the render group can monitor all GPU workloads. Pass the target process PID into kfd_smi_event_add() for these events so the existing per-client filter restricts delivery to the owning process or CAP_SYS_ADMIN subscribers. Signed-off-by: Yongqiang Sun <Yongqiang.Sun@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 92a8dba246d371fe268280e5fd74b0955688e6df)
9 daysdrm/v3d: Fix global performance monitor reference countingMaíra Canal
In the SET_GLOBAL ioctl, v3d_perfmon_find() bumps the reference count on the perfmon it returns, but v3d_perfmon_set_global_ioctl() and v3d_perfmon_delete() fail to release that reference on several paths: 1. v3d_perfmon_set_global_ioctl() leaks the reference on its error paths. 2. CLEAR_GLOBAL leaks both the find reference and the reference previously stashed in v3d->global_perfmon by the SET_GLOBAL ioctl that configured it. 3. Destroying a perfmon that is the current global perfmon leaks the reference stashed by the SET_GLOBAL ioctl. Release each of these references explicitly. Cc: stable@vger.kernel.org Fixes: c6eabbab359c ("drm/v3d: Add DRM_IOCTL_V3D_PERFMON_SET_GLOBAL") Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patch.msgid.link/20260531-v3d-perfmon-lifetime-v2-1-60ed4485a203@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com>
9 daysdrm/xe/multi_queue: skip submit when primary queue is suspendedNiranjana Vishwanathapura
Return early in submit path when the multi-queue primary exec queue is suspended to avoid submitting while suspended. v2: Remove idle_skip_suspend fix as that feature is being reverted here https://patchwork.freedesktop.org/series/167262/ Fixes: bc5775c59258 ("drm/xe/multi_queue: Add GuC interface for multi queue support") Cc: stable@vger.kernel.org # v7.0+ Assisted-by: GitHub-Copilot:claude-sonnet-4.6 Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Link: https://patch.msgid.link/20260603233946.863663-2-niranjana.vishwanathapura@intel.com (cherry picked from commit b7fb55cc3364ca128cfff9d50649ffd4327cd01e) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
9 daysdrm/xe: Clear pending_disable before signaling suspend fenceTangudu Tilak Tirumalesh
In the schedule-disable done path for suspend, we signal the suspend fence before clearing pending_disable. That wakeup can let suspend_wait complete and resume be queued immediately. The resume path may then reach enable_scheduling() while pending_disable is still set and hit the !exec_queue_pending_disable(q) assertion. Fix this by clearing pending_disable before signaling the suspend fence, so any resumed transition observes a consistent state. Fixes: 87651f31ae4e ("drm/xe/guc_submit: fix race around suspend_pending") Cc: stable@vger.kernel.org # v7.0+ Signed-off-by: Tangudu Tilak Tirumalesh <tilak.tirumalesh.tangudu@intel.com> Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patch.msgid.link/20260603065217.3131066-3-tilak.tirumalesh.tangudu@intel.com (cherry picked from commit 4b1ae138b0e103d753773956a84eebc2edbf62c4) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
9 daysRevert "drm/xe: Skip exec queue schedule toggle if queue is idle during suspend"Tangudu Tilak Tirumalesh
This reverts commit 8533051ce92015e9cc6f75e0d52119b9d91610b6. The idle-skip optimization bypasses GuC suspend, so the GPU may not perform the context switch that flushes TLB entries for invalidated userptr VMAs. In LR/preempt-fence VM mode, this can lead to missed TLB invalidation and page faults during userptr invalidation tests. Restore unconditional schedule toggling on suspend so the context-switch TLB flush is always performed. This optimization will be reintroduced with a fix that does not skip suspend in LR/preempt-fence VM mode. Fixes: 8533051ce920 ("drm/xe: Skip exec queue schedule toggle if queue is idle during suspend") Cc: stable@vger.kernel.org # v7.0+ Suggested-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Signed-off-by: Tangudu Tilak Tirumalesh <tilak.tirumalesh.tangudu@intel.com> Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patch.msgid.link/20260603065217.3131066-2-tilak.tirumalesh.tangudu@intel.com (cherry picked from commit 6a1e7934d9a6cf46aecae00a99c2603d1295e170) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
10 daysdrm/amd/pm: smu_v14_0_0: use SoftMin for gfxclk in set_soft_freq_limited_rangePriya Hosur
In smu_v14_0_0_set_soft_freq_limited_range(), the gfxclk floor is programmed via SetHardMinGfxClk together with SetSoftMaxGfxClk. Under power_dpm_force_performance_level=high this pins HardMin to peak gfxclk. In PMFW arbitration HardMin has higher priority than SoftMax, so the firmware thermal/PPT throttler cannot clamp gfxclk via SoftMax once HardMin is set to peak. Replace SetHardMinGfxClk with SetSoftMinGfxclk so the driver still requests peak performance but the firmware throttler retains the ability to clamp gfxclk under thermal/PPT pressure. SoftMax handling is unchanged and no other clock domains are affected. Signed-off-by: Priya Hosur <Priya.Hosur@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 3ea273267fd29cbf6d83ee72329f59eb5042605b) Cc: stable@vger.kernel.org
10 daysdrm/amdgpu: Fix incorrect VRAM GART mappings on non-4K page size systemsDonet Tom
When mapping VRAM pages into the GART page table, amdgpu_gart_map_vram_range() assumes that the system page size is the same as the GPU page size. On systems with non-4K page sizes, multiple GPU pages can exist within a single CPU page. As a result, the mappings are created incorrectly because fewer page table entries are programmed than required. Fix this by programming the mappings correctly for non-4K page size systems. Fixes: 237d623ae659 ("drm/amdgpu/gart: Add helper to bind VRAM pages (v2)") Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Donet Tom <donettom@linux.ibm.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a8f0bc22388f74e0cf4ed8b7d1846c580eaf44cc) Cc: stable@vger.kernel.org
10 daysdrm/amdgpu/userq: move wptr_obj cleanup in mqd_destroySunil Khatri
In case when queue_create fails and mqd has already been allocated and hence wptr_obj is not cleaned up. So moving that cleanup part to mqd_destroy so it takes care of all the cases of clean up and during tear down of the queue. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 43355f62cd2ef5386c2693df537c232ea0f2ce6c)
10 daysdrm/amdgpu: improve the userq seq BO free bit lookupPrike Liang
Use find_next_zero_bit() to locate the next free seq slot bit instead of the current walk, for more efficient bitmap scanning. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ff905a9b6228de9eedd0db71ecb1bdde91fb898d)
10 daysdrm/amdgpu/userq: remove the vital queue unmap loggingSunil Khatri
Mesa userqueues free does not wait for the free to complete and go ahead in unmapping the vital bos while kernel is still in queue free and corresponding cleanup. So ideally we don't need the logging for that and hence remove the warn message as this is expected behaviour and functionally, we are making sure to wait for the required fences before unmap. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 758a868043dcb07eca923bc451c16da3e73dc47c)
10 daysdrm/amdkfd: Fix buffer overflow in SDMA queue checkpoint/restore on GFX11Andrew Martin
The v11 MQD manager incorrectly assigned the CP-compute variants of checkpoint_mqd/restore_mqd for KFD_MQD_TYPE_SDMA queues. These functions use sizeof(struct v11_compute_mqd) (2048 bytes) instead of sizeof(struct v11_sdma_mqd) (512 bytes), causing a 1536-byte overflow. During CRIU checkpoint of an SDMA queue on Navi3x: - checkpoint_mqd() reads 2048 bytes from a 512-byte SDMA MQD buffer, leaking 1536 bytes of adjacent GTT memory to userspace During CRIU restore: - restore_mqd() writes 2048 bytes into a 512-byte SDMA MQD buffer, corrupting 1536 bytes of adjacent GTT memory (often the ring buffer or neighboring MQDs) This is a copy-paste regression unique to v11. All other ASIC backends (cik, vi, v9, v10, v12) correctly use the SDMA-specific variants. Add checkpoint_mqd_sdma() and restore_mqd_sdma() functions that properly handle the smaller v11_sdma_mqd structure, matching the pattern used in other MQD managers. Fixes: cc009e613de6 ("drm/amdkfd: Add KFD support for soc21 v3") Assisted-by: Claude:Sonnet 4-5 Signed-off-by: Andrew Martin <andrew.martin@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6fa41db7ffdec97d62433adf03b7b9b759af8c2c) Cc: stable@vger.kernel.org
10 daysdrm/amdkfd: fix NULL dereference in get_queue_ids()Muhammad Bilal
When usr_queue_id_array is NULL and num_queues is non-zero, get_queue_ids() returns NULL. The callers check only IS_ERR() on the return value; since IS_ERR(NULL) == false the check passes, and suspend_queues() calls q_array_invalidate() which immediately dereferences NULL while iterating num_queues times. Userspace can trigger this via kfd_ioctl_set_debug_trap() by supplying num_queues > 0 with a zero queue_array_ptr, causing a kernel panic. A NULL usr_queue_id_array with num_queues == 0 is a legitimate no-op (q_array_invalidate never executes, and resume_queues already guards all queue_ids dereferences behind a NULL check). Return ERR_PTR(-EINVAL) only when num_queues is non-zero and the pointer is absent; both callers already propagate IS_ERR() returns correctly to userspace. Fixes: a70a93fa568b ("drm/amdkfd: add debug suspend and resume process queues operation") Signed-off-by: Muhammad Bilal <meatuni001@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f165a82cdf503884bb1797771c61b2fcc72113d4) Cc: stable@vger.kernel.org
10 daysdrm/amdgpu: set noretry=1 as default for GFX 10.1.x (Navi10/12/14)Vitaly Prosyak
Problem: While developing the amd_close_race IGT test (which intentionally triggers execute permission faults by removing VM_PAGE_EXECUTABLE from GPU page table entries), we discovered that on Navi10 (GFX 10.1.x) these faults produce zero diagnostic output. The GPU simply hangs silently for ~10s until the scheduler timeout fires. There is no way to distinguish an execute permission fault from any other type of GPU hang. Root cause: GFX 10.1.x defaults to noretry=0, which sets RETRY_PERMISSION_OR_INVALID_PAGE_FAULT=1 in the GFXHUB UTCL2 registers (gfxhub_v2_0.c line 313). With this bit set, permission faults (valid PTE, wrong R/W/X bits) are handled entirely within the UTCL1/UTCL2 hardware loop: UTCL2 returns an XNACK to UTCL1, and UTCL1 re-requests the translation indefinitely, expecting software to eventually fix the permission bits (as happens in SVM/HMM recovery). No interrupt of any kind reaches the IH ring. This is different from invalid-page faults (V=0) which DO generate a retry fault interrupt that the driver can escalate to a no-retry fault. Permission faults with valid PTEs loop silently forever in hardware. GFX 10.3+ already defaults to noretry=1, which makes permission faults generate immediate L2 protection fault interrupts. GFX 10.1.x was inadvertently left out of this default. Fix: Change the noretry=1 threshold from IP_VERSION(10, 3, 0) to IP_VERSION(10, 1, 0) in amdgpu_gmc_noretry_set(). This is a one-line change that aligns GFX 10.1.x behavior with GFX 10.3+ and all newer generations. With noretry=1, the existing non-retry fault handler (gmc_v10_0_process_interrupt) already decodes and prints the full GCVM_L2_PROTECTION_FAULT_STATUS register including PERMISSION_FAULTS, faulting address, VMID, PASID, and process name. No additional logging code is needed — the fix is purely routing permission faults to the existing, fully-capable non-retry interrupt handler. v2: Dropped GFX10-specific logging from gmc_v10_0.c and kfd_int_process_v10.c (Felix Kuehling). v1 added logging in the retry fault handler, but with noretry=1 permission faults take the non-retry path — the v1 retry handler code was dead and would never execute. Tested on Navi10 (GFX 10.1.10): - Execute permission faults now produce immediate, clear output: [gfxhub] page fault (src_id:0 ring:64 vmid:4 pasid:592) Process amd_close_race pid 13380 thread amd_close_race pid 13384 in page at address 0x40001000 from client 0x1b (UTCL2) GCVM_L2_PROTECTION_FAULT_STATUS:0x00700881 PERMISSION_FAULTS: 0x8 - No regressions with properly-mapped GPU workloads Cc: Christian Koenig <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit eb21edd24c40d81066753f8ac6f23bce15745395) Cc: stable@vger.kernel.org
10 daysdrm/amdgpu/gfxhub: Program CRASH_ON_*_FAULT bits to 0 as neededTimur Kristóf
When the fault stop mode isn't AMDGPU_VM_FAULT_STOP_ALWAYS, these bits should be programmed to 0. Program CRASH_ON_NO_RETRY_FAULT and CRASH_ON_RETRY_FAULT always, to make sure to clear the bits when we don't want to crash. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d0cd99e73090700b7a942b98a3327ec966597d0a)
10 daysdrm/amdgpu: fix waiting for all submissions for userptrsChristian König
Wait for all submissions when userptrs need to be invalidated by the MMU notifier, not just the one the userptr was involved into. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Tested-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 91250893cbaa25c86872deca95a540d08de1f91e) Cc: stable@vger.kernel.org
10 daysdrm/amdgpu: drm/amdgpu: Set correct DMA mask for gfx12.1Harish Kasiviswanathan
Set correct DMA mask for gfx12 Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a2ef14ee2593b48242b8d90f229f71c1710529da)
10 daysdrm/amdgpu: Use asic specific pte_addr_maskHarish Kasiviswanathan
For PTE creation use asic specific physical page base address mask v2: Change variable name from pa_mask to pte_addr_mask Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2ea989885941a6e5607ef86dbe309e90b7191f21)
10 daysdrm/amd/pm: zero unused SMU argument registersYang Wang
SMU messages may use fewer arguments than the available argument registers, the previous code only wrote used registers and left the rest unchanged, so stale values from a prior message could persist. Write all argument registers for each message and zero the unused tail to keep command arguments deterministic and avoid unintended carry-over. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e03b635f61f77ebd5107ef82f48e3221cb695856)
10 daysdrm/amd/pm: mark metrics.energy_accumulator is invalid for smu 14.0.2Yang Wang
EnergyAccumulator is unsupported on SMU 14.0.2, mark it invalid. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 646b05043eeed04b51c14aad22a400a8250af4b7) Cc: stable@vger.kernel.org