summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm
AgeCommit message (Collapse)Author
2026-06-09Merge tag 'drm-misc-next-fixes-2026-06-05' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next-fixes for v7.2-rc1: - Revert last minute IS_ERR_OR_NULL changes in nouveau/gsp. - Fix build warning in drm scheduler. - Flush caches and TLB before v3d runtime suspend. - Fix a trace and debug command in amdxdna. - Fix heap buffer address validation when PASID is disabled in amdxdna. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patch.msgid.link/a4a5bf50-3fc8-4faf-884b-08121687124a@linux.intel.com
2026-06-08drm/virtio: fix dma_fence refcount leak on error in virtio_gpu_dma_fence_wait()Wentao Liang
dma_fence_unwrap_for_each() internally calls dma_fence_unwrap_first() which does cursor->chain = dma_fence_get(head), taking an extra reference. On normal loop completion, dma_fence_unwrap_next() releases this via dma_fence_chain_walk() -> dma_fence_put(). When virtio_gpu_do_fence_wait() fails and the function returns early from inside the loop, the cursor->chain reference is never released. This is the only caller in the entire kernel that does an early return inside dma_fence_unwrap_for_each. Add dma_fence_put(itr.chain) before the early return. Cc: stable@vger.kernel.org Fixes: eba57fb5498f ("drm/virtio: Wait for each dma-fence of in-fence array individually") Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Link: https://patch.msgid.link/20260607090303.92423-1-vulab@iscas.ac.cn
2026-06-08Merge tag 'hyperv-fixes-signed-20260607' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - MSHV driver fixes from various people (Anirudh Rayabharam, Can Peng, Dexuan Cui, Michael Kelley, Jork Loeser, Wei Liu) - Hyper-V user space tools fixes (Thorsten Blum) - Allow VMBus to be unloaded after frame buffer is flushed (Michael Kelley) * tag 'hyperv-fixes-signed-20260607' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: mshv: support 1G hugepages by passing them as 2M-aligned chunks Drivers: hv: vmbus: Improve the logic of reserving fb_mmio on Gen2 VMs mshv: use kmalloc_array in mshv_root_scheduler_init mshv: Add conditional VMBus dependency hyperv: Clean up and fix the guest ID comment in hvgdk.h drm/hyperv: During panic do VMBus unload after frame buffer is flushed Drivers: hv: vmbus: Provide option to skip VMBus unload on panic mshv: unmap debugfs stats pages on kexec mshv: clean up SynIC state on kexec for L1VH mshv: limit SynIC management to MSHV-owned resources hv: utils: replace deprecated strcpy with strscpy in kvp_register hv: utils: handle and propagate errors in kvp_register mshv: add a missing padding field
2026-06-08Merge tag 'amd-drm-next-7.2-2026-06-04' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-7.2-2026-06-04: amdgpu: - UserQ fix - Userptr fix - MCCS freesync fix - Remove some triggerable BUG() calls - DCN 4.2.1 fixes - Lockdep annotations - Guilty handling fix - VCN 5.3 fix - FRL fixes - Bounds checking fixes - HMM fix - IRQ accounting fix amdkfd: - Fix an event information leak - Events bounds check fix - Trap cleanup fix - Bounds checking fixes - MES fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20260604231801.19979-1-alexander.deucher@amd.com
2026-06-08Merge tag 'amd-drm-fixes-7.1-2026-06-04' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-7.1-2026-06-04: amdgpu: - UserQ fix - Userptr fix - MCCS freesync fix amdkfd: - Fix an event information leak - Events bounds check fix - Trap cleanup fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20260604230955.19629-1-alexander.deucher@amd.com
2026-06-06drm/gem: Try to fix change_handle ioctl, attempt 4Simona Vetter
[airlied: just added some comments on how to reenable] On-list because the cat is out of the bag and we're clearly not good enough to figure this out in private. The story thus far: 5e28b7b94408 ("drm: Set old handle to NULL before prime swap in change_handle") tried to fix a race condition between the gem_close and gem_change_handle ioctls, but got a few things wrong: - There's a confusion with the local variable handle, which is actually the new handle, and so the two-stage trick was actually applied to the wrong idr slot. 7164d78559b0 ("drm/gem: fix race between change_handle and handle_delete") tried to fix that by adding yet another code block, but forgot to add the error handling. Which meant we now have two paths, both kinda wrong. - dc366607c41c ("drm: Replace old pointer to new idr") tried to apply another fix, but inconsistently, again because of the handle confusion - this would be the right fix (kinda, somewhat, it's a mess) if we'd do the two-stage approach for the new handle. Except that wasn't the intent of the original fix. We also didn't have an igt merged for the original ioctl, which is a big no-go. This was attempted to address off-list in the original bugfix, and amd QA people claimed the bug was fixed now. Very clearly that's not the case. Here's my attempt to sort this out: - Rename the local variable to new_handle, the old aliasing with args->handle is just too dangerously confusing. - Merge the gem obj lookup with the two-stage idr_replace so that we avoid getting ourselves confused there. - This means we don't have a surplus temporary reference anymore, only an inherited from the idr. A concurrent gem_close on the new_handle could steal that. Fix that with the same two-stage approach create_tail uses. This is a bit overkill as documented in the comment, but I also don't trust my ability to understand this all correctly, so go with the established pattern we have from other ioctls instead for maximum paranoia. - Adjust error paths. I've tried to make the error and success paths common, because they are identical except for which handle is removed and on which we call idr_replace to (re)install the object again. But that made things messier to read, so I've left it at the more verbose version, which unfortunately hides the symmetry in the entire code flow a bit. - While at it, also replace the 7 space indent with 1 tab. And finally, because I flat out don't trust my abilities here at all anymore: - Disable the ioctl until we have the igt situation and everything else sorted out on-list and with full consensus. v2: Sashiko noticed that I didn't handle the error path for idr_replace correctly, it must be checked with IS_ERR_OR_NULL like in gem_handle_delete. So yeah, definitely should just the existing paths 1:1 because this is endless amounts of tricky. Also add the Fixes: line for the original ioctl, I forgot that too. Reported-by: DARKNAVY (@DarkNavyOrg) <vr@darknavy.com> Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch> Fixes: dc366607c41c ("drm: Replace old pointer to new idr") Cc: syzbot+d7c9eed171647e421013@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Cc: Edward Adam Davis <eadavis@qq.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Fixes: 5e28b7b94408 ("drm: Set old handle to NULL before prime swap in change_handle") Cc: David Francis <David.Francis@amd.com> Cc: Puttimet Thammasaeng <pwn8official@gmail.com> Cc: Christian Koenig <Christian.Koenig@amd.com> Fixes: 7164d78559b0 ("drm/gem: fix race between change_handle and handle_delete") Cc: Zhenghang Xiao <kipreyyy@gmail.com> Fixes: 5e28b7b94408 ("drm: Set old handle to NULL before prime swap in change_handle") Reviewed-by: David Francis <David.Francis@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patch.msgid.link/20260604194437.1725314-1-simona.vetter@ffwll.ch
2026-06-06Merge tag 'drm-intel-fixes-2026-06-05' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes - Fix color blob reference handling in intel_plane_state (Chaitanya Kumar Borah) - Revert "drm/i915/backlight: Remove try_vesa_interface" [backlight] (Suraj Kandpal) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Tvrtko Ursulin <tursulin@igalia.com> Link: https://patch.msgid.link/aiKgmwz7VGOaFXIv@linux
2026-06-06Merge tag 'drm-misc-fixes-2026-06-05' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: dumb-buffer: - remove strict limits on buffer geometry ethosu: - reject unsupported NPU_OP_RESIZE - fix index of IFM region - fix weight index - fix overflows in DMA-size calculations - reject DMA commands with uninitialized length - fix OOB write in ethosu_gem_cmdstream_copy_and_validate imx: - fix kernel-doc warnings ivpu: - add overflow checks in firmware handling and get_info_ioctl v3d: - wait for pending L2T flush before cleaning caches - fix leak of vaddr - skip CSD when it has zeroed workgroups - fix ref counting in performance monitoring Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20260605072602.GA268798@linux.fritz.box
2026-06-05Revert "drm/i915/backlight: Remove try_vesa_interface"Suraj Kandpal
This reverts commit 40d2f5820951dee818d05c14677277048bd85f9f. Removing the try_vesa_interface gate caused a backlight regression on panels whose VBT correctly reports INTEL_BACKLIGHT_DISPLAY_DDI and whose PWM path is the actual backlight control, but whose DPCD optimistically advertises DP_EDP_BACKLIGHT_AUX_ENABLE_CAP / _BRIGHTNESS_AUX_SET_CAP. After the commit such panels silently bind to the VESA AUX backlight funcs; AUX writes complete but the panel ignores them, leaving brightness stuck (no-op backlight). Observed on at least KBL and TGL eDP setups. Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com> Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com> Link: https://patch.msgid.link/20260517024709.1016121-1-suraj.kandpal@intel.com (cherry picked from commit f30fddb4402313aa5301a74d721638d343395269) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
2026-06-05Merge tag 'drm-rust-next-2026-06-04' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/rust/kernel into drm-next DRM Rust changes for v7.2-rc1 - Driver Core (shared via signed tag dd-lifetimes-7.2-rc1): - Introduce Higher-Ranked Lifetime Types (HRT) for Rust device drivers, allowing driver structs to hold device resources like pci::Bar and IoMem directly with a lifetime tied to the binding scope, removing the need for Devres indirection and ARef<Device>. - Replace drvdata() with scoped registration data on the auxiliary bus, using the new ForLt trait to thread lifetimes through registrations. Remove drvdata() and driver_type. - DRM: - Add GPUVM immediate mode abstraction for Rust GPU drivers: - In immediate mode, GPU virtual address space state is updated during job execution (in the DMA fence signalling critical path), keeping the GPUVM and the GPU's address space always in sync. - Provide GpuVm, GpuVa, and GpuVmBo types for managing address spaces, virtual mappings, and GEM object backing respectively. - Provide split-merge map/unmap operations that handle partial overlaps with existing mappings. - drm_exec integration for dma_resv locking and GEM object validation based on the external/evicted object lists are not yet covered and planned as follow-up work. - Introduce DeviceContext type state for drm::Device, allowing drivers to restrict operations to contexts where the device is guaranteed to be registered (or not yet registered) with userspace. - Add FEAT_RENDER flag to the Driver trait for render node support. - Nova: - Hopper/Blackwell enablement: - Add GPU identification and architecture-based HAL selection for Hopper (GH100) and Blackwell (GB100, GB202). - Implement the FSP (Foundation Security Processor) boot path used by Hopper and Blackwell, including FSP falcon engine support, EMEM operations, MCTP/NVDM message infrastructure, and FSP Chain of Trust boot with GSP lockdown release. - Add support for 32-bit firmware images and auto-detection of firmware image format. - Add architecture-specific framebuffer, sysmem flush, PCI config mirror, DMA mask, and WPR/non-WPR heap sizing. - GSP boot and unload: - Refactor the GSP boot process into a chipset-specific HAL, keeping the SEC2 and FSP boot paths separated cleanly. - Implement proper driver unload: send UNLOADING_GUEST_DRIVER command, run Booter Unloader and FWSEC-SB upon unbinding, and run the unload bundle on Gsp::boot() failure. This removes the need for a manual GPU reset between driver unbind and re-probe. - GA100 support: - Add support for the GA100 GPU, including IFR header detection and skipping, correct fwsignature selection, conditional FRTS boot, and documentation of the IFR header layout. - VBIOS hardening and refactoring: - Harden VBIOS parsing with checked arithmetic, bounds-checked accesses, and FromBytes-based structure reads throughout the FWSEC and Falcon data paths. Simplify the overall VBIOS module structure. - HRT adoption: - Use lifetime-parameterized pci::Bar directly, replacing the Arc<Devres<Bar0>> indirection. Replace ARef<Device> with &'bound Device in SysmemFlush and the GSP sequencer. Separate the driver type from driver data. - Misc: - Rename module names to kebab-case (nova-drm, nova-core). - Require little-endian in Kconfig, making the existing assumption explicit. - Tyr: - Define comprehensive typed register blocks for GPU_CONTROL, JOB_CONTROL, MMU_CONTROL (including per-address-space registers), and DOORBELL_BLOCK using the kernel register!() macro. This replaces manual bit manipulation with typed register and field accessors. - Add shmem-backed GEM objects and set DMA mask based on GPU physical address width. - Adopt HRT: separate driver type from driver data, and use IoMem directly instead of Devres for register access during probe. - Move clock cleanup into a Drop implementation. Signed-off-by: Dave Airlie <airlied@redhat.com> From: "Danilo Krummrich" <dakr@kernel.org> Link: https://patch.msgid.link/DJ0IF39U9ETK.PCCUO7ZEQ4S0@kernel.org
2026-06-05Merge tag 'drm-xe-fixes-2026-06-04' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes - Revert removing support for unpublished NVL-S GuC (Daniele) - Suspend fixes related to multi-queue (Niranjana) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/aiHPGiPrAyHgwBZl@intel.com
2026-06-04drm/amd/display: Consult MCCS FreeSync cap only if requested & supportedMichel Dänzer
When the do_mccs parameter is false, we don't call dm_helpers_read_mccs_caps, so sink->mccs_caps.freesync_supported is unlikely to be true. Fixes: 6f71d5dd3206 ("drm/amd/display: Read sink freesync support via mccs") Bug: https://gitlab.freedesktop.org/drm/amd/-/work_items/5286 Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 115bf5ca318e18a3dc1888ec6271c7052774952a)
2026-06-04drm/amdkfd: Unwind debug trap enable on copy_to_user failureYongqiang Sun
If kfd_dbg_trap_enable() fails while copying runtime_info to userspace, it had already activated the trap, set debug_trap_enabled, taken an extra process reference, and opened the debug event file. Return -EFAULT without unwinding that state, leaving inconsistent trap state and a refcount imbalance that could break later DISABLE/ENABLE. On copy_to_user failure, deactivate the trap and undo the rest of the enable setup before returning. Signed-off-by: Yongqiang Sun <Yongqiang.Sun@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 01112e241e37f9ac98b6f418d93ce2e0b87b7ee0)
2026-06-04drm/amdkfd: Add bounds check for AMDKFD_IOC_WAIT_EVENTSSunday Clement
The kfd_wait_on_events ioctl passes a user-supplied num_events parameter directly to alloc_event_waiters() which calls kcalloc() without validation. This allows unprivileged users with /dev/kfd access to trigger large kernel memory allocations, potentially causing memory exhaustion and denial of service via the OOM killer. Add a check to reject num_events values exceeding KFD_SIGNAL_EVENT_LIMIT (4096), which is the maximum number of events a single process can create. Signed-off-by: Sunday Clement <Sunday.Clement@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 39eb6da7acee8d0cc12a8959235b590f295d7b4c)
2026-06-04drm/amdgpu: restart the CS if some parts of the VM are still invalidatedChristian König
Make sure that we only submit work with full up to date VM page tables. Backport to 7.1 and older. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Tested-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 59720bfd8c6dbebeb8d5a7ab64241b007efd9213) Cc: stable@vger.kernel.org
2026-06-04drm/amdgpu/userq: Fix reading timeline points in wait ioctlDavid Rosca
Use correct u64 type. Signed-off-by: David Rosca <david.rosca@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0ac98160dfb6ab3c6d7b38e0ff9687780beed9cb)
2026-06-04drm/amdkfd: fix SMI event cross-process information leakYongqiang Sun
kfd_smi_ev_enabled() skips the suser privilege check when pid=0. PROCESS_START, PROCESS_END, and VMFAULT events are emitted with pid=0 while carrying another process's PID and command name, so any /dev/kfd user in the render group can monitor all GPU workloads. Pass the target process PID into kfd_smi_event_add() for these events so the existing per-client filter restricts delivery to the owning process or CAP_SYS_ADMIN subscribers. Signed-off-by: Yongqiang Sun <Yongqiang.Sun@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 92a8dba246d371fe268280e5fd74b0955688e6df)
2026-06-04drm/amdkfd: always resume_all after suspend_allAlex Deucher
Need to restore any good queues even if the suspend_all failed for some. Always run remove_queue as that will schedule a GPU reset is removing the queue fails. v2: move resume_all after remove Fixes: eb067d65c33e ("drm/amdkfd: Update BadOpcode Interrupt handling with MES") Reviewed-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amdgpu/gfx: move fault and EOP IRQ get/put to hw_init/hw_finiYunxiang Li
priv_reg / priv_inst / bad_op and (on v11+) userq EOP IRQs are acquired in late_init but released in hw_fini. This split forced gfx_v9_0_hw_fini() to defensively guard each put with amdgpu_irq_enabled() because hw_fini runs on paths that may not reach late_init. amdgpu_ip_block_hw_fini() only runs after hw_init returns success, and suspend / resume cycle the refs through the same path, so hw_init / hw_fini pair without any extra tracking. Move the gets there and drop the guards. While here, fix the pre-existing partial-failure leak in set_userq_eop_interrupts() (gfx11 / 12_0 / 12_1). amdgpu_irq_get() increments the refcount before calling .set, so a failure partway through the loop leaves earlier successful gets stranded. Track the loop position and roll back on the enable path. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amd/display: Consult MCCS FreeSync cap only if requested & supportedMichel Dänzer
When the do_mccs parameter is false, we don't call dm_helpers_read_mccs_caps, so sink->mccs_caps.freesync_supported is unlikely to be true. Fixes: 6f71d5dd3206 ("drm/amd/display: Read sink freesync support via mccs") Bug: https://gitlab.freedesktop.org/drm/amd/-/work_items/5286 Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amd/pm: Use strscpy in profile mode parsingLijo Lazar
Use strscpy to copy the buffer which makes it explicit that a valid NULL terminated string gets copied. Also, make it explicit that the source buffer can be copied safely to the temporary buffer by checking against its size. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amdkfd: Fix infinite loop parsing CRAT with zero subtype lengthYongqiang Sun
Malformed ACPI CRAT tables can advertise a zero or undersized subtype length. The parser then fails to advance the cursor and loops forever while the remaining image still looks large enough for a generic header. Validate sub_type_hdr->length on each iteration before parsing or advancing. Return -EINVAL and warn when length is zero or smaller than the generic subtype header. Signed-off-by: Yongqiang Sun <Yongqiang.Sun@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amdkfd: fix sysfs topology prop length on buffer truncationYongqiang Sun
sysfs_show_gen_prop() accumulated snprintf()'s return value into the offset. snprintf() reports bytes that would have been written, not bytes actually written, so a truncated sysfs show could over-report its length. Use sysfs_emit_at(), which returns only the bytes written. Signed-off-by: Yongqiang Sun <Yongqiang.Sun@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amdgpu: drop retry loop in amdgpu_hmm_range_get_pagesHonglei Huang
Since commit c08972f55594 ("drm/amdgpu: fix amdgpu_hmm_range_get_pages") moved mmu_interval_read_begin() out of the per-chunk loop, the captured notifier_seq is no longer refreshed across retries. As a result, the existing -EBUSY retry path can never make progress: hmm_range_fault() returns -EBUSY only when mmu_interval_check_retry(notifier, notifier_seq) reports that the sequence is stale. Once the sequence has advanced, the stored seq will never match again, so every subsequent call within the same invocation returns -EBUSY immediately. The "goto retry" therefore degenerates into a busy spin that simply burns CPU for the full HMM_RANGE_DEFAULT_TIMEOUT (~1s) window before finally bailing out with -EAGAIN. This is pure latency with no chance of recovery, and it actively hurts the KFD userptr stack: the caller ends up blocked for a second while holding mmap_lock, only to return -EAGAIN to the restore worker (or to userspace) which would have re-driven the operation immediately anyway. Drop the retry/timeout entirely and let -EBUSY propagate straight to out_free_pfns, where it is already translated to -EAGAIN. Recovery is handled at a higher level: the KFD restore_userptr_worker reschedules itself, and the userptr ioctl path returns -EAGAIN to userspace. No functional regression: the previous behaviour on -EBUSY was already to fail with -EAGAIN after a 1s stall; we just skip the stall. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Honglei Huang <honghuan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amd/pm: bound OD parameter parsing to stack array sizeCandice Li
Reject inputs once parameter_size reaches the array limit, and pass ARRAY_SIZE(parameter) into parse_input_od_command_lines() for defense in depth. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amd/pm: Stop pp_od_clk_voltage emit at PAGE_SIZEAsad Kamal
Stop appending OD sections in amdgpu_get_pp_od_clk_voltage() once the sysfs page is full, instead of checking every sysfs_emit_at() in SMU helpers. This is purely defensive hardening. v2: Drop the prior series that checked sysfs_emit_at() return values in every SMU *_emit_clk_levels() helper and smu_cmn_print_*().(Kevin) v3: Update description, remove all clamping Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amdkfd: Unwind debug trap enable on copy_to_user failureYongqiang Sun
If kfd_dbg_trap_enable() fails while copying runtime_info to userspace, it had already activated the trap, set debug_trap_enabled, taken an extra process reference, and opened the debug event file. Return -EFAULT without unwinding that state, leaving inconsistent trap state and a refcount imbalance that could break later DISABLE/ENABLE. On copy_to_user failure, deactivate the trap and undo the rest of the enable setup before returning. Signed-off-by: Yongqiang Sun <Yongqiang.Sun@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amdgpu: validate the mes firmware version for gfx12.1Sunil Khatri
MES firmware should report the same version whether read from the register or from the firmware ucode binary. This is not always the case, so add a log when they mismatch. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amdgpu: validate the mes firmware version for gfx12Sunil Khatri
MES firmware should report the same version whether read from the register or from the firmware ucode binary. This is not always the case, so add a log when they mismatch. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amdgpu: compare MES firmware version ucode for gfx11Sunil Khatri
MES firmware should report the same version whether read from the register or from the firmware ucode binary. This is not always the case, so add a log when they mismatch. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amdkfd: Add bounds check for AMDKFD_IOC_WAIT_EVENTSSunday Clement
The kfd_wait_on_events ioctl passes a user-supplied num_events parameter directly to alloc_event_waiters() which calls kcalloc() without validation. This allows unprivileged users with /dev/kfd access to trigger large kernel memory allocations, potentially causing memory exhaustion and denial of service via the OOM killer. Add a check to reject num_events values exceeding KFD_SIGNAL_EVENT_LIMIT (4096), which is the maximum number of events a single process can create. Signed-off-by: Sunday Clement <Sunday.Clement@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amdgpu: restart the CS if some parts of the VM are still invalidatedChristian König
Make sure that we only submit work with full up to date VM page tables. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Tested-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amd/display: use unsigned types for local pipe and REG_GET countersAurabindo Pillai
Two small type fixes that match how the values are actually consumed: - decide_zstate_support() iterates from 0 to pipe_count, which is unsigned. Make the loop index unsigned int. - hpo_enc401_read_state() reads HDMI_PIXEL_ENCODING and HDMI_DEEP_COLOR_DEPTH via REG_GET_2(), which internally casts the output pointer to (uint32_t *). Passing the address of an int is a strict-aliasing wart even when the sizes match. Declare the locals as uint32_t. No behavioural change since the values are only compared against small non-negative constants. Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amd/display: widen dc_hdmi_frl_flags.force_frl_rate to unsigned intAurabindo Pillai
dc_hdmi_frl_flags.force_frl_rate mirrors dc_debug_options.force_frl_rate, which was just widened to unsigned int. Match the type here too so the assignment in link_hdmi_frl.c does not narrow from unsigned to signed. All call sites in link_hdmi_frl.c only compare the value against 0, 0xF, or an hdmi_frl_link_rate enum whose values are non-negative, so the change is behaviour-preserving and does not introduce sign-compare warnings. Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amdgpu/userq: Fix reading timeline points in wait ioctlDavid Rosca
Use correct u64 type. Signed-off-by: David Rosca <david.rosca@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amdgpu/vcn5.0.0: enable secure submission on unified ring for VCN 5.3.0Jeevana Muthyala
Enable secure submission support on the unified ring for VCN IP version 5.3.0 by setting `secure_submission_supported = true` in vcn_v5_0_0_unified_ring_vm_funcs. Secure IB submission is supported on VCN 5.3.0 hardware/firmware, allowing protected decode workloads to bypass the common IB gate. Without this, secure playback submissions can be blocked and fail. Other VCN 5.x variants using the same vcn_v5_0_0_ip_block (e.g. IP_VERSION(5, 0, 0)) do not support secure submission on the unified ring and therefore continue using non-secure paths. This change only advertises existing hardware/firmware capability; non-secure decode paths remain unaffected. Signed-off-by: Jeevana Muthyala <Jeevana.Muthyala2@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amdgpu: deprecate guilty handlingChristian König
The guilty handling tried to establish a second way of signaling problems with the GPU back to userspace. This caused quite a bunch of issue we had to work around, especially lifetime issues with the drm_sched_entity. Just drop the handling altogether and use the dma_fence based approach instead. v2: fix reversed condition in entity check (Alex) Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amdgpu: Add lockdep annotations for lock ordering validationVitaly Prosyak
Add lockdep annotations to teach lockdep the correct lock hierarchy and catch ordering violations during development. This follows the pattern established by dma-resv in drivers/dma-buf/dma-resv.c. Lock ordering hierarchy (outermost to innermost): 1. userq_sch_mutex - Global userq scheduler (enforce_isolation) 2. userq_mutex - Per-context userq (held across queue create/destroy) 3. notifier_lock - MMU notifier synchronization 4. vram_lock - VRAM memory allocator 5. reset_domain->sem - GPU reset synchronization 6. reset_lock - Reset control mutex 7. srbm_mutex - SRBM register access 8. grbm_idx_mutex - GRBM index register access 9. mmio_idx_lock - MMIO index access (spinlock) The implementation provides: - Lock ordering training at module init (amdgpu_lockdep_init) - Lock class association for real driver locks (amdgpu_lockdep_set_class) Dummy locks are associated with the same class keys as real driver locks via lockdep_set_class(), ensuring lockdep connects the training ordering with actual runtime locks. Testing: Build the kernel with CONFIG_PROVE_LOCKING=y (enables CONFIG_LOCKDEP): scripts/config --enable PROVE_LOCKING scripts/config --enable DEBUG_LOCKDEP make -j$(nproc) On boot, dmesg should show: AMDGPU: Lockdep annotations initialized (9 lock levels) The companion IGT test (tests/amdgpu/amd_lockdep) exercises lock-heavy GPU code paths concurrently to trigger lockdep warnings on violations: sudo ./build/tests/amdgpu/amd_lockdep sudo dmesg | grep -A 50 "circular locking dependency" IGT subtests: concurrent-reset-and-submit - reset_sem vs submission locks concurrent-mmap-and-evict - mmap_lock vs vram_lock concurrent-userptr-and-reset - notifier_lock vs reset_sem stress-all-paths - all of the above simultaneously A clean dmesg (no "circular locking dependency" or "possible recursive locking detected" messages) confirms no lock ordering violations. For CI integration, the test should be run on kernels compiled with CONFIG_LOCKDEP=y; dmesg is scanned post-run for lockdep splats. v2: (Christian) - Move notifier_lock and vram_lock before reset locks in hierarchy. HMM invalidation holds notifier_lock and can wait for GPU reset completion, so notifier_lock must be outer to reset_domain->sem. - Associate dummy locks with lock class keys via lockdep_set_class() so lockdep connects training with real driver locks. - Update commit message to list all 9 lock levels. Requires CONFIG_PROVE_LOCKING=y to activate. Cc: Christian Konig <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Christian Konig <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amdkfd: fix SMI event cross-process information leakYongqiang Sun
kfd_smi_ev_enabled() skips the suser privilege check when pid=0. PROCESS_START, PROCESS_END, and VMFAULT events are emitted with pid=0 while carrying another process's PID and command name, so any /dev/kfd user in the render group can monitor all GPU workloads. Pass the target process PID into kfd_smi_event_add() for these events so the existing per-client filter restricts delivery to the owning process or CAP_SYS_ADMIN subscribers. Signed-off-by: Yongqiang Sun <Yongqiang.Sun@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amd/display: Add DCN42B to dml21_translation_helperMatthew Stewart
Needed for DML to function with DCN42B. Signed-off-by: Matthew Stewart <Matthew.Stewart2@amd.com> Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amd/display: Fix DCN42B version detectionMatthew Stewart
In resource_parse_asic_id, the check for GC_11_0_4 was unbounded, which caused it to override the detection of DCN42B. Signed-off-by: Matthew Stewart <Matthew.Stewart2@amd.com> Reviewed-by: Roman Li <Roman.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amdgpu: Fix user-triggerable BUG()/BUG_ON() callsCe Sun
Replace BUG()/BUG_ON() with error logs and safe returns in several places where they can be triggered by invalid userspace input, preventing DoS via kernel panic. Signed-off-by: Ce Sun <cesun102@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/v3d: Fix global performance monitor reference countingMaíra Canal
In the SET_GLOBAL ioctl, v3d_perfmon_find() bumps the reference count on the perfmon it returns, but v3d_perfmon_set_global_ioctl() and v3d_perfmon_delete() fail to release that reference on several paths: 1. v3d_perfmon_set_global_ioctl() leaks the reference on its error paths. 2. CLEAR_GLOBAL leaks both the find reference and the reference previously stashed in v3d->global_perfmon by the SET_GLOBAL ioctl that configured it. 3. Destroying a perfmon that is the current global perfmon leaks the reference stashed by the SET_GLOBAL ioctl. Release each of these references explicitly. Cc: stable@vger.kernel.org Fixes: c6eabbab359c ("drm/v3d: Add DRM_IOCTL_V3D_PERFMON_SET_GLOBAL") Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patch.msgid.link/20260531-v3d-perfmon-lifetime-v2-1-60ed4485a203@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com>
2026-06-04drm/xe/multi_queue: skip submit when primary queue is suspendedNiranjana Vishwanathapura
Return early in submit path when the multi-queue primary exec queue is suspended to avoid submitting while suspended. v2: Remove idle_skip_suspend fix as that feature is being reverted here https://patchwork.freedesktop.org/series/167262/ Fixes: bc5775c59258 ("drm/xe/multi_queue: Add GuC interface for multi queue support") Cc: stable@vger.kernel.org # v7.0+ Assisted-by: GitHub-Copilot:claude-sonnet-4.6 Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Link: https://patch.msgid.link/20260603233946.863663-2-niranjana.vishwanathapura@intel.com (cherry picked from commit b7fb55cc3364ca128cfff9d50649ffd4327cd01e) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-06-04drm/xe: Clear pending_disable before signaling suspend fenceTangudu Tilak Tirumalesh
In the schedule-disable done path for suspend, we signal the suspend fence before clearing pending_disable. That wakeup can let suspend_wait complete and resume be queued immediately. The resume path may then reach enable_scheduling() while pending_disable is still set and hit the !exec_queue_pending_disable(q) assertion. Fix this by clearing pending_disable before signaling the suspend fence, so any resumed transition observes a consistent state. Fixes: 87651f31ae4e ("drm/xe/guc_submit: fix race around suspend_pending") Cc: stable@vger.kernel.org # v7.0+ Signed-off-by: Tangudu Tilak Tirumalesh <tilak.tirumalesh.tangudu@intel.com> Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patch.msgid.link/20260603065217.3131066-3-tilak.tirumalesh.tangudu@intel.com (cherry picked from commit 4b1ae138b0e103d753773956a84eebc2edbf62c4) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-06-04Revert "drm/xe: Skip exec queue schedule toggle if queue is idle during suspend"Tangudu Tilak Tirumalesh
This reverts commit 8533051ce92015e9cc6f75e0d52119b9d91610b6. The idle-skip optimization bypasses GuC suspend, so the GPU may not perform the context switch that flushes TLB entries for invalidated userptr VMAs. In LR/preempt-fence VM mode, this can lead to missed TLB invalidation and page faults during userptr invalidation tests. Restore unconditional schedule toggling on suspend so the context-switch TLB flush is always performed. This optimization will be reintroduced with a fix that does not skip suspend in LR/preempt-fence VM mode. Fixes: 8533051ce920 ("drm/xe: Skip exec queue schedule toggle if queue is idle during suspend") Cc: stable@vger.kernel.org # v7.0+ Suggested-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Signed-off-by: Tangudu Tilak Tirumalesh <tilak.tirumalesh.tangudu@intel.com> Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patch.msgid.link/20260603065217.3131066-2-tilak.tirumalesh.tangudu@intel.com (cherry picked from commit 6a1e7934d9a6cf46aecae00a99c2603d1295e170) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-06-04Merge tag 'amd-drm-next-7.2-2026-06-03' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-7.2-2026-06-03: amdgpu: - BT.2020 fix for DCE - DC bounds checking fixes - SDMA 7.1 fix - UserQ fixes - SI fix - SMU 13 fixes - SMU 14 fixes - GC 12.1 fix - Userptr fix - GC 10.1 fix - GART fix for non-4K pages - DCN 4.x fixes - DCN 4.2 updates - More DC KUnit tests - PSR cleanup - Support for connectors without DDC pins - Initial DCN 4.2.1 support - Initial HDMI 2.1 FRL support - Misc bounds check fixes - RAS fixes - GC 11.5.6 support - SDMA 6.4.0 support - NBIO 7.11.5 support - IH 6.4.0 support - HDP 6.4.0 support - MMHUB 3.4.2 support - SMU 15.0.5 support - ATHUB 3.4.2 support - VPE 2.2 support - Devcoredump fixes - _PR3 fix amdkfd: - UAF race fix - Fix a potential NULL pointer dereference - GC 11 buffer overflow fix for SDMA - Profiler locking order fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20260604013527.2373534-1-alexander.deucher@amd.com
2026-06-04Merge tag 'drm-msm-next-2026-05-30' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/msm into drm-next Changes for v7.2 Core: - Fixed documentation for msm_gem_shrinker functions - IFPC related enablement/fixes for gen8 - PERFCNTR_CONFIG ioctl support GPU - Reworked handling of UBWC configuration - a810 suppport MDSS: - Added Milos platform support - Reworked handling of UBWC configuration DisplayPort: - Reworked HPD handling, preparing for the MST support DPU: - Added Milos platform support - Reworked handling of UBWC configuration DSI: - Added Milos platform support Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rob Clark <rob.clark@oss.qualcomm.com> Link: https://patch.msgid.link/CACSVV00DXZcvFH2-C3fouve5DGs0DGa-vvsJPuaRmUZZVNKOfg@mail.gmail.com
2026-06-03drm/amd/pm: smu_v14_0_0: use SoftMin for gfxclk in set_soft_freq_limited_rangePriya Hosur
In smu_v14_0_0_set_soft_freq_limited_range(), the gfxclk floor is programmed via SetHardMinGfxClk together with SetSoftMaxGfxClk. Under power_dpm_force_performance_level=high this pins HardMin to peak gfxclk. In PMFW arbitration HardMin has higher priority than SoftMax, so the firmware thermal/PPT throttler cannot clamp gfxclk via SoftMax once HardMin is set to peak. Replace SetHardMinGfxClk with SetSoftMinGfxclk so the driver still requests peak performance but the firmware throttler retains the ability to clamp gfxclk under thermal/PPT pressure. SoftMax handling is unchanged and no other clock domains are affected. Signed-off-by: Priya Hosur <Priya.Hosur@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 3ea273267fd29cbf6d83ee72329f59eb5042605b) Cc: stable@vger.kernel.org
2026-06-03drm/amdgpu: Fix incorrect VRAM GART mappings on non-4K page size systemsDonet Tom
When mapping VRAM pages into the GART page table, amdgpu_gart_map_vram_range() assumes that the system page size is the same as the GPU page size. On systems with non-4K page sizes, multiple GPU pages can exist within a single CPU page. As a result, the mappings are created incorrectly because fewer page table entries are programmed than required. Fix this by programming the mappings correctly for non-4K page size systems. Fixes: 237d623ae659 ("drm/amdgpu/gart: Add helper to bind VRAM pages (v2)") Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Donet Tom <donettom@linux.ibm.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a8f0bc22388f74e0cf4ed8b7d1846c580eaf44cc) Cc: stable@vger.kernel.org