diff options
author | Christian König <ckoenig.leichtzumerken@gmail.com> | 2024-08-26 14:25:40 +0200 |
---|---|---|
committer | Christian König <christian.koenig@amd.com> | 2024-09-06 18:06:06 +0200 |
commit | f07a0d1bf7de562b2806bce0ba8601c325627c4e (patch) | |
tree | 4bd07abe5729511f05b6d70757a4d40bd12853b7 /Documentation/gpu | |
parent | a401bd1264b400f96a4cf61ed3fc144008e97a4e (diff) | |
download | lwn-f07a0d1bf7de562b2806bce0ba8601c325627c4e.tar.gz lwn-f07a0d1bf7de562b2806bce0ba8601c325627c4e.zip |
drm/doc: Document submission error signaling
Different approaches have been tried to signal resets and other errors in
vendor specific ways which not only resulted in a wide variety of
implementations but also repeating the same bugs and problems over different
drivers.
Document that drivers should use dma_fence based error signaling which is
vendor agnostic and allows userspace to query submission errors in generic
non-vendor specific code.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240826122541.85663-3-christian.koenig@amd.com
Diffstat (limited to 'Documentation/gpu')
-rw-r--r-- | Documentation/gpu/drm-uapi.rst | 27 |
1 files changed, 20 insertions, 7 deletions
diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst index 370d820be248..b75cc9a70d1f 100644 --- a/Documentation/gpu/drm-uapi.rst +++ b/Documentation/gpu/drm-uapi.rst @@ -305,13 +305,26 @@ Kernel Mode Driver ------------------ The KMD is responsible for checking if the device needs a reset, and to perform -it as needed. Usually a hang is detected when a job gets stuck executing. KMD -should keep track of resets, because userspace can query any time about the -reset status for a specific context. This is needed to propagate to the rest of -the stack that a reset has happened. Currently, this is implemented by each -driver separately, with no common DRM interface. Ideally this should be properly -integrated at DRM scheduler to provide a common ground for all drivers. After a -reset, KMD should reject new command submissions for affected contexts. +it as needed. Usually a hang is detected when a job gets stuck executing. + +Propagation of errors to userspace has proven to be tricky since it goes in +the opposite direction of the usual flow of commands. Because of this vendor +independent error handling was added to the &dma_fence object, this way drivers +can add an error code to their fences before signaling them. See function +dma_fence_set_error() on how to do this and for examples of error codes to use. + +The DRM scheduler also allows setting error codes on all pending fences when +hardware submissions are restarted after an reset. Error codes are also +forwarded from the hardware fence to the scheduler fence to bubble up errors +to the higher levels of the stack and eventually userspace. + +Fence errors can be queried by userspace through the generic SYNC_IOC_FILE_INFO +IOCTL as well as through driver specific interfaces. + +Additional to setting fence errors drivers should also keep track of resets per +context, the DRM scheduler provides the drm_sched_entity_error() function as +helper for this use case. After a reset, KMD should reject new command +submissions for affected contexts. User Mode Driver ---------------- |