drm/xe/guc_submit: prevent repeated unregister

It seems that various things can trigger the lr cleanup worker, including CAT error, engine reset and destroying the actual engine, so seems plausible to end up triggering the worker more than once in some cases. If that does happen we can race with an ongoing engine deregister before it has completed, thus triggering it again and also changing the state back into pending_disable. Checking if the engine has been marked as destroyed looks like it should prevent this. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
author: Matthew Auld <matthew.auld@intel.com> 2023-08-03 18:38:50 +0100
committer: Rodrigo Vivi <rodrigo.vivi@intel.com> 2023-12-21 11:39:29 -0500
commit: 31b57683de2c98ac6a3de7223ef0afd47731265c (patch)
tree: c8cb747a65794fe835f91ea8ca2d7f778155930a /drivers/gpu/drm/xe/xe_guc_submit.c
parent: d8b4494bf184d43295b89156d7656d69f931e418 (diff)
download: lwn-31b57683de2c98ac6a3de7223ef0afd47731265c.tar.gz
lwn-31b57683de2c98ac6a3de7223ef0afd47731265c.zip
1 files changed, 12 insertions, 2 deletions
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index e12cd4285e5d..19df4b67bfbb 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -802,8 +802,18 @@ static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w)
 	/* Kill the run_job / process_msg entry points */
 	xe_sched_submission_stop(sched);
 
-	/* Engine state now stable, disable scheduling / deregister if needed */
-	if (exec_queue_registered(q)) {
+	/*
+	 * Engine state now mostly stable, disable scheduling / deregister if
+	 * needed. This cleanup routine might be called multiple times, where
+	 * the actual async engine deregister drops the final engine ref.
+	 * Calling disable_scheduling_deregister will mark the engine as
+	 * destroyed and fire off the CT requests to disable scheduling /
+	 * deregister, which we only want to do once. We also don't want to mark
+	 * the engine as pending_disable again as this may race with the
+	 * xe_guc_deregister_done_handler() which treats it as an unexpected
+	 * state.
+	 */
+	if (exec_queue_registered(q) && !exec_queue_destroyed(q)) {
 		struct xe_guc *guc = exec_queue_to_guc(q);
 		int ret;
author	Matthew Auld <matthew.auld@intel.com>	2023-08-03 18:38:50 +0100
committer	Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:39:29 -0500
commit	31b57683de2c98ac6a3de7223ef0afd47731265c (patch)
tree	c8cb747a65794fe835f91ea8ca2d7f778155930a /drivers/gpu/drm/xe/xe_guc_submit.c
parent	d8b4494bf184d43295b89156d7656d69f931e418 (diff)
download	lwn-31b57683de2c98ac6a3de7223ef0afd47731265c.tar.gz lwn-31b57683de2c98ac6a3de7223ef0afd47731265c.zip