From 88b099006d83b0bf452379cad4ce494329084726 Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Fri, 17 Sep 2021 17:43:49 +0300 Subject: scsi: ufs: core: Revert "scsi: ufs: Synchronize SCSI and UFS error handling" This reverts commit a113eaaf86373362b053279049907ff82b5df6c8. There are a couple of issues with the commit: 1. It causes deadlocks. 2. It causes the shost->eh_cmd_q list of failed requests not to be processed, ever. So revert it. 1. Deadlocks The SCSI error handler runs with requests blocked beginning when scsi_schedule_eh() sets SHOST_RECOVERY state, continuing through scsi_error_handler() callback ->eh_strategy_handler() until scsi_restart_operations() is called. By setting eh_strategy_handler to ufshcd_err_handler, the patch changed the UFS error handler to run with requests blocked, including PM requests, for the entire run of the error handler. That conflicts with UFS error handler existing synchronization with UFS device PM operations. The UFS error handler synchronizes with runtime PM by doing pm_runtime_get_sync() prior to blocking requests itself. It synchronizes with system PM by use of hba->host_sem, again before blocking requests itself. However, if requests are already blocked, then PM operations will block. So: the UFS error handler blocks waiting on PM + PM blocks waiting on SCSI PM requests to process or fail + PM requests are blocked waiting on error handling to finish = deadlock This happens both for runtime PM and system PM. Prior to the patch, these deadlocks could not happen even if SCSI error handling was running, because the presence of requests in shost->eh_cmd_q would mean the queues could not be suspended, which would mean that, should the UFS error handler run at the same time, it would not need to wait for PM or vice versa. Please note these scenarios are not just theoretical, they were found during testing on a Samsung Galaxy Book S. 2. ->eh_strategy_handler() must process shost->eh_cmd_q list of failed requests, as all other eh_strategy_handler's do except UFS error handler. Refer for example: scsi_unjam_host(), ata_scsi_error() and sas_scsi_recover_host(). Link: https://lore.kernel.org/r/20210917144349.14058-1-adrian.hunter@intel.com Fixes: a113eaaf8637 ("scsi: ufs: Synchronize SCSI and UFS error handling") Reviewed-by: Bart Van Assche Signed-off-by: Adrian Hunter Signed-off-by: Martin K. Petersen --- drivers/scsi/ufs/ufshcd.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'drivers/scsi/ufs/ufshcd.h') diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h index 4723f27a55d1..f0da5d3db1fa 100644 --- a/drivers/scsi/ufs/ufshcd.h +++ b/drivers/scsi/ufs/ufshcd.h @@ -741,6 +741,8 @@ struct ufs_hba_monitor { * @is_powered: flag to check if HBA is powered * @shutting_down: flag to check if shutdown has been invoked * @host_sem: semaphore used to serialize concurrent contexts + * @eh_wq: Workqueue that eh_work works on + * @eh_work: Worker to handle UFS errors that require s/w attention * @eeh_work: Worker to handle exception events * @errors: HBA errors * @uic_error: UFS interconnect layer error status @@ -843,6 +845,8 @@ struct ufs_hba { struct semaphore host_sem; /* Work Queues */ + struct workqueue_struct *eh_wq; + struct work_struct eh_work; struct work_struct eeh_work; /* HBA Errors */ -- cgit v1.2.3