diff options
author | Edward Cree <ecree@solarflare.com> | 2014-04-16 19:27:48 +0100 |
---|---|---|
committer | David S. Miller <davem@davemloft.net> | 2014-04-16 14:33:57 -0400 |
commit | e283546c0465dd3026bc94f7b1a9de7f6b8969ec (patch) | |
tree | 3828d4faeed3986b0f01b93416b910b11cd33280 /drivers/net/ethernet/sfc/farch.c | |
parent | 10ec34fcb100412ab186c141a9c3557d1270effd (diff) | |
download | lwn-e283546c0465dd3026bc94f7b1a9de7f6b8969ec.tar.gz lwn-e283546c0465dd3026bc94f7b1a9de7f6b8969ec.zip |
sfc:On MCDI timeout, issue an FLR (and mark MCDI to fail-fast)
When an MCDI command times out (whether or not we find it
completed when we poll), call efx_mcdi_abandon(), which tells
all subsequent MCDI calls to fail-fast, and queues up an FLR.
Because an FLR doesn't lead to receiving any reboot even from
the MC (unlike most other types of reset), we have to call
efx_ef10_reset_mc_allocations.
In efx_start_all(), if a reset (of any kind) is pending, we
bail out.
Without this, attempts to reconfigure (e.g. change mtu) can
cause driver/mc state inconsistency if the first MCDI call
triggers an FLR.
For similar reasons, on EF10, in
efx_reset_down(method=RESET_TYPE_MCDI_TIMEOUT), set the number
of active queues to zero before calling efx_stop_all().
And, on farch, in efx_reset_up(method=RESET_TYPE_MCDI_TIMEOUT),
set active_queues and flushes pending & outstanding to zero.
efx_mcdi_mode_{poll,event}() should not take us out of fail-fast
mode. Instead, this is done by efx_mcdi_reset() after the FLR
completes.
The new FLR reset_type RESET_TYPE_MCDI_TIMEOUT doesn't really
fit into the hierarchy of reset 'scopes' whereby efx_reset()
decides some resets subsume others. Thus, it uses separate logic.
Also, fixed up some inconsistency around RESET_TYPE_MC_BIST,
which was in the wrong place in that hierarchy.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'drivers/net/ethernet/sfc/farch.c')
-rw-r--r-- | drivers/net/ethernet/sfc/farch.c | 22 |
1 files changed, 22 insertions, 0 deletions
diff --git a/drivers/net/ethernet/sfc/farch.c b/drivers/net/ethernet/sfc/farch.c index a08761360cdf..0537381cd2f6 100644 --- a/drivers/net/ethernet/sfc/farch.c +++ b/drivers/net/ethernet/sfc/farch.c @@ -741,6 +741,28 @@ int efx_farch_fini_dmaq(struct efx_nic *efx) return rc; } +/* Reset queue and flush accounting after FLR + * + * One possible cause of FLR recovery is that DMA may be failing (eg. if bus + * mastering was disabled), in which case we don't receive (RXQ) flush + * completion events. This means that efx->rxq_flush_outstanding remained at 4 + * after the FLR; also, efx->active_queues was non-zero (as no flush completion + * events were received, and we didn't go through efx_check_tx_flush_complete()) + * If we don't fix this up, on the next call to efx_realloc_channels() we won't + * flush any RX queues because efx->rxq_flush_outstanding is at the limit of 4 + * for batched flush requests; and the efx->active_queues gets messed up because + * we keep incrementing for the newly initialised queues, but it never went to + * zero previously. Then we get a timeout every time we try to restart the + * queues, as it doesn't go back to zero when we should be flushing the queues. + */ +void efx_farch_finish_flr(struct efx_nic *efx) +{ + atomic_set(&efx->rxq_flush_pending, 0); + atomic_set(&efx->rxq_flush_outstanding, 0); + atomic_set(&efx->active_queues, 0); +} + + /************************************************************************** * * Event queue processing |