The nvidia binary blob sometimes defers tx completion notification to the
OS dependent layer. Thus, the watchdog timer can go off when the tx engine is working fine but the OS dependent layer just hasn't been called to cleanup finished tx transactions. To workaround this, when the watchdog fires, poke the binary blob to force it to flush any pending tx completions. If this drops the pending tx count to zero then just return without logging a message or resetting the chip. This reportedly fixes the 'device timeout()' errors with at least several NF4 nve(4) parts. Submitted by: Nathan Alexander Whitehorn <nathanw@uchicago.edu> (code) Submitted by: dg (inspiration for comment and explanation) MFC after: 1 week
This commit is contained in:
parent
8a1f412960
commit
f088002825
Notes:
svn2git
2020-12-20 02:59:44 +00:00
svn path=/head/; revision=158123
@ -1277,9 +1277,27 @@ nve_watchdog(struct ifnet *ifp)
|
||||
{
|
||||
struct nve_softc *sc = ifp->if_softc;
|
||||
|
||||
NVE_LOCK(sc);
|
||||
|
||||
/*
|
||||
* The nvidia driver blob defers tx completion notifications.
|
||||
* Thus, sometimes the watchdog timer will go off when the
|
||||
* tx engine is fine, but the tx completions are just deferred.
|
||||
* Try kicking the driver blob to clear out any pending tx
|
||||
* completions. If that clears up all the pending tx
|
||||
* operations, then just return without printing the warning
|
||||
* message or resetting the adapter.
|
||||
*/
|
||||
sc->hwapi->pfnDisableInterrupts(sc->hwapi->pADCX);
|
||||
sc->hwapi->pfnHandleInterrupt(sc->hwapi->pADCX);
|
||||
sc->hwapi->pfnEnableInterrupts(sc->hwapi->pADCX);
|
||||
if (sc->pending_txs == 0) {
|
||||
NVE_UNLOCK(sc);
|
||||
return;
|
||||
}
|
||||
|
||||
device_printf(sc->dev, "device timeout (%d)\n", sc->pending_txs);
|
||||
|
||||
NVE_LOCK(sc);
|
||||
sc->tx_errors++;
|
||||
|
||||
nve_stop(sc);
|
||||
|
Loading…
Reference in New Issue
Block a user