The nvidia binary blob sometimes defers tx completion notification to the

OS dependent layer.  Thus, the watchdog timer can go off when the tx
engine is working fine but the OS dependent layer just hasn't been called
to cleanup finished tx transactions.  To workaround this, when the watchdog
fires, poke the binary blob to force it to flush any pending tx
completions.  If this drops the pending tx count to zero then just return
without logging a message or resetting the chip.

This reportedly fixes the 'device timeout()' errors with at least several
NF4 nve(4) parts.

Submitted by:	Nathan Alexander Whitehorn <nathanw@uchicago.edu> (code)
Submitted by:	dg (inspiration for comment and explanation)
MFC after:	1 week
This commit is contained in:
John Baldwin 2006-04-28 20:08:16 +00:00
parent 8a1f412960
commit f088002825
Notes: svn2git 2020-12-20 02:59:44 +00:00
svn path=/head/; revision=158123

View File

@ -1277,9 +1277,27 @@ nve_watchdog(struct ifnet *ifp)
{
struct nve_softc *sc = ifp->if_softc;
NVE_LOCK(sc);
/*
* The nvidia driver blob defers tx completion notifications.
* Thus, sometimes the watchdog timer will go off when the
* tx engine is fine, but the tx completions are just deferred.
* Try kicking the driver blob to clear out any pending tx
* completions. If that clears up all the pending tx
* operations, then just return without printing the warning
* message or resetting the adapter.
*/
sc->hwapi->pfnDisableInterrupts(sc->hwapi->pADCX);
sc->hwapi->pfnHandleInterrupt(sc->hwapi->pADCX);
sc->hwapi->pfnEnableInterrupts(sc->hwapi->pADCX);
if (sc->pending_txs == 0) {
NVE_UNLOCK(sc);
return;
}
device_printf(sc->dev, "device timeout (%d)\n", sc->pending_txs);
NVE_LOCK(sc);
sc->tx_errors++;
nve_stop(sc);