The nvidia binary blob sometimes defers tx completion notification to the

OS dependent layer. Thus, the watchdog timer can go off when the tx engine is working fine but the OS dependent layer just hasn't been called to cleanup finished tx transactions. To workaround this, when the watchdog fires, poke the binary blob to force it to flush any pending tx completions. If this drops the pending tx count to zero then just return without logging a message or resetting the chip. This reportedly fixes the 'device timeout()' errors with at least several NF4 nve(4) parts. Submitted by: Nathan Alexander Whitehorn <nathanw@uchicago.edu> (code) Submitted by: dg (inspiration for comment and explanation) MFC after: 1 week
svn path=/head/; revision=158123
2006-04-28 20:08:16 +00:00 · 2006-04-28 20:08:16 +00:00 · f088002825 · 2020-12-20 02:59:44 +00:00
commit f088002825
parent 8a1f412960
1 changed files with 19 additions and 1 deletions
--- a/sys/dev/nve/if_nve.c
+++ b/sys/dev/nve/if_nve.c
@ -1277,9 +1277,27 @@ nve_watchdog(struct ifnet *ifp)
 {
 	struct nve_softc *sc = ifp->if_softc;

+	NVE_LOCK(sc);
+
+	/*
+	 * The nvidia driver blob defers tx completion notifications.
+	 * Thus, sometimes the watchdog timer will go off when the
+	 * tx engine is fine, but the tx completions are just deferred.
+	 * Try kicking the driver blob to clear out any pending tx
+	 * completions.  If that clears up all the pending tx
+	 * operations, then just return without printing the warning
+	 * message or resetting the adapter.
+	 */
+	sc->hwapi->pfnDisableInterrupts(sc->hwapi->pADCX);
+	sc->hwapi->pfnHandleInterrupt(sc->hwapi->pADCX);
+	sc->hwapi->pfnEnableInterrupts(sc->hwapi->pADCX);
+	if (sc->pending_txs == 0) {
+		NVE_UNLOCK(sc);
+		return;
+	}
+
 	device_printf(sc->dev, "device timeout (%d)\n", sc->pending_txs);

-	NVE_LOCK(sc);
 	sc->tx_errors++;

 	nve_stop(sc);