down, the dc driver and receiver can fall out of sync with one another,
resulting in a condition where the chip continues to receive packets
but the driver never notices. Normally, the receive handler checks each
descriptor starting from the current producer index to see if the chip
has relinquished ownership, indicating that a packet has been received.
The driver hands the packet off to ether_input() and then prepares the
descriptor to receive another frame before moving on to the next
descriptor in the ring. But sometimes, the chip appears to skip a
descriptor. This leaves the driver testing the status word in a descriptor
that never gets updated. The driver still gets "RX done" interrupts but
never advances further into the RX ring, until the ring fills up and the
chip interrupts again to signal an error condition. Sometimes, the
driver will remain in this desynchronized state, resulting in spotty
performance until the interface is reset.
Fortunately, it's fairly simple to detect this condition: if we call
the rxeof routine but the number of received packets doesn't increase,
we suspect that there could be a problem. In this case, we call a new
routine called dc_rx_resync(), which scans ahead in the RX ring to see
if there's a frame waiting for us somewhere beyond that the driver thinks
is the current producer index. If it finds one, it bumps up the index
and calls the rxeof handler again to snarf up the packet and bring the
driver back in sync with the chip. (It may actually do this several times
in the event that there's more than one "hole" in the ring.)
So far the only card supported by if_dc which has exhibited this problem
is a LinkSys LNE100TX v2.0 (82c115 PNIC II), and it only seems to happen
on one particular system, however the fix is general enough and has low
enough overhead that we may as well apply it for all supported chipsets.
I also implemented the same fix for the 3Com xl driver, which is apparently
vulnerable to the same problem.
Problem originally noted and patch tested by: Matt Dillon