d1b5b058f7
chip revisions. (A buggy taiwanese chip? I'm just shocked; shocked I tell you.) So far I have only observed the anomalous behavior on board with PCI revision 33 chips. At the moment, this seems to include only the Netgear FA310-TX rev D1 boards with chips labeled NGMC169B. (Possibly this means it's an 82c169B part from Lite-On.) The bug only manifests itself in promiscuous mode, and usually only at 10Mbps half-duplex. (I have not observed the problem in full-duplex mode, and I don't think it ever happens at 100Mbps.) The bug appears to be in the receiver DMA engine. Normally, the chip is programmed with a linked list of receiver descriptors, each with a receive buffer capable of holding a complete full-sized ethernet frame. During periods of heavy traffic (i.e. ping -c 100 -f 8100 <otherhost>), the receiver will sometimes appear to upload its entire FIFO memory contents instead of just uploading the desired received frame. The uploaded data will span several receive buffers, in spite of the fact that the chip has been told to only use one descriptor per frame, and appears to consist of previously transmitted frames with the correct received frame appended to the end. Unfortunately, there is no way to determine exactly how much data is uploaded when this happens; the chip doesn't tell you anything except the size of the desired received frame, and the amount of bogus data varies. Sometimes, the desired frame is also split across multiple buffers. The workaround is ugly and nasty. The driver assembles all of the data from the bogus frames into a single buffer. The receive buffers are always zeroed out, and we program the chip to always include the receive CRC at the end of each frame. We therefore know that we can start from the end of the buffer and scan back until we encounter a non-zero data byte, and say conclusively that this is the end of the desired frame. We can then subtract the frame length from this address to determine the real start of the frame, and copy it into an mbuf and pass it on. This is kludgy and time consuming, but it's better than dropping frames. It's not too bad since the problem only happens at 10Mbps. The workaround is only enabled for chips with PCI revision == 33. The LinkSys LNE100TX and Matrox FastNIC 10/100 cards use a revision 32 chip and work fine in promiscuous mode. Netgear support has confirmed that they "have some previous knowledge of problems in promiscuous mode" but didn't have a workaround. The people at Lite-On who would be able to suggest a possible fix are on vacation. So, I decided to implement a workaround of my own until I hear from them. I suppose this problem made it through Netgear's QA department since Windows doesn't normally use promiscuous mode, and if Windows doesn't need the feature than it can't possibly be important, right? Grrr.