arge: attempt to close a transmit race by only enabling the descriptor at the end of setup.

This driver and the linux ag71xx driver both treat the transmit ring
as a circular linked list of descriptors.  There's no "end" pointer
that is ever NULL - instead, it expects the MAC to hit a finished
descriptor (ARGE_DESC_EMPTY) and stop.

Now, since it's a circular buffer, we may end up with the hardware
hitting the beginning of our multi-descriptor frame before we've finished
setting it up. It then DMA's it in, starts sending it, and we finish
writing out the new descriptor.  The hardware may then write its
completion for the next descriptor out; then we do, and when we next
read it it'll show up as "not done" and transmit completion stops.

This unfortunately manifests itself as the transmit queue always
being active and a massive TX interrupt storm.  We need to actively
ACK packets back from the transmit engine and if we don't (eg because
we think the transmit isn't finished but it is) then the unit will
just keep generating interrupts.

I hit this finally with the below testing setup.  This fixed it for me.

Strictly speaking I should put in a sync in between writing out all of
the descriptors and writing out that final descriptor.

Tested:

* QCA9558 SoC (AP135 reference board) w/ arge1 + vlans acting as a
  router, and iperf -d (tcp, bidirectional traffic.)

Obtained from:	Linux OpenWRT (ag71xx_main.c.)
This commit is contained in:
Adrian Chadd 2015-10-30 23:18:02 +00:00
parent 70487bd29b
commit 29f88ae706

View File

@ -1519,12 +1519,27 @@ arge_encap(struct arge_softc *sc, struct mbuf **m_head)
/*
* Make a list of descriptors for this packet. DMA controller will
* walk through it while arge_link is not zero.
*
* Since we're in a endless circular buffer, ensure that
* the first descriptor in a multi-descriptor ring is always
* set to EMPTY, then un-do it when we're done populating.
*/
prev_prod = prod;
desc = prev_desc = NULL;
for (i = 0; i < nsegs; i++) {
uint32_t tmp;
desc = &sc->arge_rdata.arge_tx_ring[prod];
desc->packet_ctrl = ARGE_DMASIZE(txsegs[i].ds_len);
/*
* Set DESC_EMPTY so the hardware (hopefully) stops at this
* point. We don't want it to start transmitting descriptors
* before we've finished fleshing this out.
*/
tmp = ARGE_DMASIZE(txsegs[i].ds_len);
if (i == 0)
tmp |= ARGE_DESC_EMPTY;
desc->packet_ctrl = tmp;
/* XXX Note: only relevant for older MACs; but check length! */
if ((sc->arge_hw_flags & ARGE_HW_FLG_TX_DESC_ALIGN_4BYTE) &&
@ -1545,6 +1560,12 @@ arge_encap(struct arge_softc *sc, struct mbuf **m_head)
/* Update producer index. */
sc->arge_cdata.arge_tx_prod = prod;
/*
* The descriptors are updated, so enable the first one.
*/
desc = &sc->arge_rdata.arge_tx_ring[prev_prod];
desc->packet_ctrl &= ~ ARGE_DESC_EMPTY;
/* Sync descriptors. */
bus_dmamap_sync(sc->arge_cdata.arge_tx_ring_tag,
sc->arge_cdata.arge_tx_ring_map,