The original code was .. well, slightly more than incorrect.
It showed up as stalled RX queues if the NIC needed to be frequently
reinitialised (eg during scans.)
This is inspired by work done by Matt Dillon over at the DragonflyBSD
project.
So:
* track when EDMA RX has been stopped and when the MAC has been reset;
* re-initialise the ring only after a reset;
* track whether RX has been stopped/started - just for debugging now;
* don't bother with the RX EOL stuff for EDMA - we don't need the
interrupt at all. We also don't need to disable/enable the interrupt
or start DMA - once new frames are pushed into the ring via the
normal RX path, it'll just restart RX DMA on its own.
Tested:
* AR9380, STA mode
* AR9380, AP mode
* AR9485, STA mode
* AR9462, STA mode
fixes and beacon programming / debugging into the ath(4) driver.
The basic power save tracking:
* Add some new code to track the current desired powersave state; and
* Add some reference count tracking so we know when the NIC is awake; then
* Add code in all the points where we're about to touch the hardware and
push it to force-wake.
Then, how things are moved into power save:
* Only move into network-sleep during a RUN->SLEEP transition;
* Force wake the hardware up everywhere that we're about to touch
the hardware.
The net80211 stack takes care of doing RUN<->SLEEP<->(other) state
transitions so we don't have to do it in the driver.
Next, when to wake things up:
* In short - everywhere we touch the hardware.
* The hardware will take care of staying awake if things are queued
in the transmit queue(s); it'll then transit down to sleep if
there's nothing left. This way we don't have to track the
software / hardware transmit queue(s) and keep the hardware
awake for those.
Then, some transmit path fixes that aren't related but useful:
* Force EAPOL frames to go out at the lowest rate. This improves
reliability during the encryption handshake after 802.11
negotiation.
Next, some reset path fixes!
* Fix the overlap between reset and transmit pause so we don't
transmit frames during a reset.
* Some noisy environments will end up taking a lot longer to reset
than normal, so extend the reset period and drop the raise the
reset interval to be more realistic and give the hardware some
time to finish calibration.
* Skip calibration during the reset path. Tsk!
Then, beacon fixes in station mode!
* Add a _lot_ more debugging in the station beacon reset path.
This is all quite fluid right now.
* Modify the STA beacon programming code to try and take
the TU gap between desired TSF and the target TU into
account. (Lifted from QCA.)
Tested:
* AR5210
* AR5211
* AR5212
* AR5413
* AR5416
* AR9280
* AR9285
TODO:
* More AP, IBSS, mesh, TDMA testing
* Thorough AR9380 and later testing!
* AR9160 and AR9287 testing
Obtained from: QCA
freeing them.
The current code would walk the list and call the buffer free, which
didn't remove it from any lists before pushing it back on the free list.
Tested: AR9485, STA mode
Noticed by: dillon@apollo.dragonflybsd.org
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
This occurs at RX DMA start, even though the RX FIFO has plenty of
space. I'll go figure out why, but this shouldn't cause people to
be spammed by these messages.
buffers (ie, >4GB on amd64.)
The underlying problem was that PREREAD doesn't sync the mbuf
with the DMA memory (ie, bounce buffer), so the bounce buffer may
have had stale information. Thus it was always considering the
buffer completed and things just went off the rails.
This change does the following:
* Make ath_rx_pkt() always consume the mbuf somehow; it no longer
passes error mbufs (eg CRC errors, crypt errors, etc) back up
to the RX path to recycle. This means that a new mbuf is always
allocated each time, but it's cleaner.
* Push the RX buffer map/unmap to occur in the RX path, not
ath_rx_pkt(). Thus, ath_rx_pkt() now assumes (a) it has to consume
the mbuf somehow, and (b) that it's already been unmapped and
synced.
* For the legacy path, the descriptor isn't mapped, it comes out of
coherent, DMA memory anyway. So leave it there.
* For the EDMA path, the RX descriptor has to be cleared before
its passed to the hardware, so that when we check with
a POSTREAD sync, we actually get either a blank (not finished)
or a filled out descriptor (finished.) Otherwise we get stale
data in the DMA memory.
* .. so, for EDMA RX path, we need PREREAD|PREWRITE to sync the
data -> DMA memory, then POSTREAD|POSTWRITE to finish syncing
the DMA memory -> data.
* Whilst we're here, make sure that in EDMA buffer setup (ie,
bzero'ing the descriptor part) is done before the mbuf is
map/synched.
NOTE: there's been a lot of commits besides this one with regards to
tidying up the busdma handling in ath(4). Please check the recent
commit history.
Discussed with and thanks to: scottl
Tested:
* AR5416 (non-EDMA) on i386, with the DMA tag for the driver
set to 2^^30, not 2^^32, STA
* AR9580 (EDMA) on i386, as above, STA
* User - tested AR9380 on amd64 with 32GB RAM.
PR: kern/177530
The normal RX path (ath_rx_pkt()) will sync and unmap the
buffer before passing it up the stack. We only need to do this
if we're flushing the FIFO during reset/shutdown.
"complete RX frames."
The 128 entry RX FIFO is really easy to fill up and miss refilling
when it's done in the ath taskq - as that gets blocked up doing
RX completion, TX completion and other random things.
So the 128 entry RX FIFO now gets emptied and refilled in the ath_intr()
task (and it grabs / releases locks, so now ath_intr() can't just be
a FAST handler yet!) but the locks aren't held for very long. The
completion part is done in the ath taskqueue context.
Details:
* Create a new completed frame list - sc->sc_rx_rxlist;
* Split the EDMA RX process queue into two halves - one that
processes the RX FIFO and refills it with new frames; another
that completes the completed frame list;
* When tearing down the driver, flush whatever is in the deferred
queue as well as what's in the FIFO;
* Create two new RX methods - one that processes all RX queues,
one that processes the given RX queue. When MSI is implemented,
we get told which RX queue the interrupt came in on so we can
specifically schedule that. (And I can do that with the non-MSI
path too; I'll figure that out later.)
* Convert the legacy code over to use these new RX methods;
* Replace all the instances of the RX taskqueue enqueue with a call
to a relevant RX method to enqueue one or all RX queues.
Tested:
* AR9380, STA
* AR9580, STA
* AR5413, STA
events.
This is primarily for the TX EDMA and TX EDMA completion. I haven't yet
tied it into the EDMA RX path or the legacy TX/RX path.
Things that I don't quite like:
* Make the pointer type 'void' in ath_softc and have if_ath_alq*()
return a malloc'ed buffer. That would remove the need to include
if_ath_alq.h in if_athvar.h.
* The sysctl setup needs to be cleaned up.
This should eventually be unified with ATH_DEBUG() so I can get both
from one macro; that may take some time.
Add some new probes for TX and TX completion.
* Introduce TX DMA setup/teardown methods, mirroring what's done in
the RX path.
Although the TX DMA descriptor is setup via ath_desc_alloc() /
ath_desc_free(), there TX status descriptor ring will be allocated
in this path.
* Remove some of the TX EDMA capability probing from the RX path and
push it into the new TX EDMA path.
* wrap the RX proc calls in the RX refcount;
* call the DFS checking, fast frames staging and TX rescheduling if
required.
TODO:
* figure out if I can just make "do TX rescheduling" mean "schedule
TX taskqueue" ?
with fresh descriptors, before handling the frames.
Wrap it all in the RX locks.
Since the FIFO is very shallow (16 for HP, 128 for LP) it needs to be
drained and replenished very quickly. Ideally, I'll eventually move this
RX FIFO drain/fill into the interrupt handler, only deferring the actual
frame completion.
I was setting up the RX EDMA buffer to be 4096 bytes rather than the
RX data buffer portion. The hardware was likely getting very confused
and DMAing descriptor portions into places it shouldn't, leading to
memory corruption and occasional panics.
Whilst here, don't bother allocating descriptors for the RX EDMA case.
We don't use those descriptors. Instead, just allocate ath_buf entries.
the FIFO.
I still see some corner cases where no RX occurs when it should be
occuring. It's quite possible that there's a subtle race condition
somewhere; or maybe I'm not programming the RX queues right.
There's also no locking here yet, so any reset/configuration path
state change (ie, enabling/disabling receive from the ioctl, net80211
taskqueue, etc) could quite possibly confuse things.
* For now, kickpcu should hopefully just do nothing - the PCU doesn't need
'kicking' for Osprey and later NICs. The PCU will just restart once
the next FIFO entry is pushed in.
* Teach "proc" about "dosched", so it can be used to just flush the
FIFO contents without adding new FIFO entries.
* .. and now, implement the RX "flush" routine.
* Re-initialise the FIFO contents if the FIFO is empty (the DP is NULL.)
When PCU RX is disabled (ie, writing RX_D to the RX configuration
register) then the FIFO will be completely emptied. If the software FIFO
is full, then no further descriptors are pushed into the FIFO and
things stall.
This all requires much, much more thorough stress testing.
This is inspired by ath9k and the reference driver, but it's a new
implementation of the RX FIFO handling.
This has some issues - notably the FIFO needs to be reprogrammed when
the chip is reset.
The RX EDMA support requires a modified approach to the RX descriptor
handling.
Specifically:
* There's now two RX queues - high and low priority;
* The RX queues are implemented as FIFOs; they're now an array of pointers
to buffers;
* .. and the RX buffer and descriptor are in the same "buffer", rather than
being separate.
So to that end, this commit abstracts out most of the RX related functions
from the bulk of the driver. Notably, the RX DMA/buffer allocation isn't
updated, primarily because I haven't yet fleshed out what it should look
like.
Whilst I'm here, create a set of matching but mostly unimplemented EDMA
stubs.
Tested:
* AR9280, station mode
TODO:
* Thorough AP and other mode testing for non-EDMA chips;
* Figure out how to allocate RX buffers suitable for RX EDMA, including
correctly setting the mbuf length to compensate for the RX descriptor
and completion status area.