Commit Graph

10 Commits

Author SHA1 Message Date
Adrian Chadd
92e84e43a6 Implement the replacement EDMA FIFO code.
(Yes, the previous code temporarily broke EDMA TX. I'm sorry; I should've
actually setup ATH_BUF_FIFOEND on frames so txq->axq_fifo_depth was
cleared!)

This code implements a whole bunch of sorely needed EDMA TX improvements
along with CABQ TX support.

The specifics:

* When filling/refilling the FIFO, use the new TXQ staging queue
  for FIFO frames

* Tag frames with ATH_BUF_FIFOPTR and ATH_BUF_FIFOEND correctly.
  For now the non-CABQ transmit path pushes one frame into the TXQ
  staging queue without setting up the intermediary link pointers
  to chain them together, so draining frames from the txq staging
  queue to the FIFO queue occurs AMPDU / MPDU at a time.

* In the CABQ case, manually tag the list with ATH_BUF_FIFOPTR and
  ATH_BUF_FIFOEND so a chain of frames is pushed into the FIFO
  at once.

* Now that frames are in a FIFO pending queue, we can top up the
  FIFO after completing a single frame.  This means we can keep
  it filled rather than waiting for it drain and _then_ adding
  more frames.

* The EDMA restart routine now walks the FIFO queue in the TXQ
  rather than the pending queue and re-initialises the FIFO with
  that.

* When restarting EDMA, we may have partially completed sending
  a list.  So stamp the first frame that we see in a list with
  ATH_BUF_FIFOPTR and push _that_ into the hardware.

* When completing frames, only check those on the FIFO queue.
  We should never ever queue frames from the pending queue
  direct to the hardware, so there's no point in checking.

* Until I figure out what's going on, make sure if the TXSTATUS
  for an empty queue pops up, complain loudly and continue.
  This will stop the panics that people are seeing.  I'll add
  some code later which will assist in ensuring I'm populating
  each descriptor with the correct queue ID.

* When considering whether to queue frames to the hardware queue
  directly or software queue frames, make sure the depth of
  the FIFO is taken into account now.

* When completing frames, tag them with ATH_BUF_BUSY if they're
  not the final frame in a FIFO list.  The same holding descriptor
  behaviour is required when handling descriptors linked together
  with a link pointer as the hardware will re-read the previous
  descriptor to refresh the link pointer before contiuning.

* .. and if we complete the FIFO list (ie, the buffer has
  ATH_BUF_FIFOEND set), then we don't need the holding buffer
  any longer.  Thus, free it.

Tested:

* AR9380/AR9580, STA and hostap
* AR9280, STA/hostap

TODO:

* I don't yet trust that the EDMA restart routine is totally correct
  in all circumstances.  I'll continue to thrash this out under heavy
  multiple-TXQ traffic load and fix whatever pops up.
2013-03-26 20:04:45 +00:00
Adrian Chadd
b6ef0f8ac8 Convert the CABQ queue code over to use the HAL link pointer method
instead of axq_link.

This (among a bunch of uncommitted work) is required for EDMA chips
to correctly transmit frames on the CABQ.

Tested:

* AR9280, hostap mode
* AR9380/AR9580, hostap mode (staggered beacons)

TODO:

* This code only really gets called when burst beacons are used;
  it glues multiple CABQ queues together when sending to the hardware.
* More thorough bursted beacon testing! (first requires some work with
  the beacon queue code for bursted beacons, as that currently uses the
  link pointer and will fail on EDMA chips.)
2013-03-26 04:52:16 +00:00
Adrian Chadd
b837332d0a Overhaul the TXQ locking (again!) as part of some beacon/cabq timing
related issues.

Moving the TX locking under one lock made things easier to progress on
but it had one important side-effect - it increased the latency when
handling CABQ setup when sending beacons.

This commit introduces a bunch of new changes and a few unrelated changs
that are just easier to lump in here.

The aim is to have the CABQ locking separate from other locking.
The CABQ transmit path in the beacon process thus doesn't have to grab
the general TX lock, reducing lock contention/latency and making it
more likely that we'll make the beacon TX timing.

The second half of this commit is the CABQ related setup changes needed
for sane looking EDMA CABQ support.  Right now the EDMA TX code naively
assumes that only one frame (MPDU or A-MPDU) is being pushed into each
FIFO slot.  For the CABQ this isn't true - a whole list of frames is
being pushed in - and thus CABQ handling breaks very quickly.

The aim here is to setup the CABQ list and then push _that list_ to
the hardware for transmission.  I can then extend the EDMA TX code
to stamp that list as being "one" FIFO entry (likely by tagging the
last buffer in that list as "FIFO END") so the EDMA TX completion code
correctly tracks things.

Major:

* Migrate the per-TXQ add/removal locking back to per-TXQ, rather than
  a single lock.

* Leave the software queue side of things under the ATH_TX_LOCK lock,
  (continuing) to serialise things as they are.

* Add a new function which is called whenever there's a beacon miss,
  to print out some debugging.  This is primarily designed to help
  me figure out if the beacon miss events are due to a noisy environment,
  issues with the PHY/MAC, or other.

* Move the CABQ setup/enable to occur _after_ all the VAPs have been
  looked at.  This means that for multiple VAPS in bursted mode, the
  CABQ gets primed once all VAPs are checked, rather than being primed
  on the first VAP and then having frames appended after this.

Minor:

* Add a (disabled) twiddle to let me enable/disable cabq traffic.
  It's primarily there to let me easily debug what's going on with beacon
  and CABQ setup/traffic; there's some DMA engine hangs which I'm finally
  trying to trace down.

* Clear bf_next when flushing frames; it should quieten some warnings
  that show up when a node goes away.

Tested:

* AR9280, STA/hostap, up to 4 vaps (staggered)
* AR5416, STA/hostap, up to 4 vaps (staggered)

TODO:

* (Lots) more AR9380 and later testing, as I may have missed something here.
* Leverage this to fix CABQ hanling for AR9380 and later chips.
* Force bursted beaconing on the chips that default to staggered beacons and
  ensure the CABQ stuff is all sane (eg, the MORE bits that aren't being
  correctly set when chaining descriptors.)
2013-03-24 00:03:12 +00:00
Adrian Chadd
74ea88c379 Add more TODO items. 2013-03-19 17:55:36 +00:00
Adrian Chadd
61cd9692bb Add a quick work-around if ath_beacon_config() to not die if it's called
when an interface is going down.

Right now it's quite possible (but very unlikely!) that ath_reset()
or similar is called, leading to a beacon config call, in parallel with
the last VAP being destroyed.

This likely should be fixed by making sure the bmiss/bstuck/watchdog
taskqueues are canceled whenever the last VAP is destroyed.
2013-01-17 16:26:40 +00:00
Adrian Chadd
375307d411 Delete the per-TXQ locks and replace them with a single TX lock.
I couldn't think of a way to maintain the hardware TXQ locks _and_ layer
on top of that per-TXQ software queuing and any other kind of fine-grained
locks (eg per-TID, or per-node locks.)

So for now, to facilitate some further code refactoring and development
as part of the final push to get software queue ps-poll and u-apsd handling
into this driver, just do away with them entirely.

I may eventually bring them back at some point, when it looks slightly more
architectually cleaner to do so.  But as it stands at the present, it's
not really buying us much:

* in order to properly serialise things and not get bitten by scheduling
  and locking interactions with things higher up in the stack, we need to
  wrap the whole TX path in a long held lock.  Otherwise we can end up
  being pre-empted during frame handling, resulting in some out of order
  frame handling between sequence number allocation and encryption handling
  (ie, the seqno and the CCMP IV get out of sequence);

* .. so whilst that's the case, holding the lock for that long means that
  we're acquiring and releasing the TXQ lock _inside_ that context;

* And we also acquire it per-frame during frame completion, but we currently
  can't hold the lock for the duration of the TX completion as we need
  to call net80211 layer things with the locks _unheld_ to avoid LOR.

* .. the other places were grab that lock are reset/flush, which don't happen
  often.

My eventual aim is to change the TX path so all rejected frame transmissions
and all frame completions result in any ieee80211_free_node() calls to occur
outside of the TX lock; then I can cut back on the amount of locking that
goes on here.

There may be some LORs that occur when ieee80211_free_node() is called when
the TX queue path fails; I'll begin to address these in follow-up commits.
2012-12-02 06:24:08 +00:00
Adrian Chadd
e1252ce1d2 Extend the beacon code slightly to support AP mode beaconing for the
EDMA HAL hardware.

* The EDMA HAL code assumes the nexttbtt and intval values are in TU/8
  units, rather than TU.  For now, just "hack" around that here, at least
  until I code up something to translate it in the HAL.
* Setup some different TXQ flags for EDMA hardware.
* The EDMA HAL doesn't support setting the first rate series via
  ath_hal_setuptxdesc() - instead, a call to ath_hal_set11nratescenario()
  is always required.  So for now, just do an 11n rate series setup
  for EDMA beacon frames.

This allows my AR9380 to successfully transmit beacon frames.

However, CABQ TX and all normal data frame TX and TX completion is
still not functional and will require some more significant code churn
to make work.
2012-08-11 23:26:19 +00:00
Adrian Chadd
46634305f4 Migrate the ath_hal_filltxdesc() API to take a list of buffer/seglen values.
The existing API only exposes 'seglen' (the current buffer (segment) length)
with the data buffer pointer set in 'ds_data'.  This is fine for the legacy
DMA engine but it won't work for the EDMA engines.

The EDMA engine has a significantly different TX descriptor layout.

* The legacy DMA engine had a ds_data pointer at the same offset in the
  descriptor for both TX and RX buffers;
* The EDMA engine has no ds_data for RX - the data is DMAed after the
  descriptor;
* The EDMA engine has support for 4 TX buffer/segment pairs in the TX
  DMA descriptor;
* The EDMA TX completion is in a different FIFO, and the driver will
  'link' the status completion entry to a QCU by a "QCU ID".
  I don't know why it's just not filled in by the hardware, alas.

So given that, here are the changes:

* Instead of directly fondling 'ds_data' in ath_desc, change the
  ath_hal_filltxdesc() to take an array of buffer pointers as well
  as segment len pointers;
* The EDMA TX completion status wants a descriptor and queue id.
  This (for now) uses bf_state.bfs_txq and will extract the hardware QCU
  ID from that.
* .. and this is ugly and wasteful; it should change to just store
  the QCU in the bf_state and save 3/7 bytes in the process.

Now, the weird crap:

* The aggregate TX path was using bf_state->bfs_txq for the TXQ, rather than
  taking a function argument.  I've tidied that up.
* The multicast queue frames get put on a software TXQ and then that is
  appended to the hardware CABQ when appropriate.  So for now, make sure
  that bf_state->bfs_txq points at the CABQ when adding frames to the
  multicast queue.
* .. but the multicast queue TX path for now doesn't use the software
  queue and instead
  (a) directly sets up the descriptor contents at that point;
  (b) the frames on the vap->avp_mcastq are then just appended wholesale
      to the CABQ.
  So for now, I don't have to worry about making the multicast path
  work with aggregation or the per-TID software queue. Phew.

What's left to do:

* I need to modify the 11n ath_hal_chaintxdesc() API to do the same.
  I'll do that in a subsequent commit.
* Remove bf_state.bfs_txq entirely and store the QCU as appropriate.
* .. then do the runtime "is this going on the right HWQ?" checks using
  that, rather than comparing pointer values.

Tested on:

* AR9280 STA/AP
* AR5416 STA/AP
2012-08-05 10:12:27 +00:00
Adrian Chadd
bb06995571 Convert the TX path to use the new HAL methods for accessing the
TX descriptor link pointers.

This is required for the AR93xx and later chipsets.

The RX path is slightly different - the legacy RX path directly
accesses ath_desc->ds_link for now, however this isn't at all done
for EDMA (FIFO) RX.

Now, for those performing a little software archeology here:

This is all a bit sub-optimal. "struct ath_desc" is only really relevant
for the pre-AR93xx NICs - where ds_link and ds_data is always in the
same location.

The AR93xx and later NICs have different descriptor layouts altogether.

Now, for AR93xx and later NICs, you should never directly reference
ds_link and ds_data, as:

* the RX descriptors don't have either - the data is _after_ the RX
  descriptor.  They're just one large buffer.  There's also no need for
  a per-descriptor RX buffer size as they're all fixed sizes.

* the TX descriptors have 4 buffer and 4 length fields _and_ a link
  pointer.  Each frame takes up one TX FIFO pointer, but it can contain
  multiple subframes (either multiple frames in a buffer, and/or
  multiple frames in an aggregate/RIFS burst.)

* .. so, when TX frames are queued to a hardware queue, the link
  pointer is ONLY for buffers in that frame/aggregate.  The next frame
  starts in a new FIFO pointer.

* Finally, descriptor completion status is in a different ring.
  I'll write something up about that when its time to do so.

This was inspired by Linux ath9k and the reference driver but is a
reimplementation.

Obtained from:	Linux ath9k, Qualcomm Atheros
2012-07-19 03:51:16 +00:00
Adrian Chadd
ba5c15d9ba Migrate most of the beacon handling functions out to if_ath_beacon.c.
This is also in preparation for supporting AR9300 and later NICs.
2012-05-20 04:14:29 +00:00