Commit Graph

167 Commits

Author SHA1 Message Date
adrian
f14274ee49 Bring over some initial power save management support, reset path
fixes and beacon programming / debugging into the ath(4) driver.

The basic power save tracking:

* Add some new code to track the current desired powersave state; and
* Add some reference count tracking so we know when the NIC is awake; then
* Add code in all the points where we're about to touch the hardware and
  push it to force-wake.

Then, how things are moved into power save:

* Only move into network-sleep during a RUN->SLEEP transition;
* Force wake the hardware up everywhere that we're about to touch
  the hardware.

The net80211 stack takes care of doing RUN<->SLEEP<->(other) state
transitions so we don't have to do it in the driver.

Next, when to wake things up:

* In short - everywhere we touch the hardware.
* The hardware will take care of staying awake if things are queued
  in the transmit queue(s); it'll then transit down to sleep if
  there's nothing left.  This way we don't have to track the
  software / hardware transmit queue(s) and keep the hardware
  awake for those.

Then, some transmit path fixes that aren't related but useful:

* Force EAPOL frames to go out at the lowest rate.  This improves
  reliability during the encryption handshake after 802.11
  negotiation.

Next, some reset path fixes!

* Fix the overlap between reset and transmit pause so we don't
  transmit frames during a reset.
* Some noisy environments will end up taking a lot longer to reset
  than normal, so extend the reset period and drop the raise the
  reset interval to be more realistic and give the hardware some
  time to finish calibration.
* Skip calibration during the reset path.  Tsk!

Then, beacon fixes in station mode!

* Add a _lot_ more debugging in the station beacon reset path.
  This is all quite fluid right now.
* Modify the STA beacon programming code to try and take
  the TU gap between desired TSF and the target TU into
  account.  (Lifted from QCA.)

Tested:

* AR5210
* AR5211
* AR5212
* AR5413
* AR5416
* AR9280
* AR9285

TODO:

* More AP, IBSS, mesh, TDMA testing
* Thorough AR9380 and later testing!
* AR9160 and AR9287 testing

Obtained from:	QCA
2014-04-30 02:19:41 +00:00
adrian
613ea3ba08 Rewrite the cleanup code to, well, actually work right.
The existing cleanup code was based on the Atheros reference driver
from way back and stuff that was in Linux ath9k.  It turned out to be ..
rather silly.

Specifically:

* The whole method of determining whether there's hardware-queued frames
  was fragile and the BAW would never quite work right afterwards.

* The cleanup path wouldn't correctly pull apart aggregate frames in the
  queue, so frames would not be freed and the BAW wouldn't be correctly
  updated.

So to implement this:

* Pull the aggregate frames apart correctly and handle each separately;
* Make the atid->incomp counter just track the number of hardware queued
  frames rather than try to figure it out from the BAW;
* Modify the aggregate completion path to handle it as a single frame
  (atid->incomp tracks the one frame now, not the subframes) and
  remove the frames from the BAW before completing them as normal frames;
* Make sure bf->bf_next is NULled out correctly;
* Make both aggregate session and non-aggregate path frames now be
  handled via the incompletion path.

TODO:

* kill atid->incomp; the driver tracks the hardware queued frames
  for each TID and so we can just use that.

This is a stability fix that should be merged back to stable/10.

Tested:

* AR5416, STA

MFC after:	7 days
2014-04-21 06:07:08 +00:00
adrian
614c44f1d6 * Modify the debugging output from pause/resume to note the TID and STA
MAC
* Now that the paused < 0 bugs have been identified, make the DPRINTF()
  a device_printf() again.  Anything else that shows up here needs to be
  fixed immediately.

Tested:

* AR5416, STA mode

MFC after:	7 days
2014-04-21 02:09:14 +00:00
adrian
af79794fb4 Make sure bf_next is NULL'ed out when we're completing up an aggregate
frame through the cleanup path.

Whilst here, fix the indenting for something I messed up.

Tested:

* AR5416, STA mode
2014-04-21 02:05:51 +00:00
adrian
35123c2dbb Fix a cleanup hang if cleanup gets called _during_ an active cleanup.
During power save testing I noticed that the cleanup code is being
called during a RUN->RUN state transition.  It's because the net80211
stack is treating that (for reasons I don't quitey know yet) as a
reassociation and this calls the node cleanup code.  The reason it's
seeing a RUN->RUN transition is because during active power save
stuff it's possible that the RUN->SLEEP and SLEEP->RUN transitions
happen so quickly that the deferred net80211 vap state code
"loses" a transition, namely the intermediary SLEEP transition.

So, this was causing the node reassociation code to sometimes be called
twice in quick succession and this would result in ath_tx_tid_cleanup()
to be called again.  The code calling it would always call pause, and
then only call resume if the TID didn't have "cleanup_inprogress" set.
Unfortunately it didn't check if it was already set on entry, so it
would pause but not call resume.  Thus, paused would be called more
than once (once before each entry into ath-tx_tid_cleanup()) but resume
would only be called once when the cleanup state was finished.

This doesn't entirely fix all of the issues seen in the cleanup path
but it's a necessary first step.

Since this is a stability fix, it should be merged to stable/10 at some
point.

Tested:

* AR5416, STA mode

MFC after:	7 days
2014-04-21 01:02:49 +00:00
adrian
04e79987ba Add some debugging and forcing of the BAW to match what the current
tracked BAW actually is.

The net80211 code that completes a BAR will set tid->txa_start (the
BAW start) to whatever value was called when sending the BAR.
Now, in case there's bugs in my driver code that cause the BAW
to slip along, we should make sure that the new BAW we start
at is actually what we currently have it at, not what we've sent.

This totally breaks the specification and so this stays a printf().
If it happens then I need to know and fix it.

Whilst here, add some debugging updates:

* add TID logging to places where it's useful;
* use SEQNO().
2014-04-08 07:14:14 +00:00
adrian
3f138e5f29 Don't do continue inside the scheduler loop; we really need to check
if we've hit the end of the list and cycled around to the first
node again.

Obtained from:	DragonflyBSD
2014-04-08 07:10:52 +00:00
adrian
e4ba7b5fcb Correct the actual definition of ath_tx_tid_filt_comp_single() to
match how it's used.

This is another bug that led to aggregate traffic hanging because
the BAW tracking stopped being accurate.  In this instance, a filtered
frame that exceeded retries would return a non-error, which would
mean the caller would never remove it from the BAW.  But it wouldn't
be added to the filtered list, so it would be lost forever.  There'd
thus be a hole in the BAW that would never get transmitted and
this leads to a traffic hang.

Tested:

* Routerstation Pro, AR9220 AP
2014-04-08 07:08:59 +00:00
adrian
1bf3ee1cf9 Add a comment explaining the obvious. 2014-04-08 07:01:27 +00:00
adrian
c0d54a4c69 Don't resume a TID on each filtered frame completion - only do it if
we did suspend it.

The whole suspend/resume TID queue thing is supposed to be a matched
reference count - a subsystem (eg addba negotiation, BAR transmission,
filtered frames, etc) is supposed to call pause() once and then resume()
once.

ath_tx_tid_filt_comp_complete() is called upon the completion of any
filtered frame, regardless of whether the driver had aleady seen
a filtered frame and called pause().

So only call resume() if tid->isfiltered = 1, which indicates that
we had called pause() once.

This fixes a seemingly whacked and different problem - traffic hangs.

What was actually going on:

* There'd be some marginal link with crappy behaviour, causing filtered
  frames and BAR TXing to occur;
* A BAR TX would occur, setting the new BAW (block-ack window) to seqno n;
* .. and pause() would be called, blocking further transmission;
* A filtered frame completion would occur from the hardware, but with
  tid->isfiltered = 0 which indiciates we haven't actually marked
  the queue yet as filtered;
* ath_tx_tid_filt_comp_complete() would call resume(), continuing
  transmission;
* Some frames would be queued to the hardware, since the TID is now no
  longer paused;
* .. and if some make it out and ACked successfully, the new BAW
  may be seqno n+1 or more;
* .. then the BAR TX completes and sets the new seqno back to n.

At this point the BAW tracking would be loopy because the BAW
start was modified but the BAW ring buffer wasn't updated in lock
step.

Tested:

* Routerstation Pro + AR9220 AP
2014-04-08 07:00:43 +00:00
adrian
00f76c206a Throw the flush messages behind ATH_DEBUG_RESET as well.
These are needed to diagnose TX hangs that I and hiren are seeing.
Without it, the only way we'll see debugging is by having ATH_DEBUG_SW_TX
enabled and that is going to be very, very spammy.

ATH_DEBUG_RESET is fine; it's only going to be done during stuck beacon
situations in AP mode.

Whilst I'm here, and now that it's behind debugging, let's just disable
the "print only one" conditional.  I'll eventually make it more tunable.

Tested:

* AR9220, hostap mode.
2014-03-20 23:16:58 +00:00
rpaulo
8dfc245ef9 Call ieee80211_dump_pkt() based on IFF_DUMPPKTS().
MFC after:	3 days
2014-03-08 19:35:31 +00:00
kevlo
2d30c961db Rename definition of IEEE80211_FC1_WEP to IEEE80211_FC1_PROTECTED.
The origin of WEP comes from IEEE Std 802.11-1997 where it defines
whether the frame body of MAC frame has been encrypted using WEP
algorithm or not.
IEEE Std. 802.11-2007 changes WEP to Protected Frame, indicates
whether the frame is protected by a cryptographic encapsulation
algorithm.

Reviewed by:	adrian, rpaulo
2014-01-08 08:06:56 +00:00
cognet
fc685e71b4 Include <sys/ktr.h>, since we need it if ATH_DEBUG is defined. 2013-10-28 20:26:34 +00:00
glebius
ff6e113f1b The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
rpaulo
47e44e7b7a Add a missing comma. 2013-10-17 05:51:54 +00:00
rpaulo
bcc25db7d5 Move a lot of debugging printf's to DPRINTF.
Approved by:	adrian
MFC after:	2 weeks
2013-10-17 01:53:07 +00:00
adrian
46b70ac9f4 Log the MAC address of the node in question rather than the pointer. 2013-08-17 01:14:28 +00:00
adrian
967895b1af Shuffle around the cleanup unpause calls a bit. 2013-05-29 01:40:13 +00:00
adrian
f70201bb2a Migrate ath(4) to now use if_transmit instead of the legacy if_start
and if queue mechanism; also fix up (non-11n) TX fragment handling.

This may result in a bit of a performance drop for now but I plan on
debugging and resolving this at a later stage.

Whilst here, fix the transmit path so fragment transmission works.

The TX fragmentation handling is a bit more special.  In order to
correctly transmit TX fragments, there's a bunch of corner cases that
need to be handled:

* They must be transmitted back to back, in the same order..
* .. ie, you need to hold the TX lock whilst transmitting this
  set of fragments rather than interleaving it with other MSDUs
  destined to other nodes;
* The length of the next fragment is required when transmitting, in
  order to correctly set the NAV field in the current frame to the
  length of the next frame; which requires ..
* .. that we know the transmit duration of the next frame, which ..
* .. requires us to set the rate of all fragments to the same length,
  or make the decision up-front, etc.

To facilitate this, I've added a new ath_buf field to describe the
length of the next fragment.  This avoids having to keep the mbuf
chain together.  This used to work before my 11n TX path work because
the ath_tx_start() routine would be handed a single mbuf with m_nextpkt
pointing to the next frame, and that would be maintained all the way
up to when the duration calculation was done.  This doesn't hold
true any longer - the actual queuing may occur at any point in the
future (think ath_node TID software queuing) so this information
needs to be maintained.

Right now this does work for non-11n frames but it doesn't at all
enforce the same rate control decision for all frames in the fragment.
I plan on fixing this in a followup commit.

RTS/CTS has the same issue, I'll look at fixing this in a subsequent
commit.

Finaly, 11n fragment support requires the driver to have fully
decided what the rate scenario setup is - including 20/40MHz,
short/long GI, STBC, LDPC, number of streams, etc.  Right now that
decision is (currently) made _after_ the NAV field value is updated.
I'll fix all of this in subsequent commits.

Tested:

* AR5416, STA, transmitting 11abg fragments
* AR5416, STA, 11n fragments work but the NAV field is incorrect for
  the reasons above.

TODO:

* It would be nice to be able to queue mbufs per-node and per-TID so
  we can only queue ath_buf entries when it's time to assemble frames
  to send to the hardware.

  But honestly, we should just do that level of software queue management
  in net80211 rather than ath(4), so I'm going to leave this alone for now.

* More thorough AP, mesh and adhoc testing.

* Ensure that net80211 doesn't hand us fragmented frames when A-MPDU has
  been negotiated, as we can't do software retransmission of fragments.

* .. set CLRDMASK when transmitting fragments, just to ensure.
2013-05-26 22:23:39 +00:00
adrian
e1aea355e7 Implement a separate hardware queue threshold for aggregate and non-aggr
traffic.

When transmitting non-aggregate traffic, we need to keep the hardware
busy whilst transmitting or small bursts in txdone/tx latency will
kill us.

This restores non-aggregate iperf performance, especially when doing
TDMA.

Tested:

* AR5416<->AR5416, TDMA
* AR5416 STA <-> AR9280 AP
2013-05-21 18:13:57 +00:00
adrian
16418fa472 More non-ATH_DEBUG build fixes. 2013-05-19 01:33:17 +00:00
adrian
83ab6e624a Be (very) careful about how to add more TX DMA work.
The list-based DMA engine has the following behaviour:

* When the DMA engine is in the init state, you can write the first
  descriptor address to the QCU TxDP register and it will work.

* Then when it hits the end of the list (ie, it either hits a NULL
  link pointer, OR it hits a descriptor with VEOL set) the QCU
  stops, and the TxDP points to the last descriptor that was transmitted.

* Then when you want to transmit a new frame, you can then either:
  + write the head of the new list into TxDP, or
  + you write the head of the new list into the link pointer of the
    last completed descriptor (ie, where TxDP points), then kick
    TxE to restart transmission on that QCU>

* The hardware then will re-read the descriptor to pick up the link
  pointer and then jump to that.

Now, the quirks:

* If you write a TxDP when there's been no previous TxDP (ie, it's 0),
  it works.

* If you write a TxDP in any other instance, the TxDP write may actually
  fail.  Thus, when you start transmission, it will re-read the last
  transmitted descriptor to get the link pointer, NOT just start a new
  transmission.

So the correct thing to do here is:

* ALWAYS use the holding descriptor (ie, the last transmitted descriptor
  that we've kept safe) and use the link pointer in _THAT_ to transmit
  the next frame.

* NEVER write to the TxDP after you've done the initial write.

* .. also, don't do this whilst you're also resetting the NIC.

With this in mind, the following patch does basically the above.

* Since this encapsulates Sam's issues with the QCU behaviour w/ TDMA,
  kill the TDMA special case and replace it with the above.

* Add a new TXQ flag - PUTRUNNING - which indicates that we've started
  DMA.

* Clear that flag when DMA has been shutdown.

* Ensure that we're not restarting DMA with PUTRUNNING enabled.

* Fix the link pointer logic during TXQ drain - we should always ensure
  the link pointer does point to something if there's a list of frames.
  Having it be NULL as an indication that DMA has finished or during
  a reset causes trouble.

Now, given all of this, i want to nuke axq_link from orbit.  There's now HAL
methods to get and set the link pointer of a descriptor, so what we
should do instead is to update the right link pointer.

* If there's a holding descriptor and an empty TXQ list, set the
  link pointer of said holding descriptor to the new frame.

* If there's a non-empty TXQ list, set the link pointer of the
  last descriptor in the list to the new frame.

* Nuke axq_link from orbit.

Note:

* The AR9380 doesn't need this.  FIFO TX writes are atomic.  As long as
  we don't append to a list of frames that we've already passed to the
  hardware, all of the above doesn't apply.  The holding descriptor stuff
  is still needed to ensure the hardware can re-read a completed
  descriptor to move onto the next one, but we restart DMA by pushing in
  a new FIFO entry into the TX QCU.  That doesn't require any real
  gymnastics.

Tested:

* AR5210, AR5211, AR5212, AR5416, AR9380 - STA mode.
2013-05-18 18:27:53 +00:00
adrian
78ff6ffd7d Add some more debugging printf()s to complain if the ath_buf tx queue
doesn't match the actual hardware queue this frame is queued to.

I'm trying to ensure that the holding buffers are actually being queued
to the same TX queue as the holding buffer that they end up on.
I'm pretty sure this is all correct so if this complains, it'll be due
to some kind of subtle broken-ness that needs fixing.

This is only done for legacy hardware, not EDMA hardware.

Tested:

* AR5416 STA mode, very lightly
2013-05-17 05:16:30 +00:00
adrian
80dd90c797 Tidy up the debugging - don't bother printing out TID pointers; now
that we are printing out the MAC address in these fields, just printing
out the TID is enough.
2013-05-16 17:53:12 +00:00
adrian
f671503c18 Limit the number of software queued frames when doing non-aggregation.
This should prevent the TX queue being filled with non-aggregate frames,
causing starvation and non-fair queue behaviour.
2013-05-16 17:46:32 +00:00
adrian
c059ecd485 Implement my first cut at "correct" node power-save and
PS-POLL support.

This implements PS-POLL awareness i nthe

* Implement frame "leaking", which allows for a software queue
  to be scheduled even though it's asleep
* Track whether a frame has been leaked or not
* Leak out a single non-AMPDU frame when transmitting aggregates
* Queue BAR frames if the node is asleep
* Direct-dispatch the rest of control and management frames.
  This allows for things like re-association to occur (which involves
  sending probe req/resp as well as assoc request/response) when
  the node is asleep and then tries reassociating.
* Limit how many frames can set in the software node queue whilst
  the node is asleep.  net80211 is already buffering frames for us
  so this is mostly just paranoia.
* Add a PS-POLL method which leaks out a frame if there's something
  in the software queue, else it calls net80211's ps-poll routine.
  Since the ath PS-POLL routine marks the node as having a single frame
  to leak, either a software queued frame would leak, OR the next queued
  frame would leak. The next queued frame could be something from the
  net80211 power save queue, OR it could be a NULL frame from net80211.

TODO:

* Don't transmit further BAR frames (eg via a timeout) if the node is
  currently asleep.  Otherwise we may end up exhausting management frames
  due to the lots of queued BAR frames.

  I may just undo this bit later on and direct-dispatch BAR frames
  even if the node is asleep.

* It would be nice to burst out a single A-MPDU frame if both ends
  support this.  I may end adding a FreeBSD IE soon to negotiate
  this power save behaviour.

* I should make STAs timeout of power save mode if they've been in power
  save for more than a handful of seconds.  This way cards that get
  "stuck" in power save mode don't stay there for the "inactivity" timeout
  in net80211.

* Move the queue depth check into the driver layer (ath_start / ath_transmit)
  rather than doing it in the TX path.

* There could be some naughty corner cases with ps-poll leaking.
  Specifically, if net80211 generates a NULL data frame whilst another
  transmitter sends a normal data frame out net80211 output / transmit,
  we need to ensure that the NULL data frame goes out first.
  This is one of those things that should occur inside the VAP/ic TX lock.
  Grr, more investigations to do..

Tested:

* STA: AR5416, AR9280
* AP: AR5416, AR9280, AR9160
2013-05-15 18:33:05 +00:00
adrian
e848dec808 Improve the debugging output - use the MAC address rather than various
pointer values everywhere.
2013-05-13 19:52:35 +00:00
adrian
8f11253cd8 Oops, commit the other half of r250606. 2013-05-13 19:02:22 +00:00
adrian
d391af42eb Simplify this bit of code! 2013-05-07 07:44:07 +00:00
adrian
4501275813 When doing BAW tracking, don't dereference a NULL pointer if the BAW
slot is actually NULL.
2013-04-21 00:41:15 +00:00
adrian
d11d65f442 There's some races (likely in the BAR handling, sigh) which is causing
the pause/resume code to not be called completely symmetrically.

I'll chase down the root cause of that soon; this at least works around
the bug and tells me when it happens.
2013-04-20 22:46:31 +00:00
adrian
6e01debb46 Use the new net80211 method to fetch the node TX power, rather than
directly referencing ni->ni_txpower.

This provides the hardware with a slightly more accurate idea of
the maximum TX power to be using.

This is part of a series to get per-packet TPC to work (better).

Tested:

* AR5416, hostap mode
2013-04-16 21:26:44 +00:00
adrian
0d7093e2eb Mark a couple of places where I think the dmamap isn't being unmapped
before the TX path is being aborted.

Right now it's in the TDMA code and I can live with that; but it really
should get fixed.

I'll do a more thorough audit of this code soon.
2013-04-02 06:25:10 +00:00
adrian
171d3554be Ensure that we only call the busdma unmap/flush routines once, when
the buffer is being freed.

* When buffers are cloned, the original mapping isn't copied but it
  wasn't freeing the mapping until later.  To be safe, free the
  mapping when the buffer is cloned.

* ath_freebuf() now no longer calls the busdma sync/unmap routines.

* ath_tx_freebuf() now calls sync/unmap.

* Call sync first, before calling unmap.

Tested:

* AR5416, STA mode
2013-04-01 20:57:13 +00:00
adrian
57bb44230b Use ATH_MAX_SCATTER rather than ATH_TXDESC.
ATH_MAX_SCATTER is used to size the ath_buf DMA segment array.
We thus should use it when checking sizes of things.
2013-04-01 20:12:21 +00:00
adrian
e84569c7e8 Implement the replacement EDMA FIFO code.
(Yes, the previous code temporarily broke EDMA TX. I'm sorry; I should've
actually setup ATH_BUF_FIFOEND on frames so txq->axq_fifo_depth was
cleared!)

This code implements a whole bunch of sorely needed EDMA TX improvements
along with CABQ TX support.

The specifics:

* When filling/refilling the FIFO, use the new TXQ staging queue
  for FIFO frames

* Tag frames with ATH_BUF_FIFOPTR and ATH_BUF_FIFOEND correctly.
  For now the non-CABQ transmit path pushes one frame into the TXQ
  staging queue without setting up the intermediary link pointers
  to chain them together, so draining frames from the txq staging
  queue to the FIFO queue occurs AMPDU / MPDU at a time.

* In the CABQ case, manually tag the list with ATH_BUF_FIFOPTR and
  ATH_BUF_FIFOEND so a chain of frames is pushed into the FIFO
  at once.

* Now that frames are in a FIFO pending queue, we can top up the
  FIFO after completing a single frame.  This means we can keep
  it filled rather than waiting for it drain and _then_ adding
  more frames.

* The EDMA restart routine now walks the FIFO queue in the TXQ
  rather than the pending queue and re-initialises the FIFO with
  that.

* When restarting EDMA, we may have partially completed sending
  a list.  So stamp the first frame that we see in a list with
  ATH_BUF_FIFOPTR and push _that_ into the hardware.

* When completing frames, only check those on the FIFO queue.
  We should never ever queue frames from the pending queue
  direct to the hardware, so there's no point in checking.

* Until I figure out what's going on, make sure if the TXSTATUS
  for an empty queue pops up, complain loudly and continue.
  This will stop the panics that people are seeing.  I'll add
  some code later which will assist in ensuring I'm populating
  each descriptor with the correct queue ID.

* When considering whether to queue frames to the hardware queue
  directly or software queue frames, make sure the depth of
  the FIFO is taken into account now.

* When completing frames, tag them with ATH_BUF_BUSY if they're
  not the final frame in a FIFO list.  The same holding descriptor
  behaviour is required when handling descriptors linked together
  with a link pointer as the hardware will re-read the previous
  descriptor to refresh the link pointer before contiuning.

* .. and if we complete the FIFO list (ie, the buffer has
  ATH_BUF_FIFOEND set), then we don't need the holding buffer
  any longer.  Thus, free it.

Tested:

* AR9380/AR9580, STA and hostap
* AR9280, STA/hostap

TODO:

* I don't yet trust that the EDMA restart routine is totally correct
  in all circumstances.  I'll continue to thrash this out under heavy
  multiple-TXQ traffic load and fix whatever pops up.
2013-03-26 20:04:45 +00:00
adrian
02bcc8d29e Remove the mcast path calls to ath_hal_gettxdesclinkptr() for axq_link -
they're no longer needed for the legacy path and they're not wanted
for the EDMA path.

Tested:

* AR9280, hostap + CABQ
* AR9380/AR9580, hostap + CABQ
2013-03-26 04:56:54 +00:00
adrian
5af56a0bad Migrate the multicast queue assembly code to not use the axq_link pointer
and instead use the HAL method to set the link pointer.

Tested:

* AR9280, hostap mode, CABQ frames being queued and transmitted
2013-03-26 04:47:40 +00:00
adrian
e3aae3fa01 Move the TXQ lock earlier in this routine - so to correctly protect the
link pointer check.
2013-03-24 04:09:54 +00:00
adrian
0b04a7a29e Overhaul the TXQ locking (again!) as part of some beacon/cabq timing
related issues.

Moving the TX locking under one lock made things easier to progress on
but it had one important side-effect - it increased the latency when
handling CABQ setup when sending beacons.

This commit introduces a bunch of new changes and a few unrelated changs
that are just easier to lump in here.

The aim is to have the CABQ locking separate from other locking.
The CABQ transmit path in the beacon process thus doesn't have to grab
the general TX lock, reducing lock contention/latency and making it
more likely that we'll make the beacon TX timing.

The second half of this commit is the CABQ related setup changes needed
for sane looking EDMA CABQ support.  Right now the EDMA TX code naively
assumes that only one frame (MPDU or A-MPDU) is being pushed into each
FIFO slot.  For the CABQ this isn't true - a whole list of frames is
being pushed in - and thus CABQ handling breaks very quickly.

The aim here is to setup the CABQ list and then push _that list_ to
the hardware for transmission.  I can then extend the EDMA TX code
to stamp that list as being "one" FIFO entry (likely by tagging the
last buffer in that list as "FIFO END") so the EDMA TX completion code
correctly tracks things.

Major:

* Migrate the per-TXQ add/removal locking back to per-TXQ, rather than
  a single lock.

* Leave the software queue side of things under the ATH_TX_LOCK lock,
  (continuing) to serialise things as they are.

* Add a new function which is called whenever there's a beacon miss,
  to print out some debugging.  This is primarily designed to help
  me figure out if the beacon miss events are due to a noisy environment,
  issues with the PHY/MAC, or other.

* Move the CABQ setup/enable to occur _after_ all the VAPs have been
  looked at.  This means that for multiple VAPS in bursted mode, the
  CABQ gets primed once all VAPs are checked, rather than being primed
  on the first VAP and then having frames appended after this.

Minor:

* Add a (disabled) twiddle to let me enable/disable cabq traffic.
  It's primarily there to let me easily debug what's going on with beacon
  and CABQ setup/traffic; there's some DMA engine hangs which I'm finally
  trying to trace down.

* Clear bf_next when flushing frames; it should quieten some warnings
  that show up when a node goes away.

Tested:

* AR9280, STA/hostap, up to 4 vaps (staggered)
* AR5416, STA/hostap, up to 4 vaps (staggered)

TODO:

* (Lots) more AR9380 and later testing, as I may have missed something here.
* Leverage this to fix CABQ hanling for AR9380 and later chips.
* Force bursted beaconing on the chips that default to staggered beacons and
  ensure the CABQ stuff is all sane (eg, the MORE bits that aren't being
  correctly set when chaining descriptors.)
2013-03-24 00:03:12 +00:00
adrian
83430af282 Now that the tx map field is correctly populated for both edma and
legacy chips, just use that.
2013-03-19 17:54:37 +00:00
adrian
e3bfa0687c Why'd I keep this here? remove it entirely now. 2013-03-15 20:22:20 +00:00
adrian
90b1ba3bc4 Fix two bugs:
* when pulling frames off of the TID queue, the ATH_TID_REMOVE()
  macro decrements the axq_depth field.  So don't do it twice.

* in ath_tx_comp_cleanup_aggr(), bf wasn't being reset to bf_first
  before walking the buffer list to complete buffers; so those buffers
  will leak.
2013-03-15 20:00:08 +00:00
adrian
ad70392418 Remove a now incorrect comment.
This comment dates back to my initial stab at TX aggregation completion,
where I didn't even bother trying to do software retries.
2013-03-15 04:43:27 +00:00
adrian
027fc2306a Disable the hw TID != buffer TID check.
I can 100% reliably trigger this on TID 1 traffic by using iperf -S 32
<client fields> to create traffic that maps to TID 1.

The reference driver doesn't do this check.
2013-03-09 08:50:17 +00:00
adrian
1fef9d2b80 Disable debugging entries about BAW issues. I haven't seen any issues
to do with BAW tracking in the last 9 months or so.
2013-02-21 21:47:35 +00:00
adrian
9d7988b307 A couple of quick tidyups:
* Delete this debugging print - I used it when debugging the initial
  TX descriptor chaining code.  It now works, so let's toss it.
  It just confuses people if they enable TX descriptor debugging as they
  get two slightly different versions of the same descriptor.

* Indenting.
2013-02-20 11:22:44 +00:00
adrian
af7893e0d4 Pull out the if_transmit() work and revert back to ath_start().
My changed had some rather significant behavioural changes to throughput.
The two issues I noticed:

* With if_start and the ifnet mbuf queue, any temporary latency
  would get eaten up by some mbufs being queued.  With ath_transmit()
  queuing things to ath_buf's, I'd only get 512 TX buffers before I
  couldn't queue any further frames.

* There's also some non-zero latency involved with TX being pushed
  into a taskqueue via direct dispatch.  Any time the scheduler didn't
  immediately schedule the ath TX task would cause extra latency.
  Various 1ge/10ge drivers implement both direct dispatch (if the TX
  lock can be acquired) and deferred task transmission (if the TX lock
  can't be acquired), with frames being pushed into a drbd queue.
  I'll have to do this at some point, but until I figure out how to
  deal with 802.11 fragments, I'll have to wait a while longer.

So what I saw:

* lots of extra latency, specially under load - if the taskqueue
  wasn't immediately scheduled, things went pear shaped;

* any extra latency would result in TX ath_buf's taking their sweet time
  being replenished, so any further calls to ath_transmit() would drop
  mbufs.

* .. yes, there's no explicit backpressure here - things are just dropped.
  Eek.

With this, the general performance has gone up, but those subtle if_start()
related race conditions are back.  For some reason, this is doubly-obvious
with the AR5416 NIC and I don't quite understand why yet.

There's an unrelated issue with AR5416 performance in STA mode (it's
fine in AP mode when bridging frames, weirdly..) that requires a little
further investigation.  Specifically - it works fine on a Lenovo T40
(single core CPU) running a March 2012 9-STABLE kernel, but a Lenovo T60
(dual core) running an early November 2012 kernel behaves very poorly.
The same hardware with an AR9160 or AR9280 behaves perfectly.
2013-02-13 05:32:19 +00:00
adrian
c05e12cd73 Methodize the process of adding the software TX queue to the taskqueue.
Move it (for now) to the TX taskqueue.
2013-02-07 02:15:25 +00:00