I'm not sure why this is failing. The holding descriptor should be being
re-read when starting DMA of the next frame. Obviously something here
isn't totally correct.
I'll review the TX queue handling and see if I can figure out why this
is failing. I'll then re-revert this patch out and use the holding
descriptor again.
but partly to just tidy up things.
The problem here - there are too many TX buffers in the queue! By the
time one needs to transmit an EAPOL frame (for this PR, it's the response
to the group rekey notification from the AP) there are no ath_buf entries
free and the EAPOL frame doesn't go out.
Now, the problem!
* Enforcing the TX buffer limitation _before_ we dequeue the frame?
Bad idea. Because..
* .. it means I can't check whether the mbuf has M_EAPOL set.
The solution(s):
* De-queue the frame first
* Don't bother doing the TX buffer minimum free check until after
we know whether it's an EAPOL frame or not.
* If it's an EAPOL frame, allocate the buffer from the mgmt pool
rather than the default pool.
Whilst I'm here:
* Add a tweak to limit how many buffers a single node can acquire.
* Don't enforce that for EAPOL frames.
* .. set that to default to 1/4 of the available buffers, or 32,
whichever is more sane.
This doesn't fix issues due to a sleeping node or a very poor performing
node; but this doesn't make it worse.
Tested:
* AR5416 STA, TX'ing 100+ mbit UDP to an AP, but only 50mbit being received
(thus the TX queue fills up.)
* .. with CCMP / WPA2 encryption configured
* .. and the group rekey time set to 10 seconds, just to elicit the
behaviour very quickly.
PR: kern/138379
just "when the queue is busy."
After talking with the MAC team, it turns out that the linked list
implementation sometimes will not accept a TxDP update and will
instead re-read the link pointer. So even if the hardware has
finished transmitting a chain and has hit EOL/VEOL, it may still
re-read the link pointer to begin transmitting again.
So, always set ATH_BUF_BUSY on the last buffer in the chain (to
mark the last descriptor as the holding descriptor) and never
blank the axq_link pointer.
Tested:
* AR5416, STA mode
TODO:
* much more thorough testing with the pre-11n NICs, just to verify
that they behave the same way.
* test TDMA on the 11n and non-11n hardware.
The QCA9565 is a 1x1 2.4GHz 11n chip with integrated on-chip bluetooth.
The AR9300 HAL already has support for this chip; it just wasn't
included in the probe/attach path.
Tested:
* This commit brought to you over a QCA9565 wifi connection from
FreeBSD.
* .. ie, basic STA, pings, no iperf or antenna diversity checking just yet.
* That lock isn't actually held during reset - just the whole TX/RX path
is paused. So, remove the assertion.
* Log the TX queue status - how many hardware frames are active in the
MAC and whether the queue is active.
the pause/resume code to not be called completely symmetrically.
I'll chase down the root cause of that soon; this at least works around
the bug and tells me when it happens.
is compiled in or not.
This fixes issues with people running -HEAD but who build modules
without doing a "make buildkernel KERNCONF=XXX", thus picking up
opt_*.h. The resulting module wouldn't have 11n enabled and the
chainmask configuration would just be plain wrong.
* Add ah_ratesArray[] to the ar5416 HAL state - this stores the maximum
values permissable per rate.
* Since different chip EEPROM formats store this value in a different place,
store the HT40 power detector increment value in the ar5416 HAL state.
* Modify the target power setup code to store the maximum values in the
ar5416 HAL state rather than using a local variable.
* Add ar5416RateToRateTable() - to convert a hardware rate code to the
ratesArray enum / index.
* Add ar5416GetTxRatePower() - which goes through the gymnastics required
to correctly calculate the target TX power:
+ Add the power detector increment for ht40;
+ Take the power offset into account for AR9280 and later;
+ Offset the TX power correctly when doing open-loop TX power control;
+ Enforce the per-rate maximum value allowable.
Note - setting a TPC value of 0x0 in the TX descriptor on (at least)
the AR9160 resulted in the TX power being very high indeed. This didn't
happen on the AR9220. I'm guessing it's a chip bug that was fixed at
some point. So for now, just assume the AR5416/AR5418 and AR9130 are
also suspect and clamp the minimum value here at 1.
Tested:
* AR5416, AR9160, AR9220 hostap, verified using (2GHz) spectrum analyser
* Looked at target TX power in TX descriptor (using athalq) as well as TX
power on the spectrum analyser.
TODO:
* The TX descriptor code sets the target TX power to 0 for AR9285 chips.
I'm not yet sure why. Disable this for TPC and ensure that the TPC
TX power is set.
* AR9280, AR9285, AR9227, AR9287 testing!
* 5GHz testing!
Quirks:
* The per-packet TPC code is only exercised when the tpc sysctl is set
to 1. (dev.ath.X.tpc=1.) This needs to be done before you bring the
interface up.
* When TPC is enabled, setting the TX power doesn't end up with a call
through to the HAL to update the maximum TX power. So ensure that
you set the TPC sysctl before you bring the interface up and configure
a lower TX power or the hardware will be clamped by the lower TX
power (at least until the next channel change.)
Thanks to Qualcomm Atheros for all the hardware, and Sam Leffler for use
of his spectrum analyser to verify the TX channel power.
ath_tx_rate_fill_rcflags(). Include setting up the TX power cap in the
rate scenario setup code being passed to the HAL.
Other things:
* add a tx power cap field in ath_rc.
* Add a three-stream flag in ath_rc.
* Delete the LDPC flag from ath_rc - it's not a per-rate flag, it's a
global flag for the transmission.
directly referencing ni->ni_txpower.
This provides the hardware with a slightly more accurate idea of
the maximum TX power to be using.
This is part of a series to get per-packet TPC to work (better).
Tested:
* AR5416, hostap mode
is configured for higher rates (lower than max) but higher TX power
is configured for the lower rates, above the configured cap, to improve
long distance behaviour.
* Add the rest of the missing GPIO output mux types;
* Add in a new debug category;
* And a new MCI btcoex configuration option in ath_hal.ah_config
Obtained from: Qualcomm Atheros
buffers (ie, >4GB on amd64.)
The underlying problem was that PREREAD doesn't sync the mbuf
with the DMA memory (ie, bounce buffer), so the bounce buffer may
have had stale information. Thus it was always considering the
buffer completed and things just went off the rails.
This change does the following:
* Make ath_rx_pkt() always consume the mbuf somehow; it no longer
passes error mbufs (eg CRC errors, crypt errors, etc) back up
to the RX path to recycle. This means that a new mbuf is always
allocated each time, but it's cleaner.
* Push the RX buffer map/unmap to occur in the RX path, not
ath_rx_pkt(). Thus, ath_rx_pkt() now assumes (a) it has to consume
the mbuf somehow, and (b) that it's already been unmapped and
synced.
* For the legacy path, the descriptor isn't mapped, it comes out of
coherent, DMA memory anyway. So leave it there.
* For the EDMA path, the RX descriptor has to be cleared before
its passed to the hardware, so that when we check with
a POSTREAD sync, we actually get either a blank (not finished)
or a filled out descriptor (finished.) Otherwise we get stale
data in the DMA memory.
* .. so, for EDMA RX path, we need PREREAD|PREWRITE to sync the
data -> DMA memory, then POSTREAD|POSTWRITE to finish syncing
the DMA memory -> data.
* Whilst we're here, make sure that in EDMA buffer setup (ie,
bzero'ing the descriptor part) is done before the mbuf is
map/synched.
NOTE: there's been a lot of commits besides this one with regards to
tidying up the busdma handling in ath(4). Please check the recent
commit history.
Discussed with and thanks to: scottl
Tested:
* AR5416 (non-EDMA) on i386, with the DMA tag for the driver
set to 2^^30, not 2^^32, STA
* AR9580 (EDMA) on i386, as above, STA
* User - tested AR9380 on amd64 with 32GB RAM.
PR: kern/177530
before the TX path is being aborted.
Right now it's in the TDMA code and I can live with that; but it really
should get fixed.
I'll do a more thorough audit of this code soon.
* Don't use BUS_DMA_ALLOCNOW for descriptor DMA maps; we never use
bounce buffers for the descriptors themselves.
* Add some XXX's to mark where the ath_buf has its mbuf ripped from
underneath it without actually cleaning up the dmamap. I haven't
audited those particular code paths to see if the DMA map is guaranteed
to be setup there; I'll do that later.
* Print out a warning if the descdma tidyup code is given some descriptors
w/ maps to free. Ideally the owner will free the mbufs and unmap
the descriptors before freeing the descriptor/ath_buf pairs, but
right now that's not guaranteed to be done.
Reviewed by: scottl (BUS_DMA_ALLOCNOW tag)
the buffer is being freed.
* When buffers are cloned, the original mapping isn't copied but it
wasn't freeing the mapping until later. To be safe, free the
mapping when the buffer is cloned.
* ath_freebuf() now no longer calls the busdma sync/unmap routines.
* ath_tx_freebuf() now calls sync/unmap.
* Call sync first, before calling unmap.
Tested:
* AR5416, STA mode
The normal RX path (ath_rx_pkt()) will sync and unmap the
buffer before passing it up the stack. We only need to do this
if we're flushing the FIFO during reset/shutdown.
(Yes, the previous code temporarily broke EDMA TX. I'm sorry; I should've
actually setup ATH_BUF_FIFOEND on frames so txq->axq_fifo_depth was
cleared!)
This code implements a whole bunch of sorely needed EDMA TX improvements
along with CABQ TX support.
The specifics:
* When filling/refilling the FIFO, use the new TXQ staging queue
for FIFO frames
* Tag frames with ATH_BUF_FIFOPTR and ATH_BUF_FIFOEND correctly.
For now the non-CABQ transmit path pushes one frame into the TXQ
staging queue without setting up the intermediary link pointers
to chain them together, so draining frames from the txq staging
queue to the FIFO queue occurs AMPDU / MPDU at a time.
* In the CABQ case, manually tag the list with ATH_BUF_FIFOPTR and
ATH_BUF_FIFOEND so a chain of frames is pushed into the FIFO
at once.
* Now that frames are in a FIFO pending queue, we can top up the
FIFO after completing a single frame. This means we can keep
it filled rather than waiting for it drain and _then_ adding
more frames.
* The EDMA restart routine now walks the FIFO queue in the TXQ
rather than the pending queue and re-initialises the FIFO with
that.
* When restarting EDMA, we may have partially completed sending
a list. So stamp the first frame that we see in a list with
ATH_BUF_FIFOPTR and push _that_ into the hardware.
* When completing frames, only check those on the FIFO queue.
We should never ever queue frames from the pending queue
direct to the hardware, so there's no point in checking.
* Until I figure out what's going on, make sure if the TXSTATUS
for an empty queue pops up, complain loudly and continue.
This will stop the panics that people are seeing. I'll add
some code later which will assist in ensuring I'm populating
each descriptor with the correct queue ID.
* When considering whether to queue frames to the hardware queue
directly or software queue frames, make sure the depth of
the FIFO is taken into account now.
* When completing frames, tag them with ATH_BUF_BUSY if they're
not the final frame in a FIFO list. The same holding descriptor
behaviour is required when handling descriptors linked together
with a link pointer as the hardware will re-read the previous
descriptor to refresh the link pointer before contiuning.
* .. and if we complete the FIFO list (ie, the buffer has
ATH_BUF_FIFOEND set), then we don't need the holding buffer
any longer. Thus, free it.
Tested:
* AR9380/AR9580, STA and hostap
* AR9280, STA/hostap
TODO:
* I don't yet trust that the EDMA restart routine is totally correct
in all circumstances. I'll continue to thrash this out under heavy
multiple-TXQ traffic load and fix whatever pops up.
Each set of frames pushed into a FIFO is represented by a list of
ath_bufs - the first ath_buf in the FIFO list is marked with
ATH_BUF_FIFOPTR; the last ath_buf in the FIFO list is marked with
ATH_BUF_FIFOEND.
Multiple lists of frames are just glued together in the TAILQ as per
normal - except that at the end of a FIFO list, the descriptor link
pointer will be NULL and it'll be tagged with ATH_BUF_FIFOEND.
For non-EDMA chipsets this is a no-op - the ath_txq frame list (axq_q)
stays the same and is treated the same.
For EDMA chipsets the frames are pushed into axq_q and then when
the FIFO is to be (re) filled, frames will be moved onto the FIFO
queue and then pushed into the FIFO.
So:
* Add a new queue in each hardware TXQ (ath_txq) for staging FIFO frame
lists. It's a TAILQ (like the normal hardware frame queue) rather than
the ath9k list-of-lists to represent FIFO entries.
* Add new ath_buf flags - ATH_TX_FIFOPTR and ATH_TX_FIFOEND.
* When allocating ath_buf entries, clear out the flag value before
returning it or it'll end up having stale flags.
* When cloning ath_buf entries, only clone ATH_BUF_MGMT. Don't clone
the FIFO related flags.
* Extend ath_tx_draintxq() to first drain the FIFO staging queue, _then_
drain the normal hardware queue.
Tested:
* AR9280, hostap
* AR9280, STA
* AR9380/AR9580 - hostap
TODO:
* Test on other chipsets, just to be thorough.
instead of axq_link.
This (among a bunch of uncommitted work) is required for EDMA chips
to correctly transmit frames on the CABQ.
Tested:
* AR9280, hostap mode
* AR9380/AR9580, hostap mode (staggered beacons)
TODO:
* This code only really gets called when burst beacons are used;
it glues multiple CABQ queues together when sending to the hardware.
* More thorough bursted beacon testing! (first requires some work with
the beacon queue code for bursted beacons, as that currently uses the
link pointer and will fail on EDMA chips.)
the descriptor link pointer, rather than directly.
This is needed on AR9380 and later (ie, EDMA) NICs so the multicast queue
has a chance in hell of being put together right.
Tested:
* AR9380, AR9580 in hostap mode, CABQ traffic (but with other patches..)
related issues.
Moving the TX locking under one lock made things easier to progress on
but it had one important side-effect - it increased the latency when
handling CABQ setup when sending beacons.
This commit introduces a bunch of new changes and a few unrelated changs
that are just easier to lump in here.
The aim is to have the CABQ locking separate from other locking.
The CABQ transmit path in the beacon process thus doesn't have to grab
the general TX lock, reducing lock contention/latency and making it
more likely that we'll make the beacon TX timing.
The second half of this commit is the CABQ related setup changes needed
for sane looking EDMA CABQ support. Right now the EDMA TX code naively
assumes that only one frame (MPDU or A-MPDU) is being pushed into each
FIFO slot. For the CABQ this isn't true - a whole list of frames is
being pushed in - and thus CABQ handling breaks very quickly.
The aim here is to setup the CABQ list and then push _that list_ to
the hardware for transmission. I can then extend the EDMA TX code
to stamp that list as being "one" FIFO entry (likely by tagging the
last buffer in that list as "FIFO END") so the EDMA TX completion code
correctly tracks things.
Major:
* Migrate the per-TXQ add/removal locking back to per-TXQ, rather than
a single lock.
* Leave the software queue side of things under the ATH_TX_LOCK lock,
(continuing) to serialise things as they are.
* Add a new function which is called whenever there's a beacon miss,
to print out some debugging. This is primarily designed to help
me figure out if the beacon miss events are due to a noisy environment,
issues with the PHY/MAC, or other.
* Move the CABQ setup/enable to occur _after_ all the VAPs have been
looked at. This means that for multiple VAPS in bursted mode, the
CABQ gets primed once all VAPs are checked, rather than being primed
on the first VAP and then having frames appended after this.
Minor:
* Add a (disabled) twiddle to let me enable/disable cabq traffic.
It's primarily there to let me easily debug what's going on with beacon
and CABQ setup/traffic; there's some DMA engine hangs which I'm finally
trying to trace down.
* Clear bf_next when flushing frames; it should quieten some warnings
that show up when a node goes away.
Tested:
* AR9280, STA/hostap, up to 4 vaps (staggered)
* AR5416, STA/hostap, up to 4 vaps (staggered)
TODO:
* (Lots) more AR9380 and later testing, as I may have missed something here.
* Leverage this to fix CABQ hanling for AR9380 and later chips.
* Force bursted beaconing on the chips that default to staggered beacons and
ensure the CABQ stuff is all sane (eg, the MORE bits that aren't being
correctly set when chaining descriptors.)
to stuck beacons.
* Set the cabq readytime (ie, how long to burst for) to 50% of the total
beacon interval time
* fix the cabq adjustment calculation based on how the beacon offset is
calculated (the SWBA/DBA time offset.)
This is all still a bit magic voodoo but it does seem to have further
quietened issues with missed/stuck beacons under my local testing.
In any case, it better matches what the reference HAL implements.
Obtained from: Qualcomm Atheros