When I initially did this 11n TX work in days of yonder, my 802.11 standards
clue was ... not as finely tuned. One of the things in 802.11-2012 (which
I guess technically was after I did this work, but I'm sure it was like this
in the previous rev?) is that among other traffic classes, three things are
important:
* group addressed frames should be default non-QoS, even if they're QoS frames, and
* group addressed frames should have a seqno out of a different space than the
per-TID QoS one; and because of this
* group addressed frames, being non-QoS, should never be in the Block-ACK window
for TX.
Now, net80211 and now this code cheats by using the non-QOS TID, but ideally
we'd introduce a separate seqno space just for multicast/group traffic for
TX and RX comparison.
Later extensions (eg reliable multicast / multimedia) express what one should do
when doing multicast traffic in a TID. Now, technically we /could/ do group traffic
as QoS traffic and throw it into a per-TID seqno space, but this definitely
introduces ordering issues when you take into account things like CABQ behaviour.
(Ie, if some traffic in the TID goes into the CABQ and some doesn't, because
it's doing a split of multicast and non-multicast traffic, then you have
seqno ordering issues.)
So, until someone implements 802.11vv reliable multicast / multimedia extensions,
group traffic is non-QoS.
Next, software/hardware queue TID mapping. In the past I believed the WME tagging
of frames because well, net80211 had a habit of tagging things like management
traffic with it. But, then we also map QoS traffic categories to TIDs as well.
So, we should obey the TID! But! then it put some management traffic into higher
WME categories too, as those frames don't have QoS TIDs. But! It'd do things like
put things like QoS action frames into higher WME categories, when they should
be kept in-order with the rest of the traffic for that TID. So! Given all of this,
the ath(4) driver does overrides to not trust the WME category.
I .. am undoing some of this. Now, the TID has a 1:1 mapping to the hardware
queue. The TID is the primary source of truth now for all QoS traffic.
The WME is only used for non-QoS traffic. This now means that any TID traffic
queued should be consistently queued regardless of WME, so things like the
"TX finished, do more TX" that is occuring right now for transmit handling
should be "better".
The consistent {TID, WME} -> hardware queue mapping is important for
transmit completion. It's used to schedule more traffic for that
particular TID, because that {many TID}:{1 TXQ} mapping in ath_tx_tid_sched()
is used for driving completion. Ie, when the hardware queue completes,
it'll walk that list of scheduled TIDs attached to that TXQ.
The eventual aim is to get ready for some other features around putting
some data into other hardware queues (eg for better PS-POLL support,
uAPSD, support, correct-er TDMA support, etc) which requires that
I tidy all of this up in preparation for then introducing further
TID scheduling that isn't linked to a hardware TXQ (likely a per-WME, per-TID
driver queue, and a per-node driver queue) to enable that.
Tested:
* AR9380, STA mode
* AR9380, AR9580, AP mode
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.
Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96
This implements hardware assisted quiet IE support. Quiet time is
an optional interval on DFS channels (but doesn't have to be DFS
only channels! sigh) where the station and AP can be quiet in order
to allow for channel utilisation measurements. Typically that's
stuff like radar detection, spectral scan, other-BSS frame sniffing,
checking how busy the air is, etc.
The hardware implements it as one of the generic timers, which is
supplied a period, offset from the trigger period and duration
to stay quiet. The AP can announce quiet time configurations which
change, and so this code also tracks that.
Implementation details:
* track the current quiet time IE
* compare the new one against the previous one - if only the TBTT
counter changes, don't update things
* If tbttcount=1 then program it into the hardware - that is when
it is easiest to program the correct starting offset (one TBTT +
configured offset).
* .. later on check to see if it can be done on any tbttcount
* If the IE goes away then remove the quiet timer and clear the
config
* Upon reset, state change, new beacon - clear quiet time IE
and just let it resync from the next beacon.
History:
This was work done initially by sibridgetech.com in 2011/2012/2013
as part of some FreeBSD wifi DFS contracting work they had for a
third party. They implemented the net80211 quiet time IE pieces
and had some test code for the station side which didn't entirely
use the timers correctly.
I figured out how to use the timers correctly without stopping/starting
the transmit DMA engine each time. When done correctly, the timer
just needs to be programmed once and left alone until the next
configuration change.
So, thanks to Himali Patel and Parthiv Shah for their work way
back then. I finally figured it out and finished it!
TODO:
* Now, I'd rather net80211 did the quiet time IE tracking and parsing,
pushing configurations into the driver is needed. I'll look at
doing that in a subsequent update.
* This doesn't handle multiple quiet time IEs, which will currently
just mess things up. I'll look into supporting that in the future
(at least by only obeying "one" of them, and then ignoring
subsequent IEs in a beacon/probe frame.)
* This also implements the STA side and not the AP side - the AP
side will come later, and involves taking various other intervals
into account (eg the beacon offset for multi-VAP modes, the
SWBA time, etc, etc) as well as obtaining the configuration when
a beacon is configured/generated rather than "hearing" an IE.
* .. investigate supporting quiet IE in mesh, tdma, ibss modes
* .. investigate supporting quiet IE for non-DFS channels
(so this can be done for say, 2GHz channels.)
* Chances are i should commit NULL methods for the ar5210, ar5211 HALs..
Tested:
* AR9380, STA mode - announcing quiet, removing quiet, changing quite
time config, whilst doing iperf testing;
* AR9380, AP mode.
* Although the hardware is awake, the power state handling doesn't think so.
So just explicitly wake it up early in setup so ath_hal calls don't complain.
* We shouldn't be transmitting or ACKing frames during DFS CAC or on passive
channels before we hear a beacon. So, start laying down comments in the
places where this work has to be done.
Note:
* The main bit missing from finishing this particular bit of work is a state
call to transition a VAP from passive to non-passive when a beacon is heard.
CAC is easy, it's an interface state. So, I'll go and add a method to control
that soon.
* limit cabq to 64 - in practice if this stays at ath_txbuf then
all buffers can be tied up by a very busy broadcast domain (eg ARP
storm, way too much MDNS/NETBIOS). It's been like this in the
freebsd-wifi-build AP project for the longest time.
* Now that I figured out the hilarity inherent in aggregate forming
and AR9380 EDMA work, change the per-node to 64 frames by default.
I'll do some more work to shorten the queue latency introduced when
doing data so TCP isn't so terrible, but it's now no longer /always/
tens of milliseconds of extra latency when doing active iperf tests.
Notes:
The reason for the extra latency is partly tx/rx taskqueue handling and
scheduling, and partly due to a lack of airtime/QoS awareness of per-node
traffic. Ideally we'd have different limits/priorities on the QoS/TID
levels per node so say, voice/video data got a better share of buffer
allocations over best effort/bulk data, but we currently don't implement
that. It's not /hard/ to do, I just need to do it.
Tested:
* AR9380 (STA), AR9580 (hostap) - both with the relevant changes.
TCP is now at around 180mbit with rate control and RTS protection
enabled. UDP stays at 355mbit at MCS23, no HT protection.
This is two fixes, which establishes what I /think/ is pretty close to the
theoretical PHY maximum speed on the AR9380 devices.
* When doing A-MPDU on a TID, don't queue to the hardware directly if
the hardware queue is busy. This gives us time to get more packets
queued up (and the hardware is busy, so there's no point in queuing
more to the hardware right now) to potentially form an A-MPDU.
This fixes up the throughput issue I was seeing where a couple hundred
single frames were being sent a second interspersed between A-MPDU
frames. It just happened that the software queue had exactly one
frame in it at that point. Queuing it until the hardware finishes
transmitting isn't exactly costly.
* When determining whether to dequeue from a software node/TID queue into
the hardware queue, fix up the checks to work right for EDMA chips
(ar9380 and later.) Before it was not dispatching anything until
the FIFO was empty. Now we allow it to dispatch another aggregate
up to the hardware aggregate limit, like I intended with the earlier
work.
This allows a 5GHz HT40, short-GI, "htprotmode off" test at MCS23
to achieve 357 Mbit/sec in a one-way UDP test. The stars have to be
aligned /just right/ so there are no retries but it can happen.
Just don't expect it to work in an OTA test if your 2yo is running
around the room - MCS23 is very very sensitive to channel conditions.
Tested:
* AR9380 STA (test) -> AR9580 hostap
TODO:
* More thorough testing on pre-AR9380 chips (AR5416, AR9160, AR9280)
* (Finally) teach ath_rate_sample about throughput/latency rather than
air time, so I can get good transmit rates with a 2yo running around.
When investigating performance on UDP TX on the AR9380 I found that the
following sequence was occuring:
* INTR
* EINPROGRESS - nothing yet
* INTR
* TXSTATUS - process a TX completion for an aggregate
* INTR, INTR
* TXSTATUS - process a TX completion for an aggregate
* TXD, TXD ... populate frames from the hardware queue and submit
What should be happening is a completed TXSTATUS fires off more packets
that are queued on active TIDs.
What /was/ happening was after that first TXSTATUS the TX queue hardware queue
was still empty, so it didn't push anything into the FIFO. Only after the
second TXSTATUS did any progress get made.
This is one of two commits - it ensures that the software TX queue scheduler
is called /after/ TX completion, otherwise no frames from the software staging
queues will be processed into the hardware queues.
The second commit will fix it so it populates aggregate frames correctly
when the above occurs - right now ath_txq_sched() is called, but it doesn't
populate anything because its pre-check conditions are wrong.
Whilst here, add/tweak debugging.
Tested:
* AR9380 STA (testing device) -> AR9580 hostap
This is supposed to only be applied to the first subframe and only if
RTS/CTS is being done. I'm still not yet checking RTS/CTS exchange status
so it's just happening for all subframes on AR9380 and later.
This gets MCS23 throughput up from around 250mbit to 303mbit with RTS/CTS
protection enabled, and around 330mbit with no HT protection enabled.
Now, MCS23 has a PHY rate of 450mbit and we should be seeing closer to
400mbit for a straight one-way UDP test, but this beats the previous
maximum throughput.
Tested:
* AR9380 (STA) -> AR9580 (AP) - STA with the modifications, doing UDP TX
test using iperf.
Set both IEEE80211_HTCAP_LDPC and IEEE80211_HTC_TXLDPC capability flags
if LDPC is supported + set 'do_ldpc = 1' only when it is not disabled,
not just supported.
Reviewed by: adrian
Differential Revision: https://reviews.freebsd.org/D9277
A recent change enforced the VAP limit as well as the peer limit.
I now need to actually set iv_ampdu_limit or we don't transmit more
than 8K sized aggregates.
This restores the expected (suboptimal, but still much faster) behaviour.
Tested:
* AR9380, STA mode
The HT40 channel population logic was "just" doing pairs of channels starting with
the band entry frequency. Trouble is, a lot of the rules start way off at 5120MHz,
which isn't a valid 5GHz channel. Then, eg for HT40U, it would populate:
* (5120,5140)
* (5160,5180)
* (5200,5220)
* (5240,5260)
.. as the HT40U pairs, with the first being the primary channel. Channel 36
is 5180MHz, and since it's not a primary channel here, it wouldn't populate it.
Then, the next HT40U would be 5200/5220, which is highly wrong.
HT40D had the same problem.
So, this just forces that 5GHz HT40 channels start at channel 36 (5180),
no matter what the band edge says. This includes eg doing 4.9GHz channels.
This erm, meant that the HT40 channels for the low band was always wrong.
Oops!
Tested:
* AR9380, STA mode
* AR9344 SoC, AP mode
MFC after: 1 week
This is important in hostap, ibss, (11s at some magical future date, etc)
where different nodes may have smaller limits.
Oops!
MFC after: 1 week
Relnotes: Yes
This adds a workaround to incorrectly behaving APs (ie, FreeBSD APs) which
don't beacon out exactly when they should (at TBTT multiples of beacon
intervals.)
It forces the hardware awake (but leaves it in network-sleep so self
generated frames still state that the hardware is asleep!) and will
remain awake until the next sleep transition driven by net80211.
That way if the beacons are just at the wrong interval, we get a much
better chance of hearing more consecutive beacons before we go to sleep,
thus not constantly disconnecting.
Tested:
* AR9485, STA mode, against a misbehaving FreeBSD AP.
The 802.11-2012 spec talks about this - section 10.1.3.2 - Beacon Generation
in Infrastructure Networks. So yes, we should be expecting beacons to be
going out in multiples of intval.
Silly adrian.
So:
* fix the FreeBSD APs that are sending beacons at incorrect TBTTs (target
beacon transmit time); and
* yes indeed we will have to wake up out of network sleep until we sync
a beacon.
This was being done in the pre-AR9380 case, but not for AR9380 and later.
When powersave in STA mode is enabled, this may have lead to the transmit
completion code doing this:
* call the task, which doesn't wake up the hardware
* complete the frames, which doesn't touch the hardware
* schedule pending frames on the hardware queue, which DOES touch the
hardware, and this will be ignored
This would show up in the logs like this:
(with debugging enabled):
Nov 27 23:03:56 lovelace kernel: Q1[ 0] (nseg=1) (DS.V:0xfffffe011bd57300 DS.P:0x49b57300) I: 168cc117 L:00000000 F:0005
...
(in general, doesn't require debugging enabled):
Nov 27 23:03:56 lovelace kernel: ath_hal_reg_write: reg=0x00000804, val=0x49b57300, pm=2
That register is a EDMA TX FIFO register (queue 1), and the val is the descriptor
being written.
Whilst here, make sure the software queue gets kicked here.
Tested;
* AR9485, STA mode + powersave
This bug has been bugging me for quite some time. I finally sat down
with enough coffee to figure it out.
The short of it - rounding up to the next intval multiple of the TSF value
only works if the AP is transmitting all its beacons on an interval of
the TSF. If it isn't - for example, doing staggered beacons on a multi-VAP
setup with a single hardware TSF - then weird things occur.
The long of it -
When powersave is enabled, the MAC and PHY are partially powered off.
They can't receive any packets (or transmit, for that matter.)
The target beacon timer programming will wake up the MAC/PHY just before
the beacon is supposed to be received (well, strictly speaking, at DTIM
so it can see the TIM - traffic information map - telling the STA whether
any traffic is there for it) and it happens automatically.
However, this relies on the target beacon time being programmed correctly.
If it isn't then the hardware will wake up and not hear any beacons -
and then it'll be asleep for said beacons. After enough of this, net80211
will give up and assume the AP went away.
This should fix both TSFOOR interrupts and disconnects from APs with powersave
enabled.
The annoying bit is that it only happens if APs stagger things or start
on a non-zero TSF. So, this would sometimes be fine and sometimes not be
fine.
What:
* I don't know (yet) why the code rounds up to the next intval.
For now, just disable rounding it and trust the value we get.
TODO:
* If we do see a beacon miss in STA mode then we should transition
out of sleep for a while so we can hear beacons to resync against.
I'd love a patch from someone to enable that particular behaviour.
Note - that doesn't require that net80211 brings the chip out of
sleep state - only that we wake the chip up through to full-on and
then let it go to sleep again when we've seen a beacon. The wifi
stack and AP can still completely just stay believing we're in sleep
mode.
Tested:
* AR9485, STA mode, powersave enabled
MFC after: 1 week
Relnotes: Yes
* Obey the peer A-MPDU density if it's larger than the currently configured
one.
* Pay attention to the peer A-MPDU max-size and don't assume we can transmit
a full A-MPDU (64k!) if the peer announces smaller values.
Relnotes: ath(4): Fix A-MPDU transmit; obey A-MPDU density and max size.
This is a long time coming. The general pieces have been floating around
in a local repo since circa 2012 when I dropped the net80211 support
into the tree.
This allows the per-chain RSSI and NF to show up in 'ifconfig wlanX list sta'.
I haven't yet implemented the EVM hookups so that'll show up; that'll come
later.
Thanks to Susie Hellings <susie@susie.id.au> who did the original work
on this a looong time ago for a company we both worked at.
* Don't do RTS/CTS - experiments show that we get ACK frames for each of them
and this ends up causing the timestamps to look all funny.
* Set the HAL_TXDESC_POS bit, so the AR9300 HAL sets up the hardware to return
location and CSI information.
* change the HT_RC_2_MCS to do MCS0..23
* Use it when looking up the ht20/ht40 array for bits-per-symbol
* add a clk_to_psec (picoseconds) routine, so we can get sub-microsecond
accuracy for the math
* .. and make that + clk_to_usec public, so higher layer code that is
returning clocks (eg the ANI diag routines, some upcoming locationing
experiments) can be converted to microseconds.
Whilst here, add a comment in ar5416 so i or someone else can revisit the
latency values.
Uses of commas instead of a semicolons can easily go undetected. The comma
can serve as a statement separator but this shouldn't be abused when
statements are meant to be standalone.
Detected with devel/coccinelle following a hint from DragonFlyBSD.
MFC after: 1 month
The 11n duration calculation function in net80211 and the HAL round /up/
the duration calculation for short-gi, so we can't use that.
The 11n duration calculation doesn't know about the extra symbol time
needed for STBC, nor the LDPC encoding duration, so we can't use
that.
This (along with other, local hacks) allow the locationing services to
get down to around 200nS (yes, nanoseconds) of variance when speaking
to a "good" AP.
Tested:
* AR9380, STA mode, local locationing frame hacks
The pre-11n calculations include SIFS, but the 11n ones don't.
The reason is that (mostly) the 11n hardware is doing the SIFS calculation
for us but the pre-11n hardware isn't. This means that we're over-shooting
the times in the duration field for non-11n frames on 11n hardware, which
is OK, if not a little inefficient.
Now, this is all fine for what the hardware needs for doing duration math
for ACK, RTS/CTS, frame length, etc, but it isn't useful for doing PHY
duration calculations. Ie, given a frame to TX and its timestamp, what
would the end of the actual transmission time be; and similar for an
RX timestamp and figuring out its original length.
So, this adds a new field to the duration routines which requests
SIFS or no SIFS to be included. All the callers currently will call
it requesting SIFS, so this /should/ be a glorious no-op. I'm however
planning some future work around airtime fairness and positioning which
requires these routines to have SIFS be optional.
Notably though, the 11n version doesn't do any SIFS addition at the moment.
I'll go and tweak and verify all of the packet durations before I go and
flip that part on.
Tested:
* AR9330, STA mode
* AR9330, AP mode
* AR9380, STA mode
* the code already stored the length of the RX desc, which I never used.
So, use that and retire the new flag I introduced a while ago.
* Introduce a TX timestamp length field and capability.
* extend the TX timestamp to 32 bits, as the AR5416 and later does a full
32 bit TX timestamp instead of 15 or 16 bits.
* add RX descriptor fields for PHY uploaded information (coming soon)
* add flags for RX/TX fast timestamp, hardware upload, etc
* add a flag for TX to request ToD/ToA location information.
I keep asking myself "what do these fields mean" and so now I've clarified
it for myself.
Tested:
* Reading the comments, going "a-ha!" a couple times.
Approved by: re (gjb)