I couldn't think of a way to maintain the hardware TXQ locks _and_ layer
on top of that per-TXQ software queuing and any other kind of fine-grained
locks (eg per-TID, or per-node locks.)
So for now, to facilitate some further code refactoring and development
as part of the final push to get software queue ps-poll and u-apsd handling
into this driver, just do away with them entirely.
I may eventually bring them back at some point, when it looks slightly more
architectually cleaner to do so. But as it stands at the present, it's
not really buying us much:
* in order to properly serialise things and not get bitten by scheduling
and locking interactions with things higher up in the stack, we need to
wrap the whole TX path in a long held lock. Otherwise we can end up
being pre-empted during frame handling, resulting in some out of order
frame handling between sequence number allocation and encryption handling
(ie, the seqno and the CCMP IV get out of sequence);
* .. so whilst that's the case, holding the lock for that long means that
we're acquiring and releasing the TXQ lock _inside_ that context;
* And we also acquire it per-frame during frame completion, but we currently
can't hold the lock for the duration of the TX completion as we need
to call net80211 layer things with the locks _unheld_ to avoid LOR.
* .. the other places were grab that lock are reset/flush, which don't happen
often.
My eventual aim is to change the TX path so all rejected frame transmissions
and all frame completions result in any ieee80211_free_node() calls to occur
outside of the TX lock; then I can cut back on the amount of locking that
goes on here.
There may be some LORs that occur when ieee80211_free_node() is called when
the TX queue path fails; I'll begin to address these in follow-up commits.
enforcing the TXOP and TBTT limits:
* Frames which will overlap with TBTT will not TX;
* Frames which will exceed TXOP will be filtered.
This is not enabled by default; it's intended to be enabled by the
TDMA code on 802.11n capable chipsets.
now this works for non-debug and debug builds.
* Add a comment reminding me (or someone) to audit all of the relevant
math to ensure there's no weird wrapping issues still lurking about.
But yes, this does seem to be mostly working.
Pointy-hat-to: adrian, yet again
* add some further debugging prints, which are quite nice to have
* add in ALQ hooks (optional!) to allow for the TDMA information to be
logged in-line with the TX and RX descriptor information.
The existing logic wrapped programming nexttbtt at 65535 TU.
This is not good enough for the 11n chips, whose nexttbtt register
(GENERIC_TIMER_0) has an initial value from 0..2^31-1 TSF.
So converting the TU to TSF had the counter wrap at (65535 << 10) TSF.
Once this wrap occured, the nexttbtt value was very very low, much
lower than the current TSF value. At this point, the nexttbtt timer
would constantly fire, leading to the TX queue being constantly gated
open.. and when this occured, the sender was not correctly transmitting
in its slot but just able to continuously transmit. The master would
then delay transmitting its beacon until after the air became free
(which I guess would be after the burst interval, before the next burst
interval would quickly follow) and that big delta in master beacon TX
would start causing big swings in the slot timing adjustment.
With this change, the nexttbtt value is allowed to go all the way up
to the maximum value permissable by the 32 bit representation.
I haven't yet tested it to that point; I really should. The AR5212
HAL now filters out values above 65535 TU for the beacon configuration
(and the relevant legal values for SWBA, DBA and NEXTATIM) and the
AR5416 HAL just dutifully programs in what it should.
With this, TDMA is now useful on the 802.11n chips.
Tested:
* AR5416, AR9280 TDMA slave
* AR5413 TDMA slave
what the maximum legal values are.
The current beacon timer configuration from TDMA wraps things at
HAL_BEACON_PERIOD-1 TU. For the 11a chips this is fine, but for
the 11n chips it's not enough resolution. Since the 11a chips have a
limit on what's "valid", just enforce this so when I do write larger
values in, they get suitably wrapped before programming.
Tested:
* AR5413, TDMA slave
Todo:
* Run it for a (lot) longer on a clear channel, ensure that no strange
slippages occur.
* Re-validate this on STA configurations, just to be sure.
After chatting with the MAC team, the TSF writes (at least on the 11n
MACs, I don't know about pre-11n MACs) are done as 64 bit writes that
can take some time. So, doing a 32 bit TSF write is definitely not
supported. Leave a comment here which explains that.
Whilst here, add a comment which outlines that after a reset or TSF
write, the TSF write may take a while (up to 50uS) to update.
A write or reset shouldn't be done whilst the previous one is in
flight. Also (and this isn't currently done) a read shouldn't
occur until the SLEEP32_TSF_WRITE_STAT is clear. Right now we're
not doing that, mostly because we haven't been doing lots of TSF
resets/writes until recently.
TSF write.
The TSF_L32 update is fine for the AR5413 (and later, I guess) 11abg NICs
however on the 11n NICs this didn't work. The TSF writes were causing
a much larger time to be skipped, leading to the timing to never
converge.
I've tested this 64 bit TSF read, adjust and write on both the
11n NICs and the AR5413 NIC I've been using for testing. It works
fine on each.
This patch allows the AR5416/AR9280 to be used as a TDMA member.
I don't yet know why the AR9280 is ~7uS accurate rather than ~3uS;
I'll look into it soon.
Tested:
* AR5413, TDMA slave (~ 3us accuracy)
* AR5416, TDMA slave (~ 3us accuracy)
* AR9280, TDMA slave (~ 7us accuracy)
on the 802.11n NICs.
The 802.11n NICs return a TBTT value that continues far past the 16 bit
HAL_BEACON_PERIOD time (in TU.) The code would constrain nextslot to
HAL_BEACON_PERIOD, but it wasn't constraining nexttbtt - the pre-11n
NICs would only return TU values from 0 -> HAL_BEACON_PERIOD. Thus,
when nexttbtt exceeded 64 milliseconds, it would not wrap (but nextslot
did) which lead to a huge tsfdelta.
So until the slot calculation is converted to work in TSF rather than
a mix of TSF and TU, "make" the nexttbtt values match the TU assumptions
for pre-11n NICs.
This fixes the crazy deltatsf calculations but it doesn't fix the
non-convergent tsfdelta issue. That'll be fixed in a subsequent commit.
encryption types.
The AR5210 only has four WEP key slots, in contrast to what the
later MACs have (ie, the keycache.) So there's no way to store a "clear"
key.
Even if the driver is taught to not allocate CLR key entries for
the AR5210, the hardware will actually attempt to decode the encrypted
frames with the (likely all 0!) WEP keys.
So for now, disable the hardware encryption entirely and just so it
all in software. That allows both WEP -and- WPA to actually work.
If someone wishes to try and make hardware WEP _but_ software WPA work,
they'll have to create a HAL capability to enable/disable hardware
encryption based on the current STA/Hostap mode. However, making
multi-vap work with one WEP and one WPA VAP will require hardware
encryption to be disabled anyway.
* For CABQ traffic, I -can- chain them together using the next pointer
and just push that particular chain head to the CABQ. However, this
doesn't magically make EDMA TX CABQ work - I have to do some further
hoop jumping.
* upon setup, tell the alq code what the chip information is.
* add TX/RX path logging for legacy chips.
* populate the tx/rx descriptor length fields with a best-estimate.
It's overly big (96 bytes when AH_SUPPORT_AR5416 is enabled)
but it'll do for now.
Whilst I'm here, add CURVNET_RESTORE() here during probe/attach as a
partial solution to fixing crashes during attach when the attach fails.
There are other attach failures that I have to deal with; those'll come
later.
* Add a new method which allows the driver to push the MAC/phy/hal info
into the logging stream.
* Add a new ALQ logging entry which logs the mac/phy/hal information.
* Modify the ALQ startup path to log the MAC/phy/hal information
so the decoder knows which HAL/chip is generating this information.
* Convert the header and mac/phy/hal information to use be32, rather than
host order. I'd like to make this stuff endian-agnostic so I can
decode MIPS generated logs on a PC.
This requires some further driver modifications to correctly log the
right initial chip information.
Also - although noone bar me is currently using this, I've shifted the
debug bitmask around a bit. Consider yourself warned!
This was broken by me when merging the 802.11n aggregate descriptor chain
setup with the default descriptor chain setup, in preparation for supporting
AR9380 NICs.
The corner case here is quite specific - if you queue an aggregate frame
with >1 frames in it, and the last subframe has only one descriptor making
it up, then that descriptor won't have the rate control information
copied into it. Look at what happens inside ar5416FillTxDesc() if
both firstSeg and lastSeg are set to 1.
Then when ar5416ProcTxDesc() goes to fill out ts_rate based on the
transmit index, it looks at the rate control fields in that descriptor
and dutifully sets it to be 0.
It doesn't happen for non-aggregate frames - if they have one descriptor,
the first descriptor already has rate control info.
I removed the call to ath_hal_setuplasttxdesc() when I migrated the
code to use the "new" style aggregate chain routines from the HAL.
But I missed this particular corner case.
This is a bit inefficient with MIPS boards as it involves a few redundant
writes into non-cachable memory. I'll chase that up when it matters.
Tested:
* AR9280 STA mode, TCP iperf traffic
* Rui Paulo <rpaulo@> first reported this and has verified it on
his AR9160 based AP.
PR: kern/173636
This happens during a scan in STA mode; any queued data frames will
be power save queued but as there's no TIM in STA mode, it panics.
This was introduced by me when I disabled my driver-aware power save
handling support.
actual traffic with an AR9380/AR9382/AR9485.
The sample rate control stats would show impossibly large numbers for
"successful packets transmitted." The number was a tad under 2^^64-1.
So after a bit of digging, I found that the sample rate control code
was making 'tries' turn into a negative number.. and this was because
ts_longretry was too small.
The hardware returns "ts_longretry" at the current rate selection,
not overall for that TX descriptor. So if you setup four TX rate
scenarios and the second one works, ts_longretry is only set for
the number of attempts at that second rate scenario. The FreeBSD HAL
code does the correction in ath_hal_proctxdesc() - however, this isn't
possible with EDMA.
EDMA TX completion is done separate from the original TX descriptor.
So the real solution is to split out "find ts_rate and ts_longretry"
from "complete TX descriptor". Until that's done, put a hack in
the EDMA TX path that uses the rate scenario information in the ath_buf.
Tested: AR9380, AR9382, AR9485 STA mode
events.
This is primarily for the TX EDMA and TX EDMA completion. I haven't yet
tied it into the EDMA RX path or the legacy TX/RX path.
Things that I don't quite like:
* Make the pointer type 'void' in ath_softc and have if_ath_alq*()
return a malloc'ed buffer. That would remove the need to include
if_ath_alq.h in if_athvar.h.
* The sysctl setup needs to be cleaned up.
I'm using this to debug EDMA TX and RX descriptors and it's really helpful
to have a non-printf() way to decode frames.
I won't link this into the build until I've tidied it up a little more.
This will eventually be behind ATH_DEBUG_ALQ.