The hardware can generate its own frames (eg RTS/CTS exchanges, other
kinds of 802.11 management stuff, especially when it comes to 802.11n)
and these also have PWRMGT flags. So if the VAP is asleep but the
NIC is in force-awake for some reason, ensure that the self-generated
frames have PWRMGT set to 1.
Now, this (like basically everything to do with powersave) is still
racy - the only way to guarantee that it's all actually consistent
is to pause transmit and let it finish before transitioning the VAP
to sleep, but this at least gets the basic method of tracking and
updating the state debugged.
Tested:
* AR5416, STA mode
* AR9380, STA mode
fixes and beacon programming / debugging into the ath(4) driver.
The basic power save tracking:
* Add some new code to track the current desired powersave state; and
* Add some reference count tracking so we know when the NIC is awake; then
* Add code in all the points where we're about to touch the hardware and
push it to force-wake.
Then, how things are moved into power save:
* Only move into network-sleep during a RUN->SLEEP transition;
* Force wake the hardware up everywhere that we're about to touch
the hardware.
The net80211 stack takes care of doing RUN<->SLEEP<->(other) state
transitions so we don't have to do it in the driver.
Next, when to wake things up:
* In short - everywhere we touch the hardware.
* The hardware will take care of staying awake if things are queued
in the transmit queue(s); it'll then transit down to sleep if
there's nothing left. This way we don't have to track the
software / hardware transmit queue(s) and keep the hardware
awake for those.
Then, some transmit path fixes that aren't related but useful:
* Force EAPOL frames to go out at the lowest rate. This improves
reliability during the encryption handshake after 802.11
negotiation.
Next, some reset path fixes!
* Fix the overlap between reset and transmit pause so we don't
transmit frames during a reset.
* Some noisy environments will end up taking a lot longer to reset
than normal, so extend the reset period and drop the raise the
reset interval to be more realistic and give the hardware some
time to finish calibration.
* Skip calibration during the reset path. Tsk!
Then, beacon fixes in station mode!
* Add a _lot_ more debugging in the station beacon reset path.
This is all quite fluid right now.
* Modify the STA beacon programming code to try and take
the TU gap between desired TSF and the target TU into
account. (Lifted from QCA.)
Tested:
* AR5210
* AR5211
* AR5212
* AR5413
* AR5416
* AR9280
* AR9285
TODO:
* More AP, IBSS, mesh, TDMA testing
* Thorough AR9380 and later testing!
* AR9160 and AR9287 testing
Obtained from: QCA
Some code will appear soon that is actually setting the chip powerstate
separate from the self-generated frames power state.
* Allow the AR5416 family chips to actually have the power state changed
from the self generated state change.
Tested (STA mode):
* AR5210
* AR5211
* AR5412
* AR5413
* AR5416
* AR9285
the MYBEACON RX filter (only receive beacons which match the BSSID)
or all beacons on the current channel.
* Add the relevant RX filter entry for MYBEACON.
Tested:
* AR5416, STA
* AR9285, STA
TODO:
* once the code is in -HEAD, just make sure that the code which uses it
correctly sets BEACON for pre-AR5416 chips.
Obtained from: QCA, Linux ath9k
the QCA HAL.
This fires off an interrupt if the TSF from the AP / IBSS peer is
wildly out of range. I'll add some code to the ath(4) driver soon
which makes use of this.
TODO:
* verify this didn't break TDMA!
to the hardware.
The QCA HAL has a comment noting that if this isn't done, modifications
to AR_IMR_S2 before AR_IMR is flushed may produce spurious interrupts.
Obtained from: QCA
concurrent updates from any completing transmits in other threads.
This was exposed when doing power save work - net80211 is constantly
doing reassociations and it's causing the rate control state to get
blanked out. This could cause the rate control code to assert.
This should be MFCed to stable/10 as it's a stability fix.
Tested:
* AR5416, STA
MFC after: 7 days
The existing cleanup code was based on the Atheros reference driver
from way back and stuff that was in Linux ath9k. It turned out to be ..
rather silly.
Specifically:
* The whole method of determining whether there's hardware-queued frames
was fragile and the BAW would never quite work right afterwards.
* The cleanup path wouldn't correctly pull apart aggregate frames in the
queue, so frames would not be freed and the BAW wouldn't be correctly
updated.
So to implement this:
* Pull the aggregate frames apart correctly and handle each separately;
* Make the atid->incomp counter just track the number of hardware queued
frames rather than try to figure it out from the BAW;
* Modify the aggregate completion path to handle it as a single frame
(atid->incomp tracks the one frame now, not the subframes) and
remove the frames from the BAW before completing them as normal frames;
* Make sure bf->bf_next is NULled out correctly;
* Make both aggregate session and non-aggregate path frames now be
handled via the incompletion path.
TODO:
* kill atid->incomp; the driver tracks the hardware queued frames
for each TID and so we can just use that.
This is a stability fix that should be merged back to stable/10.
Tested:
* AR5416, STA
MFC after: 7 days
MAC
* Now that the paused < 0 bugs have been identified, make the DPRINTF()
a device_printf() again. Anything else that shows up here needs to be
fixed immediately.
Tested:
* AR5416, STA mode
MFC after: 7 days
During power save testing I noticed that the cleanup code is being
called during a RUN->RUN state transition. It's because the net80211
stack is treating that (for reasons I don't quitey know yet) as a
reassociation and this calls the node cleanup code. The reason it's
seeing a RUN->RUN transition is because during active power save
stuff it's possible that the RUN->SLEEP and SLEEP->RUN transitions
happen so quickly that the deferred net80211 vap state code
"loses" a transition, namely the intermediary SLEEP transition.
So, this was causing the node reassociation code to sometimes be called
twice in quick succession and this would result in ath_tx_tid_cleanup()
to be called again. The code calling it would always call pause, and
then only call resume if the TID didn't have "cleanup_inprogress" set.
Unfortunately it didn't check if it was already set on entry, so it
would pause but not call resume. Thus, paused would be called more
than once (once before each entry into ath-tx_tid_cleanup()) but resume
would only be called once when the cleanup state was finished.
This doesn't entirely fix all of the issues seen in the cleanup path
but it's a necessary first step.
Since this is a stability fix, it should be merged to stable/10 at some
point.
Tested:
* AR5416, STA mode
MFC after: 7 days
tracked BAW actually is.
The net80211 code that completes a BAR will set tid->txa_start (the
BAW start) to whatever value was called when sending the BAR.
Now, in case there's bugs in my driver code that cause the BAW
to slip along, we should make sure that the new BAW we start
at is actually what we currently have it at, not what we've sent.
This totally breaks the specification and so this stays a printf().
If it happens then I need to know and fix it.
Whilst here, add some debugging updates:
* add TID logging to places where it's useful;
* use SEQNO().
match how it's used.
This is another bug that led to aggregate traffic hanging because
the BAW tracking stopped being accurate. In this instance, a filtered
frame that exceeded retries would return a non-error, which would
mean the caller would never remove it from the BAW. But it wouldn't
be added to the filtered list, so it would be lost forever. There'd
thus be a hole in the BAW that would never get transmitted and
this leads to a traffic hang.
Tested:
* Routerstation Pro, AR9220 AP
we did suspend it.
The whole suspend/resume TID queue thing is supposed to be a matched
reference count - a subsystem (eg addba negotiation, BAR transmission,
filtered frames, etc) is supposed to call pause() once and then resume()
once.
ath_tx_tid_filt_comp_complete() is called upon the completion of any
filtered frame, regardless of whether the driver had aleady seen
a filtered frame and called pause().
So only call resume() if tid->isfiltered = 1, which indicates that
we had called pause() once.
This fixes a seemingly whacked and different problem - traffic hangs.
What was actually going on:
* There'd be some marginal link with crappy behaviour, causing filtered
frames and BAR TXing to occur;
* A BAR TX would occur, setting the new BAW (block-ack window) to seqno n;
* .. and pause() would be called, blocking further transmission;
* A filtered frame completion would occur from the hardware, but with
tid->isfiltered = 0 which indiciates we haven't actually marked
the queue yet as filtered;
* ath_tx_tid_filt_comp_complete() would call resume(), continuing
transmission;
* Some frames would be queued to the hardware, since the TID is now no
longer paused;
* .. and if some make it out and ACked successfully, the new BAW
may be seqno n+1 or more;
* .. then the BAR TX completes and sets the new seqno back to n.
At this point the BAW tracking would be loopy because the BAW
start was modified but the BAW ring buffer wasn't updated in lock
step.
Tested:
* Routerstation Pro + AR9220 AP
These are needed to diagnose TX hangs that I and hiren are seeing.
Without it, the only way we'll see debugging is by having ATH_DEBUG_SW_TX
enabled and that is going to be very, very spammy.
ATH_DEBUG_RESET is fine; it's only going to be done during stuck beacon
situations in AP mode.
Whilst I'm here, and now that it's behind debugging, let's just disable
the "print only one" conditional. I'll eventually make it more tunable.
Tested:
* AR9220, hostap mode.
device is asleep.
This doesn't avoid logging errors for things that are actually OK to
access whilst the chip is asleep (eg, the RTC registers (0x7000->0x70ff
on the AR5416 and later.)
But, this is a pretty good indicator if things are accessed incorrectly.
Tested:
* AR5416, STA
This way the state changes from sleep->awake before the registers are poked
and from awake->sleep after the registers are poked.
This way spurious warnings aren't printed by my (to be committed)
debugging code.
Tested:
* AR5416, STA
Yes, this means that sc_invalid is slightly racy, but there are other
issues here which need fixing.
This fixes a source of eventual LORs - ath_init() grabs ATH_LOCK to do
work and releases it before it calls ieee80211_start_all().
ieee80211_start_all() will grab the net80211 comlock to iterate over
the VAPs.
TODO:
* .. I should just migrate the ieee80211_start_all() work to a
deferred task so it can be done later; it doesn't have to be
immediately done.
Tested:
* AR5416, STA mode
private per-chip HAL.
This allows the ah_osdep.[ch] code to check whether the power state is
valid for doing chip programming.
It should be a no-op for normal driver work but it does require a
clean kernel/module rebuild, as the size of HAL structures have changed.
Now, this doesn't track whether the hardware is ACTUALLY awake,
as NETWORK_SLEEP wakes the chip up for a short period when traffic
is received. This doesn't actually set the power mode to AWAKE, so
we have to be careful about how we touch things.
But it's enough to start down the path of implementing station mode
chipset power savings, as a large part of the silliness is making
sure the chip is awake during periodic calibration / ANI and
random places where transmit may be occuring. I'd rather not a repeat
of debugging power save on ath9k, where races with calibration
and transmit path stuff took a couple years to shake out.
Tested:
* AR5416, STA mode
The origin of WEP comes from IEEE Std 802.11-1997 where it defines
whether the frame body of MAC frame has been encrypted using WEP
algorithm or not.
IEEE Std. 802.11-2007 changes WEP to Protected Frame, indicates
whether the frame is protected by a cryptographic encapsulation
algorithm.
Reviewed by: adrian, rpaulo
freeing them.
The current code would walk the list and call the buffer free, which
didn't remove it from any lists before pushing it back on the free list.
Tested: AR9485, STA mode
Noticed by: dillon@apollo.dragonflybsd.org
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
The AR5212 series of MACs implement the same channel counters as the
later 11n chips - except, of course, the 11n specific counter (extension
channel busy.)
This allows users of these NICs to use 'athsurvey' to see how busy their
current channel is.
Tested:
* AR5212, AR2413 NICs, STA mode
Approved by: re@ (gleb)
This occurs at RX DMA start, even though the RX FIFO has plenty of
space. I'll go figure out why, but this shouldn't cause people to
be spammed by these messages.
The AHB code:
* hard coded the AR9130 device id;
* assumes a 4k flash calibration space.
This code now extends this:
* hint.ath.X.eepromsize now overrides the eeprom range, instead of 4k
* hint.ath.X.device_id and hint.ath.X.vendor_id can now be overridden.
Tested:
* AR9330 board (Carambola 2)
This hasn't yet been tested as unfortunately the AR3012 I have doesn't
have the "real" firmware on it; it shipped with the cut down HCI firmware
that only understands enough to accept a new firmware image.
* Linux ath9k (GPIO constants)
* Add the LNA configuration table entries for AR933x/AR9485
* Add a chip-dependent LNA signal level delta in the startup path
* Add a TODO list for the stuff I haven't yet ported over but
I haven't.
Tested:
* AR9462 with LNA diversity enabled
The reference HAL pushes a config group parameter to the driver layer
to inform it which particular chip behaviour to implement.
This particular value tags it as an AR9285.
The AR9485 chip and AR933x SoC both implement LNA diversity.
There are a few extra things that need to happen before this can be
flipped on for those chips (mostly to do with setting up the different
bias values and LNA1/LNA2 RSSI differences) but the first stage is
putting this code into the driver layer so it can be reused.
This has the added benefit of making it easier to expose configuration
options and diagnostic information via the ioctl API. That's not yet
being done but it sure would be nice to do so.
Tested:
* AR9285, with LNA diversity enabled
* AR9285, with LNA diversity disabled in EEPROM