spurious (and fatal) interrupt errors.
One user reported seeing this:
Apr 22 18:04:24 ceres kernel: ar5416GetPendingInterrupts: fatal error,
ISR_RAC 0x0 SYNC_CAUSE 0x2000
SYNC_CAUSE of 0x2000 is AR_INTR_SYNC_LOCAL_TIMEOUT which is a bus timeout;
this shouldn't cause HAL_INT_FATAL to be set.
After checking out ath9k, ath9k_ar9002_hw_get_isr() clears (*masked)
before continuing, regardless of whether any bits in the ISR registers
are set. So if AR_INTR_SYNC_CAUSE is set to something that isn't
treated as fatal, and AR_ISR isn't read or is read and is 0, then
(*masked) wouldn't be cleared. Thus any of the existing bits set
that were passed in would be preserved in the output.
The caller in if_ath - ath_intr() - wasn't setting the masked value
to 0 before calling ath_hal_getisr(), so anything that was present
in that uninitialised variable would be preserved in the case above
of AR_ISR=0, AR_INTR_SYNC_CAUSE != 0; and if the HAL_INT_FATAL bit
was set, a fatal condition would be interpreted and the chip was
reset.
This patch does the following:
* ath_intr() - set masked to 0 before calling ath_hal_getisr();
* ar5416GetPendingInterrupts() - clear (*masked) before processing
continues; so if the interrupt source is AR_INTR_SYNC_CAUSE
and it isn't fatal, the hardware isn't reset via returning
HAL_INT_FATAL.
This doesn't fix any underlying errors which trigger
AR_INTR_SYNC_LOCAL_TIMEOUT - which is a bus timeout of some
sort - so that likely should be further investigated.
diversity.
This is bit dirty and likely should be revised at a later date,
with an eye to unifying/tidying up the whole diversity setup
and allowing developers to do "tricky stuff" as they desire.
For now, this works.
From the ath9k source:
==
11N: we can no longer afford to self link the last descriptor.
MAC acknowledges BA status as long as it copies frames to host
buffer (or rx fifo). This can incorrectly acknowledge packets
to a sender if last desc is self-linked.
==
Since this is useful for pre-AR5416 chips that communicate PHY errors
via error frames rather than by on-chip counters, leave the support
in there, but disable it for AR5416 and later.
Introduce the AHB glue for Atheros embedded systems. Right now it's
hard-coded for the AR9130 chip whose support isn't yet in this HAL;
it'll be added in a subsequent commit.
Kernel configuration files now need both 'ath' and 'ath_pci' devices; both
modules need to be loaded for the ath device to work.
in the RX path when doing 11n and block-ack'ed frames. Apparently, the MAC
will loop over that self-linked descriptor and treat it as "good enough"
for (incorrectly!) ACKing the frames in the block-ack.
Until I figure out how to work around this issue in the future, this counter
will tell me if packet RX processing ever gets to the point where it's
touching the self-linked descriptor. If there's ever enough packets to get
to that point, BA's will be invalid and likely very unhappy.
by default.
Adventourous souls with an AR9220/AR9280 or later and who have a device
that sends PS-POLL frames may wish to try tinkering with this option and
get back to me.
It's still not ready for prime-time - there's some TX niggles with these 11n
cards that I'm still trying to wrap my head around, and AMPDU-TX is just not
implemented so things will come to a crashing halt if you're not careful.
There's still a lot of random issues to sort out with the radio side of
things and AMPDU RX handling (and completely missing AMPDU TX handling!)
but if people wish to give this a go and assist in debugging the
issues, they can define ATH_DO_11N to enable it.
I'm just re-iterating - this is here to allow people to assist in
further 11n development; it is not any indication that the 11n support
is complete and functional.
Important notes:
* This doesn't support 1-stream cards yet - (eg AR9285) - the various bits
that negotiate TX/RX MCS don't know not to try >1 stream TX or negotiate
1-stream RX; so don't enable 11n unless you've first taught the rate
control module and the net80211 stack to negotiate 1-stream stuff;
* The only rate control module minimally 11n aware is ath_rate_sample;
* ath_rate_sample doesn't know about HT/40; so airtime will be incorrectly
calculated;
* The AR9160 and AR9280 radio code is unreliable at the higher MCS rates for
some reason; this will definitely impact 11n performance;
* AMPDU-TX isn't yet implemented;
* AMPDU-RX may be a bit buggy still and will definitely suffer from the
radio unreliability mentioned above (ie, don't expect 150/300mbit
RX just yet.)
The correct bit to set is 0x1 in the high MAC address byte, not 0x80.
The hardware isn't programmed with that bit (which is the multicast
adress bit.)
The linux ath9k keycache code uses that bit in the MAC as a "this is
a multicast key!" and doesn't set the AR_KEYTABLE_VALID bit.
This tells the hardware the MAC isn't to be used for unicast destination
matching but it can be used for multicast bssid traffic.
This fixes some encryption problems in station mode.
PR: kern/154598
Revert back to the previous method of doing it for where a node can be
identified and it's an 11n node.
I'll have to do some further research into exactly what is being messed up
with the sequence number matching and I'll then revisit this.
This fixes two problems -
* All packets need to be processed here, not just aggregate ones - as any
received frames (AMPDU or otherwise) in the given TID (traffic class id)
will update the sequence number and, implied with that, update the window;
* It seems there's situations where packets aren't matching a current node but
somehow need to be tracked. Thus just tag them all for now; I'll figure out
the why later.
Whilst I'm here, bump the stats counters whilst I'm at it.
This fixes AMPDU RX in my tests; the main problems now stem from what look
like PHY level error/retransmits which are impeding general throughput, incl.
AMPDU.
A-MPDU RX interferes with packet retransmission/reordering.
In local testing, I was seeing A-MPDU being negotiated and then
not used by the AP sending frames to the STA; the STA would then
treat non A-MPDU frames that are retransmits as out of the window
and get plain confused.
The hardware RX status descriptor has a "I'm part of an aggregate"
bit; so this should eventually be tested and then punted to the
A-MPDU reorder handling only if it has this bit set.
There's two reasons for this:
* the raw and non-raw TX path shares a lot of duplicate code which should be
refactored;
* the 11n-ready chip TX path needs a little reworking.
The rxmonitor hook is called on each received packet. This can get very,
very busy as the tx/rx/chanbusy registers are thus read each time a packet
is received.
Instead, shuffle out the true per-packet processing which is needed and move
the rest of the ANI processing into a periodic event which runs every 100ms
by default.
The AR9100 at least doesn't have an external serial EEPROM
attached to the MAC; it instead stores the calibration data
in the normal system flash.
I believe earlier parts can do something similar but I haven't
experienced it first-hand.
This commit introduces an eepromdata pointer into the API but
doesn't at all commit to using it. A future commit will
include the glue needed to allow the AR9100 support code
to use this data pointer as the EEPROM.
Since we now have the source code, there's no reason to hide the diag codes
from other areas.
They live in the HAL as they form part of the HAL API and should still be treate
as "potentially flexible; don't publish as a public API." But since they're
already used as a public API (see follow-up commit), we may as well use
them in place of magic constants.
flag immediately so it's only set once per longcal interval.
Without this, the current AR5416 code will continuously spam NF
calibrations during a periodic calibration if the longcal flag
is set. The longcal flag wouldn't be cleared until the calibration
method indicates that calibrations are "complete".
This drops the rate of NF calibration updates down from "once every
shortcal" (ie, every 100ms) during a periodic calibration, to only
once per "longcal" interval. Spamming NF calibrations every 100ms
caused some potentially horrific issues in noisy environments as
NF calibrations can take longer than 100ms and this spamming can
cause invalid NF calibration results to be read back - leading to
missed beacons, and thus leading to a stuck beacon situation.
Stuck beacons cause interface resets, which restart calibrations.
This means that the longcal calibration runs every 100ms (shortcal)
until all initial calibrations are completed. This spamming can then
cause the above issues which leads to stuck beacons, leading to
interface resets, etc, etc. Quite annoying.
queue length. The default value for this parameter is 50, which is
quite low for many of today's uses and the only way to modify this
parameter right now is to edit if_var.h file. Also add read-only
sysctl with the same name, so that it's possible to retrieve the
current value.
MFC after: 1 month
* WPA-None requires ap_scan=2:
The major difference between ap_scan=1 (default) and 2 is, that no
IEEE80211_IOC_SCAN* ioctls/functions are called, though, there is a
dependency on those. For example the call to wpa_driver_bsd_scan()
sets the interface UP, this never happens, therefore the interface
must be marked up in wpa_driver_bsd_associate(). IEEE80211_IOC_SSID
also is not called, which means that the SSID has not been set prior
to the IEEE80211_MLME_ASSOC call.
* WPA-None has no support for sequence number updates, it doesn't make
sense to check for replay violations..
* I had some crashes right after the switch to RUN state, issue is
that sc->sc_lastrs was not yet defined.
Approved by: rpaulo (mentor)
MFC after: 3 weeks