The hardware can optionally "filter" frames if successive transmissions
to a given node (ie, "entry in the keycache") fail. That way the hardware
can implement a kind of early abort of all the other frames queued to
that destination, rather than simply trying to TX each frame to that
destination (and failing.)
The background:
* If a frame comes back as being filtered, the hardware didn't try to
TX it (or it was outside the TX burst opportunity.) So, take it as a hint
that some (but not all, see below) frames to the destination may be
filtered.
* If the CLRDMASK bit is set in a TX descriptor, the "filter to this
destination" bit in the keycache entry is cleared and TX to that host
will be unconditionally retried.
* Right now everything has the CLRDMASK bit set, so filtered frames
tend to be aggregates and frames that fall outside of the WME burst
window. It was a bit worse in the past as I had messed up the TX
flags and CLRDMASK wasn't being set on aggregate frames.
The annoying bits:
* It's easy (ish) to do for aggregate session frames - firstly, they
can be retried in any order as long as they're within the BAW, and
there's already a bunch of infrastructure tracking how many frames
the TID has queued to the hardware (tid->hwq_depth.) However, for
frames that bypassed the software queue, hwq_depth doesn't get
incremented. I'll fix that in a subsequent commit.
* For non-aggregate session frames, the only retries that can occur
are ones for sequence numbers that hvaen't successfully been TXed yet.
Since there's no re-ordering going on in non-aggregate sessions, if any
subsequent seqno frames make it out, any filtered frames before that
seqno need to be dropped.
Hence why this initially is just for aggregate session frames.
* Since there may be intermediary frames to the destination that
have CLRDMASK set - for example, any directly dispatched management
frames to that destination - it's possible that there will be some
filtered frames followed up by some non filtered frames. Thus,
it can't be assumed that once you see a filtered frame for the given
destination node, all subsequent frames for all TIDs will be filtered.
Ok, with that in mind:
* Create a per-TID filtered frame queue for frames that the hardware
returns as filtered.
* Track filtered frames per-tid, rather than per-node. It just makes
the locking much easier.
* When a filtered frame appears in the completion function, the node
transitions to "filtered", and all subsequent completed error frames
(filtered or otherwise) are put on the filtered frame queue. The TID
is paused once (during the transition from non-filtered to filtered).
* If a filtered frame retry count exceeds SWMAX_RETRIES, a BAR should be
sent.
* Once all the frames queued to the hardware for the given filtered frame
TID, transition back from filtered frame to non-filtered frame, which
means pre-pending all the filtered frames onto the head of the software
queue, clearing the filtered frame state and unpausing the TID.
Things get quite hairy around handling completion (aggr, non-aggr, norm,
direct-dispatched frames to a hardware queue); whether it's an "error",
"cleanup" or "BAR" state as well as filtered, which order to do things
in (eg do filtered BEFORE checking for BAR, as the filter completion
may be needed to actually transmit a BAR frame.)
This work has definitely reminded me that I have to tidy up all the locking
and remove some of the ridiculous lock/unlock/lock/unlock going on in the
completion functions.
It's also reminded me that I should really split out TID versus hardware TXQ
locking, even if the underlying locking is still the destination hardware TXQ.
Finally, this is all pre-requisite for working on AP mode power save support
(PS-POLL, uAPSD) as well as improving performance to misbehaving nodes (as
they can transition into filter mode, stopping any TX until everything has
caught up.)
Finally (ish) - this should also be done for non-aggregate sessions as
there are still plenty of laptops and mobile devices that don't speak
802.11n but do wish for stable, useful power save AP support where packets
aren't simply dropped. This requires software retransmission for
non-aggregate sessions to be implemented, which includes the caveats I've
mentioned above.
Finally finally - this doesn't yet do anything about the CLRDMASK bit in the
TX descriptor. That's still unconditionally set to 1. I'll debug the
current work (mostly ensuring I haven't busted up the hairy transitions
between BAR, filtered, error (all frames in an aggregate failing) and
cleanup (when transitioning from aggregation -> non-aggregation.))
Finally finally finally - this is all original work by yours truely, rather
than ported from the Atheros internal driver codebase or Linux ath9k.
Tested:
* AR9280, AR5416 in STA mode
* AR9280, AR9130 in hostap mode
* Lots and lots of iperf testing in very marginal and non-marginal conditions,
complete with inducing filtered frames + BAR TX conditions.
The AR9003 series NICs implement a separate RX error to signal that a
Keycache miss occured. The earlier NICs would not set the key index
valid bit.
I'll dig into the difference between "no key index bit set" and "keycache
miss".
This includes a few new fields in each RXed frame:
* per chain RX RSSI (ctl and ext);
* current RX chainmask;
* EVM information;
* PHY error code;
* basic RX status bits (CRC error, PHY error, etc).
This is primarily to allow me to do some userland PHY error processing
for radar and spectral scan data. However since EVM and per-chain RSSI
is provided, others may find it useful for a variety of tasks.
The default is to not compile in the radiotap vendor extensions, primarily
because tcpdump doesn't seem to handle the particular vendor extension
layout I'm using, and I'd rather not break existing code out there that
may be (badly) parsing the radiotap data.
Instead, add the option 'ATH_ENABLE_RADIOTAP_VENDOR_EXT' to your kernel
configuration file to enable these options.
call these after rate control selection is done.
The duration/protection code wasn't working - it expected the rix to
be valid. Unfortunately after I moved the rate control selection into
late in the process, the rix value isn't valid and thus the protection/
duration code would get things wrong.
HT frames are now correctly protected with an RTS and for the AR5416,
this involves having the aggregate frames be limited to 8K.
TODO:
* Fix up the DMA sync to occur just before the frame is queued to the
hardware. I'm adjusting the duration here but not doing the DMA
flush.
* Doubly/triply ensure that the aggregate frames are being limited to
the correct size, or the AR5416 will get unhappy when TXing RTS-protected
aggregates.
In a very noisy 2.4GHz environment (with HT/40 enabled, making it worse)
I saw the following occur:
* the air was considered "busy" a lot of the time;
* the cabq time is quite short due to staggered beacons being enabled;
* it just wasn't able to keep up TX'ing CABQ frames;
* .. and the cabq would swallow up all the TX ath_buf's.
This patch introduces a twiddle which allows the maximum cabq depth to be
set, forcing further frames to be dropped.
It defaults to the TX buffer count at the moment, so the default behaviour
isn't changed.
I've also started fleshing out a similar setup for the data path, so
it doesn't swallow up all the available TX buffers and preventing management
frames (such as ADDBA) out.
PR: kern/165895
* Failall is now named just that.
* Add TX ok and TX fail, for aggregate frame sub-frames.
This will break athstats; a followup commit wil resolve this.
Sponsored by: Hobnob, Inc.
ioctl interface for DFS modules to use.
Since there's no open source dfs code yet, this doesn't introduce any
operational changes.
Approved by: re (kib)
in the RX path when doing 11n and block-ack'ed frames. Apparently, the MAC
will loop over that self-linked descriptor and treat it as "good enough"
for (incorrectly!) ACKing the frames in the block-ack.
Until I figure out how to work around this issue in the future, this counter
will tell me if packet RX processing ever gets to the point where it's
touching the self-linked descriptor. If there's ever enough packets to get
to that point, BA's will be invalid and likely very unhappy.
The rxmonitor hook is called on each received packet. This can get very,
very busy as the tx/rx/chanbusy registers are thus read each time a packet
is received.
Instead, shuffle out the true per-packet processing which is needed and move
the rest of the ANI processing into a periodic event which runs every 100ms
by default.
o change tdma packet drop msg when ack required to ATH_DEBUG_TDMA
(ATH_DEBUG_XMIT is too noisy)
o add a debug msg for raw packet drop due to interface down/invalid
o add stats for these two cases
o explain how another drop case is handled
o add net80211 support for a tdma vap that is built on top of the
existing adhoc-demo support
o add tdma scheduling of frame transmission to the ath driver; it's
conceivable other devices might be capable of this too in which case
they can make use of the 802.11 protocol additions etc.
o add minor bits to user tools that need to know: ifconfig to setup and
configure, new statistics in athstats, and new debug mask bits
While the architecture can support >2 slots in a TDMA BSS the current
design is intended (and tested) for only 2 slots.
Sponsored by: Intel
Note this includes changes to all drivers and moves some device firmware
loading to use firmware(9) and a separate module (e.g. ral). Also there
no longer are separate wlan_scan* modules; this functionality is now
bundled into the wlan module.
Supported by: Hobnob and Marvell
Reviewed by: many
Obtained from: Atheros (some bits)
o major overhaul of the way channels are handled: channels are now
fully enumerated and uniquely identify the operating characteristics;
these changes are visible to user applications which require changes
o make scanning support independent of the state machine to enable
background scanning and roaming
o move scanning support into loadable modules based on the operating
mode to enable different policies and reduce the memory footprint
on systems w/ constrained resources
o add background scanning in station mode (no support for adhoc/ibss
mode yet)
o significantly speedup sta mode scanning with a variety of techniques
o add roaming support when background scanning is supported; for now
we use a simple algorithm to trigger a roam: we threshold the rssi
and tx rate, if either drops too low we try to roam to a new ap
o add tx fragmentation support
o add first cut at 802.11n support: this code works with forthcoming
drivers but is incomplete; it's included now to establish a baseline
for other drivers to be developed and for user applications
o adjust max_linkhdr et. al. to reflect 802.11 requirements; this eliminates
prepending mbufs for traffic generated locally
o add support for Atheros protocol extensions; mainly the fast frames
encapsulation (note this can be used with any card that can tx+rx
large frames correctly)
o add sta support for ap's that beacon both WPA1+2 support
o change all data types from bsd-style to posix-style
o propagate noise floor data from drivers to net80211 and on to user apps
o correct various issues in the sta mode state machine related to handling
authentication and association failures
o enable the addition of sta mode power save support for drivers that need
net80211 support (not in this commit)
o remove old WI compatibility ioctls (wicontrol is officially dead)
o change the data structures returned for get sta info and get scan
results so future additions will not break user apps
o fixed tx rate is now maintained internally as an ieee rate and not an
index into the rate set; this needs to be extended to deal with
multi-mode operation
o add extended channel specifications to radiotap to enable 11n sniffing
Drivers:
o ath: add support for bg scanning, tx fragmentation, fast frames,
dynamic turbo (lightly tested), 11n (sniffing only and needs
new hal)
o awi: compile tested only
o ndis: lightly tested
o ipw: lightly tested
o iwi: add support for bg scanning (well tested but may have some
rough edges)
o ral, ural, rum: add suppoort for bg scanning, calibrate rssi data
o wi: lightly tested
This work is based on contributions by Atheros, kmacy, sephe, thompsa,
mlaier, kevlo, and others. Much of the scanning work was supported by
Atheros. The 11n work was supported by Marvell.
o include current tx rate in stats so athstats gets a consistent
snapshot and doesn't have to make an extra ioctl
o record tx rate for raw frames
MFC after: 3 weeks
frame and if we get a beacon miss interrupt ignore it if we've received
a frame within the beacon miss interval. This should never trigger
and the handling at the net80211 layer should likewise deal with this
but it doesn't hurt and can suppress extranous probe request frames.
Note that we can legtimately get a bmiss when under heavy load.
MFC after: 2 weeks
o record tsf in tx+rx frames
o switch from raw rssi to dbm for signal data and record both
signal and noise floor data (hacked for now to assume a fixed
noise floor; is correct with new hal)
o add monpass sysctl to control which rx'd frames are passed
up with errors; especially useful to see frames with CRC errors
o mark 'd packets w/ a CRC error with radiotap's BADFCS flag
Also add placeholder code for calibrating the noise floor when
using newer hals.
Reviewed by: avatar
MFC after: 1 week
o move tx taps from ath_start to ath_tx_start so lots more
state is available to tap
o add tx flags
o add tx rate
o add tx power (constant for the moment)
o add tx antenna state
o fix race condition when processing rx descriptors: because we use
a self-linked descriptor at the end of the rx descriptor list to
avoid rx overruns (which can easily happen for 5212 parts that enable
PHY errors) we must carefully check that a descriptor is "done" by
looking ahead to the next descriptor before believing the done bit
in the current descriptor (this is all handled in the HAL since the
rx descriptor format is chip-specific so we need to pass in two
additional parameters--the physical address of the current descriptor
and the virtual address of the next descriptor in the list)
o check copyout return status for SIOCGATHSTATS ioctl
Approved by: re (scottl)
quietly discard them; this just permits them to be collected with bpf)
o add a counter for the number of rate control frames discarded when not in
monitor mode
o move the rx "too short" statistic in the stat structure so non-error rx stats
are together (NB: ABI change to apps that collect stats via driver ioctl)