Commit Graph

190 Commits

Author SHA1 Message Date
Adrian Chadd
9a2de0c3d6 [ath] reset hardware if this particular mac bug is seen.
I have to dig into why I'm seeing it on chips as late as the AR9380 era
stuff (as it's marked as an AR5416 bug, but who knows!) but i'm seeing
aggregate TX frames complete with no blockack bit set.  So, everything
should be treated as a failure and do a hardware reset for good measure.

Tested:

* AR9380, STA mode
* AR9580 (5GHz), AP mode
2020-05-21 04:26:20 +00:00
Adrian Chadd
051ea90c43 [ath_rate_sample] Limit the tx schedules for A-MPDU ; don't take short retries
into account and remove the requirement that the MCS rate is "higher" if we're
 considering a new rate.

Ok, another fun one.

* In order for reliable non-software retried higher MCS rates, the TX schedules
  (inconsistently!) use hard-coded lower rates at the end of the schedule.
  Now, hard-coded is a problem because (a) it means that aggregate formation
  is limited by the SLOWEST rate, so I never formed large AMDU frames for
  3 stream rates, and (b) if the AP disables lower rates as base rates, it
  complains about "unknown rix" every frame you transmit at that rate.

  So, for now just disable the third and fourth schedule entry for AMPDUs.
  Now I'm forming 32k and 64k aggregates for the higher density MCS rates
  much more reliably.

  It would be much nicer if the rate schedule stuff wasn't fixed but instead
  I'd just populate ath_rc_series[] when I fetch the rates.  This is all a
  holdover of ye olde pre-11n stuff and I really just need to nuke it.

  But for now, ye hack.

* The check for "is this MCS rate better" based on MCS itself is just garbage.
  It meant things like going MCS0->7 would be fine, and say 0->8->16 is fine,
  (as they're equivalent encoding but 1,2,3 spatial streams), BUT it meant
  going something like MCS7->11 would fail even though it's likely that
  MCS11 would just be better, both for EWMA/BER and throughput.

  So for now just use the average tx time.  The "right" way for this comparison
  would be to compare PHY bitrates rather than MCS / rate indexes, but I'm not
  yet there.  The bit rates ARE available in the PHY index, but honestly
  I have a lot of other cleaning up to here before I think about that.

* Don't include the RTS/CTS retry count (and thus time) into the average tx time
  caluation.  It just makes temporarily failures make the rate look bad by
  QUITE A LOT, as RTS/CTS exchanges are (a) long, and (b) mostly irrelevant
  to the actual rate being tried.  If we keep hitting RTS/CTS failures then
  there's something ELSE wrong on the channel, not our selected rate.
2020-05-16 05:07:45 +00:00
Adrian Chadd
cce6344402 [ath] [ath_rate] Extend ath_rate_sample to better handle 11n rates and aggregates.
My initial rate control code was .. suboptimal.  I wanted to at least get MCS
rates sent, but it didn't do anywhere near enough to handle low signal level links
or remotely keep accurate statistics.

So, 8 years later, here's what I should've done back then.

* Firstly, I wasn't at all tracking packet sizes other than the two buckets
  (250 and 1600 bytes.)  So, extend it to include 4096, 8192, 16384, 32768 and
  65536.  I may go add 2048 at some point if I find it's useful.

  This is important for a few reasons.  First, when forming A-MPDU or AMSDU
  aggregates the frame sizes are larger, and thus the TX time calculation
  is woefully, increasingly wrong.  Secondly, the behaviour of 802.11 channels
  isn't some fixed thing, both due to channel conditions and radios themselves.
  Notably, there was some observations done a few years ago on 11n chipsets
  which noticed longer aggregates showed an increase in failed A-MPDU sub-frame
  reception as you got further along in the transmit time.  It could be due to
  a variety of things - transmitter linearity, channel conditions changing,
  frequency/phase drift, etc - but the observation was to potentially form
  shorter aggregates to improve BER.

* .. and then modify the ath TX path to report the length of the aggregate sent,
  so as the statistics kept would line up with the correct bucket.

* Then on the rate control look-up side - i was also only using the first frame
  length for an A-MPDU rate control lookup which isn't good enough here.
  So, add a new method that walks the TID software queue for that node to
  find out what the likely length of data available is.  It isn't ALL of the
  data in the queue because we'll only ever send enough data to fit inside the
  block-ack window, so limit how many bytes we return to roughly what ath_tx_form_aggr()
  would do.

* .. and cache that in the first ath_buf in the aggregate so it and the eventual
  AMPDU length can be returned to the rate control code.

* THEN, modify the rate control code to look at them both when deciding which bucket
  to attribute the sent frame on.  I'm erring on the side of caution and using the
  size bucket that the lookup is based on.

Ok, so now the rate lookups and statistics are "more correct".  However, MCS rates
are not the same as 11abg rates in that they're not a monotonically incrementing
set of faster rates and you can't assume that just because a given MCS rate fails,
the next higher one wouldn't work better or be a lower average tx time.

So, I had to do a bunch of surgery to the best rate and sample rate math.
This is the bit that's a WIP.

* First, simplify the statistics updates (update_stats()) to do a single pass on
  all rates.
* Next, make sure that each rate average tx time is updated based on /its/ failure/success.
  Eg if you sent a frame with { MCS15, MCS12, MCS8 } and MCS8 succeeded, MCS15 and MCS
  12 would have their average tx time updated for /their/ part of the transmission,
  not the whole transmission.
* Next, EWMA wasn't being fully calculated based on the /failures/ in each of the
  rate attempts.  So, if MCS15, MCS12 failed above but MCS8 didn't, then ensure
  that the statistics noted that /all/ subframes failed at those rates, rather than
  the eventual set of transmitted/sent frames.   This ensures the EWMA /and/ average
  TX time are updated correctly.
* When picking a sample rate and initial rate, probe rates aroud the current MCS
  but limit it to MCS0..7 /for all spatial streams/, rather than doing crazy things
  like hitting MCS7 and then probing MCS8 - MCS8 is basically MCS0 but two spatial
  streams.  It's a /lot/ slower than MCS7.  Also, the reverse is true - if we're at
  MCS8 then don't probe MCS7 as part of it, it's not likely to succeed.
* Fix bugs in pick_best_rate() where I was /immediately/ choosing the highest MCS
  rate if there weren't any frames yet transmitted.  I was defaulting to 25% EWMA and
  .. then each comparison would accept the higher rate.  Just skip those; sampling
  will fill in the details.

So, this seems to work a lot better.  It's not perfect; I'm still seeing a lot of
instability around higher MCS rates because there are bursts of loss/retransmissions
that aren't /too/ bad.  But i'll keep iterating over this and tidying up my hacks.

Ok, so why this still something I'm poking at? rather than porting minstrel_ht?

ath_rate_sample tries to minimise airtime, not maximise throughput.  I have
extended it with an EWMA based on sub-frame success/failures - high MCS rates
that have partially successful receptions still show super short average frame
times, but a /lot/ of retransmits have to happen for that to work.
So for MCS rates I also track this EWMA and ensure that the rates I'm choosing
don't have super crappy packet failures.  I don't mind not getting lower
peak throughput versus minstrel_ht; instead I want to see if I can make "minimise
airtime" work well.

Tested:

* AR9380, STA mode
* AR9344, STA mode
* AR9580, STA/AP mode
2020-05-15 18:51:20 +00:00
Adrian Chadd
84f950a54d [ath] [ath_rate] Add some extra data into the rate control lookup.
Right now (well, since I did this in 2011/2012) the rate control code
makes some super bad choices for 11n aggregates/rates, and it tracks
statistics even more questionably.

It's been long enough and I'm now trying to use it again daily, so let's
start by:

* telling the rate control code if it's an aggregate or not;
* being clearer about the TID - yes it can be extracted from the
  ath_buf but this way it can be overridden by the caller without
  changing the TID itself.

  (This is for doing experiments with voice/video QoS at some point..)

* Return an optional field to limit how long the aggregate is in
  microseconds.  Right now the rate control code supplies a rate table
  and the ath aggr form code will look at the rate table and limit
  the aggregate size to 4ms at the slowest rate.  Yeah, this is pretty
  terrible.

* Add some more TODO comments around handling txpower, rate and
  handling filtered frames status so if I continue to have spoons for
  this I can go poke at it.
2020-05-13 00:05:11 +00:00
Adrian Chadd
9fbe631a1a [net80211] convert all of the WME use over to a temporary copy of WME info.
This removes the direct WME info access in the ieee80211com struct and instead
provides a method of fetching the data.  Right now it's a no-op but eventually
it'll turn into a per-VAP method for drivers that support it (eg iwn, iwm,
upcoming ath10k work) as things like p2p support require this kind of behaviour.

Tested:

* ath(4), STA and AP mode

TODO:

* yes, this is slightly stack size-y, but it is an important first step
  to get drivers migrated over to a sensible WME API.  A lot of per-phy things
  need to be converted to per-VAP before P2P, 11ac firmware, etc stuff shows up.
2018-01-02 00:07:28 +00:00
Pedro F. Giffuni
718cf2ccb9 sys/dev: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 14:52:40 +00:00
Adrian Chadd
15e58d4d26 [ath] prepare for "correct" group (bcast/mcast) address frame handling and software/hardware queue TID mapping.
When I initially did this 11n TX work in days of yonder, my 802.11 standards
clue was ... not as finely tuned.  One of the things in 802.11-2012 (which
I guess technically was after I did this work, but I'm sure it was like this
in the previous rev?) is that among other traffic classes, three things are
important:

* group addressed frames should be default non-QoS, even if they're QoS frames, and
* group addressed frames should have a seqno out of a different space than the
  per-TID QoS one; and because of this
* group addressed frames, being non-QoS, should never be in the Block-ACK window
  for TX.

Now, net80211 and now this code cheats by using the non-QOS TID, but ideally
we'd introduce a separate seqno space just for multicast/group traffic for
TX and RX comparison.

Later extensions (eg reliable multicast / multimedia) express what one should do
when doing multicast traffic in a TID.  Now, technically we /could/ do group traffic
as QoS traffic and throw it into a per-TID seqno space, but this definitely
introduces ordering issues when you take into account things like CABQ behaviour.
(Ie, if some traffic in the TID goes into the CABQ and some doesn't, because
it's doing a split of multicast and non-multicast traffic, then you have
seqno ordering issues.)

So, until someone implements 802.11vv reliable multicast / multimedia extensions,
group traffic is non-QoS.

Next, software/hardware queue TID mapping.  In the past I believed the WME tagging
of frames because well, net80211 had a habit of tagging things like management
traffic with it.  But, then we also map QoS traffic categories to TIDs as well.
So, we should obey the TID!  But! then it put some management traffic into higher
WME categories too, as those frames don't have QoS TIDs.  But! It'd do things like
put things like QoS action frames into higher WME categories, when they should
be kept in-order with the rest of the traffic for that TID.  So! Given all of this,
the ath(4) driver does overrides to not trust the WME category.

I .. am undoing some of this.  Now, the TID has a 1:1 mapping to the hardware
queue.  The TID is the primary source of truth now for all QoS traffic.
The WME is only used for non-QoS traffic.  This now means that any TID traffic
queued should be consistently queued regardless of WME, so things like the
"TX finished, do more TX" that is occuring right now for transmit handling
should be "better".

The consistent {TID, WME} -> hardware queue mapping is important for
transmit completion.  It's used to schedule more traffic for that
particular TID, because that {many TID}:{1 TXQ} mapping in ath_tx_tid_sched()
is used for driving completion.  Ie, when the hardware queue completes,
it'll walk that list of scheduled TIDs attached to that TXQ.

The eventual aim is to get ready for some other features around putting
some data into other hardware queues (eg for better PS-POLL support,
uAPSD, support, correct-er TDMA support, etc) which requires that
I tidy all of this up in preparation for then introducing further
TID scheduling that isn't linked to a hardware TXQ (likely a per-WME, per-TID
driver queue, and a per-node driver queue) to enable that.

Tested:

* AR9380, STA mode
* AR9380, AR9580, AP mode
2017-03-19 05:00:14 +00:00
Adrian Chadd
39d5467677 [ath] log seqno, type and subtype when assigning sequence numbers for A-MPDU.
This is just to improve adrian-debugging.
2017-01-31 20:57:40 +00:00
Adrian Chadd
57af292d36 [ath] fix thresholds for deciding to queue to the software queue and populate hardware frames
This is two fixes, which establishes what I /think/ is pretty close to the
theoretical PHY maximum speed on the AR9380 devices.

* When doing A-MPDU on a TID, don't queue to the hardware directly if
  the hardware queue is busy.  This gives us time to get more packets
  queued up (and the hardware is busy, so there's no point in queuing
  more to the hardware right now) to potentially form an A-MPDU.

  This fixes up the throughput issue I was seeing where a couple hundred
  single frames were being sent a second interspersed between A-MPDU
  frames.  It just happened that the software queue had exactly one
  frame in it at that point.  Queuing it until the hardware finishes
  transmitting isn't exactly costly.

* When determining whether to dequeue from a software node/TID queue into
  the hardware queue, fix up the checks to work right for EDMA chips
  (ar9380 and later.)   Before it was not dispatching anything until
  the FIFO was empty.  Now we allow it to dispatch another aggregate
  up to the hardware aggregate limit, like I intended with the earlier
  work.

This allows a 5GHz HT40, short-GI, "htprotmode off" test at MCS23
to achieve 357 Mbit/sec in a one-way UDP test.  The stars have to be
aligned /just right/ so there are no retries but it can happen.
Just don't expect it to work in an OTA test if your 2yo is running
around the room - MCS23 is very very sensitive to channel conditions.

Tested:

* AR9380 STA (test) -> AR9580 hostap

TODO:

* More thorough testing on pre-AR9380 chips (AR5416, AR9160, AR9280)
* (Finally) teach ath_rate_sample about throughput/latency rather than
  air time, so I can get good transmit rates with a 2yo running around.
2017-01-23 04:30:08 +00:00
Andriy Voskoboinyk
887a63246c net80211: remove IEEE80211_RADIOTAP_TSFT field from transmit definitions.
This field may be used for received frames only.

Differential Revision:	https://reviews.freebsd.org/D3826
Differential Revision:	https://reviews.freebsd.org/D3827
2016-09-20 18:53:42 +00:00
Adrian Chadd
5abc0b2590 [ath] set the relevant TOA/TOD locationing bits when trying to do locationing.
* Don't do RTS/CTS - experiments show that we get ACK frames for each of them
  and this ends up causing the timestamps to look all funny.
* Set the HAL_TXDESC_POS bit, so the AR9300 HAL sets up the hardware to return
  location and CSI information.
2016-09-12 04:55:13 +00:00
Adrian Chadd
7ff1939db0 [ath] [ath_hal] break out the duration calculation to optionally include SIFS.
The pre-11n calculations include SIFS, but the 11n ones don't.

The reason is that (mostly) the 11n hardware is doing the SIFS calculation
for us but the pre-11n hardware isn't.  This means that we're over-shooting
the times in the duration field for non-11n frames on 11n hardware, which
is OK, if not a little inefficient.

Now, this is all fine for what the hardware needs for doing duration math
for ACK, RTS/CTS, frame length, etc, but it isn't useful for doing PHY
duration calculations.  Ie, given a frame to TX and its timestamp, what
would the end of the actual transmission time be; and similar for an
RX timestamp and figuring out its original length.

So, this adds a new field to the duration routines which requests
SIFS or no SIFS to be included.  All the callers currently will call
it requesting SIFS, so this /should/ be a glorious no-op.  I'm however
planning some future work around airtime fairness and positioning which
requires these routines to have SIFS be optional.

Notably though, the 11n version doesn't do any SIFS addition at the moment.
I'll go and tweak and verify all of the packet durations before I go and
flip that part on.

Tested:

* AR9330, STA mode
* AR9330, AP mode
* AR9380, STA mode
2016-07-15 06:39:35 +00:00
Adrian Chadd
bcf5fc498a [ath] commit initial bluetooth coexistence support for the MCI NICs.
This is the initial framework to call into the MCI HAL routines and drive
the basic state engine.

The MCI bluetooth coex model uses a command channel between wlan and
bluetooth, rather than a 2-wire or 3-wire signaling protocol to control things.
This means the wlan and bluetooth chip exchange a lot more information and
signaling, even at the per-packet level.  The NICs in question can share
the input LNA and output PA on the die, so they absolutely can't stomp
on each other in a silly fashion.  It also allows for the bluetooth side
to signal when profiles come and go, so the driver can take appropriate
control.  There's also the possibility of dynamic bluetooth/wlan duty cycle
control which I haven't yet really played with.

It configures things up with a static "wlan wins everything" coexistence,
configures up the available 2GHz channel map for bluetooth, sets a static
duty cycle for bluetooth/wifi traffic priority and drives the basics needed to
keep the MCI HAL code happy.

It doesn't do any actual coexistence except to default to "wlan wins everything",
which at least demonstrates that things do indeed work.  Bluetooth inquiry frames
still trump wifi (including beacons), so that demonstrates things really do
indeed seem to work.

Tested:

* AR9462 (WB222), STA mode + bt
* QCA9565 (WB335), STA mode + bt

TODO:

* .. the rest of coexistence.  yes, bluetooth, not people.  That stuff's hard.
* It doesn't do the initial BT side calibration, which requires a WLAN chip
  reset.  I'll fix up the reset path a bit more first before I enable that.
* The 1-ant and 2-ant configuration bits aren't being set correctly in
  if_ath_btcoex.c - I'll dig into that and fix it in a subsequent commit.
* It's not enabled by default for WB222/WB225 even though I believe it now
  can be - I'll chase that up in a subsequent commit.

Obtained from:	Qualcomm Atheros, Linux ath9k
2016-06-02 00:51:36 +00:00
Pedro F. Giffuni
f6b6084b8e dev/ath: minor spelling fixes in comments.
No functional change.

Reviewed by:	adrian
2016-05-02 19:56:48 +00:00
Adrian Chadd
82525db1eb [ath] turn the BA hardware bug back into a printf().
I saw this happen a couple of times and all I saw was a dump of the
transmit descriptors.  Log the message for now so I can see whta happened.
2016-04-29 01:52:06 +00:00
Adrian Chadd
d957a93abe net80211: move ieee80211_free_node() call on error from ic_raw_xmit() to ieee80211_raw_output().
This doesn't free the mbuf upon error; the driver ic_raw_xmit method is still
doing that.

Submitted by:	<s3erios@gmail.com>
Differential Revision:	https://reviews.freebsd.org/D3774
2015-10-12 04:55:20 +00:00
Adrian Chadd
d07be335a0 net80211: separate mbuf cleanup from ieee80211_fragment()
* Create ieee80211_free_mbuf() which frees a list of mbufs.
* Use it in the fragment transmit path and ath / uath transmit paths.
* Call it in xmit_pkt() if the transmission fails; otherwise fragments
  may be leaked.

This should be a big no-op.

Submitted by:	<s3erios@gmail.com>
Differential Revision:	https://reviews.freebsd.org/D3769
2015-10-12 03:27:08 +00:00
Gleb Smirnoff
7a79cebfba Replay r286410. Change KPI of how device drivers that provide wireless
connectivity interact with the net80211 stack.

Historical background: originally wireless devices created an interface,
just like Ethernet devices do. Name of an interface matched the name of
the driver that created. Later, wlan(4) layer was introduced, and the
wlanX interfaces become the actual interface, leaving original ones as
"a parent interface" of wlanX. Kernelwise, the KPI between net80211 layer
and a driver became a mix of methods that pass a pointer to struct ifnet
as identifier and methods that pass pointer to struct ieee80211com. From
user point of view, the parent interface just hangs on in the ifconfig
list, and user can't do anything useful with it.

Now, the struct ifnet goes away. The struct ieee80211com is the only
KPI between a device driver and net80211. Details:

- The struct ieee80211com is embedded into drivers softc.
- Packets are sent via new ic_transmit method, which is very much like
  the previous if_transmit.
- Bringing parent up/down is done via new ic_parent method, which notifies
  driver about any changes: number of wlan(4) interfaces, number of them
  in promisc or allmulti state.
- Device specific ioctls (if any) are received on new ic_ioctl method.
- Packets/errors accounting are done by the stack. In certain cases, when
  driver experiences errors and can not attribute them to any specific
  interface, driver updates ic_oerrors or ic_ierrors counters.

Details on interface configuration with new world order:
- A sequence of commands needed to bring up wireless DOESN"T change.
- /etc/rc.conf parameters DON'T change.
- List of devices that can be used to create wlan(4) interfaces is
  now provided by net.wlan.devices sysctl.

Most drivers in this change were converted by me, except of wpi(4),
that was done by Andriy Voskoboinyk. Big thanks to Kevin Lo for testing
changes to at least 8 drivers. Thanks to pluknet@, Oliver Hartmann,
Olivier Cochard, gjb@, mmoll@, op@ and lev@, who also participated in
testing.

Reviewed by:	adrian
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2015-08-27 08:56:39 +00:00
Adrian Chadd
3797bf0896 Remove most of the references of ifp->if_softc and replace with
references to ic->ic_softc.

This is in preparation for gleb's ifnet work.

Tested:

* ath(4), STA mode
* ath(4), hostap mode
* make universe
2015-08-17 02:04:11 +00:00
Adrian Chadd
ba2c1fbc03 Revert the wifi ifnet changes until things are more baked and tested.
* 286410
* 286413
* 286416

The initial commit broke a variety of debug and features that aren't
in the GENERIC kernels but are enabled in other platforms.
2015-08-08 01:10:17 +00:00
Gleb Smirnoff
79d2c5e857 Change KPI of how device drivers that provide wireless connectivity interact
with the net80211 stack.

Historical background: originally wireless devices created an interface,
just like Ethernet devices do. Name of an interface matched the name of
the driver that created. Later, wlan(4) layer was introduced, and the
wlanX interfaces become the actual interface, leaving original ones as
"a parent interface" of wlanX. Kernelwise, the KPI between net80211 layer
and a driver became a mix of methods that pass a pointer to struct ifnet
as identifier and methods that pass pointer to struct ieee80211com. From
user point of view, the parent interface just hangs on in the ifconfig
list, and user can't do anything useful with it.

Now, the struct ifnet goes away. The struct ieee80211com is the only
KPI between a device driver and net80211. Details:

- The struct ieee80211com is embedded into drivers softc.
- Packets are sent via new ic_transmit method, which is very much like
  the previous if_transmit.
- Bringing parent up/down is done via new ic_parent method, which notifies
  driver about any changes: number of wlan(4) interfaces, number of them
  in promisc or allmulti state.
- Device specific ioctls (if any) are received on new ic_ioctl method.
- Packets/errors accounting are done by the stack. In certain cases, when
  driver experiences errors and can not attribute them to any specific
  interface, driver updates ic_oerrors or ic_ierrors counters.

Details on interface configuration with new world order:
- A sequence of commands needed to bring up wireless DOESN"T change.
- /etc/rc.conf parameters DON'T change.
- List of devices that can be used to create wlan(4) interfaces is
  now provided by net.wlan.devices sysctl.

Most drivers in this change were converted by me, except of wpi(4),
that was done by Andriy Voskoboinyk. Big thanks to Kevin Lo for testing
changes to at least 8 drivers. Thanks to Olivier Cochard, gjb@, mmoll@,
op@ and lev@, who also participated in testing. Details here:

https://wiki.freebsd.org/projects/ifnet/net80211

Still, drivers: ndis, wtap, mwl, ipw, bwn, wi, upgt, uath were not
tested. Changes to mwl, ipw, bwn, wi, upgt are trivial and chances
of problems are low. The wtap wasn't compilable even before this change.
But the ndis driver is complex, and it is likely to be broken with this
commit. Help with testing and debugging it is appreciated.

Differential Revision:	D2655, D2740
Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2015-08-07 11:43:14 +00:00
Gleb Smirnoff
76e6fd5d6c Use device_printf() instead of if_printf(). No functional changes. 2015-05-29 14:35:16 +00:00
Gleb Smirnoff
2127b2e232 Mechanically convert to if_inc_counter(). 2014-09-18 20:47:39 +00:00
Adrian Chadd
f5c30c4e8d Bring over some initial power save management support, reset path
fixes and beacon programming / debugging into the ath(4) driver.

The basic power save tracking:

* Add some new code to track the current desired powersave state; and
* Add some reference count tracking so we know when the NIC is awake; then
* Add code in all the points where we're about to touch the hardware and
  push it to force-wake.

Then, how things are moved into power save:

* Only move into network-sleep during a RUN->SLEEP transition;
* Force wake the hardware up everywhere that we're about to touch
  the hardware.

The net80211 stack takes care of doing RUN<->SLEEP<->(other) state
transitions so we don't have to do it in the driver.

Next, when to wake things up:

* In short - everywhere we touch the hardware.
* The hardware will take care of staying awake if things are queued
  in the transmit queue(s); it'll then transit down to sleep if
  there's nothing left.  This way we don't have to track the
  software / hardware transmit queue(s) and keep the hardware
  awake for those.

Then, some transmit path fixes that aren't related but useful:

* Force EAPOL frames to go out at the lowest rate.  This improves
  reliability during the encryption handshake after 802.11
  negotiation.

Next, some reset path fixes!

* Fix the overlap between reset and transmit pause so we don't
  transmit frames during a reset.
* Some noisy environments will end up taking a lot longer to reset
  than normal, so extend the reset period and drop the raise the
  reset interval to be more realistic and give the hardware some
  time to finish calibration.
* Skip calibration during the reset path.  Tsk!

Then, beacon fixes in station mode!

* Add a _lot_ more debugging in the station beacon reset path.
  This is all quite fluid right now.
* Modify the STA beacon programming code to try and take
  the TU gap between desired TSF and the target TU into
  account.  (Lifted from QCA.)

Tested:

* AR5210
* AR5211
* AR5212
* AR5413
* AR5416
* AR9280
* AR9285

TODO:

* More AP, IBSS, mesh, TDMA testing
* Thorough AR9380 and later testing!
* AR9160 and AR9287 testing

Obtained from:	QCA
2014-04-30 02:19:41 +00:00
Adrian Chadd
f172ef758e Rewrite the cleanup code to, well, actually work right.
The existing cleanup code was based on the Atheros reference driver
from way back and stuff that was in Linux ath9k.  It turned out to be ..
rather silly.

Specifically:

* The whole method of determining whether there's hardware-queued frames
  was fragile and the BAW would never quite work right afterwards.

* The cleanup path wouldn't correctly pull apart aggregate frames in the
  queue, so frames would not be freed and the BAW wouldn't be correctly
  updated.

So to implement this:

* Pull the aggregate frames apart correctly and handle each separately;
* Make the atid->incomp counter just track the number of hardware queued
  frames rather than try to figure it out from the BAW;
* Modify the aggregate completion path to handle it as a single frame
  (atid->incomp tracks the one frame now, not the subframes) and
  remove the frames from the BAW before completing them as normal frames;
* Make sure bf->bf_next is NULled out correctly;
* Make both aggregate session and non-aggregate path frames now be
  handled via the incompletion path.

TODO:

* kill atid->incomp; the driver tracks the hardware queued frames
  for each TID and so we can just use that.

This is a stability fix that should be merged back to stable/10.

Tested:

* AR5416, STA

MFC after:	7 days
2014-04-21 06:07:08 +00:00
Adrian Chadd
1771c64935 * Modify the debugging output from pause/resume to note the TID and STA
MAC
* Now that the paused < 0 bugs have been identified, make the DPRINTF()
  a device_printf() again.  Anything else that shows up here needs to be
  fixed immediately.

Tested:

* AR5416, STA mode

MFC after:	7 days
2014-04-21 02:09:14 +00:00
Adrian Chadd
706bb44485 Make sure bf_next is NULL'ed out when we're completing up an aggregate
frame through the cleanup path.

Whilst here, fix the indenting for something I messed up.

Tested:

* AR5416, STA mode
2014-04-21 02:05:51 +00:00
Adrian Chadd
59fbb5304d Fix a cleanup hang if cleanup gets called _during_ an active cleanup.
During power save testing I noticed that the cleanup code is being
called during a RUN->RUN state transition.  It's because the net80211
stack is treating that (for reasons I don't quitey know yet) as a
reassociation and this calls the node cleanup code.  The reason it's
seeing a RUN->RUN transition is because during active power save
stuff it's possible that the RUN->SLEEP and SLEEP->RUN transitions
happen so quickly that the deferred net80211 vap state code
"loses" a transition, namely the intermediary SLEEP transition.

So, this was causing the node reassociation code to sometimes be called
twice in quick succession and this would result in ath_tx_tid_cleanup()
to be called again.  The code calling it would always call pause, and
then only call resume if the TID didn't have "cleanup_inprogress" set.
Unfortunately it didn't check if it was already set on entry, so it
would pause but not call resume.  Thus, paused would be called more
than once (once before each entry into ath-tx_tid_cleanup()) but resume
would only be called once when the cleanup state was finished.

This doesn't entirely fix all of the issues seen in the cleanup path
but it's a necessary first step.

Since this is a stability fix, it should be merged to stable/10 at some
point.

Tested:

* AR5416, STA mode

MFC after:	7 days
2014-04-21 01:02:49 +00:00
Adrian Chadd
42fdd8e726 Add some debugging and forcing of the BAW to match what the current
tracked BAW actually is.

The net80211 code that completes a BAR will set tid->txa_start (the
BAW start) to whatever value was called when sending the BAR.
Now, in case there's bugs in my driver code that cause the BAW
to slip along, we should make sure that the new BAW we start
at is actually what we currently have it at, not what we've sent.

This totally breaks the specification and so this stays a printf().
If it happens then I need to know and fix it.

Whilst here, add some debugging updates:

* add TID logging to places where it's useful;
* use SEQNO().
2014-04-08 07:14:14 +00:00
Adrian Chadd
8ec9220e81 Don't do continue inside the scheduler loop; we really need to check
if we've hit the end of the list and cycled around to the first
node again.

Obtained from:	DragonflyBSD
2014-04-08 07:10:52 +00:00
Adrian Chadd
1f7373066f Correct the actual definition of ath_tx_tid_filt_comp_single() to
match how it's used.

This is another bug that led to aggregate traffic hanging because
the BAW tracking stopped being accurate.  In this instance, a filtered
frame that exceeded retries would return a non-error, which would
mean the caller would never remove it from the BAW.  But it wouldn't
be added to the filtered list, so it would be lost forever.  There'd
thus be a hole in the BAW that would never get transmitted and
this leads to a traffic hang.

Tested:

* Routerstation Pro, AR9220 AP
2014-04-08 07:08:59 +00:00
Adrian Chadd
c5d230ab42 Add a comment explaining the obvious. 2014-04-08 07:01:27 +00:00
Adrian Chadd
a3fd3b1429 Don't resume a TID on each filtered frame completion - only do it if
we did suspend it.

The whole suspend/resume TID queue thing is supposed to be a matched
reference count - a subsystem (eg addba negotiation, BAR transmission,
filtered frames, etc) is supposed to call pause() once and then resume()
once.

ath_tx_tid_filt_comp_complete() is called upon the completion of any
filtered frame, regardless of whether the driver had aleady seen
a filtered frame and called pause().

So only call resume() if tid->isfiltered = 1, which indicates that
we had called pause() once.

This fixes a seemingly whacked and different problem - traffic hangs.

What was actually going on:

* There'd be some marginal link with crappy behaviour, causing filtered
  frames and BAR TXing to occur;
* A BAR TX would occur, setting the new BAW (block-ack window) to seqno n;
* .. and pause() would be called, blocking further transmission;
* A filtered frame completion would occur from the hardware, but with
  tid->isfiltered = 0 which indiciates we haven't actually marked
  the queue yet as filtered;
* ath_tx_tid_filt_comp_complete() would call resume(), continuing
  transmission;
* Some frames would be queued to the hardware, since the TID is now no
  longer paused;
* .. and if some make it out and ACked successfully, the new BAW
  may be seqno n+1 or more;
* .. then the BAR TX completes and sets the new seqno back to n.

At this point the BAW tracking would be loopy because the BAW
start was modified but the BAW ring buffer wasn't updated in lock
step.

Tested:

* Routerstation Pro + AR9220 AP
2014-04-08 07:00:43 +00:00
Adrian Chadd
6fc621c22c Throw the flush messages behind ATH_DEBUG_RESET as well.
These are needed to diagnose TX hangs that I and hiren are seeing.
Without it, the only way we'll see debugging is by having ATH_DEBUG_SW_TX
enabled and that is going to be very, very spammy.

ATH_DEBUG_RESET is fine; it's only going to be done during stuck beacon
situations in AP mode.

Whilst I'm here, and now that it's behind debugging, let's just disable
the "print only one" conditional.  I'll eventually make it more tunable.

Tested:

* AR9220, hostap mode.
2014-03-20 23:16:58 +00:00
Rui Paulo
a2be2710b4 Call ieee80211_dump_pkt() based on IFF_DUMPPKTS().
MFC after:	3 days
2014-03-08 19:35:31 +00:00
Kevin Lo
5945b5f5ab Rename definition of IEEE80211_FC1_WEP to IEEE80211_FC1_PROTECTED.
The origin of WEP comes from IEEE Std 802.11-1997 where it defines
whether the frame body of MAC frame has been encrypted using WEP
algorithm or not.
IEEE Std. 802.11-2007 changes WEP to Protected Frame, indicates
whether the frame is protected by a cryptographic encapsulation
algorithm.

Reviewed by:	adrian, rpaulo
2014-01-08 08:06:56 +00:00
Olivier Houchard
f431664c05 Include <sys/ktr.h>, since we need it if ATH_DEBUG is defined. 2013-10-28 20:26:34 +00:00
Gleb Smirnoff
76039bc84f The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
Rui Paulo
b372f122ab Add a missing comma. 2013-10-17 05:51:54 +00:00
Rui Paulo
83bbd5ebf9 Move a lot of debugging printf's to DPRINTF.
Approved by:	adrian
MFC after:	2 weeks
2013-10-17 01:53:07 +00:00
Adrian Chadd
272a8ab68a Log the MAC address of the node in question rather than the pointer. 2013-08-17 01:14:28 +00:00
Adrian Chadd
5da3fc1048 Shuffle around the cleanup unpause calls a bit. 2013-05-29 01:40:13 +00:00
Adrian Chadd
cd7dffd058 Migrate ath(4) to now use if_transmit instead of the legacy if_start
and if queue mechanism; also fix up (non-11n) TX fragment handling.

This may result in a bit of a performance drop for now but I plan on
debugging and resolving this at a later stage.

Whilst here, fix the transmit path so fragment transmission works.

The TX fragmentation handling is a bit more special.  In order to
correctly transmit TX fragments, there's a bunch of corner cases that
need to be handled:

* They must be transmitted back to back, in the same order..
* .. ie, you need to hold the TX lock whilst transmitting this
  set of fragments rather than interleaving it with other MSDUs
  destined to other nodes;
* The length of the next fragment is required when transmitting, in
  order to correctly set the NAV field in the current frame to the
  length of the next frame; which requires ..
* .. that we know the transmit duration of the next frame, which ..
* .. requires us to set the rate of all fragments to the same length,
  or make the decision up-front, etc.

To facilitate this, I've added a new ath_buf field to describe the
length of the next fragment.  This avoids having to keep the mbuf
chain together.  This used to work before my 11n TX path work because
the ath_tx_start() routine would be handed a single mbuf with m_nextpkt
pointing to the next frame, and that would be maintained all the way
up to when the duration calculation was done.  This doesn't hold
true any longer - the actual queuing may occur at any point in the
future (think ath_node TID software queuing) so this information
needs to be maintained.

Right now this does work for non-11n frames but it doesn't at all
enforce the same rate control decision for all frames in the fragment.
I plan on fixing this in a followup commit.

RTS/CTS has the same issue, I'll look at fixing this in a subsequent
commit.

Finaly, 11n fragment support requires the driver to have fully
decided what the rate scenario setup is - including 20/40MHz,
short/long GI, STBC, LDPC, number of streams, etc.  Right now that
decision is (currently) made _after_ the NAV field value is updated.
I'll fix all of this in subsequent commits.

Tested:

* AR5416, STA, transmitting 11abg fragments
* AR5416, STA, 11n fragments work but the NAV field is incorrect for
  the reasons above.

TODO:

* It would be nice to be able to queue mbufs per-node and per-TID so
  we can only queue ath_buf entries when it's time to assemble frames
  to send to the hardware.

  But honestly, we should just do that level of software queue management
  in net80211 rather than ath(4), so I'm going to leave this alone for now.

* More thorough AP, mesh and adhoc testing.

* Ensure that net80211 doesn't hand us fragmented frames when A-MPDU has
  been negotiated, as we can't do software retransmission of fragments.

* .. set CLRDMASK when transmitting fragments, just to ensure.
2013-05-26 22:23:39 +00:00
Adrian Chadd
72910f03e5 Implement a separate hardware queue threshold for aggregate and non-aggr
traffic.

When transmitting non-aggregate traffic, we need to keep the hardware
busy whilst transmitting or small bursts in txdone/tx latency will
kill us.

This restores non-aggregate iperf performance, especially when doing
TDMA.

Tested:

* AR5416<->AR5416, TDMA
* AR5416 STA <-> AR9280 AP
2013-05-21 18:13:57 +00:00
Adrian Chadd
6112d22c3f More non-ATH_DEBUG build fixes. 2013-05-19 01:33:17 +00:00
Adrian Chadd
9be82a4209 Be (very) careful about how to add more TX DMA work.
The list-based DMA engine has the following behaviour:

* When the DMA engine is in the init state, you can write the first
  descriptor address to the QCU TxDP register and it will work.

* Then when it hits the end of the list (ie, it either hits a NULL
  link pointer, OR it hits a descriptor with VEOL set) the QCU
  stops, and the TxDP points to the last descriptor that was transmitted.

* Then when you want to transmit a new frame, you can then either:
  + write the head of the new list into TxDP, or
  + you write the head of the new list into the link pointer of the
    last completed descriptor (ie, where TxDP points), then kick
    TxE to restart transmission on that QCU>

* The hardware then will re-read the descriptor to pick up the link
  pointer and then jump to that.

Now, the quirks:

* If you write a TxDP when there's been no previous TxDP (ie, it's 0),
  it works.

* If you write a TxDP in any other instance, the TxDP write may actually
  fail.  Thus, when you start transmission, it will re-read the last
  transmitted descriptor to get the link pointer, NOT just start a new
  transmission.

So the correct thing to do here is:

* ALWAYS use the holding descriptor (ie, the last transmitted descriptor
  that we've kept safe) and use the link pointer in _THAT_ to transmit
  the next frame.

* NEVER write to the TxDP after you've done the initial write.

* .. also, don't do this whilst you're also resetting the NIC.

With this in mind, the following patch does basically the above.

* Since this encapsulates Sam's issues with the QCU behaviour w/ TDMA,
  kill the TDMA special case and replace it with the above.

* Add a new TXQ flag - PUTRUNNING - which indicates that we've started
  DMA.

* Clear that flag when DMA has been shutdown.

* Ensure that we're not restarting DMA with PUTRUNNING enabled.

* Fix the link pointer logic during TXQ drain - we should always ensure
  the link pointer does point to something if there's a list of frames.
  Having it be NULL as an indication that DMA has finished or during
  a reset causes trouble.

Now, given all of this, i want to nuke axq_link from orbit.  There's now HAL
methods to get and set the link pointer of a descriptor, so what we
should do instead is to update the right link pointer.

* If there's a holding descriptor and an empty TXQ list, set the
  link pointer of said holding descriptor to the new frame.

* If there's a non-empty TXQ list, set the link pointer of the
  last descriptor in the list to the new frame.

* Nuke axq_link from orbit.

Note:

* The AR9380 doesn't need this.  FIFO TX writes are atomic.  As long as
  we don't append to a list of frames that we've already passed to the
  hardware, all of the above doesn't apply.  The holding descriptor stuff
  is still needed to ensure the hardware can re-read a completed
  descriptor to move onto the next one, but we restart DMA by pushing in
  a new FIFO entry into the TX QCU.  That doesn't require any real
  gymnastics.

Tested:

* AR5210, AR5211, AR5212, AR5416, AR9380 - STA mode.
2013-05-18 18:27:53 +00:00
Adrian Chadd
97c9a8e806 Add some more debugging printf()s to complain if the ath_buf tx queue
doesn't match the actual hardware queue this frame is queued to.

I'm trying to ensure that the holding buffers are actually being queued
to the same TX queue as the holding buffer that they end up on.
I'm pretty sure this is all correct so if this complains, it'll be due
to some kind of subtle broken-ness that needs fixing.

This is only done for legacy hardware, not EDMA hardware.

Tested:

* AR5416 STA mode, very lightly
2013-05-17 05:16:30 +00:00
Adrian Chadd
6d07d3e014 Tidy up the debugging - don't bother printing out TID pointers; now
that we are printing out the MAC address in these fields, just printing
out the TID is enough.
2013-05-16 17:53:12 +00:00
Adrian Chadd
b45a991e92 Limit the number of software queued frames when doing non-aggregation.
This should prevent the TX queue being filled with non-aggregate frames,
causing starvation and non-fair queue behaviour.
2013-05-16 17:46:32 +00:00
Adrian Chadd
22a3aee637 Implement my first cut at "correct" node power-save and
PS-POLL support.

This implements PS-POLL awareness i nthe

* Implement frame "leaking", which allows for a software queue
  to be scheduled even though it's asleep
* Track whether a frame has been leaked or not
* Leak out a single non-AMPDU frame when transmitting aggregates
* Queue BAR frames if the node is asleep
* Direct-dispatch the rest of control and management frames.
  This allows for things like re-association to occur (which involves
  sending probe req/resp as well as assoc request/response) when
  the node is asleep and then tries reassociating.
* Limit how many frames can set in the software node queue whilst
  the node is asleep.  net80211 is already buffering frames for us
  so this is mostly just paranoia.
* Add a PS-POLL method which leaks out a frame if there's something
  in the software queue, else it calls net80211's ps-poll routine.
  Since the ath PS-POLL routine marks the node as having a single frame
  to leak, either a software queued frame would leak, OR the next queued
  frame would leak. The next queued frame could be something from the
  net80211 power save queue, OR it could be a NULL frame from net80211.

TODO:

* Don't transmit further BAR frames (eg via a timeout) if the node is
  currently asleep.  Otherwise we may end up exhausting management frames
  due to the lots of queued BAR frames.

  I may just undo this bit later on and direct-dispatch BAR frames
  even if the node is asleep.

* It would be nice to burst out a single A-MPDU frame if both ends
  support this.  I may end adding a FreeBSD IE soon to negotiate
  this power save behaviour.

* I should make STAs timeout of power save mode if they've been in power
  save for more than a handful of seconds.  This way cards that get
  "stuck" in power save mode don't stay there for the "inactivity" timeout
  in net80211.

* Move the queue depth check into the driver layer (ath_start / ath_transmit)
  rather than doing it in the TX path.

* There could be some naughty corner cases with ps-poll leaking.
  Specifically, if net80211 generates a NULL data frame whilst another
  transmitter sends a normal data frame out net80211 output / transmit,
  we need to ensure that the NULL data frame goes out first.
  This is one of those things that should occur inside the VAP/ic TX lock.
  Grr, more investigations to do..

Tested:

* STA: AR5416, AR9280
* AP: AR5416, AR9280, AR9160
2013-05-15 18:33:05 +00:00