The pre-11n calculations include SIFS, but the 11n ones don't.
The reason is that (mostly) the 11n hardware is doing the SIFS calculation
for us but the pre-11n hardware isn't. This means that we're over-shooting
the times in the duration field for non-11n frames on 11n hardware, which
is OK, if not a little inefficient.
Now, this is all fine for what the hardware needs for doing duration math
for ACK, RTS/CTS, frame length, etc, but it isn't useful for doing PHY
duration calculations. Ie, given a frame to TX and its timestamp, what
would the end of the actual transmission time be; and similar for an
RX timestamp and figuring out its original length.
So, this adds a new field to the duration routines which requests
SIFS or no SIFS to be included. All the callers currently will call
it requesting SIFS, so this /should/ be a glorious no-op. I'm however
planning some future work around airtime fairness and positioning which
requires these routines to have SIFS be optional.
Notably though, the 11n version doesn't do any SIFS addition at the moment.
I'll go and tweak and verify all of the packet durations before I go and
flip that part on.
Tested:
* AR9330, STA mode
* AR9330, AP mode
* AR9380, STA mode
* the code already stored the length of the RX desc, which I never used.
So, use that and retire the new flag I introduced a while ago.
* Introduce a TX timestamp length field and capability.
* extend the TX timestamp to 32 bits, as the AR5416 and later does a full
32 bit TX timestamp instead of 15 or 16 bits.
* add RX descriptor fields for PHY uploaded information (coming soon)
* add flags for RX/TX fast timestamp, hardware upload, etc
* add a flag for TX to request ToD/ToA location information.
I keep asking myself "what do these fields mean" and so now I've clarified
it for myself.
Tested:
* Reading the comments, going "a-ha!" a couple times.
Approved by: re (gjb)
It turns out that getting decent performance requires stacking the TX
FIFO a little more aggressively.
* Ensure that when we complete a frame, we attempt to push a new frame
into the FIFO so TX is kept as active as it needs to be
* Be more aggressive about batching non-aggregate frames into a single
TX FIFO slot. This "fixes" TDMA performance (since we only get one
TX FIFO slot ungated per DMA beacon alert) but it does this by pushing
a whole lot of work into the TX FIFO slot.
I'm not /entirely/ pleased by this solution, but it does fix a whole bunch
of corner case issues in the transmit side and fix TDMA whilst I'm at it.
I'll go revisit transmit packet scheduling in ath(4) post 11.
Tested:
* AR9380, STA mode
* AR9580, hostap mode
* AR9380, TDMA client mode
Approved by: re (hrs)
This started showing up when doing lots of aggregate traffic. For TDMA it's
always no-ACK traffic and I didn't notice this, and I didn't notice it
when doing 11abg traffic as it didn't fail enough in a bad way to trigger
this.
This showed up as the fifo depth being < 0.
Eg:
Jun 19 09:23:07 gertrude kernel: ath0: ath_tx_edma_push_staging_list: queued 2 packets; depth=2, fifo depth=1
Jun 19 09:23:07 gertrude kernel: ath0: ath_edma_tx_processq: Q1, bf=0xfffffe000385f068, start=1, end=1
Jun 19 09:23:07 gertrude kernel: ath0: ath_edma_tx_processq: Q1: FIFO depth is now 0 (1)
Jun 19 09:23:07 gertrude kernel: ath0: ath_edma_tx_processq: Q1, bf=0xfffffe0003866fe8, start=0, end=1
Jun 19 09:23:07 gertrude kernel: ath0: ath_edma_tx_processq: Q1: FIFO depth is now -1 (0)
So, clear the flags before adding them to a TX queue, so if they're
re-added for the retransmit path it'll clear whatever they were and
not double-account the FIFOEND flag. Oops.
Tested:
* AR9380, STA mode, 11n iperf testing (~130mbit)
Approved by: re (delphij)
It turns out the frame scheduling policies (eg DBA_GATED) operate on
a single TX FIFO entry. ASAP scheduling is fine; those frames always
go out.
DBA-gated sets the TX queue ready when the DBA timer fires, which triggers
a beacon transmit. Normally this is used for content-after-beacon queue
(CABQ) work, which needs to burst out immediately after a beacon.
(eg broadcast, multicast, etc frames.) This is a general policy that you
can use for any queue, and Sam's TDMA code uses it.
When DBA_GATED is used and something like say, an 11e TX burst window,
it only operates on a single TX FIFO entry. If you have a single frame
per TX FIFO entry and say, a 2.5ms long burst window (eg TDMA!) then it'll
only burst a single frame every 2.5ms. If there's no gating (eg ASAP) then
the burst window is fine, and multiple TX FIFO slots get used.
The CABQ code does pack in a list of frames (ie, the whole cabq) but
up until this commit, the normal TX queues didn't. It showed up when
I started to debug TDMA on the AR9380 and later.
This commit doesn't fix the TDMA case - that's still broken here, because
all I'm doing here is allowing 'some' frames to be bursting, but I'm
certainly not filling the whole TX FIFO slot entry with frames.
Doing that 'properly' kind of requires me to take into account how long
packets should take to transmit and say, doing 1.5 or something times that
per TX FIFO slot, as if you partially transmit a slot, when it's next
gated it'll just finish that TX FIFO slot, then not advance to the next
one.
Now, I /also/ think queuing a new packet restarts DMA, but you have to
push new frames into the TX FIFO. I need to experiment some more with
this because if it's really the case, I will be able to do TDMA support
without the egregious hacks I have in my local tree. Sam's TDMA code
for previous chips would just kick the TXE bit to push along DMA
again, but we can't do that for EDMA chips - we /have/ to push a new
frame into the TX FIFO to restart DMA. Ugh.
Tested:
* AR9380, STA mode
* AR9380, hostap mode
* AR9580, hostap mode
Approved by: re (gjb)
Some later code I'll commit pushes lists of frames into the EDMA TX
FIFO, rather than a single frame at a time. The CABQ code already
pushes frame lists, but it turns out we should actually be doing it
in general or performance tanks. :(
This is the initial framework to call into the MCI HAL routines and drive
the basic state engine.
The MCI bluetooth coex model uses a command channel between wlan and
bluetooth, rather than a 2-wire or 3-wire signaling protocol to control things.
This means the wlan and bluetooth chip exchange a lot more information and
signaling, even at the per-packet level. The NICs in question can share
the input LNA and output PA on the die, so they absolutely can't stomp
on each other in a silly fashion. It also allows for the bluetooth side
to signal when profiles come and go, so the driver can take appropriate
control. There's also the possibility of dynamic bluetooth/wlan duty cycle
control which I haven't yet really played with.
It configures things up with a static "wlan wins everything" coexistence,
configures up the available 2GHz channel map for bluetooth, sets a static
duty cycle for bluetooth/wifi traffic priority and drives the basics needed to
keep the MCI HAL code happy.
It doesn't do any actual coexistence except to default to "wlan wins everything",
which at least demonstrates that things do indeed work. Bluetooth inquiry frames
still trump wifi (including beacons), so that demonstrates things really do
indeed seem to work.
Tested:
* AR9462 (WB222), STA mode + bt
* QCA9565 (WB335), STA mode + bt
TODO:
* .. the rest of coexistence. yes, bluetooth, not people. That stuff's hard.
* It doesn't do the initial BT side calibration, which requires a WLAN chip
reset. I'll fix up the reset path a bit more first before I enable that.
* The 1-ant and 2-ant configuration bits aren't being set correctly in
if_ath_btcoex.c - I'll dig into that and fix it in a subsequent commit.
* It's not enabled by default for WB222/WB225 even though I believe it now
can be - I'll chase that up in a subsequent commit.
Obtained from: Qualcomm Atheros, Linux ath9k
The legacy bits are just from ah.h; the MCI bits are from the ar9300
HAL "freebsd" extras.
A subsequent commit will include ah_btcoex.h into ah.h and remove
the older defintions.
This is like the WB222 coexistence (ie, "MCI", a message bus inside the
chip), and it's currently a cut/paste so I can start using it to flesh
out the differences with WB222.
It doesn't completely /do/ bluetooth coexistence, because it turns out
I need to add some contigmalloc'ed buffers to the btcoex path for this
type of hardware. I'm putting this work in the "people would like
to see functioning-ish btcoex before FreeBSD-11" bucket because I see
this as "broken".
Tested:
* QCA9535 (WB335) NIC, BT + 2GHz STA
Split getchannels() method in ath_hal/ah_regdomain.c into a subset
of functions for better readability.
Note: due to different internal structure, it cannot use
ieee80211_add_channel*() (however, some parts are done in a
similar manner).
Differential Revision: https://reviews.freebsd.org/D6139
I .. can't believe I missed this.
This showed up because the AP was TX'ing LDPC to an iwm(4) chipset,
which didn't advertise LDPC and doesn't /accept/ LDPC. Amusingly, all
the two other FreeBSD 11n parts I had tested with (AR9380, Intel 7260)
and I completely forgot to test on ye olde hardware.
That'll teach me.
Tested:
* AR9580 (AP) - Intel 7260 (STA), AR9380 (STA), Intel 6205 (STA)
LDPC adds better transmit reliability if both ends support it.
You in theory can do both STBC and LDPC at the same time.
If I see issues I'll disable it.
* Only enable it if both ends of a connection negotiate it.
* Disable it if any rate is non-11n.
* Count both LDPC TX and STBC TX.
Tested:
* AR9380, STA mode
This enables LDPC receive support for the AR9300 chips that support it.
It'll announce LDPC support via net80211.
Tested:
* AR9380, STA mode
* AR9331, (to verify the HAL didn't attach it to a chip which
doesn't support LDPC.)
TODO:
* Add in net80211 machinery to make this configurable at runtime.
Add support for the FHT_STBC_TX flag in iv_flags_ht, so it'll now obey
the per-vap ifconfig stbctx flag.
This means that we can do STBC TX on one vap and not another VAP.
(As well as STBC RX on said vap; that changes the HTCAP announcement.)
le*dec / le*enc functions.
Replace net80211 specific macros with system-wide bytestream
encoding/decoding functions:
- LE_READ_2 -> le16dec
- LE_READ_4 -> le32dec
- LE_WRITE_2 -> le16enc
- LE_WRITE_4 -> le32enc
+ drop ieee80211_input.h include, where it was included for these
operations only.
Reviewed by: adrian
Differential Revision: https://reviews.freebsd.org/D6030
* Don't use arbitrary frames for the average RX RSSI - only frames
from the current BSSID
* Don't log / do the syncbeacon logic for another BSSID and definitely
don't do the syncbeacon call if we miss beacons outside of STA mode.
* Don't do the IBSS merge bits if the current node plainly won't ever
match our current BSS (ie, the IBSS doesn't have to match, but all
the same bits that we check in ieee80211_ibss_merge() have to match.)
Tested:
* ath(4), AR9380, IBSS mode, surrounded by a lot of IBSS 11ac networks.
Sponsored by: Eva Automation, Inc.
It turns out that these will clash very annoyingly with the linux
macros in the linuxkpi layer, so let the wookie^Wlinux win.
The only user that I can find is ath(4), so fix it there too.
taskqueue_enqueue() was changed to support both fast and non-fast
taskqueues 10 years ago in r154167. It has been a compat shim ever
since. It's time for the compat shim to go.
Submitted by: Howard Su <howard0su@gmail.com>
Reviewed by: sephe
Differential Revision: https://reviews.freebsd.org/D5131
These are going to be much more efficient on low end embedded systems
but unfortunately they make it .. less convenient to implement correct
bus barriers and debugging. They also didn't implement the register
serialisation workaround required for Owl (AR5416.)
So, just remove them for now. Later on I'll just inline the routines
from ah_osdep.c.
The ath hal and driver code all assume the world is an x86 or the
bus layer does an explicit bus flush after each operation (eg netbsd.)
However, we don't do that.
So, to be "correct" on platforms like sparc64, mips and ppc (and maybe
ARM, I am not sure), just do explicit barriers after each operation.
Now, this does slow things down a tad on embedded platforms but I'd
rather things be "correct" versus "fast." At some later point if someone
wishes it to be fast then we should add the barrier calls to the HAL and
driver.
Tested:
* carambola 2 (AR9331.)
The synth programming here requires the real centre frequency,
which for HT20 channels is the normal channel, but HT40 is
/not/ the primary channel. Everything else was using 'freq',
which is the correct centre frequency, but the hornet config
was using 'ichan' to do the lookup which was also the primary
channel.
So, modify the HAL call that does the mapping to take a frequency
in MHz and return the channel number.
Tested:
* Carambola 2, AR9331, tested both HT/20 and HT/40 operation.
This probe/attaches correctly in my local branch and now displays
a useful message:
ath0: <Qualcomm Atheros QCA953x> at mem 0x18100000-0x1811ffff irq 0 on nexus0
...
ath0: AR9530 mac 1280.0 RF5110 phy 0.0