Commit Graph

465 Commits

Author SHA1 Message Date
adrian
eaac94fbfe Disable my software queue TIM and PS handling for now.
ps-poll is totally broken in its current form.

This should unbreak things enough to let people use PS-POLL devices,
but leave it in place for me to finish PS-POLL handling.
2012-11-07 06:29:45 +00:00
adrian
7a72be7da2 Add a new HAL call to extract out the HAL enterprise bits from the
AR9300 HAL.
2012-11-03 22:12:35 +00:00
adrian
2ef1c4bedf I give up - introduce a TX lock to serialise TX operations.
I've tried serialising TX using queues and such but unfortunately
due to how this interacts with the locking going on elsewhere in the
networking stack, the TX task gets delayed, resulting in quite a
noticable throughput loss:

* baseline TCP for 2x2 11n HT40 is ~ 170mbit/sec;
* TCP for TX task in the ath taskq, with the RX also going on - 80mbit/sec;
* TCP for TX task in a separate, second taskq - 100mbit/sec.

So for now I'm going with the Linux wireless stack approach - lock tx
early.  The linux code does in the wireless stack, before the 802.11
state stuff happens and before it's punted to the driver.
But TX locking needs to also occur at the driver layer as the TX
completion code _also_ begins to drain the ifnet TX queue.

Whilst I'm here, add some KTR traces for the TX path.

Note:

* This really should be done at the net80211 layer (as well, at least.)
  But that'll have to wait for a little more thought to happen.
2012-10-31 06:27:58 +00:00
adrian
25baa69114 Begin fleshing out some software queue awareness for TIM handling with
the power save queue.

* introduce some new ATH_NODE lock protected fields, tracking the
  net80211 psq and TIM state;
* when doing buffer transitions - ie, when sending and completing
  buffers - check the state of the SWQ and update the TIM appropriately.
* when clearing the TIM bit, if the SWQ is not empty then delay clearing
  it.

This is racy, but it's no less racy than the current net80211 power
save queue management code.  Specifically, with multiple TX threads,
it's quite plausible that parallel state updates will race and the
TIM will be left in an inconsistent state.  I'll address that in
a follow-up commit.
2012-10-28 21:13:12 +00:00
adrian
94c194db6e Add a temporary (for values of "temporary") work around for hotplug
support with ath(4) and VIMAGE.

Right now the VIMAGE code doesn't supply a default vnet context during:

* hotplug attach;
* any device detach.

It special cases kldload/boot time probing (by setting the context to
vnet0) but that doesn't occur when probing devices during a bus rescan -
eg, adding a cardbus card.

These will eventually go away when the VIMAGE support extends to providing
default contexts to hotplug attach/detach.
2012-10-28 18:46:06 +00:00
adrian
b75b22519f Push the actual TX processing into the ath taskqueue, rather than having
it run out of multiple concurrent contexts.

Right now the ath(4) TX processing is a bit hairy. Specifically:

* It was running out of ath_start(), which could occur from multiple
  concurrent sending processes (as if_start() can be started from multiple
  sending threads nowdays.. sigh)

* during RX if fast frames are enabled (so not really at the moment, not
  until I fix this particular feature again..)

* during ath_reset() - so anything which calls that

* during ath_tx_proc*() in the ath taskqueue - ie, TX is attempted again
  after TX completion, as there's now hopefully some ath_bufs available.

* Then, the ic_raw_xmit() method can queue raw frames for transmission
  at any time, from any net80211 TX context. Ew.

This has caused packet ordering issues in the past - specifically,
there's absolutely no guarantee that preemption won't occuring _during_
ath_start() by the TX completion processing, which will call ath_start()
again. It's a mess - 802.11 really, really wants things to be in
sequence or things go all kinds of loopy.

So:

* create a new task struct for TX'ing;
* make the if_start method simply queue the task on the ath taskqueue;
* make ath_start() just be called by the new TX task;
* make ath_tx_kick() just schedule the ath TX task, rather than directly
  calling ath_start().

Now yes, this means that I've taken a step backwards in terms of
concurrency - TX -and- RX now occur in the same single-task taskqueue.
But there's nothing stopping me from separating out the TX / TX completion
code into a separate taskqueue which runs in parallel with the RX path,
if that ends up being appropriate for some platforms.

This fixes the CCMP/seqno concurrency issues that creep up when you
transmit large amounts of uni-directional UDP traffic (>200MBit) on a
FreeBSD STA -> AP, as now there's only one TX context no matter what's
going on (TX completion->retry/software queue,
userland->net80211->ath_start(), TX completion -> ath_start());
but it won't fix any concurrency issues between raw transmitted frames
and non-raw transmitted frames (eg EAPOL frames on TID 16 and any other
TID 16 multicast traffic that gets put on the CABQ.)  That is going to
require a bunch more re-architecture before it's feasible to fix.

In any case, this is a big step towards making the majority of the TX
path locking irrelevant, as now almost all TX activity occurs in the
taskqueue.

Phew.
2012-10-14 20:44:08 +00:00
adrian
5beeacda7a Initialise an uninitialised variable. 2012-10-05 16:44:00 +00:00
adrian
78ead1fc05 Pause and unpause the software queues for a given node based on the
net80211 node power save state.

* Add an ATH_NODE_UNLOCK_ASSERT() check
* Add a new node field - an_is_powersave
* Pause/unpause the queue based on the node state
* Attempt to handle net80211 concurrency issues so the queue
  doesn't get paused/unpaused more than once at a time from
  the net80211 power save code.

Whilst here (and breaking my usual rule), set CLRDMASK when a queue
is unpaused, regardless of whether the queue has some pending traffic.
This means the first frame from that TID (now or later) will hvae
CLRDMASK set.

Also whilst here, bump the swretrymax counters whenever the
filtered frames code expires a frame.  Again, breaking my rule, but
this is just a statistics thing rather than a functional change.

This doesn't fix ps-poll (but it doesn't break it too much worse
than it is at the present) or correcting the TID updates.
That's next on the list.

Tested:
	* AR9220 AP (Atheros AP96 reference design)
	* Macbook Pro and LG Optimus 1 Android phone, both setting
	  and clearing power save state (but not using PS-POLL.)
2012-10-03 23:23:45 +00:00
adrian
d870dd946f Migrate the ath(4) KTR logging to use an ATH_KTR() macro.
This should eventually be unified with ATH_DEBUG() so I can get both
from one macro; that may take some time.

Add some new probes for TX and TX completion.
2012-09-24 20:35:56 +00:00
adrian
368dcf1975 Remove TDMA #define entries from if_ath.c; they now exist in if_ath_tdma.h. 2012-09-09 04:53:10 +00:00
adrian
13dd70acbf There's no nede to allocate a DMA map just before calling bus_dmamem_alloc().
In fact, bus_dmamem_alloc() happily NULLs the dmat pointer passed in,
before replacing it with its own.

This fixes a MIPS crash when kldload'ing if_ath/if_ath_pci -
bus_dmamap_destroy() was passed in a NULL dmat pointer and was doing
all kinds of very bad things.

Reviewed by:	scottl
2012-08-29 16:58:51 +00:00
adrian
2e17c6efe7 Implement a sequential descriptor ID value and stuff it in the ath_buf.
This will be used by the EDMA TX code to assign descriptor IDs in order
to provide some debugging.
2012-08-15 06:48:34 +00:00
adrian
efa454776b Break out the TX completion code into a separate function, so it can be
re-used by the upcoming EDMA TX completion code.

Make ath_stoptxdma() public, again so the EDMA TX code can use it.

Don't check for the TXQ bitmap in the ISR when doing EDMA work as it
doesn't apply for EDMA.
2012-08-14 22:32:20 +00:00
adrian
f315189440 Revert the ath_tx_draintxq() method, and instead teach it the minimum
necessary to "do" EDMA.

It was just using the TX completion status for logging information about
the descriptor completion.  Since with EDMA we don't know this without
checking the TX completion FIFO, we can't provide this information.
So don't.
2012-08-12 00:46:15 +00:00
adrian
815b21bf9d Break out ath_draintxq() into a method and un-methodize ath_tx_processq().
Now that I understand what's going on with this, I've realised that
it's going to be quite difficult to implement a processq method in
the EDMA case.  Because there's a separate TX status FIFO, I can't
just run processq() on each EDMA TXQ to see what's finished.
i have to actually run the TX status queue and handle individual
TXQs.

So:

* unmethodize ath_tx_processq();
* leave ath_tx_draintxq() as a method, as it only uses the completion status
  for debugging rather than actively completing the frames (ie, all frames
  here are failed);
* Methodize ath_draintxq().

The EDMA ath_draintxq() will have to take care of running the TX
completion FIFO before (potentially) freeing frames in the queue.

The only two places where ath_tx_draintxq() (on a single TXQ) are used:

* ath_draintxq(); and
* the CABQ handling in the beacon setup code - it drains the CABQ before
  populating the CABQ with frames for a new beacon (when doing multi-VAP
  operation.)

So it's quite possible that once I methodize the CABQ and beacon handling,
I can just drop ath_tx_draintxq() in its entirety.

Finally, it's also quite possible that I can remove ath_tx_draintxq()
in the future and just "teach" it to not check the status when doing
EDMA.
2012-08-12 00:37:29 +00:00
adrian
ce09176548 Extend the beacon code slightly to support AP mode beaconing for the
EDMA HAL hardware.

* The EDMA HAL code assumes the nexttbtt and intval values are in TU/8
  units, rather than TU.  For now, just "hack" around that here, at least
  until I code up something to translate it in the HAL.
* Setup some different TXQ flags for EDMA hardware.
* The EDMA HAL doesn't support setting the first rate series via
  ath_hal_setuptxdesc() - instead, a call to ath_hal_set11nratescenario()
  is always required.  So for now, just do an 11n rate series setup
  for EDMA beacon frames.

This allows my AR9380 to successfully transmit beacon frames.

However, CABQ TX and all normal data frame TX and TX completion is
still not functional and will require some more significant code churn
to make work.
2012-08-11 23:26:19 +00:00
adrian
01f36653db Allow 802.11n hardware to support multi-rate retry when RTS/CTS is
enabled.

The legacy (pre-802.11n) hardware doesn't support this - although
the AR5212 era hardware supports MRR, it doesn't have all the bits
needed to support MRR + RTS/CTS.  The AR5416 and later support
a packet duration and RTS/CTS flags per rate scenario, so we should
support it.

Tested:

* AR9280, STA

PR:		kern/170302
2012-07-31 23:54:15 +00:00
adrian
c9862ba17f Migrate some more TX side setup routines to be methods. 2012-07-31 03:09:48 +00:00
adrian
42e4d59244 Fix breakage introduced in r238824 - correctly calculate the descriptor
wrapping.

The previous code was only wrapping descriptor "block" boundaries rather
than individual descriptors.  It sounds equivalent but it isn't.

r238824 changed the descriptor allocation to enforce that an individual
descriptor doesn't wrap a 4KiB boundary rather than the whole block
of descriptors.  Eg, for TX descriptors, they're allocated in blocks
of 10 descriptors for each ath_buf (for scatter/gather DMA.)
2012-07-29 08:52:32 +00:00
adrian
f918a23c85 Add a missing call to ath_txdma_teardown(). 2012-07-28 04:40:52 +00:00
adrian
c4ee11cf19 Modify ath_descdma_cleanup() to handle ath_descdma instances with no
buffers.

ath_descdma is now being used for things other than the classical
combination of ath_buf + ath_desc allocations.  In this particular case,
don't try to free and blank out the ath_buf list if it's not passed in.
2012-07-27 10:38:17 +00:00
adrian
8c0c6fa06f Migrate the descriptor allocation function to not care about the number
of buffers, only the number of descriptors.

This involves:

* Change the allocation function to not use nbuf at all;
* When calling it, pass in "nbuf * ndesc" to correctly update how many
  descriptors are being allocated.

Whilst here, fix the descriptor allocation code to correctly allocate
a larger buffer size if the Merlin 4KB WAR is required.  It overallocates
descriptors when allocating a block that doesn't ever have a 4KB boundary
being crossed, but that can be fixed at a later stage.
2012-07-27 05:48:42 +00:00
adrian
e8e08eee9e Refactor out the descriptor allocation code from the buffer allocation
code.

The TX EDMA completion path is going to need descriptors allocated but
not any buffers.  This code will form the basis for that.
2012-07-27 05:34:45 +00:00
adrian
142df5fc8e Modify ath_descdma_setup() to take a descriptor size parameter.
The AR9300 and later descriptors are 128 bytes, however I'd like to make
sure that isn't used for earlier chips.

* Populate the TX descriptor length field in the softc with
  sizeof(ath_desc)

* Use this field when allocating the TX descriptors

* Pre-AR93xx TX/RX descriptors will use the ath_desc size; newer ones will
  query the HAL for these sizes.
2012-07-23 23:40:13 +00:00
adrian
c89f08ceb9 Begin separating out the TX DMA setup in preparation for TX EDMA support.
* Introduce TX DMA setup/teardown methods, mirroring what's done in
  the RX path.

  Although the TX DMA descriptor is setup via ath_desc_alloc() /
  ath_desc_free(), there TX status descriptor ring will be allocated
  in this path.

* Remove some of the TX EDMA capability probing from the RX path and
  push it into the new TX EDMA path.
2012-07-23 03:52:18 +00:00
adrian
f20813da65 Begin modifying the descriptor allocation functions to support a variable
sized TX descriptor.

This is required for the AR93xx EDMA support which requires 128 byte
TX descriptors (which is significantly larger than the earlier
hardware.)
2012-07-23 02:26:33 +00:00
adrian
ff4507cc5c Enable the basic node-based rate control statistics via an ioctl(). 2012-07-20 01:36:46 +00:00
adrian
852b8a4cb1 Ensure that error is set.
Noticed by:	rui
2012-07-14 05:51:54 +00:00
adrian
79bcabb67e Don't free the descriptor allocation/map if it doesn't exist.
I missed this in my previous commit.
2012-07-14 02:47:16 +00:00
adrian
00e578733c Fix EDMA RX to actually work without panicing the machine.
I was setting up the RX EDMA buffer to be 4096 bytes rather than the
RX data buffer portion.  The hardware was likely getting very confused
and DMAing descriptor portions into places it shouldn't, leading to
memory corruption and occasional panics.

Whilst here, don't bother allocating descriptors for the RX EDMA case.
We don't use those descriptors. Instead, just allocate ath_buf entries.
2012-07-14 02:07:51 +00:00
adrian
f0a77e77df Flip on EDMA RX of both HP and LP queue frames.
Yes, this is in the legacy interrupt path.  The NIC does support
MSI but I haven't yet sat down and written that code.
2012-07-10 07:43:31 +00:00
adrian
b89b88c83b Migrate the ATH_KTR_* fields out to if_ath_debug.h . 2012-07-10 06:11:39 +00:00
adrian
2977d109d8 Further preparations for the RX EDMA support.
Break out the DMA descriptor setup/teardown code into a method.
The EDMA RX code doesn't allocate descriptors, just ath_buf entries.
2012-07-09 08:37:59 +00:00
adrian
8af1316f3a Begin abstracting out the RX path in preparation for RX EDMA support.
The RX EDMA support requires a modified approach to the RX descriptor
handling.

Specifically:

* There's now two RX queues - high and low priority;
* The RX queues are implemented as FIFOs; they're now an array of pointers
  to buffers;
* .. and the RX buffer and descriptor are in the same "buffer", rather than
  being separate.

So to that end, this commit abstracts out most of the RX related functions
from the bulk of the driver.  Notably, the RX DMA/buffer allocation isn't
updated, primarily because I haven't yet fleshed out what it should look
like.

Whilst I'm here, create a set of matching but mostly unimplemented EDMA
stubs.

Tested:

  * AR9280, station mode

TODO:

  * Thorough AP and other mode testing for non-EDMA chips;
  * Figure out how to allocate RX buffers suitable for RX EDMA, including
    correctly setting the mbuf length to compensate for the RX descriptor
    and completion status area.
2012-07-03 06:59:12 +00:00
adrian
b5c3afbe7d Shuffle these initialisations to where they should be. 2012-06-24 08:28:06 +00:00
adrian
a5794ac4d0 Introduce an optional ath(4) radiotap vendor extension.
This includes a few new fields in each RXed frame:

* per chain RX RSSI (ctl and ext);
* current RX chainmask;
* EVM information;
* PHY error code;
* basic RX status bits (CRC error, PHY error, etc).

This is primarily to allow me to do some userland PHY error processing
for radar and spectral scan data.  However since EVM and per-chain RSSI
is provided, others may find it useful for a variety of tasks.

The default is to not compile in the radiotap vendor extensions, primarily
because tcpdump doesn't seem to handle the particular vendor extension
layout I'm using, and I'd rather not break existing code out there that
may be (badly) parsing the radiotap data.

Instead, add the option 'ATH_ENABLE_RADIOTAP_VENDOR_EXT' to your kernel
configuration file to enable these options.
2012-06-24 07:01:49 +00:00
adrian
656256d25b After some discussion with bschmidt@, it's likely better to just go
through ieee80211_suspend_all() and ieee80211_resume_all().
All the other wireless drivers are doing that particular dance.

PR:		kern/169084
2012-06-17 03:08:33 +00:00
adrian
e781c0c0fc oops, remove this, it wasn't supposed to be committed. 2012-06-16 22:26:45 +00:00
kib
df96c54547 Fix build. 2012-06-16 20:49:08 +00:00
adrian
1e66a452fb Shuffle some more fields in ath_buf so it's not too big.
This shaves off 20 bytes - from 288 bytes to 268 bytes.

However, it's still too big.
2012-06-16 04:41:35 +00:00
adrian
610e0d131b Convert ath(4) to just use ieee80211_suspend_all() and ieee80211_resume_all().
The existing code tries to use the beacon miss timer to signal that the AP
has gone away.  Unfortunately this doesn't seem to be behaving itself.
I'll try to investigate why this is for the sake of completeness.

The result is the STA will stay "associated" to the AP it was associated
with when it suspended.  It never receives a bmiss notification so it
never tries reassociating.

PR:		kern/169084
2012-06-15 01:15:59 +00:00
adrian
27ea453afe Disable BGSCAN for 802.11n for now. Until scanning during traffic is
fixed for 802.11n TX, this needs to be disabled or users wlil see randomly
hanging aggregation sessions.

Whilst I'm here, remove the warning about 802.11n being full of dragons.
It's nowhere near that scary now.
2012-06-14 04:14:06 +00:00
adrian
a65b3dd589 Implement a global (all non-mgmt traffic) TX ath_buf limitation when
ath_start() is called.

This (defaults to 10 frames) gives for a little headway in the TX ath_buf
allocation, so buffer cloning is still possible.

This requires a lot omre experimenting and tuning.

It also doesn't stop a node/TID from consuming all of the available
ath_buf's, especially when the node is going through high packet loss
or only talking at a low TX rate.  It also doesn't stop a paused TID
from taking all of the ath_bufs.  I'll look at fixing that up in subsequent
commits.

PR:	kern/168170
2012-06-14 00:51:53 +00:00
adrian
528dfae9f3 Implement a separate, smaller pool of ath_buf entries for use by management
traffic.

* Create sc_mgmt_txbuf and sc_mgmt_txdesc, initialise/free them appropriately.
* Create an enum to represent buffer types in the API.
* Extend ath_getbuf() and _ath_getbuf_locked() to take the above enum.
* Right now anything sent via ic_raw_xmit() allocates via ATH_BUFTYPE_MGMT.
  This may not be very useful.
* Add ATH_BUF_MGMT flag (ath_buf.bf_flags) which indicates the current buffer
  is a mgmt buffer and should go back onto the mgmt free list.
* Extend 'txagg' to include debugging output for both normal and mgmt txbufs.
* When checking/clearing ATH_BUF_BUSY, do it on both TX pools.

Tested:

* STA mode, with heavy UDP injection via iperf.  This filled the TX queue
  however BARs were still going out successfully.

TODO:

* Initialise the mgmt buffers with ATH_BUF_MGMT and then ensure the right
  type is being allocated and freed on the appropriate list.  That'd save
  a write operation (to bf->bf_flags) on each buffer alloc/free.

* Test on AP mode, ensure that BAR TX and probe responses go out nicely
  when the main TX queue is filled (eg with paused traffic to a TID,
  awaiting a BAR to complete.)

PR:		kern/168170
2012-06-13 06:57:55 +00:00
adrian
0e5e2a4303 Replace the direct sc_txbuf manipulation with a pair of functions.
This is preparation work for having a separate ath_buf queue for
management traffic.

PR:		kern/168170
2012-06-13 05:39:16 +00:00
adrian
d958c69a5f Add a new ioctl for ath(4) which returns the aggregate statistics. 2012-06-10 06:42:18 +00:00
adrian
b1336b8d21 Mostly revert previous commit(s). After doing a bunch of local testing,
it turns out that it negatively affects performance.  I'm stil investigating
exactly why deferring the IO causes such negative TCP performance but
doesn't affect UDP preformance.

Leave the ath_tx_kick() change in there however; it's going to be useful
to have that there for if_transmit() work.

PR:		kern/168649
2012-06-05 06:03:55 +00:00
adrian
6dd8c2eb6c Create a function - ath_tx_kick() - which is called where ath_start() is
called to "kick" along TX.

For now, schedule a taskqueue call.

Later on I may go back to the direct call of ath_rx_tasklet() - but for
now, this will do.

I've tested UDP and TCP TX. UDP TX still achieves 240MBit, but TCP
TX gets stuck at around 100MBit or so, instead of the 150MBit it should
be at.  I'll re-test with no ACPI/power/sleep states enabled at startup
and see what effect it has.

This is in preparation for supporting an if_transmit() path, which will
turn ath_tx_kick() into a NUL operation (as there won't be an ifnet
queue to service.)

Tested:
	* AR9280 STA

TODO:
	* test on AR5416, AR9160, AR928x STA/AP modes

PR:		kern/168649
2012-06-05 03:14:49 +00:00
adrian
673d798918 Migrate the TX path to a taskqueue for now, until a better way of
implementing parallel TX and TX/RX completion can be done without
simply abusing long-held locks.

Right now, multiple concurrent ath_start() entries can result in
frames being dequeued out of order.  Well, they're dequeued in order
fine, but if there's any preemption or race between CPUs between:

* removing the frame from the ifnet, and
* calling and runningath_tx_start(), until the frame is placed on a
  software or hardware TXQ

Then although dequeueing the frame is in-order, queueing it to the hardware
may be out of order.

This is solved in a lot of other drivers by just holding a TX lock over
a rather long period of time.  This lets them continue to direct dispatch
without races between dequeue and hardware queue.

Note to observers: if_transmit() doesn't necessarily solve this.
It removes the ifnet from the main path, but the same issue exists if
there's some intermediary queue (eg a bufring, which as an aside also
may pull in ifnet when you're using ALTQ.)

So, until I can sit down and code up a much better way of doing parallel
TX, I'm going to leave the TX path using a deferred taskqueue task.
What I will likely head towards is doing a direct dispatch to hardware
or software via if_transmit(), but it'll require some driver changes to
allow queues to be made without using the really large ath_buf / ath_desc
entries.

TODO:

* Look at how feasible it'll be to just do direct dispatch to
  ath_tx_start() from if_transmit(), avoiding doing _any_ intermediary
  serialisation into a global queue.  This may break ALTQ for example,
  so I have to be delicate.

* It's quite likely that I should break up ath_tx_start() so it
  deposits frames onto the software queues first, and then only fill
  in the 802.11 fields when it's being queued to the hardware.
  That will make the if_transmit() -> software queue path very
  quick and lightweight.

* This has some very bad behaviour when using ACPI and Cx states.
  I'll do some subsequent analysis using KTR and schedgraph and file
  a follow-up PR or two.

PR:		kern/168649
2012-06-04 22:01:12 +00:00
adrian
d6bb741dfb oops - ath_hal_disablepcie is actually destined for another purpose,
not to disable the PCIe PHY in prepration for reset.

Extend the enablepci method to have a "poweroff" flag, which if equal
to true means the hardware is about to go to sleep.
2012-05-25 05:01:27 +00:00