freebsd-skq

Author	SHA1	Message	Date
Adrian Chadd	72910f03e5	Implement a separate hardware queue threshold for aggregate and non-aggr traffic. When transmitting non-aggregate traffic, we need to keep the hardware busy whilst transmitting or small bursts in txdone/tx latency will kill us. This restores non-aggregate iperf performance, especially when doing TDMA. Tested: * AR5416<->AR5416, TDMA * AR5416 STA <-> AR9280 AP	2013-05-21 18:13:57 +00:00
Adrian Chadd	22a3aee637	Implement my first cut at "correct" node power-save and PS-POLL support. This implements PS-POLL awareness i nthe * Implement frame "leaking", which allows for a software queue to be scheduled even though it's asleep * Track whether a frame has been leaked or not * Leak out a single non-AMPDU frame when transmitting aggregates * Queue BAR frames if the node is asleep * Direct-dispatch the rest of control and management frames. This allows for things like re-association to occur (which involves sending probe req/resp as well as assoc request/response) when the node is asleep and then tries reassociating. * Limit how many frames can set in the software node queue whilst the node is asleep. net80211 is already buffering frames for us so this is mostly just paranoia. * Add a PS-POLL method which leaks out a frame if there's something in the software queue, else it calls net80211's ps-poll routine. Since the ath PS-POLL routine marks the node as having a single frame to leak, either a software queued frame would leak, OR the next queued frame would leak. The next queued frame could be something from the net80211 power save queue, OR it could be a NULL frame from net80211. TODO: * Don't transmit further BAR frames (eg via a timeout) if the node is currently asleep. Otherwise we may end up exhausting management frames due to the lots of queued BAR frames. I may just undo this bit later on and direct-dispatch BAR frames even if the node is asleep. * It would be nice to burst out a single A-MPDU frame if both ends support this. I may end adding a FreeBSD IE soon to negotiate this power save behaviour. * I should make STAs timeout of power save mode if they've been in power save for more than a handful of seconds. This way cards that get "stuck" in power save mode don't stay there for the "inactivity" timeout in net80211. * Move the queue depth check into the driver layer (ath_start / ath_transmit) rather than doing it in the TX path. * There could be some naughty corner cases with ps-poll leaking. Specifically, if net80211 generates a NULL data frame whilst another transmitter sends a normal data frame out net80211 output / transmit, we need to ensure that the NULL data frame goes out first. This is one of those things that should occur inside the VAP/ic TX lock. Grr, more investigations to do.. Tested: * STA: AR5416, AR9280 * AP: AR5416, AR9280, AR9160	2013-05-15 18:33:05 +00:00
Adrian Chadd	22780332ae	Oops, commit the other half of r250606.	2013-05-13 19:02:22 +00:00
Adrian Chadd	4a502c332a	Add a new option to limit the maximum size of aggregates. The default is to limit them to what the hardware is capable of. Add sysctl twiddles for both the non-RTS and RTS protected aggregate generation. Whilst here, add some comments about stuff that I've discovered during my exploration of the TX aggregate / delimiter setup path from the reference driver.	2013-02-21 06:18:40 +00:00
Adrian Chadd	bb327d284b	ALQ logging enhancements: * upon setup, tell the alq code what the chip information is. * add TX/RX path logging for legacy chips. * populate the tx/rx descriptor length fields with a best-estimate. It's overly big (96 bytes when AH_SUPPORT_AR5416 is enabled) but it'll do for now. Whilst I'm here, add CURVNET_RESTORE() here during probe/attach as a partial solution to fixing crashes during attach when the attach fails. There are other attach failures that I have to deal with; those'll come later.	2012-11-16 19:57:16 +00:00
Adrian Chadd	548a605d0d	Begin fleshing out some software queue awareness for TIM handling with the power save queue. * introduce some new ATH_NODE lock protected fields, tracking the net80211 psq and TIM state; * when doing buffer transitions - ie, when sending and completing buffers - check the state of the SWQ and update the TIM appropriately. * when clearing the TIM bit, if the SWQ is not empty then delay clearing it. This is racy, but it's no less racy than the current net80211 power save queue management code. Specifically, with multiple TX threads, it's quite plausible that parallel state updates will race and the TIM will be left in an inconsistent state. I'll address that in a follow-up commit.	2012-10-28 21:13:12 +00:00
Adrian Chadd	0eb8162623	Pause and unpause the software queues for a given node based on the net80211 node power save state. * Add an ATH_NODE_UNLOCK_ASSERT() check * Add a new node field - an_is_powersave * Pause/unpause the queue based on the node state * Attempt to handle net80211 concurrency issues so the queue doesn't get paused/unpaused more than once at a time from the net80211 power save code. Whilst here (and breaking my usual rule), set CLRDMASK when a queue is unpaused, regardless of whether the queue has some pending traffic. This means the first frame from that TID (now or later) will hvae CLRDMASK set. Also whilst here, bump the swretrymax counters whenever the filtered frames code expires a frame. Again, breaking my rule, but this is just a statistics thing rather than a functional change. This doesn't fix ps-poll (but it doesn't break it too much worse than it is at the present) or correcting the TID updates. That's next on the list. Tested: * AR9220 AP (Atheros AP96 reference design) * Macbook Pro and LG Optimus 1 Android phone, both setting and clearing power save state (but not using PS-POLL.)	2012-10-03 23:23:45 +00:00
Adrian Chadd	1762ec944a	Revert the ath_tx_draintxq() method, and instead teach it the minimum necessary to "do" EDMA. It was just using the TX completion status for logging information about the descriptor completion. Since with EDMA we don't know this without checking the TX completion FIFO, we can't provide this information. So don't.	2012-08-12 00:46:15 +00:00
Adrian Chadd	788e6aa99c	Break out ath_draintxq() into a method and un-methodize ath_tx_processq(). Now that I understand what's going on with this, I've realised that it's going to be quite difficult to implement a processq method in the EDMA case. Because there's a separate TX status FIFO, I can't just run processq() on each EDMA TXQ to see what's finished. i have to actually run the TX status queue and handle individual TXQs. So: * unmethodize ath_tx_processq(); * leave ath_tx_draintxq() as a method, as it only uses the completion status for debugging rather than actively completing the frames (ie, all frames here are failed); * Methodize ath_draintxq(). The EDMA ath_draintxq() will have to take care of running the TX completion FIFO before (potentially) freeing frames in the queue. The only two places where ath_tx_draintxq() (on a single TXQ) are used: * ath_draintxq(); and * the CABQ handling in the beacon setup code - it drains the CABQ before populating the CABQ with frames for a new beacon (when doing multi-VAP operation.) So it's quite possible that once I methodize the CABQ and beacon handling, I can just drop ath_tx_draintxq() in its entirety. Finally, it's also quite possible that I can remove ath_tx_draintxq() in the future and just "teach" it to not check the status when doing EDMA.	2012-08-12 00:37:29 +00:00
Adrian Chadd	f8418db57e	Migrate some more TX side setup routines to be methods.	2012-07-31 03:09:48 +00:00
Adrian Chadd	746bab5b7f	Break out the hardware handoff and TX DMA restart code into methods. These (and a few others) will differ based on the underlying DMA implementation. For the EDMA NICs, simply stub them out in a fashion which will let me focus on implementing the necessary descriptor API changes.	2012-07-31 02:28:32 +00:00
Adrian Chadd	3fdfc33024	Begin separating out the TX DMA setup in preparation for TX EDMA support. * Introduce TX DMA setup/teardown methods, mirroring what's done in the RX path. Although the TX DMA descriptor is setup via ath_desc_alloc() / ath_desc_free(), there TX status descriptor ring will be allocated in this path. * Remove some of the TX EDMA capability probing from the RX path and push it into the new TX EDMA path.	2012-07-23 03:52:18 +00:00
Adrian Chadd	a108d2d6c6	Revert r233227 and followup commits as it breaks CCMP PN replay detection. This showed up when doing heavy UDP throughput on SMP machines. The problem with this is because the 802.11 sequence number is being allocated separately to the CCMP PN replay number (which is assigned during ieee80211_crypto_encap()). Under significant throughput (200+ MBps) the TX path would be stressed enough that frame TX/retry would force sequence number and PN allocation to be out of order. So once the frames were reordered via 802.11 seqnos, the CCMP PN would be far out of order, causing most frames to be discarded by the receiver. I've fixed this in some local work by being forced to: (a) deal with the issues that lead to the parallel TX causing out of order sequence numbers in the first place; (b) fix all the packet queuing issues which lead to strange (but mostly valid) TX. I'll begin fixing these in a subsequent commit or five. PR: kern/166190	2012-06-11 06:59:28 +00:00
Adrian Chadd	0b96ef630b	Delay sequence number allocation for A-MPDU until just before the frame is queued to the hardware. Because multiple concurrent paths can execute ath_start(), multiple concurrent paths can push frames into the software/hardware TX queue and since preemption/interrupting can occur, there's the possibility that a gap in time will occur between allocating the sequence number and queuing it to the hardware. Because of this, it's possible that a thread will have allocated a sequence number and then be preempted by another thread doing the same. If the second thread sneaks the frame into the BAW, the (earlier) sequence number of the first frame will be now outside the BAW and will result in the frame being constantly re-added to the tail of the queue. There it will live until the sequence numbers cycle around again. This also creates a hole in the RX BAW tracking which can also cause issues. This patch delays the sequence number allocation to occur only just before the frame is going to be added to the BAW. I've been wanting to do this anyway as part of a general code tidyup but I've not gotten around to it. This fixes the PR. However, it still makes it quite difficult to try and ensure in-order queuing and dequeuing of frames. Since multiple copies of ath_start() can be run at the same time (eg one TXing process thread, one TX completion task/one RX task) the driver may end up having frames dequeued and pushed into the hardware slightly/occasionally out of order. And, to make matters more annoying, net80211 may have the same behaviour - in the non-aggregation case, the TX code allocates sequence numbers before it's thrown to the driver. I'll open another PR to investigate this and potentially introduce some kind of final-pass TX serialisation before frames are thrown to the hardware. It's also very likely worthwhile adding some debugging code into ath(4) and net80211 to catch when/if this does occur. PR: kern/166190	2012-03-20 04:50:25 +00:00
Adrian Chadd	eb6f0de09d	Introduce TX aggregation and software TX queue management for Atheros AR5416 and later wireless devices. This is a very large commit - the complete history can be found in the user/adrian/if_ath_tx branch. Legacy (ie, pre-AR5416) devices also use the per-software TXQ support and (in theory) can support non-aggregation ADDBA sessions. However, the net80211 stack doesn't currently support this. In summary: TX path: * queued frames normally go onto a per-TID, per-node queue * some special frames (eg ADDBA control frames) are thrown directly onto the relevant hardware queue so they can go out before any software queued frames are queued. * Add methods to create, suspend, resume and tear down an aggregation session. * Add in software retransmission of both normal and aggregate frames. * Add in completion handling of aggregate frames, including parsing the block ack bitmap provided by the hardware. * Write an aggregation function which can assemble frames into an aggregate based on the selected rate control and channel configuration. * The per-TID queues are locked based on their target hardware TX queue. This matches what ath9k/atheros does, and thus simplified porting over some of the aggregation logic. * When doing TX aggregation, stick the sequence number allocation in the TX path rather than net80211 TX path, and protect it by the TXQ lock. Rate control: * Delay rate control selection until the frame is about to be queued to the hardware, so retried frames can have their rate control choices changed. Frames with a static rate control selection have that applied before each TX, just to simplify the TX path (ie, not have "static" and "dynamic" rate control special cased.) * Teach ath_rate_sample about aggregates - both completion and errors. * Add an EWMA for tracking what the current "good" MCS rate is based on failure rates. Misc: * Introduce a bunch of dirty hacks and workarounds so TID mapping and net80211 frame inspection can be kept out of the net80211 layer. Because of the way this code works (and it's from Atheros and Linux ath9k), there is a consistent, 1:1 mapping between TID and AC. So we need to ensure that frames going to a specific TID will _always_ end up on the right AC, and vice versa, or the completion/locking will simply get very confused. I plan on addressing this mess in the future. Known issues: * There is no BAR frame transmission just yet. A whole lot of tidying up needs to occur before BAR frame TX can occur in the "correct" place - ie, once the TID TX queue has been drained. * Interface reset/purge/etc results in frames in the TX and RX queues being removed. This creates holes in the sequence numbers being assigned and the TX/RX AMPDU code (on either side) just hangs. * There's no filtered frame support at the present moment, so stations going into power saving mode will simply have a number of frames dropped - likely resulting in a traffic "hang". * Raw frame TX is going to just not function with 11n aggregation. Likely this needs to be modified to always override the sequence number if the frame is going into an aggregation session. However, general raw frame injection currently doesn't work in general in net80211, so let's just ignore this for now until this is sorted out. * HT protection is just not implemented and won't be until the above is sorted out. In addition, the AR5416 has issues RTS protecting large aggregates (anything >8k), so the work around needs to be ported and tested. Thus, this will be put on hold until the above work is complete. * The rate control module 'sample' is the only currently supported module; onoe/amrr haven't been tested and have likely bit rotted a little. I'll follow up with some commits to make them work again for non-11n rates, but they won't be updated to handle 11n and aggregation. If someone wishes to do so then they're welcome to send along patches. * .. and "sample" doesn't really do a good job of 11n TX. Specifically, the metrics used (packet TX time and failure/success rates) isn't as useful for 11n. It's likely that it should be extended to take into account the aggregate throughput possible and then choose a rate which maximises that. Ie, it may be acceptable for a higher MCS rate with a higher failure to be used if it gives a more acceptable throughput/latency then a lower MCS rate @ a lower error rate. Again, patches will be gratefully accepted. Because of this, ATH_ENABLE_11N is still not enabled by default. Sponsored by: Hobnob, Inc. Obtained from: Linux, Atheros	2011-11-08 22:43:13 +00:00
Adrian Chadd	8f939e7967	Merge in some fixes from the if_ath_tx branch. * Close down some of the kickpcu races, where the interrupt handler can and will run concurrently with the taskqueue. * Close down the TXQ active/completed race between the interrupt handler and the concurrently running tx completion taskqueue function. * Add some tx and rx interrupt count tracking, for debugging. * Fix the kickpcu logic in ath_rx_proc() to not simply drain and restart the TX queue - instead, assume the hardware isn't (too) confused and just restart RX DMA. This may break on previous chipsets, so if it does I'll add a HAL flag and conditionally handle this (ie, for broken chipsets, I'll just restore the "stop PCU / flush things / restart PCU" logic.) * Misc stuff Sponsored by: Hobnob, Inc.	2011-11-08 18:10:04 +00:00
Adrian Chadd	b8e788a53a	Migrate the TX path code out of if_ath and into a separate source file. There's two reasons for this: * the raw and non-raw TX path shares a lot of duplicate code which should be refactored; * the 11n-ready chip TX path needs a little reworking.	2011-01-29 11:35:23 +00:00

17 Commits