freebsd-skq

Author	SHA1	Message	Date
Adrian Chadd	bad98824c0	Break out the TX completion code into a separate function, so it can be re-used by the upcoming EDMA TX completion code. Make ath_stoptxdma() public, again so the EDMA TX code can use it. Don't check for the TXQ bitmap in the ISR when doing EDMA work as it doesn't apply for EDMA.	2012-08-14 22:32:20 +00:00
Adrian Chadd	1762ec944a	Revert the ath_tx_draintxq() method, and instead teach it the minimum necessary to "do" EDMA. It was just using the TX completion status for logging information about the descriptor completion. Since with EDMA we don't know this without checking the TX completion FIFO, we can't provide this information. So don't.	2012-08-12 00:46:15 +00:00
Adrian Chadd	788e6aa99c	Break out ath_draintxq() into a method and un-methodize ath_tx_processq(). Now that I understand what's going on with this, I've realised that it's going to be quite difficult to implement a processq method in the EDMA case. Because there's a separate TX status FIFO, I can't just run processq() on each EDMA TXQ to see what's finished. i have to actually run the TX status queue and handle individual TXQs. So: * unmethodize ath_tx_processq(); * leave ath_tx_draintxq() as a method, as it only uses the completion status for debugging rather than actively completing the frames (ie, all frames here are failed); * Methodize ath_draintxq(). The EDMA ath_draintxq() will have to take care of running the TX completion FIFO before (potentially) freeing frames in the queue. The only two places where ath_tx_draintxq() (on a single TXQ) are used: * ath_draintxq(); and * the CABQ handling in the beacon setup code - it drains the CABQ before populating the CABQ with frames for a new beacon (when doing multi-VAP operation.) So it's quite possible that once I methodize the CABQ and beacon handling, I can just drop ath_tx_draintxq() in its entirety. Finally, it's also quite possible that I can remove ath_tx_draintxq() in the future and just "teach" it to not check the status when doing EDMA.	2012-08-12 00:37:29 +00:00
Adrian Chadd	f8418db57e	Migrate some more TX side setup routines to be methods.	2012-07-31 03:09:48 +00:00
Adrian Chadd	b39722d6dd	Migrate the descriptor allocation function to not care about the number of buffers, only the number of descriptors. This involves: * Change the allocation function to not use nbuf at all; * When calling it, pass in "nbuf * ndesc" to correctly update how many descriptors are being allocated. Whilst here, fix the descriptor allocation code to correctly allocate a larger buffer size if the Merlin 4KB WAR is required. It overallocates descriptors when allocating a block that doesn't ever have a 4KB boundary being crossed, but that can be fixed at a later stage.	2012-07-27 05:48:42 +00:00
Adrian Chadd	c9f78537bc	Refactor out the descriptor allocation code from the buffer allocation code. The TX EDMA completion path is going to need descriptors allocated but not any buffers. This code will form the basis for that.	2012-07-27 05:34:45 +00:00
Adrian Chadd	1006fc0c3b	Modify ath_descdma_setup() to take a descriptor size parameter. The AR9300 and later descriptors are 128 bytes, however I'd like to make sure that isn't used for earlier chips. * Populate the TX descriptor length field in the softc with sizeof(ath_desc) * Use this field when allocating the TX descriptors * Pre-AR93xx TX/RX descriptors will use the ath_desc size; newer ones will query the HAL for these sizes.	2012-07-23 23:40:13 +00:00
Adrian Chadd	39abbd9bd2	Fix EDMA RX to actually work without panicing the machine. I was setting up the RX EDMA buffer to be 4096 bytes rather than the RX data buffer portion. The hardware was likely getting very confused and DMAing descriptor portions into places it shouldn't, leading to memory corruption and occasional panics. Whilst here, don't bother allocating descriptors for the RX EDMA case. We don't use those descriptors. Instead, just allocate ath_buf entries.	2012-07-14 02:07:51 +00:00
Adrian Chadd	3d184db2f8	Further preparations for the RX EDMA support. Break out the DMA descriptor setup/teardown code into a method. The EDMA RX code doesn't allocate descriptors, just ath_buf entries.	2012-07-09 08:37:59 +00:00
Adrian Chadd	af33d486ab	Implement a separate, smaller pool of ath_buf entries for use by management traffic. * Create sc_mgmt_txbuf and sc_mgmt_txdesc, initialise/free them appropriately. * Create an enum to represent buffer types in the API. * Extend ath_getbuf() and _ath_getbuf_locked() to take the above enum. * Right now anything sent via ic_raw_xmit() allocates via ATH_BUFTYPE_MGMT. This may not be very useful. * Add ATH_BUF_MGMT flag (ath_buf.bf_flags) which indicates the current buffer is a mgmt buffer and should go back onto the mgmt free list. * Extend 'txagg' to include debugging output for both normal and mgmt txbufs. * When checking/clearing ATH_BUF_BUSY, do it on both TX pools. Tested: * STA mode, with heavy UDP injection via iperf. This filled the TX queue however BARs were still going out successfully. TODO: * Initialise the mgmt buffers with ATH_BUF_MGMT and then ensure the right type is being allocated and freed on the appropriate list. That'd save a write operation (to bf->bf_flags) on each buffer alloc/free. * Test on AP mode, ensure that BAR TX and probe responses go out nicely when the main TX queue is filled (eg with paused traffic to a TID, awaiting a BAR to complete.) PR: kern/168170	2012-06-13 06:57:55 +00:00
Adrian Chadd	e1a50456b6	Replace the direct sc_txbuf manipulation with a pair of functions. This is preparation work for having a separate ath_buf queue for management traffic. PR: kern/168170	2012-06-13 05:39:16 +00:00
Adrian Chadd	9f95609828	Mostly revert previous commit(s). After doing a bunch of local testing, it turns out that it negatively affects performance. I'm stil investigating exactly why deferring the IO causes such negative TCP performance but doesn't affect UDP preformance. Leave the ath_tx_kick() change in there however; it's going to be useful to have that there for if_transmit() work. PR: kern/168649	2012-06-05 06:03:55 +00:00
Adrian Chadd	14d33c7e35	Create a function - ath_tx_kick() - which is called where ath_start() is called to "kick" along TX. For now, schedule a taskqueue call. Later on I may go back to the direct call of ath_rx_tasklet() - but for now, this will do. I've tested UDP and TCP TX. UDP TX still achieves 240MBit, but TCP TX gets stuck at around 100MBit or so, instead of the 150MBit it should be at. I'll re-test with no ACPI/power/sleep states enabled at startup and see what effect it has. This is in preparation for supporting an if_transmit() path, which will turn ath_tx_kick() into a NUL operation (as there won't be an ifnet queue to service.) Tested: * AR9280 STA TODO: * test on AR5416, AR9160, AR928x STA/AP modes PR: kern/168649	2012-06-05 03:14:49 +00:00
Adrian Chadd	470a7f4191	Migrate the TX path to a taskqueue for now, until a better way of implementing parallel TX and TX/RX completion can be done without simply abusing long-held locks. Right now, multiple concurrent ath_start() entries can result in frames being dequeued out of order. Well, they're dequeued in order fine, but if there's any preemption or race between CPUs between: * removing the frame from the ifnet, and * calling and runningath_tx_start(), until the frame is placed on a software or hardware TXQ Then although dequeueing the frame is in-order, queueing it to the hardware may be out of order. This is solved in a lot of other drivers by just holding a TX lock over a rather long period of time. This lets them continue to direct dispatch without races between dequeue and hardware queue. Note to observers: if_transmit() doesn't necessarily solve this. It removes the ifnet from the main path, but the same issue exists if there's some intermediary queue (eg a bufring, which as an aside also may pull in ifnet when you're using ALTQ.) So, until I can sit down and code up a much better way of doing parallel TX, I'm going to leave the TX path using a deferred taskqueue task. What I will likely head towards is doing a direct dispatch to hardware or software via if_transmit(), but it'll require some driver changes to allow queues to be made without using the really large ath_buf / ath_desc entries. TODO: * Look at how feasible it'll be to just do direct dispatch to ath_tx_start() from if_transmit(), avoiding doing _any_ intermediary serialisation into a global queue. This may break ALTQ for example, so I have to be delicate. * It's quite likely that I should break up ath_tx_start() so it deposits frames onto the software queues first, and then only fill in the 802.11 fields when it's being queued to the hardware. That will make the if_transmit() -> software queue path very quick and lightweight. * This has some very bad behaviour when using ACPI and Cx states. I'll do some subsequent analysis using KTR and schedgraph and file a follow-up PR or two. PR: kern/168649	2012-06-04 22:01:12 +00:00
Adrian Chadd	ba5c15d9ba	Migrate most of the beacon handling functions out to if_ath_beacon.c. This is also in preparation for supporting AR9300 and later NICs.	2012-05-20 04:14:29 +00:00
Adrian Chadd	a35dae8d87	Migrate the TDMA management functions out of if_ath.c into if_ath_tdma.c. There's some TX path TDMA code in if_ath_tx.c which should be migrated out, but first I should likely try and verify/fix/repair the TDMA support in 9.x and -HEAD.	2012-05-20 02:49:42 +00:00
Adrian Chadd	e60c4fc2c9	Migrate the bulk of the RX routines out from if_ath.c to if_ath_rx.[ch]. * migrate the rx processing out into if_ath_rx.c * migrate the TSF functions into if_ath_tsf.h, as inlines This is in prepration for supporting the EDMA RX routines, required to support the AR93xx series NICs. TODO: * ath_start() shouldn't be private, but it's called as part of the RX path. I should likely migrate ath_rx_tasklet() back into if_ath.c and then return this to be 'static'. The RX code really shouldn't need to see TX routines (and vice versa.) * ath_beacon_* should be in if_ath_beacon.[ch]. * ath_tdma_* should be in if_ath_tdma.[ch] ...	2012-05-20 02:05:10 +00:00
Adrian Chadd	eb6f0de09d	Introduce TX aggregation and software TX queue management for Atheros AR5416 and later wireless devices. This is a very large commit - the complete history can be found in the user/adrian/if_ath_tx branch. Legacy (ie, pre-AR5416) devices also use the per-software TXQ support and (in theory) can support non-aggregation ADDBA sessions. However, the net80211 stack doesn't currently support this. In summary: TX path: * queued frames normally go onto a per-TID, per-node queue * some special frames (eg ADDBA control frames) are thrown directly onto the relevant hardware queue so they can go out before any software queued frames are queued. * Add methods to create, suspend, resume and tear down an aggregation session. * Add in software retransmission of both normal and aggregate frames. * Add in completion handling of aggregate frames, including parsing the block ack bitmap provided by the hardware. * Write an aggregation function which can assemble frames into an aggregate based on the selected rate control and channel configuration. * The per-TID queues are locked based on their target hardware TX queue. This matches what ath9k/atheros does, and thus simplified porting over some of the aggregation logic. * When doing TX aggregation, stick the sequence number allocation in the TX path rather than net80211 TX path, and protect it by the TXQ lock. Rate control: * Delay rate control selection until the frame is about to be queued to the hardware, so retried frames can have their rate control choices changed. Frames with a static rate control selection have that applied before each TX, just to simplify the TX path (ie, not have "static" and "dynamic" rate control special cased.) * Teach ath_rate_sample about aggregates - both completion and errors. * Add an EWMA for tracking what the current "good" MCS rate is based on failure rates. Misc: * Introduce a bunch of dirty hacks and workarounds so TID mapping and net80211 frame inspection can be kept out of the net80211 layer. Because of the way this code works (and it's from Atheros and Linux ath9k), there is a consistent, 1:1 mapping between TID and AC. So we need to ensure that frames going to a specific TID will _always_ end up on the right AC, and vice versa, or the completion/locking will simply get very confused. I plan on addressing this mess in the future. Known issues: * There is no BAR frame transmission just yet. A whole lot of tidying up needs to occur before BAR frame TX can occur in the "correct" place - ie, once the TID TX queue has been drained. * Interface reset/purge/etc results in frames in the TX and RX queues being removed. This creates holes in the sequence numbers being assigned and the TX/RX AMPDU code (on either side) just hangs. * There's no filtered frame support at the present moment, so stations going into power saving mode will simply have a number of frames dropped - likely resulting in a traffic "hang". * Raw frame TX is going to just not function with 11n aggregation. Likely this needs to be modified to always override the sequence number if the frame is going into an aggregation session. However, general raw frame injection currently doesn't work in general in net80211, so let's just ignore this for now until this is sorted out. * HT protection is just not implemented and won't be until the above is sorted out. In addition, the AR5416 has issues RTS protecting large aggregates (anything >8k), so the work around needs to be ported and tested. Thus, this will be put on hold until the above work is complete. * The rate control module 'sample' is the only currently supported module; onoe/amrr haven't been tested and have likely bit rotted a little. I'll follow up with some commits to make them work again for non-11n rates, but they won't be updated to handle 11n and aggregation. If someone wishes to do so then they're welcome to send along patches. * .. and "sample" doesn't really do a good job of 11n TX. Specifically, the metrics used (packet TX time and failure/success rates) isn't as useful for 11n. It's likely that it should be extended to take into account the aggregate throughput possible and then choose a rate which maximises that. Ie, it may be acceptable for a higher MCS rate with a higher failure to be used if it gives a more acceptable throughput/latency then a lower MCS rate @ a lower error rate. Again, patches will be gratefully accepted. Because of this, ATH_ENABLE_11N is still not enabled by default. Sponsored by: Hobnob, Inc. Obtained from: Linux, Atheros	2011-11-08 22:43:13 +00:00
Adrian Chadd	9352fb7ab0	Refactor out the TX buffer management and completion code in preparation for TX aggregation. * Add in logic which calls ath_buf bf->bf_comp if it's set. This allows for AMPDU (and RIFS, and FF, if someone desires) code to handle completion - which includes freeing subframes, retransmitting subframes, etc. * Break out the buffer free, buffer busy/unbusy default completion handler code into separate functions. This allows bf_comp methods to free and unbusy each subframe ath_buf as required. * Break out the statistics update code into a separate function, just to clean up the TX completion path a little. Sponsored by: Hobnob, Inc.	2011-11-08 21:49:33 +00:00
Adrian Chadd	e346b0732c	Change ath_buf allocation to: * Immediately return NULL if a buffer isn't available; * Track the "buffers not available" count; * Clear some fields used for tx aggregation; * Add ath_buf_clone() which clones the majority of buffer state. This is needed when retransmission of a "busy" buffer is required. Sponsored by: Hobnob, Inc.	2011-11-08 21:13:05 +00:00
Adrian Chadd	517526efe8	In preparation for supporting 11n TX/RX properly, allow for TX queue draining and interface resets to be marked as ATH_RESET_DEFAULT, ATH_RESET_FULL, ATH_RESET_NOLOSS. Currently a reset is still a reset - ie, all tx/rx frames in the hardware queues are purged. This means that those frames will be lost to the 11n TX and RX aggregation state tracking, breaking AMPDU sessions. The (eventual) new semantics: * ATH_RESET_DEFAULT: full reset, this is the default for reset situations which I haven't yet figured out what they should be. * ATH_RESET_FULL: A full reset - for things such as channel changes. * ATH_RESET_NOLOSS: Don't flush TX/RX queues - handle pending RX frames and leave TX frames where they are; restart TX DMA from where it was.	2011-11-08 18:56:52 +00:00
Adrian Chadd	6079fdbede	Migrate the sysctl related routines (statistics, debugging, etc) out of if_ath.c and into if_ath_sysctl.c .	2011-03-02 16:03:19 +00:00
Adrian Chadd	b8e788a53a	Migrate the TX path code out of if_ath and into a separate source file. There's two reasons for this: * the raw and non-raw TX path shares a lot of duplicate code which should be refactored; * the 11n-ready chip TX path needs a little reworking.	2011-01-29 11:35:23 +00:00

23 Commits