freebsd-skq

Author	SHA1	Message	Date
Adrian Chadd	59fbb5304d	Fix a cleanup hang if cleanup gets called _during_ an active cleanup. During power save testing I noticed that the cleanup code is being called during a RUN->RUN state transition. It's because the net80211 stack is treating that (for reasons I don't quitey know yet) as a reassociation and this calls the node cleanup code. The reason it's seeing a RUN->RUN transition is because during active power save stuff it's possible that the RUN->SLEEP and SLEEP->RUN transitions happen so quickly that the deferred net80211 vap state code "loses" a transition, namely the intermediary SLEEP transition. So, this was causing the node reassociation code to sometimes be called twice in quick succession and this would result in ath_tx_tid_cleanup() to be called again. The code calling it would always call pause, and then only call resume if the TID didn't have "cleanup_inprogress" set. Unfortunately it didn't check if it was already set on entry, so it would pause but not call resume. Thus, paused would be called more than once (once before each entry into ath-tx_tid_cleanup()) but resume would only be called once when the cleanup state was finished. This doesn't entirely fix all of the issues seen in the cleanup path but it's a necessary first step. Since this is a stability fix, it should be merged to stable/10 at some point. Tested: * AR5416, STA mode MFC after: 7 days	2014-04-21 01:02:49 +00:00
Adrian Chadd	42fdd8e726	Add some debugging and forcing of the BAW to match what the current tracked BAW actually is. The net80211 code that completes a BAR will set tid->txa_start (the BAW start) to whatever value was called when sending the BAR. Now, in case there's bugs in my driver code that cause the BAW to slip along, we should make sure that the new BAW we start at is actually what we currently have it at, not what we've sent. This totally breaks the specification and so this stays a printf(). If it happens then I need to know and fix it. Whilst here, add some debugging updates: * add TID logging to places where it's useful; * use SEQNO().	2014-04-08 07:14:14 +00:00
Adrian Chadd	8ec9220e81	Don't do continue inside the scheduler loop; we really need to check if we've hit the end of the list and cycled around to the first node again. Obtained from: DragonflyBSD	2014-04-08 07:10:52 +00:00
Adrian Chadd	1f7373066f	Correct the actual definition of ath_tx_tid_filt_comp_single() to match how it's used. This is another bug that led to aggregate traffic hanging because the BAW tracking stopped being accurate. In this instance, a filtered frame that exceeded retries would return a non-error, which would mean the caller would never remove it from the BAW. But it wouldn't be added to the filtered list, so it would be lost forever. There'd thus be a hole in the BAW that would never get transmitted and this leads to a traffic hang. Tested: * Routerstation Pro, AR9220 AP	2014-04-08 07:08:59 +00:00
Adrian Chadd	c5d230ab42	Add a comment explaining the obvious.	2014-04-08 07:01:27 +00:00
Adrian Chadd	a3fd3b1429	Don't resume a TID on each filtered frame completion - only do it if we did suspend it. The whole suspend/resume TID queue thing is supposed to be a matched reference count - a subsystem (eg addba negotiation, BAR transmission, filtered frames, etc) is supposed to call pause() once and then resume() once. ath_tx_tid_filt_comp_complete() is called upon the completion of any filtered frame, regardless of whether the driver had aleady seen a filtered frame and called pause(). So only call resume() if tid->isfiltered = 1, which indicates that we had called pause() once. This fixes a seemingly whacked and different problem - traffic hangs. What was actually going on: * There'd be some marginal link with crappy behaviour, causing filtered frames and BAR TXing to occur; * A BAR TX would occur, setting the new BAW (block-ack window) to seqno n; * .. and pause() would be called, blocking further transmission; * A filtered frame completion would occur from the hardware, but with tid->isfiltered = 0 which indiciates we haven't actually marked the queue yet as filtered; * ath_tx_tid_filt_comp_complete() would call resume(), continuing transmission; * Some frames would be queued to the hardware, since the TID is now no longer paused; * .. and if some make it out and ACked successfully, the new BAW may be seqno n+1 or more; * .. then the BAR TX completes and sets the new seqno back to n. At this point the BAW tracking would be loopy because the BAW start was modified but the BAW ring buffer wasn't updated in lock step. Tested: * Routerstation Pro + AR9220 AP	2014-04-08 07:00:43 +00:00
Adrian Chadd	6fc621c22c	Throw the flush messages behind ATH_DEBUG_RESET as well. These are needed to diagnose TX hangs that I and hiren are seeing. Without it, the only way we'll see debugging is by having ATH_DEBUG_SW_TX enabled and that is going to be very, very spammy. ATH_DEBUG_RESET is fine; it's only going to be done during stuck beacon situations in AP mode. Whilst I'm here, and now that it's behind debugging, let's just disable the "print only one" conditional. I'll eventually make it more tunable. Tested: * AR9220, hostap mode.	2014-03-20 23:16:58 +00:00
Rui Paulo	a2be2710b4	Call ieee80211_dump_pkt() based on IFF_DUMPPKTS(). MFC after: 3 days	2014-03-08 19:35:31 +00:00
Kevin Lo	5945b5f5ab	Rename definition of IEEE80211_FC1_WEP to IEEE80211_FC1_PROTECTED. The origin of WEP comes from IEEE Std 802.11-1997 where it defines whether the frame body of MAC frame has been encrypted using WEP algorithm or not. IEEE Std. 802.11-2007 changes WEP to Protected Frame, indicates whether the frame is protected by a cryptographic encapsulation algorithm. Reviewed by: adrian, rpaulo	2014-01-08 08:06:56 +00:00
Olivier Houchard	f431664c05	Include <sys/ktr.h>, since we need it if ATH_DEBUG is defined.	2013-10-28 20:26:34 +00:00
Gleb Smirnoff	76039bc84f	The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 17:58:36 +00:00
Rui Paulo	b372f122ab	Add a missing comma.	2013-10-17 05:51:54 +00:00
Rui Paulo	83bbd5ebf9	Move a lot of debugging printf's to DPRINTF. Approved by: adrian MFC after: 2 weeks	2013-10-17 01:53:07 +00:00
Adrian Chadd	272a8ab68a	Log the MAC address of the node in question rather than the pointer.	2013-08-17 01:14:28 +00:00
Adrian Chadd	5da3fc1048	Shuffle around the cleanup unpause calls a bit.	2013-05-29 01:40:13 +00:00
Adrian Chadd	cd7dffd058	Migrate ath(4) to now use if_transmit instead of the legacy if_start and if queue mechanism; also fix up (non-11n) TX fragment handling. This may result in a bit of a performance drop for now but I plan on debugging and resolving this at a later stage. Whilst here, fix the transmit path so fragment transmission works. The TX fragmentation handling is a bit more special. In order to correctly transmit TX fragments, there's a bunch of corner cases that need to be handled: * They must be transmitted back to back, in the same order.. * .. ie, you need to hold the TX lock whilst transmitting this set of fragments rather than interleaving it with other MSDUs destined to other nodes; * The length of the next fragment is required when transmitting, in order to correctly set the NAV field in the current frame to the length of the next frame; which requires .. * .. that we know the transmit duration of the next frame, which .. * .. requires us to set the rate of all fragments to the same length, or make the decision up-front, etc. To facilitate this, I've added a new ath_buf field to describe the length of the next fragment. This avoids having to keep the mbuf chain together. This used to work before my 11n TX path work because the ath_tx_start() routine would be handed a single mbuf with m_nextpkt pointing to the next frame, and that would be maintained all the way up to when the duration calculation was done. This doesn't hold true any longer - the actual queuing may occur at any point in the future (think ath_node TID software queuing) so this information needs to be maintained. Right now this does work for non-11n frames but it doesn't at all enforce the same rate control decision for all frames in the fragment. I plan on fixing this in a followup commit. RTS/CTS has the same issue, I'll look at fixing this in a subsequent commit. Finaly, 11n fragment support requires the driver to have fully decided what the rate scenario setup is - including 20/40MHz, short/long GI, STBC, LDPC, number of streams, etc. Right now that decision is (currently) made _after_ the NAV field value is updated. I'll fix all of this in subsequent commits. Tested: * AR5416, STA, transmitting 11abg fragments * AR5416, STA, 11n fragments work but the NAV field is incorrect for the reasons above. TODO: * It would be nice to be able to queue mbufs per-node and per-TID so we can only queue ath_buf entries when it's time to assemble frames to send to the hardware. But honestly, we should just do that level of software queue management in net80211 rather than ath(4), so I'm going to leave this alone for now. * More thorough AP, mesh and adhoc testing. * Ensure that net80211 doesn't hand us fragmented frames when A-MPDU has been negotiated, as we can't do software retransmission of fragments. * .. set CLRDMASK when transmitting fragments, just to ensure.	2013-05-26 22:23:39 +00:00
Adrian Chadd	72910f03e5	Implement a separate hardware queue threshold for aggregate and non-aggr traffic. When transmitting non-aggregate traffic, we need to keep the hardware busy whilst transmitting or small bursts in txdone/tx latency will kill us. This restores non-aggregate iperf performance, especially when doing TDMA. Tested: * AR5416<->AR5416, TDMA * AR5416 STA <-> AR9280 AP	2013-05-21 18:13:57 +00:00
Adrian Chadd	6112d22c3f	More non-ATH_DEBUG build fixes.	2013-05-19 01:33:17 +00:00
Adrian Chadd	9be82a4209	Be (very) careful about how to add more TX DMA work. The list-based DMA engine has the following behaviour: * When the DMA engine is in the init state, you can write the first descriptor address to the QCU TxDP register and it will work. * Then when it hits the end of the list (ie, it either hits a NULL link pointer, OR it hits a descriptor with VEOL set) the QCU stops, and the TxDP points to the last descriptor that was transmitted. * Then when you want to transmit a new frame, you can then either: + write the head of the new list into TxDP, or + you write the head of the new list into the link pointer of the last completed descriptor (ie, where TxDP points), then kick TxE to restart transmission on that QCU> * The hardware then will re-read the descriptor to pick up the link pointer and then jump to that. Now, the quirks: * If you write a TxDP when there's been no previous TxDP (ie, it's 0), it works. * If you write a TxDP in any other instance, the TxDP write may actually fail. Thus, when you start transmission, it will re-read the last transmitted descriptor to get the link pointer, NOT just start a new transmission. So the correct thing to do here is: * ALWAYS use the holding descriptor (ie, the last transmitted descriptor that we've kept safe) and use the link pointer in _THAT_ to transmit the next frame. * NEVER write to the TxDP after you've done the initial write. * .. also, don't do this whilst you're also resetting the NIC. With this in mind, the following patch does basically the above. * Since this encapsulates Sam's issues with the QCU behaviour w/ TDMA, kill the TDMA special case and replace it with the above. * Add a new TXQ flag - PUTRUNNING - which indicates that we've started DMA. * Clear that flag when DMA has been shutdown. * Ensure that we're not restarting DMA with PUTRUNNING enabled. * Fix the link pointer logic during TXQ drain - we should always ensure the link pointer does point to something if there's a list of frames. Having it be NULL as an indication that DMA has finished or during a reset causes trouble. Now, given all of this, i want to nuke axq_link from orbit. There's now HAL methods to get and set the link pointer of a descriptor, so what we should do instead is to update the right link pointer. * If there's a holding descriptor and an empty TXQ list, set the link pointer of said holding descriptor to the new frame. * If there's a non-empty TXQ list, set the link pointer of the last descriptor in the list to the new frame. * Nuke axq_link from orbit. Note: * The AR9380 doesn't need this. FIFO TX writes are atomic. As long as we don't append to a list of frames that we've already passed to the hardware, all of the above doesn't apply. The holding descriptor stuff is still needed to ensure the hardware can re-read a completed descriptor to move onto the next one, but we restart DMA by pushing in a new FIFO entry into the TX QCU. That doesn't require any real gymnastics. Tested: * AR5210, AR5211, AR5212, AR5416, AR9380 - STA mode.	2013-05-18 18:27:53 +00:00
Adrian Chadd	97c9a8e806	Add some more debugging printf()s to complain if the ath_buf tx queue doesn't match the actual hardware queue this frame is queued to. I'm trying to ensure that the holding buffers are actually being queued to the same TX queue as the holding buffer that they end up on. I'm pretty sure this is all correct so if this complains, it'll be due to some kind of subtle broken-ness that needs fixing. This is only done for legacy hardware, not EDMA hardware. Tested: * AR5416 STA mode, very lightly	2013-05-17 05:16:30 +00:00
Adrian Chadd	6d07d3e014	Tidy up the debugging - don't bother printing out TID pointers; now that we are printing out the MAC address in these fields, just printing out the TID is enough.	2013-05-16 17:53:12 +00:00
Adrian Chadd	b45a991e92	Limit the number of software queued frames when doing non-aggregation. This should prevent the TX queue being filled with non-aggregate frames, causing starvation and non-fair queue behaviour.	2013-05-16 17:46:32 +00:00
Adrian Chadd	22a3aee637	Implement my first cut at "correct" node power-save and PS-POLL support. This implements PS-POLL awareness i nthe * Implement frame "leaking", which allows for a software queue to be scheduled even though it's asleep * Track whether a frame has been leaked or not * Leak out a single non-AMPDU frame when transmitting aggregates * Queue BAR frames if the node is asleep * Direct-dispatch the rest of control and management frames. This allows for things like re-association to occur (which involves sending probe req/resp as well as assoc request/response) when the node is asleep and then tries reassociating. * Limit how many frames can set in the software node queue whilst the node is asleep. net80211 is already buffering frames for us so this is mostly just paranoia. * Add a PS-POLL method which leaks out a frame if there's something in the software queue, else it calls net80211's ps-poll routine. Since the ath PS-POLL routine marks the node as having a single frame to leak, either a software queued frame would leak, OR the next queued frame would leak. The next queued frame could be something from the net80211 power save queue, OR it could be a NULL frame from net80211. TODO: * Don't transmit further BAR frames (eg via a timeout) if the node is currently asleep. Otherwise we may end up exhausting management frames due to the lots of queued BAR frames. I may just undo this bit later on and direct-dispatch BAR frames even if the node is asleep. * It would be nice to burst out a single A-MPDU frame if both ends support this. I may end adding a FreeBSD IE soon to negotiate this power save behaviour. * I should make STAs timeout of power save mode if they've been in power save for more than a handful of seconds. This way cards that get "stuck" in power save mode don't stay there for the "inactivity" timeout in net80211. * Move the queue depth check into the driver layer (ath_start / ath_transmit) rather than doing it in the TX path. * There could be some naughty corner cases with ps-poll leaking. Specifically, if net80211 generates a NULL data frame whilst another transmitter sends a normal data frame out net80211 output / transmit, we need to ensure that the NULL data frame goes out first. This is one of those things that should occur inside the VAP/ic TX lock. Grr, more investigations to do.. Tested: * STA: AR5416, AR9280 * AP: AR5416, AR9280, AR9160	2013-05-15 18:33:05 +00:00
Adrian Chadd	9b48fb4b32	Improve the debugging output - use the MAC address rather than various pointer values everywhere.	2013-05-13 19:52:35 +00:00
Adrian Chadd	22780332ae	Oops, commit the other half of r250606.	2013-05-13 19:02:22 +00:00
Adrian Chadd	55cf0326a1	Simplify this bit of code!	2013-05-07 07:44:07 +00:00
Adrian Chadd	3527f6a9b1	When doing BAW tracking, don't dereference a NULL pointer if the BAW slot is actually NULL.	2013-04-21 00:41:15 +00:00
Adrian Chadd	dff5bdf48c	There's some races (likely in the BAR handling, sigh) which is causing the pause/resume code to not be called completely symmetrically. I'll chase down the root cause of that soon; this at least works around the bug and tells me when it happens.	2013-04-20 22:46:31 +00:00
Adrian Chadd	12087a0769	Use the new net80211 method to fetch the node TX power, rather than directly referencing ni->ni_txpower. This provides the hardware with a slightly more accurate idea of the maximum TX power to be using. This is part of a series to get per-packet TPC to work (better). Tested: * AR5416, hostap mode	2013-04-16 21:26:44 +00:00
Adrian Chadd	c23a9d98bf	Mark a couple of places where I think the dmamap isn't being unmapped before the TX path is being aborted. Right now it's in the TDMA code and I can live with that; but it really should get fixed. I'll do a more thorough audit of this code soon.	2013-04-02 06:25:10 +00:00
Adrian Chadd	3f3a5dbd2c	Ensure that we only call the busdma unmap/flush routines once, when the buffer is being freed. * When buffers are cloned, the original mapping isn't copied but it wasn't freeing the mapping until later. To be safe, free the mapping when the buffer is cloned. * ath_freebuf() now no longer calls the busdma sync/unmap routines. * ath_tx_freebuf() now calls sync/unmap. * Call sync first, before calling unmap. Tested: * AR5416, STA mode	2013-04-01 20:57:13 +00:00
Adrian Chadd	09067b6e9a	Use ATH_MAX_SCATTER rather than ATH_TXDESC. ATH_MAX_SCATTER is used to size the ath_buf DMA segment array. We thus should use it when checking sizes of things.	2013-04-01 20:12:21 +00:00
Adrian Chadd	92e84e43a6	Implement the replacement EDMA FIFO code. (Yes, the previous code temporarily broke EDMA TX. I'm sorry; I should've actually setup ATH_BUF_FIFOEND on frames so txq->axq_fifo_depth was cleared!) This code implements a whole bunch of sorely needed EDMA TX improvements along with CABQ TX support. The specifics: * When filling/refilling the FIFO, use the new TXQ staging queue for FIFO frames * Tag frames with ATH_BUF_FIFOPTR and ATH_BUF_FIFOEND correctly. For now the non-CABQ transmit path pushes one frame into the TXQ staging queue without setting up the intermediary link pointers to chain them together, so draining frames from the txq staging queue to the FIFO queue occurs AMPDU / MPDU at a time. * In the CABQ case, manually tag the list with ATH_BUF_FIFOPTR and ATH_BUF_FIFOEND so a chain of frames is pushed into the FIFO at once. * Now that frames are in a FIFO pending queue, we can top up the FIFO after completing a single frame. This means we can keep it filled rather than waiting for it drain and _then_ adding more frames. * The EDMA restart routine now walks the FIFO queue in the TXQ rather than the pending queue and re-initialises the FIFO with that. * When restarting EDMA, we may have partially completed sending a list. So stamp the first frame that we see in a list with ATH_BUF_FIFOPTR and push _that_ into the hardware. * When completing frames, only check those on the FIFO queue. We should never ever queue frames from the pending queue direct to the hardware, so there's no point in checking. * Until I figure out what's going on, make sure if the TXSTATUS for an empty queue pops up, complain loudly and continue. This will stop the panics that people are seeing. I'll add some code later which will assist in ensuring I'm populating each descriptor with the correct queue ID. * When considering whether to queue frames to the hardware queue directly or software queue frames, make sure the depth of the FIFO is taken into account now. * When completing frames, tag them with ATH_BUF_BUSY if they're not the final frame in a FIFO list. The same holding descriptor behaviour is required when handling descriptors linked together with a link pointer as the hardware will re-read the previous descriptor to refresh the link pointer before contiuning. * .. and if we complete the FIFO list (ie, the buffer has ATH_BUF_FIFOEND set), then we don't need the holding buffer any longer. Thus, free it. Tested: * AR9380/AR9580, STA and hostap * AR9280, STA/hostap TODO: * I don't yet trust that the EDMA restart routine is totally correct in all circumstances. I'll continue to thrash this out under heavy multiple-TXQ traffic load and fix whatever pops up.	2013-03-26 20:04:45 +00:00
Adrian Chadd	35bec3655e	Remove the mcast path calls to ath_hal_gettxdesclinkptr() for axq_link - they're no longer needed for the legacy path and they're not wanted for the EDMA path. Tested: * AR9280, hostap + CABQ * AR9380/AR9580, hostap + CABQ	2013-03-26 04:56:54 +00:00
Adrian Chadd	0891354cd2	Migrate the multicast queue assembly code to not use the axq_link pointer and instead use the HAL method to set the link pointer. Tested: * AR9280, hostap mode, CABQ frames being queued and transmitted	2013-03-26 04:47:40 +00:00
Adrian Chadd	56a859789f	Move the TXQ lock earlier in this routine - so to correctly protect the link pointer check.	2013-03-24 04:09:54 +00:00
Adrian Chadd	b837332d0a	Overhaul the TXQ locking (again!) as part of some beacon/cabq timing related issues. Moving the TX locking under one lock made things easier to progress on but it had one important side-effect - it increased the latency when handling CABQ setup when sending beacons. This commit introduces a bunch of new changes and a few unrelated changs that are just easier to lump in here. The aim is to have the CABQ locking separate from other locking. The CABQ transmit path in the beacon process thus doesn't have to grab the general TX lock, reducing lock contention/latency and making it more likely that we'll make the beacon TX timing. The second half of this commit is the CABQ related setup changes needed for sane looking EDMA CABQ support. Right now the EDMA TX code naively assumes that only one frame (MPDU or A-MPDU) is being pushed into each FIFO slot. For the CABQ this isn't true - a whole list of frames is being pushed in - and thus CABQ handling breaks very quickly. The aim here is to setup the CABQ list and then push _that list_ to the hardware for transmission. I can then extend the EDMA TX code to stamp that list as being "one" FIFO entry (likely by tagging the last buffer in that list as "FIFO END") so the EDMA TX completion code correctly tracks things. Major: * Migrate the per-TXQ add/removal locking back to per-TXQ, rather than a single lock. * Leave the software queue side of things under the ATH_TX_LOCK lock, (continuing) to serialise things as they are. * Add a new function which is called whenever there's a beacon miss, to print out some debugging. This is primarily designed to help me figure out if the beacon miss events are due to a noisy environment, issues with the PHY/MAC, or other. * Move the CABQ setup/enable to occur _after_ all the VAPs have been looked at. This means that for multiple VAPS in bursted mode, the CABQ gets primed once all VAPs are checked, rather than being primed on the first VAP and then having frames appended after this. Minor: * Add a (disabled) twiddle to let me enable/disable cabq traffic. It's primarily there to let me easily debug what's going on with beacon and CABQ setup/traffic; there's some DMA engine hangs which I'm finally trying to trace down. * Clear bf_next when flushing frames; it should quieten some warnings that show up when a node goes away. Tested: * AR9280, STA/hostap, up to 4 vaps (staggered) * AR5416, STA/hostap, up to 4 vaps (staggered) TODO: * (Lots) more AR9380 and later testing, as I may have missed something here. * Leverage this to fix CABQ hanling for AR9380 and later chips. * Force bursted beaconing on the chips that default to staggered beacons and ensure the CABQ stuff is all sane (eg, the MORE bits that aren't being correctly set when chaining descriptors.)	2013-03-24 00:03:12 +00:00
Adrian Chadd	378a752f59	Now that the tx map field is correctly populated for both edma and legacy chips, just use that.	2013-03-19 17:54:37 +00:00
Adrian Chadd	cd4f1ba89f	Why'd I keep this here? remove it entirely now.	2013-03-15 20:22:20 +00:00
Adrian Chadd	302868d914	Fix two bugs: * when pulling frames off of the TID queue, the ATH_TID_REMOVE() macro decrements the axq_depth field. So don't do it twice. * in ath_tx_comp_cleanup_aggr(), bf wasn't being reset to bf_first before walking the buffer list to complete buffers; so those buffers will leak.	2013-03-15 20:00:08 +00:00
Adrian Chadd	8454d32107	Remove a now incorrect comment. This comment dates back to my initial stab at TX aggregation completion, where I didn't even bother trying to do software retries.	2013-03-15 04:43:27 +00:00
Adrian Chadd	b3420862a7	Disable the hw TID != buffer TID check. I can 100% reliably trigger this on TID 1 traffic by using iperf -S 32 <client fields> to create traffic that maps to TID 1. The reference driver doesn't do this check.	2013-03-09 08:50:17 +00:00
Adrian Chadd	ce597531f2	Disable debugging entries about BAW issues. I haven't seen any issues to do with BAW tracking in the last 9 months or so.	2013-02-21 21:47:35 +00:00
Adrian Chadd	f274e91f67	A couple of quick tidyups: * Delete this debugging print - I used it when debugging the initial TX descriptor chaining code. It now works, so let's toss it. It just confuses people if they enable TX descriptor debugging as they get two slightly different versions of the same descriptor. * Indenting.	2013-02-20 11:22:44 +00:00
Adrian Chadd	1a85141ad4	Pull out the if_transmit() work and revert back to ath_start(). My changed had some rather significant behavioural changes to throughput. The two issues I noticed: * With if_start and the ifnet mbuf queue, any temporary latency would get eaten up by some mbufs being queued. With ath_transmit() queuing things to ath_buf's, I'd only get 512 TX buffers before I couldn't queue any further frames. * There's also some non-zero latency involved with TX being pushed into a taskqueue via direct dispatch. Any time the scheduler didn't immediately schedule the ath TX task would cause extra latency. Various 1ge/10ge drivers implement both direct dispatch (if the TX lock can be acquired) and deferred task transmission (if the TX lock can't be acquired), with frames being pushed into a drbd queue. I'll have to do this at some point, but until I figure out how to deal with 802.11 fragments, I'll have to wait a while longer. So what I saw: * lots of extra latency, specially under load - if the taskqueue wasn't immediately scheduled, things went pear shaped; * any extra latency would result in TX ath_buf's taking their sweet time being replenished, so any further calls to ath_transmit() would drop mbufs. * .. yes, there's no explicit backpressure here - things are just dropped. Eek. With this, the general performance has gone up, but those subtle if_start() related race conditions are back. For some reason, this is doubly-obvious with the AR5416 NIC and I don't quite understand why yet. There's an unrelated issue with AR5416 performance in STA mode (it's fine in AP mode when bridging frames, weirdly..) that requires a little further investigation. Specifically - it works fine on a Lenovo T40 (single core CPU) running a March 2012 9-STABLE kernel, but a Lenovo T60 (dual core) running an early November 2012 kernel behaves very poorly. The same hardware with an AR9160 or AR9280 behaves perfectly.	2013-02-13 05:32:19 +00:00
Adrian Chadd	21bca442b9	Methodize the process of adding the software TX queue to the taskqueue. Move it (for now) to the TX taskqueue.	2013-02-07 02:15:25 +00:00
Adrian Chadd	f28a552089	Migrate the TX sending code out from under the ath0 taskq and into the separate ath0 TX taskq. Whilst here, make sure that the TX software scheduler is also running out of the TX task, rather than the ath0 taskqueue. Make sure that the tx taskqueue is blocked/unblocked as necessary. This allows for a little more parallelism on multi-core machines, as well as (eventually) supporting a higher task priority for TX tasks, allowing said TX task to preempt an already running RX or TX completion task. Tested: * AR5416, AR9280 hostap and STA modes	2013-01-26 00:14:34 +00:00
Adrian Chadd	f74d878fda	Fix this routine to acutally break out and not set clrdmask if any of the TIDs are currently marked as "filtered."	2013-01-21 07:50:38 +00:00
Adrian Chadd	4f25ddbbe6	Migrate CLRDMASK to be a per-node flag, rather than a per-TID flag. This is easily possible now that the TX is protected by a single lock, rather than a per-TXQ (and thus per-TID) lock. Only set CLRDMASK if none of the destinations are filtered. This likely will need some tuning when it comes time to do UASPD/PS-POLL TX, however at that point it should be manually set anyway. Tested: * AR9280, STA mode TODO: * More thorough testing in AP mode * test other chipsets, just to be safe/sure.	2013-01-21 04:06:04 +00:00
Adrian Chadd	c5239edb98	Implement frame (data) transmission using if_transmit(), rather than if_start(). This removes the overlapping data path TX from occuring, which solves quite a number of the potential TX queue races in ath(4). It doesn't fix the net80211 layer TX queue races and it doesn't fix the raw TX path yet, but it's an important step towards this. This hasn't dropped the TX performance in my testing; primarily because now the TX path can quickly queue frames and continue along processing. This involves a few rather deep changes: * Use the ath_buf as a queue placeholder for now, as we need to be able to support queuing a list of mbufs (ie, when transmitting fragments) and m_nextpkt can't be used here (because it's what is joining the fragments together) * if_transmit() now simply allocates the ath_buf and queues it to a driver TX staging queue. * TX is now moved into a taskqueue function. * The TX taskqueue function now dequeues and transmits frames. * Fragments are handled correctly here - as the current API passes the fragment list as one mbuf list (joined with m_nextpkt) through to the driver if_transmit(). * For the couple of places where ath_start() may be called (mostly from net80211 when starting the VAP up again), just reimplement it using the new enqueue and taskqueue methods. What I don't like (about this work and the TX code in general): * I'm using the same lock for the staging TX queue management and the actual TX. This isn't required; I'm just being slack. * I haven't yet moved TX to a separate taskqueue (but the taskqueue is created); it's easy enough to do this later if necessary. I just need to make sure it's a higher priority queue, so TX has the same behaviour as it used to (where it would preempt existing RX..) * I need to re-review the TX path a little more and make sure that ieee80211_node_() functions aren't called within the TX lock. When queueing, I should just push failed frames into a queue and when I'm wrapping up the TX code, unlock the TX lock and call ieee80211_node_free() on each. It would be nice if I could hold the TX lock for the entire TX and TX completion, rather than this release/re-acquire behaviour. But that requires that I shuffle around the TX completion code to handle actual ath_buf free and net80211 callback/free outside of the TX lock. That's one of my next projects. * the ic_raw_xmit() path doesn't use this yet - so it still has sequencing problems with parallel, overlapping calls to the data path. I'll fix this later. Tested: * Hostap - AR9280, AR9220 * STA - AR5212, AR9280, AR5416	2013-01-15 18:01:23 +00:00

1 2 3 4

163 Commits