freebsd-nq

Author	SHA1	Message	Date
Adrian Chadd	0eb8162623	Pause and unpause the software queues for a given node based on the net80211 node power save state. * Add an ATH_NODE_UNLOCK_ASSERT() check * Add a new node field - an_is_powersave * Pause/unpause the queue based on the node state * Attempt to handle net80211 concurrency issues so the queue doesn't get paused/unpaused more than once at a time from the net80211 power save code. Whilst here (and breaking my usual rule), set CLRDMASK when a queue is unpaused, regardless of whether the queue has some pending traffic. This means the first frame from that TID (now or later) will hvae CLRDMASK set. Also whilst here, bump the swretrymax counters whenever the filtered frames code expires a frame. Again, breaking my rule, but this is just a statistics thing rather than a functional change. This doesn't fix ps-poll (but it doesn't break it too much worse than it is at the present) or correcting the TID updates. That's next on the list. Tested: * AR9220 AP (Atheros AP96 reference design) * Macbook Pro and LG Optimus 1 Android phone, both setting and clearing power save state (but not using PS-POLL.)	2012-10-03 23:23:45 +00:00
Adrian Chadd	0368251456	Migrate the ath(4) KTR logging to use an ATH_KTR() macro. This should eventually be unified with ATH_DEBUG() so I can get both from one macro; that may take some time. Add some new probes for TX and TX completion.	2012-09-24 20:35:56 +00:00
Adrian Chadd	de8e4d6436	Add a per-TID filter queue and filter state bits. These are intended for software TX filtering support, where the NIC decides there has been too many successive failues to a destination and will filter it. Although the filtering is done per-destination (via the keycache), the state and queue is kept per-TID for now. It simplifies the overall architecture design and locking. Whilst here, add ATH_TID_UNLOCK_ASSERT().	2012-09-17 01:21:55 +00:00
Adrian Chadd	7d6b932c44	Add an accessor macro for getting access to the default DFS parameters. PR: kern/170904	2012-08-24 17:37:12 +00:00
Adrian Chadd	85bf9bc3d5	Implement a sequential descriptor ID value and stuff it in the ath_buf. This will be used by the EDMA TX code to assign descriptor IDs in order to provide some debugging.	2012-08-15 06:48:34 +00:00
Adrian Chadd	e5661062ee	Add an assertion to check that the given TXQ is _not_ locked.	2012-08-14 22:30:17 +00:00
Adrian Chadd	1762ec944a	Revert the ath_tx_draintxq() method, and instead teach it the minimum necessary to "do" EDMA. It was just using the TX completion status for logging information about the descriptor completion. Since with EDMA we don't know this without checking the TX completion FIFO, we can't provide this information. So don't.	2012-08-12 00:46:15 +00:00
Adrian Chadd	788e6aa99c	Break out ath_draintxq() into a method and un-methodize ath_tx_processq(). Now that I understand what's going on with this, I've realised that it's going to be quite difficult to implement a processq method in the EDMA case. Because there's a separate TX status FIFO, I can't just run processq() on each EDMA TXQ to see what's finished. i have to actually run the TX status queue and handle individual TXQs. So: * unmethodize ath_tx_processq(); * leave ath_tx_draintxq() as a method, as it only uses the completion status for debugging rather than actively completing the frames (ie, all frames here are failed); * Methodize ath_draintxq(). The EDMA ath_draintxq() will have to take care of running the TX completion FIFO before (potentially) freeing frames in the queue. The only two places where ath_tx_draintxq() (on a single TXQ) are used: * ath_draintxq(); and * the CABQ handling in the beacon setup code - it drains the CABQ before populating the CABQ with frames for a new beacon (when doing multi-VAP operation.) So it's quite possible that once I methodize the CABQ and beacon handling, I can just drop ath_tx_draintxq() in its entirety. Finally, it's also quite possible that I can remove ath_tx_draintxq() in the future and just "teach" it to not check the status when doing EDMA.	2012-08-12 00:37:29 +00:00
Adrian Chadd	3ae723d459	Begin fleshing out the TX FIFO support. * Add ATH_TXQ_FIRST() for easy tasting of what's on the list; * Add an "axq_fifo_depth" for easy tracking of how deep the current FIFO is; * Flesh out the handoff (mcast, hw) functions; * Begin fleshing out a TX ISR proc, which tastes the TX status FIFO. The legacy hardware stuffs the TX completion at the end of the final frame descriptor (or final sub-frame when doing aggregate.) So it's feasible to do a per-TXQ drain and process, as the needed info is right there. For EDMA hardware, there's a separate TX completion FIFO. So the TX process routine needs to read the single FIFO and then process the frames in each hardware queue. This makes it difficult to do a per-queue process, as you'll end up with frames in the TX completion FIFO for a different TXQ to the one you've passed to ath_tx_draintxq() or ath_tx_processq(). Testing: I've tested the TX queue and TX completion code in hostap mode on an AR9380. Beacon frames successfully transmit and the completion routine is called. Occasional data frames end up in TXQ 1 and are also successfully completed. However, this requires some changes to the beacon code path as: * The AR9380 beacon configuration API is now in TU/8, rather than TU; * The AR9380 TX API requires the rate control is setup using a call to setup11nratescenario, rather than having the try0 series setup (rate/tries for the first series); so the beacon won't go out. I'll follow this up with commits to the beacon code.	2012-08-11 22:20:28 +00:00
Adrian Chadd	fffbec8618	Migrate the 802.11n ath_hal_chaintxdesc() API to use a buffer/segment array, similar to what filltxdesc() uses. This removes the last reference to ds_data in the TX path outside of debugging statements. These need to be adjusted/fixed. Tested: * AR9280 STA/AP with iperf TCP traffic	2012-08-05 11:24:21 +00:00
Adrian Chadd	46634305f4	Migrate the ath_hal_filltxdesc() API to take a list of buffer/seglen values. The existing API only exposes 'seglen' (the current buffer (segment) length) with the data buffer pointer set in 'ds_data'. This is fine for the legacy DMA engine but it won't work for the EDMA engines. The EDMA engine has a significantly different TX descriptor layout. * The legacy DMA engine had a ds_data pointer at the same offset in the descriptor for both TX and RX buffers; * The EDMA engine has no ds_data for RX - the data is DMAed after the descriptor; * The EDMA engine has support for 4 TX buffer/segment pairs in the TX DMA descriptor; * The EDMA TX completion is in a different FIFO, and the driver will 'link' the status completion entry to a QCU by a "QCU ID". I don't know why it's just not filled in by the hardware, alas. So given that, here are the changes: * Instead of directly fondling 'ds_data' in ath_desc, change the ath_hal_filltxdesc() to take an array of buffer pointers as well as segment len pointers; * The EDMA TX completion status wants a descriptor and queue id. This (for now) uses bf_state.bfs_txq and will extract the hardware QCU ID from that. * .. and this is ugly and wasteful; it should change to just store the QCU in the bf_state and save 3/7 bytes in the process. Now, the weird crap: * The aggregate TX path was using bf_state->bfs_txq for the TXQ, rather than taking a function argument. I've tidied that up. * The multicast queue frames get put on a software TXQ and then that is appended to the hardware CABQ when appropriate. So for now, make sure that bf_state->bfs_txq points at the CABQ when adding frames to the multicast queue. * .. but the multicast queue TX path for now doesn't use the software queue and instead (a) directly sets up the descriptor contents at that point; (b) the frames on the vap->avp_mcastq are then just appended wholesale to the CABQ. So for now, I don't have to worry about making the multicast path work with aggregation or the per-TID software queue. Phew. What's left to do: * I need to modify the 11n ath_hal_chaintxdesc() API to do the same. I'll do that in a subsequent commit. * Remove bf_state.bfs_txq entirely and store the QCU as appropriate. * .. then do the runtime "is this going on the right HWQ?" checks using that, rather than comparing pointer values. Tested on: * AR9280 STA/AP * AR5416 STA/AP	2012-08-05 10:12:27 +00:00
Adrian Chadd	af01710118	Allow 802.11n hardware to support multi-rate retry when RTS/CTS is enabled. The legacy (pre-802.11n) hardware doesn't support this - although the AR5212 era hardware supports MRR, it doesn't have all the bits needed to support MRR + RTS/CTS. The AR5416 and later support a packet duration and RTS/CTS flags per rate scenario, so we should support it. Tested: * AR9280, STA PR: kern/170302	2012-07-31 23:54:15 +00:00
Adrian Chadd	f8418db57e	Migrate some more TX side setup routines to be methods.	2012-07-31 03:09:48 +00:00
Adrian Chadd	79607afe3e	Flesh out the initial TX FIFO storage for each hardware TX queue.	2012-07-28 04:42:05 +00:00
Adrian Chadd	26463136ac	Bring this API in line with what the reference driver and Linux ath9k was doing. Obtained from: Qualcomm Atheros, Linux ath9k	2012-07-27 11:23:24 +00:00
Adrian Chadd	ba3fd9d86a	Allocate a descriptor ring for EDMA TX completion status. Configure the hardware with said ring physical address and size.	2012-07-27 10:41:54 +00:00
Adrian Chadd	59a7572437	Add a new HAL method - the AR93xx and later NICs have a separate TX descriptor ring for TX status completion. This API call will pass the allocated buffer details to the HAL.	2012-07-24 01:18:19 +00:00
Adrian Chadd	3fdfc33024	Begin separating out the TX DMA setup in preparation for TX EDMA support. * Introduce TX DMA setup/teardown methods, mirroring what's done in the RX path. Although the TX DMA descriptor is setup via ath_desc_alloc() / ath_desc_free(), there TX status descriptor ring will be allocated in this path. * Remove some of the TX EDMA capability probing from the RX path and push it into the new TX EDMA path.	2012-07-23 03:52:18 +00:00
Adrian Chadd	54c9979539	Flesh out a new DMA map for the EDMA TX completion status, as well as a lock to go with that whole code path.	2012-07-23 02:49:25 +00:00
Adrian Chadd	3d9b15965e	Begin modifying the descriptor allocation functions to support a variable sized TX descriptor. This is required for the AR93xx EDMA support which requires 128 byte TX descriptors (which is significantly larger than the earlier hardware.)	2012-07-23 02:26:33 +00:00
Adrian Chadd	661deb68d5	Use HAL_NUM_RX_QUEUES rather than a magic constant.	2012-07-19 03:18:15 +00:00
Adrian Chadd	ad3e6dcd37	Break out the TX descriptor link field into HAL methods. The DMA FIFO chips (AR93xx and later) differ slightly to th elegacy chips: * The RX DMA descriptors don't have a ds_link field; * The TX DMA descriptors have a ds_link field however at a different offset. This is a reimplementation based on what the reference driver and ath9k does. A subsequent commit will enable it in the TX and beacon paths. Obtained from: Linux ath9k, Qualcomm Atheros	2012-07-19 02:25:14 +00:00
Adrian Chadd	0b59717b4b	Change the RX EDMA path to first complete the FIFO, then re-populate it with fresh descriptors, before handling the frames. Wrap it all in the RX locks. Since the FIFO is very shallow (16 for HP, 128 for LP) it needs to be drained and replenished very quickly. Ideally, I'll eventually move this RX FIFO drain/fill into the interrupt handler, only deferring the actual frame completion.	2012-07-14 02:52:48 +00:00
Adrian Chadd	2fe91baa92	Create an RX queue lock. Ideally these locks would go away and there'd be a single driver lock, like what iwn(4) does. I'll worry about that later.	2012-07-14 02:22:17 +00:00
Adrian Chadd	d434a377d9	Convert sc_rxpending to a per-EDMA queue, and use that for the legacy code. Prepare ath_rx_pkt() to handle multiple RX queues, and default the legacy RX queue to use the HP queue.	2012-07-10 00:02:19 +00:00
Adrian Chadd	3d184db2f8	Further preparations for the RX EDMA support. Break out the DMA descriptor setup/teardown code into a method. The EDMA RX code doesn't allocate descriptors, just ath_buf entries.	2012-07-09 08:37:59 +00:00
Adrian Chadd	0a6b6951b2	Introduce the EDMA related HAL capabilities. Whilst here, fix a typo in a previous commit. Obtained from: Qualcomm Atheros	2012-07-09 07:31:26 +00:00
Adrian Chadd	d60a0680ba	Extend the RX HAL API to include the RX queue identifier. The AR93xx and later chips support two RX FIFO queues - a high and low priority queue. For legacy chips, just assume the queues are high priority. This is inspired by the reference driver but is a reimplementation of the API and code.	2012-07-09 07:19:11 +00:00
Adrian Chadd	f8cc9b09b0	Begin abstracting out the RX path in preparation for RX EDMA support. The RX EDMA support requires a modified approach to the RX descriptor handling. Specifically: * There's now two RX queues - high and low priority; * The RX queues are implemented as FIFOs; they're now an array of pointers to buffers; * .. and the RX buffer and descriptor are in the same "buffer", rather than being separate. So to that end, this commit abstracts out most of the RX related functions from the bulk of the driver. Notably, the RX DMA/buffer allocation isn't updated, primarily because I haven't yet fleshed out what it should look like. Whilst I'm here, create a set of matching but mostly unimplemented EDMA stubs. Tested: * AR9280, station mode TODO: * Thorough AP and other mode testing for non-EDMA chips; * Figure out how to allocate RX buffers suitable for RX EDMA, including correctly setting the mbuf length to compensate for the RX descriptor and completion status area.	2012-07-03 06:59:12 +00:00
Adrian Chadd	577cd9a9b2	Bring over some further HAL capabilities from the Atheros HAL, as well as an EDMA check function. For the AR9003 and later NICs, different TX/RX DMA and descriptor handling code will be conditional on the EDMA check. Obtained from: Qualcomm Atheros	2012-07-02 06:02:12 +00:00
Adrian Chadd	375d4f068a	Shuffle some more fields in ath_buf so it's not too big. This shaves off 20 bytes - from 288 bytes to 268 bytes. However, it's still too big.	2012-06-16 04:41:35 +00:00
Adrian Chadd	3dd2db6646	Shave four (or eight) bytes off of ath_buf - this field isn't used.	2012-06-16 04:36:08 +00:00
Adrian Chadd	956ac958bf	Shrink ath_buf a little more: * Resize some types. In particular, bfs_seqno can be uint16_t for now. Previous work would assign the unassigned seqno a value of -1, which I obviously can't do here. * Remove bfs_pktdur. It was in the original code but nothing so far uses it. This gets ath_buf down (on my i386 system) to 292 bytes from 300 bytes. I'd rather it be much, much smaller.	2012-06-14 04:24:13 +00:00
Adrian Chadd	23ced6c117	Implement a global (all non-mgmt traffic) TX ath_buf limitation when ath_start() is called. This (defaults to 10 frames) gives for a little headway in the TX ath_buf allocation, so buffer cloning is still possible. This requires a lot omre experimenting and tuning. It also doesn't stop a node/TID from consuming all of the available ath_buf's, especially when the node is going through high packet loss or only talking at a low TX rate. It also doesn't stop a paused TID from taking all of the ath_bufs. I'll look at fixing that up in subsequent commits. PR: kern/168170	2012-06-14 00:51:53 +00:00
Adrian Chadd	af33d486ab	Implement a separate, smaller pool of ath_buf entries for use by management traffic. * Create sc_mgmt_txbuf and sc_mgmt_txdesc, initialise/free them appropriately. * Create an enum to represent buffer types in the API. * Extend ath_getbuf() and _ath_getbuf_locked() to take the above enum. * Right now anything sent via ic_raw_xmit() allocates via ATH_BUFTYPE_MGMT. This may not be very useful. * Add ATH_BUF_MGMT flag (ath_buf.bf_flags) which indicates the current buffer is a mgmt buffer and should go back onto the mgmt free list. * Extend 'txagg' to include debugging output for both normal and mgmt txbufs. * When checking/clearing ATH_BUF_BUSY, do it on both TX pools. Tested: * STA mode, with heavy UDP injection via iperf. This filled the TX queue however BARs were still going out successfully. TODO: * Initialise the mgmt buffers with ATH_BUF_MGMT and then ensure the right type is being allocated and freed on the appropriate list. That'd save a write operation (to bf->bf_flags) on each buffer alloc/free. * Test on AP mode, ensure that BAR TX and probe responses go out nicely when the main TX queue is filled (eg with paused traffic to a TID, awaiting a BAR to complete.) PR: kern/168170	2012-06-13 06:57:55 +00:00
Adrian Chadd	c2ac9655c3	Introduce a new lock debug which is specifically for making sure the _TID_ lock is held. For now the TID lock is also the TXQ lock. This is just to make sure that the right TXQ lock is held for the given TID.	2012-06-11 07:06:49 +00:00
Adrian Chadd	a108d2d6c6	Revert r233227 and followup commits as it breaks CCMP PN replay detection. This showed up when doing heavy UDP throughput on SMP machines. The problem with this is because the 802.11 sequence number is being allocated separately to the CCMP PN replay number (which is assigned during ieee80211_crypto_encap()). Under significant throughput (200+ MBps) the TX path would be stressed enough that frame TX/retry would force sequence number and PN allocation to be out of order. So once the frames were reordered via 802.11 seqnos, the CCMP PN would be far out of order, causing most frames to be discarded by the receiver. I've fixed this in some local work by being forced to: (a) deal with the issues that lead to the parallel TX causing out of order sequence numbers in the first place; (b) fix all the packet queuing issues which lead to strange (but mostly valid) TX. I'll begin fixing these in a subsequent commit or five. PR: kern/166190	2012-06-11 06:59:28 +00:00
Adrian Chadd	9f95609828	Mostly revert previous commit(s). After doing a bunch of local testing, it turns out that it negatively affects performance. I'm stil investigating exactly why deferring the IO causes such negative TCP performance but doesn't affect UDP preformance. Leave the ath_tx_kick() change in there however; it's going to be useful to have that there for if_transmit() work. PR: kern/168649	2012-06-05 06:03:55 +00:00
Adrian Chadd	470a7f4191	Migrate the TX path to a taskqueue for now, until a better way of implementing parallel TX and TX/RX completion can be done without simply abusing long-held locks. Right now, multiple concurrent ath_start() entries can result in frames being dequeued out of order. Well, they're dequeued in order fine, but if there's any preemption or race between CPUs between: * removing the frame from the ifnet, and * calling and runningath_tx_start(), until the frame is placed on a software or hardware TXQ Then although dequeueing the frame is in-order, queueing it to the hardware may be out of order. This is solved in a lot of other drivers by just holding a TX lock over a rather long period of time. This lets them continue to direct dispatch without races between dequeue and hardware queue. Note to observers: if_transmit() doesn't necessarily solve this. It removes the ifnet from the main path, but the same issue exists if there's some intermediary queue (eg a bufring, which as an aside also may pull in ifnet when you're using ALTQ.) So, until I can sit down and code up a much better way of doing parallel TX, I'm going to leave the TX path using a deferred taskqueue task. What I will likely head towards is doing a direct dispatch to hardware or software via if_transmit(), but it'll require some driver changes to allow queues to be made without using the really large ath_buf / ath_desc entries. TODO: * Look at how feasible it'll be to just do direct dispatch to ath_tx_start() from if_transmit(), avoiding doing _any_ intermediary serialisation into a global queue. This may break ALTQ for example, so I have to be delicate. * It's quite likely that I should break up ath_tx_start() so it deposits frames onto the software queues first, and then only fill in the 802.11 fields when it's being queued to the hardware. That will make the if_transmit() -> software queue path very quick and lightweight. * This has some very bad behaviour when using ACPI and Cx states. I'll do some subsequent analysis using KTR and schedgraph and file a follow-up PR or two. PR: kern/168649	2012-06-04 22:01:12 +00:00
Adrian Chadd	a35baf81c9	Remove an unneeded field from ath_buf.	2012-05-26 01:34:36 +00:00
Adrian Chadd	ae2a0aa428	oops - ath_hal_disablepcie is actually destined for another purpose, not to disable the PCIe PHY in prepration for reset. Extend the enablepci method to have a "poweroff" flag, which if equal to true means the hardware is about to go to sleep.	2012-05-25 05:01:27 +00:00
Adrian Chadd	d73df6d52c	Prepare for improved (read: pcie) suspend/resume support. * Flesh out the pcie disable method for 11n chips, as they were defaulting to the AR5212 (empty) PCIe disable method. * Add accessor macros for the HAL PCIe enable/disable calls. * Call disable on ath_suspend() * Call enable on ath_resume() NOTE: * This has nothing to do with the NIC sleep/run state - the NIC still will stay in network-run state rather than supporting network-sleep state. This is preparation work for supporting correct suspend/resume WARs for the 11n PCIe NICs. TODO: * It may be feasible at this point to keep the chip powered down during initial probe/attach and only power it up upon the first configure/reset pass. This however would require correct (for values of "correct") tracking of the NIC power configuration state from the driver and that just isn't attempted at the moment. Tested: * AR9280 on my Lenovo T60, but with no suspend/resume pass (yet).	2012-05-25 02:07:59 +00:00
Adrian Chadd	e4f6061912	Re-up the TX ath_buf limit from 128 to 512. I'll have to leave this high for now, until I've done some significant surgery with how ath_bufs (and descriptors) are handled. This should significantly cut down on the opportunities for a full TX queue hanging traffic. I'll continue making things work though; I'm mostly doing this for users. :)	2012-05-22 19:50:21 +00:00
Adrian Chadd	d3a6425b7c	Fix up some corner cases with aggregation handling. I've come across a weird scenario in net80211 where two TX streams will happily attempt to setup an aggregation session together. If we're very lucky, it happens concurrently on separate CPUs and the total lack of locking in the net80211 aggregation code causes this stuff to race. Badly. So >1 call would occur to the ath(4) addba start, but only one call would complete to addba complete or timeout. The TID would thus stay paused. The real fix is to implement some proper per-node (or maybe per-TID) locking in net80211, which then could be leveraged by the ath(4) TX aggregation code. Whilst I'm at it, shuffle around the debugging messages a bit. I like to keep people on their toes.	2012-05-22 06:31:03 +00:00
Adrian Chadd	0e22ed0eb2	Migrate ath_debug and sc_debug from an int to a uint64_t / QUAD; add some more BAR debugging logic. * Change the definition of ath_debug and ath_softc.sc_debug from int to uint64_t; * Change the relevant sysctls; * Add a new BAR TX debugging field; * Use this in if_ath_tx. This has been tested by using the sysctl program, which happily allows for fields > 32 bits to be configured.	2012-05-15 23:39:37 +00:00
Adrian Chadd	352f07f66d	Change the MIB cycle count API to return HAL_BOOL, rather than uint32_t, to return whether it was successful. Add placeholder (blank) methods for previous chips, for both it and the 11n extension channel busy call.	2012-05-01 14:48:51 +00:00
Adrian Chadd	f846cf42ab	Run the fatal proc as a proc, rather than where it currently is. Otherwise the reset path will sleep, which it can't do in this context.	2012-04-17 06:02:41 +00:00
Adrian Chadd	82d05362e6	Drop this down from 512 to 128 for now. This may result in a bit of a throughput drop. However, any throughput drop at this point should be investigated and root caused, as it's likely because TX scheduling (all the way down to how preemption, scheduler work, etc) is happening in a sub-optimal fashion. This also makes it much more likely to be reloadable on a live machine. Allocating 5120 TX ath_buf entries via contigmalloc is very unlikely after a few hours of using X/Chromium.	2012-04-15 19:54:22 +00:00
Adrian Chadd	f8ab7a9fc9	Convert the flags over to a set of bit flags.	2012-04-10 19:25:43 +00:00
Adrian Chadd	9467e3f3fc	Squirrel away SYNC interrupt debugging if it's enabled in the HAL. Bus errors will show up as various SYNC interrupts which will be passed back up to ath_intr().	2012-04-10 07:23:37 +00:00

1 2 3 4

181 Commits