freebsd-skq

Author	SHA1	Message	Date
adrian	eaac94fbfe	Disable my software queue TIM and PS handling for now. ps-poll is totally broken in its current form. This should unbreak things enough to let people use PS-POLL devices, but leave it in place for me to finish PS-POLL handling.	2012-11-07 06:29:45 +00:00
adrian	7a72be7da2	Add a new HAL call to extract out the HAL enterprise bits from the AR9300 HAL.	2012-11-03 22:12:35 +00:00
adrian	2ef1c4bedf	I give up - introduce a TX lock to serialise TX operations. I've tried serialising TX using queues and such but unfortunately due to how this interacts with the locking going on elsewhere in the networking stack, the TX task gets delayed, resulting in quite a noticable throughput loss: * baseline TCP for 2x2 11n HT40 is ~ 170mbit/sec; * TCP for TX task in the ath taskq, with the RX also going on - 80mbit/sec; * TCP for TX task in a separate, second taskq - 100mbit/sec. So for now I'm going with the Linux wireless stack approach - lock tx early. The linux code does in the wireless stack, before the 802.11 state stuff happens and before it's punted to the driver. But TX locking needs to also occur at the driver layer as the TX completion code _also_ begins to drain the ifnet TX queue. Whilst I'm here, add some KTR traces for the TX path. Note: * This really should be done at the net80211 layer (as well, at least.) But that'll have to wait for a little more thought to happen.	2012-10-31 06:27:58 +00:00
adrian	25baa69114	Begin fleshing out some software queue awareness for TIM handling with the power save queue. * introduce some new ATH_NODE lock protected fields, tracking the net80211 psq and TIM state; * when doing buffer transitions - ie, when sending and completing buffers - check the state of the SWQ and update the TIM appropriately. * when clearing the TIM bit, if the SWQ is not empty then delay clearing it. This is racy, but it's no less racy than the current net80211 power save queue management code. Specifically, with multiple TX threads, it's quite plausible that parallel state updates will race and the TIM will be left in an inconsistent state. I'll address that in a follow-up commit.	2012-10-28 21:13:12 +00:00
adrian	94c194db6e	Add a temporary (for values of "temporary") work around for hotplug support with ath(4) and VIMAGE. Right now the VIMAGE code doesn't supply a default vnet context during: * hotplug attach; * any device detach. It special cases kldload/boot time probing (by setting the context to vnet0) but that doesn't occur when probing devices during a bus rescan - eg, adding a cardbus card. These will eventually go away when the VIMAGE support extends to providing default contexts to hotplug attach/detach.	2012-10-28 18:46:06 +00:00
adrian	b75b22519f	Push the actual TX processing into the ath taskqueue, rather than having it run out of multiple concurrent contexts. Right now the ath(4) TX processing is a bit hairy. Specifically: * It was running out of ath_start(), which could occur from multiple concurrent sending processes (as if_start() can be started from multiple sending threads nowdays.. sigh) * during RX if fast frames are enabled (so not really at the moment, not until I fix this particular feature again..) * during ath_reset() - so anything which calls that * during ath_tx_proc() in the ath taskqueue - ie, TX is attempted again after TX completion, as there's now hopefully some ath_bufs available. Then, the ic_raw_xmit() method can queue raw frames for transmission at any time, from any net80211 TX context. Ew. This has caused packet ordering issues in the past - specifically, there's absolutely no guarantee that preemption won't occuring _during_ ath_start() by the TX completion processing, which will call ath_start() again. It's a mess - 802.11 really, really wants things to be in sequence or things go all kinds of loopy. So: * create a new task struct for TX'ing; * make the if_start method simply queue the task on the ath taskqueue; * make ath_start() just be called by the new TX task; * make ath_tx_kick() just schedule the ath TX task, rather than directly calling ath_start(). Now yes, this means that I've taken a step backwards in terms of concurrency - TX -and- RX now occur in the same single-task taskqueue. But there's nothing stopping me from separating out the TX / TX completion code into a separate taskqueue which runs in parallel with the RX path, if that ends up being appropriate for some platforms. This fixes the CCMP/seqno concurrency issues that creep up when you transmit large amounts of uni-directional UDP traffic (>200MBit) on a FreeBSD STA -> AP, as now there's only one TX context no matter what's going on (TX completion->retry/software queue, userland->net80211->ath_start(), TX completion -> ath_start()); but it won't fix any concurrency issues between raw transmitted frames and non-raw transmitted frames (eg EAPOL frames on TID 16 and any other TID 16 multicast traffic that gets put on the CABQ.) That is going to require a bunch more re-architecture before it's feasible to fix. In any case, this is a big step towards making the majority of the TX path locking irrelevant, as now almost all TX activity occurs in the taskqueue. Phew.	2012-10-14 20:44:08 +00:00
adrian	5beeacda7a	Initialise an uninitialised variable.	2012-10-05 16:44:00 +00:00
adrian	78ead1fc05	Pause and unpause the software queues for a given node based on the net80211 node power save state. * Add an ATH_NODE_UNLOCK_ASSERT() check * Add a new node field - an_is_powersave * Pause/unpause the queue based on the node state * Attempt to handle net80211 concurrency issues so the queue doesn't get paused/unpaused more than once at a time from the net80211 power save code. Whilst here (and breaking my usual rule), set CLRDMASK when a queue is unpaused, regardless of whether the queue has some pending traffic. This means the first frame from that TID (now or later) will hvae CLRDMASK set. Also whilst here, bump the swretrymax counters whenever the filtered frames code expires a frame. Again, breaking my rule, but this is just a statistics thing rather than a functional change. This doesn't fix ps-poll (but it doesn't break it too much worse than it is at the present) or correcting the TID updates. That's next on the list. Tested: * AR9220 AP (Atheros AP96 reference design) * Macbook Pro and LG Optimus 1 Android phone, both setting and clearing power save state (but not using PS-POLL.)	2012-10-03 23:23:45 +00:00
adrian	d870dd946f	Migrate the ath(4) KTR logging to use an ATH_KTR() macro. This should eventually be unified with ATH_DEBUG() so I can get both from one macro; that may take some time. Add some new probes for TX and TX completion.	2012-09-24 20:35:56 +00:00
adrian	368dcf1975	Remove TDMA #define entries from if_ath.c; they now exist in if_ath_tdma.h.	2012-09-09 04:53:10 +00:00
adrian	13dd70acbf	There's no nede to allocate a DMA map just before calling bus_dmamem_alloc(). In fact, bus_dmamem_alloc() happily NULLs the dmat pointer passed in, before replacing it with its own. This fixes a MIPS crash when kldload'ing if_ath/if_ath_pci - bus_dmamap_destroy() was passed in a NULL dmat pointer and was doing all kinds of very bad things. Reviewed by: scottl	2012-08-29 16:58:51 +00:00
adrian	2e17c6efe7	Implement a sequential descriptor ID value and stuff it in the ath_buf. This will be used by the EDMA TX code to assign descriptor IDs in order to provide some debugging.	2012-08-15 06:48:34 +00:00
adrian	efa454776b	Break out the TX completion code into a separate function, so it can be re-used by the upcoming EDMA TX completion code. Make ath_stoptxdma() public, again so the EDMA TX code can use it. Don't check for the TXQ bitmap in the ISR when doing EDMA work as it doesn't apply for EDMA.	2012-08-14 22:32:20 +00:00
adrian	f315189440	Revert the ath_tx_draintxq() method, and instead teach it the minimum necessary to "do" EDMA. It was just using the TX completion status for logging information about the descriptor completion. Since with EDMA we don't know this without checking the TX completion FIFO, we can't provide this information. So don't.	2012-08-12 00:46:15 +00:00
adrian	815b21bf9d	Break out ath_draintxq() into a method and un-methodize ath_tx_processq(). Now that I understand what's going on with this, I've realised that it's going to be quite difficult to implement a processq method in the EDMA case. Because there's a separate TX status FIFO, I can't just run processq() on each EDMA TXQ to see what's finished. i have to actually run the TX status queue and handle individual TXQs. So: * unmethodize ath_tx_processq(); * leave ath_tx_draintxq() as a method, as it only uses the completion status for debugging rather than actively completing the frames (ie, all frames here are failed); * Methodize ath_draintxq(). The EDMA ath_draintxq() will have to take care of running the TX completion FIFO before (potentially) freeing frames in the queue. The only two places where ath_tx_draintxq() (on a single TXQ) are used: * ath_draintxq(); and * the CABQ handling in the beacon setup code - it drains the CABQ before populating the CABQ with frames for a new beacon (when doing multi-VAP operation.) So it's quite possible that once I methodize the CABQ and beacon handling, I can just drop ath_tx_draintxq() in its entirety. Finally, it's also quite possible that I can remove ath_tx_draintxq() in the future and just "teach" it to not check the status when doing EDMA.	2012-08-12 00:37:29 +00:00
adrian	ce09176548	Extend the beacon code slightly to support AP mode beaconing for the EDMA HAL hardware. * The EDMA HAL code assumes the nexttbtt and intval values are in TU/8 units, rather than TU. For now, just "hack" around that here, at least until I code up something to translate it in the HAL. * Setup some different TXQ flags for EDMA hardware. * The EDMA HAL doesn't support setting the first rate series via ath_hal_setuptxdesc() - instead, a call to ath_hal_set11nratescenario() is always required. So for now, just do an 11n rate series setup for EDMA beacon frames. This allows my AR9380 to successfully transmit beacon frames. However, CABQ TX and all normal data frame TX and TX completion is still not functional and will require some more significant code churn to make work.	2012-08-11 23:26:19 +00:00
adrian	01f36653db	Allow 802.11n hardware to support multi-rate retry when RTS/CTS is enabled. The legacy (pre-802.11n) hardware doesn't support this - although the AR5212 era hardware supports MRR, it doesn't have all the bits needed to support MRR + RTS/CTS. The AR5416 and later support a packet duration and RTS/CTS flags per rate scenario, so we should support it. Tested: * AR9280, STA PR: kern/170302	2012-07-31 23:54:15 +00:00
adrian	c9862ba17f	Migrate some more TX side setup routines to be methods.	2012-07-31 03:09:48 +00:00
adrian	42e4d59244	Fix breakage introduced in r238824 - correctly calculate the descriptor wrapping. The previous code was only wrapping descriptor "block" boundaries rather than individual descriptors. It sounds equivalent but it isn't. r238824 changed the descriptor allocation to enforce that an individual descriptor doesn't wrap a 4KiB boundary rather than the whole block of descriptors. Eg, for TX descriptors, they're allocated in blocks of 10 descriptors for each ath_buf (for scatter/gather DMA.)	2012-07-29 08:52:32 +00:00
adrian	f918a23c85	Add a missing call to ath_txdma_teardown().	2012-07-28 04:40:52 +00:00
adrian	c4ee11cf19	Modify ath_descdma_cleanup() to handle ath_descdma instances with no buffers. ath_descdma is now being used for things other than the classical combination of ath_buf + ath_desc allocations. In this particular case, don't try to free and blank out the ath_buf list if it's not passed in.	2012-07-27 10:38:17 +00:00
adrian	8c0c6fa06f	Migrate the descriptor allocation function to not care about the number of buffers, only the number of descriptors. This involves: * Change the allocation function to not use nbuf at all; * When calling it, pass in "nbuf * ndesc" to correctly update how many descriptors are being allocated. Whilst here, fix the descriptor allocation code to correctly allocate a larger buffer size if the Merlin 4KB WAR is required. It overallocates descriptors when allocating a block that doesn't ever have a 4KB boundary being crossed, but that can be fixed at a later stage.	2012-07-27 05:48:42 +00:00
adrian	e8e08eee9e	Refactor out the descriptor allocation code from the buffer allocation code. The TX EDMA completion path is going to need descriptors allocated but not any buffers. This code will form the basis for that.	2012-07-27 05:34:45 +00:00
adrian	142df5fc8e	Modify ath_descdma_setup() to take a descriptor size parameter. The AR9300 and later descriptors are 128 bytes, however I'd like to make sure that isn't used for earlier chips. * Populate the TX descriptor length field in the softc with sizeof(ath_desc) * Use this field when allocating the TX descriptors * Pre-AR93xx TX/RX descriptors will use the ath_desc size; newer ones will query the HAL for these sizes.	2012-07-23 23:40:13 +00:00
adrian	c89f08ceb9	Begin separating out the TX DMA setup in preparation for TX EDMA support. * Introduce TX DMA setup/teardown methods, mirroring what's done in the RX path. Although the TX DMA descriptor is setup via ath_desc_alloc() / ath_desc_free(), there TX status descriptor ring will be allocated in this path. * Remove some of the TX EDMA capability probing from the RX path and push it into the new TX EDMA path.	2012-07-23 03:52:18 +00:00
adrian	f20813da65	Begin modifying the descriptor allocation functions to support a variable sized TX descriptor. This is required for the AR93xx EDMA support which requires 128 byte TX descriptors (which is significantly larger than the earlier hardware.)	2012-07-23 02:26:33 +00:00
adrian	ff4507cc5c	Enable the basic node-based rate control statistics via an ioctl().	2012-07-20 01:36:46 +00:00
adrian	852b8a4cb1	Ensure that error is set. Noticed by: rui	2012-07-14 05:51:54 +00:00
adrian	79bcabb67e	Don't free the descriptor allocation/map if it doesn't exist. I missed this in my previous commit.	2012-07-14 02:47:16 +00:00
adrian	00e578733c	Fix EDMA RX to actually work without panicing the machine. I was setting up the RX EDMA buffer to be 4096 bytes rather than the RX data buffer portion. The hardware was likely getting very confused and DMAing descriptor portions into places it shouldn't, leading to memory corruption and occasional panics. Whilst here, don't bother allocating descriptors for the RX EDMA case. We don't use those descriptors. Instead, just allocate ath_buf entries.	2012-07-14 02:07:51 +00:00
adrian	f0a77e77df	Flip on EDMA RX of both HP and LP queue frames. Yes, this is in the legacy interrupt path. The NIC does support MSI but I haven't yet sat down and written that code.	2012-07-10 07:43:31 +00:00
adrian	b89b88c83b	Migrate the ATH_KTR_* fields out to if_ath_debug.h .	2012-07-10 06:11:39 +00:00
adrian	2977d109d8	Further preparations for the RX EDMA support. Break out the DMA descriptor setup/teardown code into a method. The EDMA RX code doesn't allocate descriptors, just ath_buf entries.	2012-07-09 08:37:59 +00:00
adrian	8af1316f3a	Begin abstracting out the RX path in preparation for RX EDMA support. The RX EDMA support requires a modified approach to the RX descriptor handling. Specifically: * There's now two RX queues - high and low priority; * The RX queues are implemented as FIFOs; they're now an array of pointers to buffers; * .. and the RX buffer and descriptor are in the same "buffer", rather than being separate. So to that end, this commit abstracts out most of the RX related functions from the bulk of the driver. Notably, the RX DMA/buffer allocation isn't updated, primarily because I haven't yet fleshed out what it should look like. Whilst I'm here, create a set of matching but mostly unimplemented EDMA stubs. Tested: * AR9280, station mode TODO: * Thorough AP and other mode testing for non-EDMA chips; * Figure out how to allocate RX buffers suitable for RX EDMA, including correctly setting the mbuf length to compensate for the RX descriptor and completion status area.	2012-07-03 06:59:12 +00:00
adrian	b5c3afbe7d	Shuffle these initialisations to where they should be.	2012-06-24 08:28:06 +00:00
adrian	a5794ac4d0	Introduce an optional ath(4) radiotap vendor extension. This includes a few new fields in each RXed frame: * per chain RX RSSI (ctl and ext); * current RX chainmask; * EVM information; * PHY error code; * basic RX status bits (CRC error, PHY error, etc). This is primarily to allow me to do some userland PHY error processing for radar and spectral scan data. However since EVM and per-chain RSSI is provided, others may find it useful for a variety of tasks. The default is to not compile in the radiotap vendor extensions, primarily because tcpdump doesn't seem to handle the particular vendor extension layout I'm using, and I'd rather not break existing code out there that may be (badly) parsing the radiotap data. Instead, add the option 'ATH_ENABLE_RADIOTAP_VENDOR_EXT' to your kernel configuration file to enable these options.	2012-06-24 07:01:49 +00:00
adrian	656256d25b	After some discussion with bschmidt@, it's likely better to just go through ieee80211_suspend_all() and ieee80211_resume_all(). All the other wireless drivers are doing that particular dance. PR: kern/169084	2012-06-17 03:08:33 +00:00
adrian	e781c0c0fc	oops, remove this, it wasn't supposed to be committed.	2012-06-16 22:26:45 +00:00
kib	df96c54547	Fix build.	2012-06-16 20:49:08 +00:00
adrian	1e66a452fb	Shuffle some more fields in ath_buf so it's not too big. This shaves off 20 bytes - from 288 bytes to 268 bytes. However, it's still too big.	2012-06-16 04:41:35 +00:00
adrian	610e0d131b	Convert ath(4) to just use ieee80211_suspend_all() and ieee80211_resume_all(). The existing code tries to use the beacon miss timer to signal that the AP has gone away. Unfortunately this doesn't seem to be behaving itself. I'll try to investigate why this is for the sake of completeness. The result is the STA will stay "associated" to the AP it was associated with when it suspended. It never receives a bmiss notification so it never tries reassociating. PR: kern/169084	2012-06-15 01:15:59 +00:00
adrian	27ea453afe	Disable BGSCAN for 802.11n for now. Until scanning during traffic is fixed for 802.11n TX, this needs to be disabled or users wlil see randomly hanging aggregation sessions. Whilst I'm here, remove the warning about 802.11n being full of dragons. It's nowhere near that scary now.	2012-06-14 04:14:06 +00:00
adrian	a65b3dd589	Implement a global (all non-mgmt traffic) TX ath_buf limitation when ath_start() is called. This (defaults to 10 frames) gives for a little headway in the TX ath_buf allocation, so buffer cloning is still possible. This requires a lot omre experimenting and tuning. It also doesn't stop a node/TID from consuming all of the available ath_buf's, especially when the node is going through high packet loss or only talking at a low TX rate. It also doesn't stop a paused TID from taking all of the ath_bufs. I'll look at fixing that up in subsequent commits. PR: kern/168170	2012-06-14 00:51:53 +00:00
adrian	528dfae9f3	Implement a separate, smaller pool of ath_buf entries for use by management traffic. * Create sc_mgmt_txbuf and sc_mgmt_txdesc, initialise/free them appropriately. * Create an enum to represent buffer types in the API. * Extend ath_getbuf() and _ath_getbuf_locked() to take the above enum. * Right now anything sent via ic_raw_xmit() allocates via ATH_BUFTYPE_MGMT. This may not be very useful. * Add ATH_BUF_MGMT flag (ath_buf.bf_flags) which indicates the current buffer is a mgmt buffer and should go back onto the mgmt free list. * Extend 'txagg' to include debugging output for both normal and mgmt txbufs. * When checking/clearing ATH_BUF_BUSY, do it on both TX pools. Tested: * STA mode, with heavy UDP injection via iperf. This filled the TX queue however BARs were still going out successfully. TODO: * Initialise the mgmt buffers with ATH_BUF_MGMT and then ensure the right type is being allocated and freed on the appropriate list. That'd save a write operation (to bf->bf_flags) on each buffer alloc/free. * Test on AP mode, ensure that BAR TX and probe responses go out nicely when the main TX queue is filled (eg with paused traffic to a TID, awaiting a BAR to complete.) PR: kern/168170	2012-06-13 06:57:55 +00:00
adrian	0e5e2a4303	Replace the direct sc_txbuf manipulation with a pair of functions. This is preparation work for having a separate ath_buf queue for management traffic. PR: kern/168170	2012-06-13 05:39:16 +00:00
adrian	d958c69a5f	Add a new ioctl for ath(4) which returns the aggregate statistics.	2012-06-10 06:42:18 +00:00
adrian	b1336b8d21	Mostly revert previous commit(s). After doing a bunch of local testing, it turns out that it negatively affects performance. I'm stil investigating exactly why deferring the IO causes such negative TCP performance but doesn't affect UDP preformance. Leave the ath_tx_kick() change in there however; it's going to be useful to have that there for if_transmit() work. PR: kern/168649	2012-06-05 06:03:55 +00:00
adrian	6dd8c2eb6c	Create a function - ath_tx_kick() - which is called where ath_start() is called to "kick" along TX. For now, schedule a taskqueue call. Later on I may go back to the direct call of ath_rx_tasklet() - but for now, this will do. I've tested UDP and TCP TX. UDP TX still achieves 240MBit, but TCP TX gets stuck at around 100MBit or so, instead of the 150MBit it should be at. I'll re-test with no ACPI/power/sleep states enabled at startup and see what effect it has. This is in preparation for supporting an if_transmit() path, which will turn ath_tx_kick() into a NUL operation (as there won't be an ifnet queue to service.) Tested: * AR9280 STA TODO: * test on AR5416, AR9160, AR928x STA/AP modes PR: kern/168649	2012-06-05 03:14:49 +00:00
adrian	673d798918	Migrate the TX path to a taskqueue for now, until a better way of implementing parallel TX and TX/RX completion can be done without simply abusing long-held locks. Right now, multiple concurrent ath_start() entries can result in frames being dequeued out of order. Well, they're dequeued in order fine, but if there's any preemption or race between CPUs between: * removing the frame from the ifnet, and * calling and runningath_tx_start(), until the frame is placed on a software or hardware TXQ Then although dequeueing the frame is in-order, queueing it to the hardware may be out of order. This is solved in a lot of other drivers by just holding a TX lock over a rather long period of time. This lets them continue to direct dispatch without races between dequeue and hardware queue. Note to observers: if_transmit() doesn't necessarily solve this. It removes the ifnet from the main path, but the same issue exists if there's some intermediary queue (eg a bufring, which as an aside also may pull in ifnet when you're using ALTQ.) So, until I can sit down and code up a much better way of doing parallel TX, I'm going to leave the TX path using a deferred taskqueue task. What I will likely head towards is doing a direct dispatch to hardware or software via if_transmit(), but it'll require some driver changes to allow queues to be made without using the really large ath_buf / ath_desc entries. TODO: * Look at how feasible it'll be to just do direct dispatch to ath_tx_start() from if_transmit(), avoiding doing _any_ intermediary serialisation into a global queue. This may break ALTQ for example, so I have to be delicate. * It's quite likely that I should break up ath_tx_start() so it deposits frames onto the software queues first, and then only fill in the 802.11 fields when it's being queued to the hardware. That will make the if_transmit() -> software queue path very quick and lightweight. * This has some very bad behaviour when using ACPI and Cx states. I'll do some subsequent analysis using KTR and schedgraph and file a follow-up PR or two. PR: kern/168649	2012-06-04 22:01:12 +00:00
adrian	d6bb741dfb	oops - ath_hal_disablepcie is actually destined for another purpose, not to disable the PCIe PHY in prepration for reset. Extend the enablepci method to have a "poweroff" flag, which if equal to true means the hardware is about to go to sleep.	2012-05-25 05:01:27 +00:00

1 2 3 4 5 ...

465 Commits