freebsd-skq

Author	SHA1	Message	Date
Adrian Chadd	f8af1be8f8	Remove the ah_desc.h reference; it's not needed. I'm using these descriptor header files in userland and I'm trying to avoid populating a compatibility ah_desc.h file.	2012-11-17 02:00:33 +00:00
Adrian Chadd	69f33b13d1	I'm not sure why ah_desc.h was required here, but it doesn't _need_ to be. So, just toss it. There's no options or ah_desc fields in here. Whilst I'm here, fix up the #ifdef and #define to mach.	2012-11-16 20:04:45 +00:00
Adrian Chadd	e3f0668803	* Remove a duplicate TX ALQ post routine! * For CABQ traffic, I -can- chain them together using the next pointer and just push that particular chain head to the CABQ. However, this doesn't magically make EDMA TX CABQ work - I have to do some further hoop jumping.	2012-11-16 19:58:15 +00:00
Adrian Chadd	bb327d284b	ALQ logging enhancements: * upon setup, tell the alq code what the chip information is. * add TX/RX path logging for legacy chips. * populate the tx/rx descriptor length fields with a best-estimate. It's overly big (96 bytes when AH_SUPPORT_AR5416 is enabled) but it'll do for now. Whilst I'm here, add CURVNET_RESTORE() here during probe/attach as a partial solution to fixing crashes during attach when the attach fails. There are other attach failures that I have to deal with; those'll come later.	2012-11-16 19:57:16 +00:00
Adrian Chadd	956d4fb965	ath(4) ALQ logging improvements. * Add a new method which allows the driver to push the MAC/phy/hal info into the logging stream. * Add a new ALQ logging entry which logs the mac/phy/hal information. * Modify the ALQ startup path to log the MAC/phy/hal information so the decoder knows which HAL/chip is generating this information. * Convert the header and mac/phy/hal information to use be32, rather than host order. I'd like to make this stuff endian-agnostic so I can decode MIPS generated logs on a PC. This requires some further driver modifications to correctly log the right initial chip information. Also - although noone bar me is currently using this, I've shifted the debug bitmask around a bit. Consider yourself warned!	2012-11-16 19:39:29 +00:00
Adrian Chadd	bbdf3df1c4	Make sure the final descriptor in an aggregate has rate control information. This was broken by me when merging the 802.11n aggregate descriptor chain setup with the default descriptor chain setup, in preparation for supporting AR9380 NICs. The corner case here is quite specific - if you queue an aggregate frame with >1 frames in it, and the last subframe has only one descriptor making it up, then that descriptor won't have the rate control information copied into it. Look at what happens inside ar5416FillTxDesc() if both firstSeg and lastSeg are set to 1. Then when ar5416ProcTxDesc() goes to fill out ts_rate based on the transmit index, it looks at the rate control fields in that descriptor and dutifully sets it to be 0. It doesn't happen for non-aggregate frames - if they have one descriptor, the first descriptor already has rate control info. I removed the call to ath_hal_setuplasttxdesc() when I migrated the code to use the "new" style aggregate chain routines from the HAL. But I missed this particular corner case. This is a bit inefficient with MIPS boards as it involves a few redundant writes into non-cachable memory. I'll chase that up when it matters. Tested: * AR9280 STA mode, TCP iperf traffic * Rui Paulo <rpaulo@> first reported this and has verified it on his AR9160 based AP. PR: kern/173636	2012-11-15 03:00:49 +00:00
Adrian Chadd	5f9fe65d64	Place 'dev.ath.X.debug' back under ATH_DEBUG, rather than ATH_DEBUG_ALQ.	2012-11-13 19:45:13 +00:00
Adrian Chadd	7d9dd2ac96	Add some debugging to try and catch an invalid TX rate (0x0) that is being reported.	2012-11-13 06:28:57 +00:00
Adrian Chadd	603280386b	Correctly fix the 'scan during STA mode' crash.	2012-11-11 21:58:18 +00:00
Adrian Chadd	58c82ec453	Remove this; i incorrectly committed the wrong (debug) changes in my previous commit.	2012-11-11 21:57:18 +00:00
Adrian Chadd	04cdca73d9	Don't call av_set_tim() if it's NULL. This happens during a scan in STA mode; any queued data frames will be power save queued but as there's no TIM in STA mode, it panics. This was introduced by me when I disabled my driver-aware power save handling support.	2012-11-11 00:34:10 +00:00
Adrian Chadd	3345c65be0	Correct some rather weird and broken behaviour observed when doing actual traffic with an AR9380/AR9382/AR9485. The sample rate control stats would show impossibly large numbers for "successful packets transmitted." The number was a tad under 2^^64-1. So after a bit of digging, I found that the sample rate control code was making 'tries' turn into a negative number.. and this was because ts_longretry was too small. The hardware returns "ts_longretry" at the current rate selection, not overall for that TX descriptor. So if you setup four TX rate scenarios and the second one works, ts_longretry is only set for the number of attempts at that second rate scenario. The FreeBSD HAL code does the correction in ath_hal_proctxdesc() - however, this isn't possible with EDMA. EDMA TX completion is done separate from the original TX descriptor. So the real solution is to split out "find ts_rate and ts_longretry" from "complete TX descriptor". Until that's done, put a hack in the EDMA TX path that uses the rate scenario information in the ath_buf. Tested: AR9380, AR9382, AR9485 STA mode	2012-11-10 22:37:06 +00:00
Kevin Lo	f78d5b7e8a	s/ATH_DEBUG/ATH_DEBUG_ALQ	2012-11-10 15:21:39 +00:00
Kevin Lo	9fc1923565	Fix the build.	2012-11-10 08:34:40 +00:00
Adrian Chadd	a64438faed	Fix a very incorrect description.	2012-11-09 01:28:11 +00:00
Adrian Chadd	bbee93a84e	Fix the build - fix up the ath_alq code to not compile by default.	2012-11-08 23:11:59 +00:00
Adrian Chadd	b69b0dcc24	Add some hooks into the driver to attach, detach and record EDMA descriptor events. This is primarily for the TX EDMA and TX EDMA completion. I haven't yet tied it into the EDMA RX path or the legacy TX/RX path. Things that I don't quite like: * Make the pointer type 'void' in ath_softc and have if_ath_alq() return a malloc'ed buffer. That would remove the need to include if_ath_alq.h in if_athvar.h. The sysctl setup needs to be cleaned up.	2012-11-08 18:11:31 +00:00
Adrian Chadd	2a2441c9fa	Add my initial cut at driver-layer ALQ support. I'm using this to debug EDMA TX and RX descriptors and it's really helpful to have a non-printf() way to decode frames. I won't link this into the build until I've tidied it up a little more. This will eventually be behind ATH_DEBUG_ALQ.	2012-11-08 18:07:29 +00:00
Adrian Chadd	174484b17a	Oops, fix bogus spacing.	2012-11-08 17:46:27 +00:00
Adrian Chadd	ae3815fd18	Implement the ATH_RESET_NOLOSS path for TX stop and start; this is needed for 802.11n TX device restarting. Remove the debug printf()s; they're no longer needed here.	2012-11-08 17:43:58 +00:00
Adrian Chadd	d4c0d5d0d9	Convert this to a debug printf; it's working fine now.	2012-11-08 17:32:55 +00:00
Adrian Chadd	89d2e576a4	Don't compile in my (not yet committed) ath_alq code unless ATH_DEBUG_ALQ is defined. This will unbreak ATH_DEBUG builds.	2012-11-07 16:34:09 +00:00
Adrian Chadd	bdbb6e5b8c	Disable my software queue TIM and PS handling for now. ps-poll is totally broken in its current form. This should unbreak things enough to let people use PS-POLL devices, but leave it in place for me to finish PS-POLL handling.	2012-11-07 06:29:45 +00:00
Adrian Chadd	7877ac644e	Add new HAL configuration features for the updated AR9300 HAL.	2012-11-07 06:23:23 +00:00
Adrian Chadd	6e84772f4d	Convert the aggregate descriptor path over to use the same API as the non-aggregate path. I "cheated" by using some TX setup code in our HAL that isn't present in the atheros HAL (or Linux ath9k.) The old path for forming aggregates was: * setup the rate control in the first descriptor; * call chaintxdesc() on all the frames; * call setupfirsttxdesc() on the first descrpitor in the first frame; * call setuplasttxdesc() on the last descriptor in the last frame. The new path for forming aggregates looks like the non-aggregate path: * call setuptxdesc() on the first descriptor in the first frame; * setup the rate control in the first descriptor; * call filltxdesc() on each descriptor in the frame; * if it's an aggregate - call set11n_aggr_{first, middle, last} as appropriate (see the code for a description of what is "appropriate".) Now, this is done primarily for the AR9300 HAL - it doesn't implement the first set of aggregate functions. It just has the older methods and the "first/middle/last" aggregate methods. So, let's convert the code to use these. Note: the AR5416 HAL in FreeBSD had that code (from me, a while ago) and a previous commit brought it up to behave the same as the AR9300 HAL routines. There's some further tidyups to be done - specifically, avoid doing multiple calls to the 11n descriptor functions. I shouldn't call clr11n_aggr(), then set11n_aggr_middle(), then also set11n_aggr_first(). On (at least MIPS) the TX descriptors are in non-cachable memory and this will cause multiple slow writes. I'll debug/tidy that up in a future commit. Tested: * AR9280, STA * AR9280/AR9160, AP * AR9380, STA (using a local, closed source HAL, sorry!)	2012-11-06 06:19:11 +00:00
Dimitry Andric	29658c96ce	Remove duplicate const specifiers in many drivers (I hope I got all of them, please let me know if not). Most of these are of the form: static const struct bzzt_type { [...list of members...] } const bzzt_devs[] = { [...list of initializers...] }; The second const is unnecessary, as arrays cannot be modified anyway, and if the elements are const, the whole thing is const automatically (e.g. it is placed in .rodata). I have verified this does not change the binary output of a full kernel build (except for build timestamps embedded in the object files). Reviewed by: yongari, marius MFC after: 1 week	2012-11-05 19:16:27 +00:00
Adrian Chadd	c19a2a1a9f	Clear IFF_DRV_OACTIVE if any slots were completed. This unblocks TX EDMA under high load.	2012-11-05 09:27:47 +00:00
Adrian Chadd	bc919a54b2	TX EDMA debugging fixes: * Do the calculation for each ath_buf, rather than just the first * Correct the calculation in the first place.	2012-11-05 07:08:45 +00:00
Adrian Chadd	4c5038c7b5	Oops - conditionalise that.	2012-11-04 00:46:01 +00:00
Adrian Chadd	d40c846abf	EDMA TX tweaks: * don't poke ath_hal_txstart() if nothing was pushed into the FIFO during the refill process; * shuffle around the TX debugging output a little so it's logged at TX hardware enqueue; * Add logging of the TX status processing.	2012-11-03 22:54:42 +00:00
Adrian Chadd	64dbfc6d92	For AR9380 NICs - the non-enterprise versions don't support RTS protection of small (< 256 byte) aggregate frames. This needs to be done or 11n aggregation TX just simply doesn't work on these NICs. Whilst here, extend some debug printing; I was using this whilst debugging the TX power setup in the TX descriptor(s) on the AR9380.	2012-11-03 22:13:42 +00:00
Adrian Chadd	5540369b93	Add a new HAL call to extract out the HAL enterprise bits from the AR9300 HAL.	2012-11-03 22:12:35 +00:00
Adrian Chadd	b90559c429	HAL API updates, from the previous couple of HAL commits.	2012-11-03 04:56:08 +00:00
Adrian Chadd	f74b406ddd	HAL API changes! * introduce a new HAL API method to pull out the TX status descriptor contents. * Add num_delims to the 11n first aggr method. This isn't used by the driver at the moment so it won't affect anything.	2012-11-03 04:55:43 +00:00
Adrian Chadd	70ee90299b	Add a debug method to dump the EDMA TX status descriptor contents out. This requires some HAL API changes to be useful, as there's no way right now to pull out the TX status descriptor contents.	2012-11-03 04:53:44 +00:00
Adrian Chadd	aff98f17c6	Since the PLL changes aren't in here yet for the AR9130 half/quarter rate support, disable it.	2012-10-31 21:14:25 +00:00
Adrian Chadd	b0245b90ba	Oops - this was incorrectly removed in a previous commit.	2012-10-31 21:06:55 +00:00
Adrian Chadd	9bb63aa8ff	Oops - missing from the last commit - add ANI immunity levels for AR9160. Obtained from: Qualcomm Atheros	2012-10-31 21:04:23 +00:00
Adrian Chadd	adadb6074d	HAL updates! * Add some more ANI spur immunity levels. * For AR5111 radios attached to an AR5212, limit the 5GHz channels that are available. A later revision of the AR5111 supports the 4.9GHz PSB channels but right now there's no check in place for the radio revision. If someone wants PSB support on AR5212+AR5111 radios then please let me know and I'll add the relevant version check. Obtained from: Qualcomm Atheros	2012-10-31 21:03:55 +00:00
Adrian Chadd	3631c3f200	Add in the last random assortment of missing bits for the AR9380 HAL. Obtained from: Qualcomm Atheros	2012-10-31 21:00:01 +00:00
Adrian Chadd	321e63ddee	Add the emulation PCI device id - these days, 0xabcd shows up all over the internet as "AR9380 and later which didn't get its PCI ID written in at power-on", so it's hardly an unknown constant. Obtained from: Qualcomm Atheros	2012-10-31 20:58:24 +00:00
Adrian Chadd	bf57b7b2ce	I've had some feedback that CCK rates are more reliable than MCS 0 in some very degenerate conditions. However, until ath_rate_form_aggr() is taught to not form aggregates if ANY selected rate is non-MCS, this can't yet be enabled. So, just add a comment.	2012-10-31 06:35:50 +00:00
Adrian Chadd	1b5c5f5ad0	I give up - introduce a TX lock to serialise TX operations. I've tried serialising TX using queues and such but unfortunately due to how this interacts with the locking going on elsewhere in the networking stack, the TX task gets delayed, resulting in quite a noticable throughput loss: * baseline TCP for 2x2 11n HT40 is ~ 170mbit/sec; * TCP for TX task in the ath taskq, with the RX also going on - 80mbit/sec; * TCP for TX task in a separate, second taskq - 100mbit/sec. So for now I'm going with the Linux wireless stack approach - lock tx early. The linux code does in the wireless stack, before the 802.11 state stuff happens and before it's punted to the driver. But TX locking needs to also occur at the driver layer as the TX completion code _also_ begins to drain the ifnet TX queue. Whilst I'm here, add some KTR traces for the TX path. Note: * This really should be done at the net80211 layer (as well, at least.) But that'll have to wait for a little more thought to happen.	2012-10-31 06:27:58 +00:00
Adrian Chadd	548a605d0d	Begin fleshing out some software queue awareness for TIM handling with the power save queue. * introduce some new ATH_NODE lock protected fields, tracking the net80211 psq and TIM state; * when doing buffer transitions - ie, when sending and completing buffers - check the state of the SWQ and update the TIM appropriately. * when clearing the TIM bit, if the SWQ is not empty then delay clearing it. This is racy, but it's no less racy than the current net80211 power save queue management code. Specifically, with multiple TX threads, it's quite plausible that parallel state updates will race and the TIM will be left in an inconsistent state. I'll address that in a follow-up commit.	2012-10-28 21:13:12 +00:00
Adrian Chadd	a93c5097c9	Add a temporary (for values of "temporary") work around for hotplug support with ath(4) and VIMAGE. Right now the VIMAGE code doesn't supply a default vnet context during: * hotplug attach; * any device detach. It special cases kldload/boot time probing (by setting the context to vnet0) but that doesn't occur when probing devices during a bus rescan - eg, adding a cardbus card. These will eventually go away when the VIMAGE support extends to providing default contexts to hotplug attach/detach.	2012-10-28 18:46:06 +00:00
Adrian Chadd	9572684af7	Since it's not immediately obvious whether the current TX path handles fragment rate lookups correctly, add a comment describing exactly that. The assumption in the fragment duration code is the duration of the next fragment will match the rate used by the current fragment. But I think a rate lookup is being done for _each_ fragment. For older pre-sample rate control this would almost always be the case, but for sample it may be incorrect more often then correct.	2012-10-26 16:31:12 +00:00
Adrian Chadd	cf0c92d600	Track the total number of software queued frames in an atomic variable stashed away in ath_node. As much as I tried to stuff that behind the ATH_NODE lock, unfortunately the locking is just too plain hairy (for me! And I wrote it!) to do cleanly. Hence using atomics here instead of a lock. The ATH_NODE lock just isn't currently used anywhere besides the rate control updates. If in the future everything gets migrated back to using a single ATH_NODE lock or a single global ATH_TX lock (ie, a single TX lock for all TX and TX completion) then fine, I'll remove the atomics.	2012-10-15 00:07:18 +00:00
Adrian Chadd	13aa9ee5c2	Stop abusing the ATH_TID_*() queue macros for filtered frames and give them their own macro set.	2012-10-14 23:52:30 +00:00
Adrian Chadd	8e7393944d	Push the actual TX processing into the ath taskqueue, rather than having it run out of multiple concurrent contexts. Right now the ath(4) TX processing is a bit hairy. Specifically: * It was running out of ath_start(), which could occur from multiple concurrent sending processes (as if_start() can be started from multiple sending threads nowdays.. sigh) * during RX if fast frames are enabled (so not really at the moment, not until I fix this particular feature again..) * during ath_reset() - so anything which calls that * during ath_tx_proc() in the ath taskqueue - ie, TX is attempted again after TX completion, as there's now hopefully some ath_bufs available. Then, the ic_raw_xmit() method can queue raw frames for transmission at any time, from any net80211 TX context. Ew. This has caused packet ordering issues in the past - specifically, there's absolutely no guarantee that preemption won't occuring _during_ ath_start() by the TX completion processing, which will call ath_start() again. It's a mess - 802.11 really, really wants things to be in sequence or things go all kinds of loopy. So: * create a new task struct for TX'ing; * make the if_start method simply queue the task on the ath taskqueue; * make ath_start() just be called by the new TX task; * make ath_tx_kick() just schedule the ath TX task, rather than directly calling ath_start(). Now yes, this means that I've taken a step backwards in terms of concurrency - TX -and- RX now occur in the same single-task taskqueue. But there's nothing stopping me from separating out the TX / TX completion code into a separate taskqueue which runs in parallel with the RX path, if that ends up being appropriate for some platforms. This fixes the CCMP/seqno concurrency issues that creep up when you transmit large amounts of uni-directional UDP traffic (>200MBit) on a FreeBSD STA -> AP, as now there's only one TX context no matter what's going on (TX completion->retry/software queue, userland->net80211->ath_start(), TX completion -> ath_start()); but it won't fix any concurrency issues between raw transmitted frames and non-raw transmitted frames (eg EAPOL frames on TID 16 and any other TID 16 multicast traffic that gets put on the CABQ.) That is going to require a bunch more re-architecture before it's feasible to fix. In any case, this is a big step towards making the majority of the TX path locking irrelevant, as now almost all TX activity occurs in the taskqueue. Phew.	2012-10-14 20:44:08 +00:00
Adrian Chadd	516f67965a	Break the RX processing up into smaller chunks of 128 frames each. Right now processing a full 512 frame queue takes quite a while (measured on the order of milliseconds.) Because of this, the TX processing ends up sometimes preempting the taskqueue: * userland sends a frame * it goes in through net80211 and out to ath_start() * ath_start() will end up either direct dispatching or software queuing a frame. If TX had to wait for RX to finish, it would add quite a few ms of additional latency to the packet transmission. This in the past has caused issues with TCP throughput. Now, as part of my attempt to bring sanity to the TX/RX paths, the first step is to make the RX processing happen in smaller 'parts'. That way when TX is pushed into the ath taskqueue, there won't be so much latency in the way of things. The bigger scale change (which will come much later) is to actually process the frames in the ath_intr taskqueue but process _frames_ in the ath driver taskqueue. That would reduce the latency between processing and requeuing new descriptors. But that'll come later. The actual work: * Add ATH_RX_MAX at 128 (static for now); * break out of the processing loop if npkts reaches ATH_RX_MAX; * if we processed ATH_RX_MAX or more frames during the processing loop, immediately reschedule another RX taskqueue run. This will handle the further frames in the taskqueue. This should have very minimal impact on the general throughput case, unless the scheduler is being very very strange or the ath taskqueue ends up spending a lot of time on non-RX operations (such as TX completion.)	2012-10-14 20:31:38 +00:00

1 2 3 4 5 ...

1320 Commits