AR5416 and AR9280, but leave it disabled by default.
TL;DR: don't enable this code at all unless you go through the process
of getting the NIC re-certified. This is purely to be used as a
reference and NOT a certified solution by any stretch of the imagination.
The background:
The AR5112 RF synth right up to the AR5133 RF synth (used on the AR5416,
derivative is used for the AR9130/AR9160) only implement down to 2.5MHz
channel spacing in 5GHz. Ie, the RF synth is programmed in steps of 2.5MHz
(or 5, 10, 20MHz.) So they can't represent the quarter rate channels
in the 4.9GHz PSB (which end in xxx2MHz and xxx7MHz). They support
fractional spacing in 2GHz (1MHz spacing) (or things wouldn't work,
right?)
So instead of doing this, the RF synth programming for the AR5112 and
later code will round to the nearest available frequency.
If all NICs were RF5112 or later, they'll inter-operate fine - they all
program the same. (And for reference, only the latest revision of the
RF5111 NICs do it, but the driver doesn't yet implement the programming.)
However:
* The AR5416 programming didn't at all implement the fractional synth
work around as above;
* The AR9280 programming actually programmed the accurate centre frequency
and thus wouldn't inter-operate with the legacy NICs.
So this patch:
* Implements the 4.9GHz PSB fractional synth workaround, exactly as the
RF5112 and later code does;
* Adds a very dirty workaround from me to calculate the same channel
centre "fudge" to the AR9280 code when operating on fractional frequencies
in 5GHz.
HOWEVER however:
It is disabled by default. Since the HAL didn't implement this feature,
it's highly unlikely that the AR5416 and AR928x has been tested in these
centre frequencies. There's a lot of regulatory compliance testing required
before a NIC can have this enabled - checking for centre frequency,
for drift, for synth spurs, for distortion and spectral mask compliance.
There's likely a lot of other things that need testing so please don't
treat this as an exhaustive, authoritative list. There's a perfectly good
process out there to get a NIC certified by your regulatory domain, please
go and engage someone to do that for you and pay the relevant fees.
If a company wishes to grab this work and certify existing 802.11n NICs
for work in these bands then please be my guest. The AR9280 works fine
on the correct fractional synth channels (49x2 and 49x7Mhz) so you don't
need to get certification for that. But the 500KHz offset hack may have
the above issues (spur, distortion, accuracy, etc) so you will need to
get the NIC recertified.
Please note that it's also CARD dependent. Just because the RF synth
will behave correctly doesn't at all mean that the card design will also
behave correctly. So no, I won't enable this by default if someone
verifies a specific AR5416/AR9280 NIC works. Please don't ask.
Tested:
I used the following NICs to do basic interoperability testing at
half and quarter rates. However, I only did very minimal spectrum
analyser testing (mostly "am I about to blow things up" testing;
not "certification ready" testing):
* AR5212 + AR5112 synth
* AR5413 + AR5413 synth
* AR5416 + AR5113 synth
* AR9280
net80211 node power save state.
* Add an ATH_NODE_UNLOCK_ASSERT() check
* Add a new node field - an_is_powersave
* Pause/unpause the queue based on the node state
* Attempt to handle net80211 concurrency issues so the queue
doesn't get paused/unpaused more than once at a time from
the net80211 power save code.
Whilst here (and breaking my usual rule), set CLRDMASK when a queue
is unpaused, regardless of whether the queue has some pending traffic.
This means the first frame from that TID (now or later) will hvae
CLRDMASK set.
Also whilst here, bump the swretrymax counters whenever the
filtered frames code expires a frame. Again, breaking my rule, but
this is just a statistics thing rather than a functional change.
This doesn't fix ps-poll (but it doesn't break it too much worse
than it is at the present) or correcting the TID updates.
That's next on the list.
Tested:
* AR9220 AP (Atheros AP96 reference design)
* Macbook Pro and LG Optimus 1 Android phone, both setting
and clearing power save state (but not using PS-POLL.)
tree used it incorrectly, which lead to inaccurate overrated
if_obytes accounting. The drbr(9) used to update ifnet stats on
drbr_enqueue(), which is not accurate since enqueuing doesn't
imply successful processing by driver. Dequeuing neither mean
that. Most drivers also called drbr_stats_update() which did
accounting again, leading to doubled if_obytes statistics. And
in case of severe transmitting, when a packet could be several
times enqueued and dequeued it could have been accounted several
times.
o Thus, make drbr(9) API thinner. Now drbr(9) merely chooses between
ALTQ queueing or buf_ring(9) queueing.
- It doesn't touch the buf_ring stats any more.
- It doesn't touch ifnet stats anymore.
- drbr_stats_update() no longer exists.
o buf_ring(9) handles its stats itself:
- It handles br_drops itself.
- br_prod_bytes stats are dropped. Rationale: no one ever
reads them but update of a common counter on every packet
negatively affects performance due to excessive cache
invalidation.
- buf_ring_enqueue_bytes() reduced to buf_ring_enqueue(), since
we no longer account bytes.
o Drivers handle their stats theirselves: if_obytes, if_omcasts.
o mlx4(4), igb(4), em(4), vxge(4), oce(4) and ixv(4) no longer
use drbr_stats_update(), and update ifnet stats theirselves.
o bxe(4) was the most correct driver, it didn't call
drbr_stats_update(), thus it was the only driver accurate under
moderate load. Now it also maintains stats itself.
o ixgbe(4) had already taken stats from hardware, so just
- drop software stats updating.
- take multicast packet count from hardware as well.
o mxge(4) just no longer needs NO_SLOW_STATS define.
o cxgb(4), cxgbe(4) need no change, since they obtain stats
from hardware.
Reviewed by: jfv, gnn
bits under #ifdef _KERNEL but leave definitions for various structures
defined by standards ($PIR table, SMAP entries, etc.) available to
userland.
- Consolidate duplicate SMBIOS table structure definitions in ipmi(4)
and smbios(4) in <machine/pc/bios.h> and make them available to
userland.
MFC after: 2 weeks
This doesn't specifically fix the issue(s) i'm seeing in this 2GHz
environment (where setting/increasing spur immunity causes OFDM restart
errors to skyrocket through the roof; but leaving it at 0 would leave
the environment cleaner..)
Pointy-hat-to: me, for committing this broken code in the first place.
- Use a dedicated task to handle deferred transmits from the if_transmit
method instead of reusing the existing per-queue interrupt task.
Reusing the per-queue interrupt task could result in both an interrupt
thread and the taskqueue thread trying to handle received packets on a
single queue resulting in out-of-order packet processing and lock
contention.
- Don't define ixgbe_start() at all where if_transmit is used.
Tested by: Vijay Singh
Reviewed by: jfv
MFC after: 2 weeks
Device nodes are in the format /dev/led/isci.busX.portY.locate.
Sponsored by: Intel
Requested by: Paul Maulberger <paul dot maulberger at gmx dot de>
MFC after: 1 week
things like EAPOL frames make it out.
After a whole bunch of hacking/testing, I discovered that they weren't
being early-dropped by the stack (but I should look at ensuring that
later..) but were even making to the hardware transmit queue.
They were mostly even being received by the remote end. However, the
remote end was completely ignoring them.
This didn't happen under 150-170MBit TCP tests as I'm guessing the TX
queue stayed very busy and the STA didn't do any scanning. However, when
doing 100Mbit/s of TCP traffic, the STA would do background scanning -
which involves it coming in and out of powersave mode with the AP.
Now, this is a total and utter hack around the real problems, which are:
* I need to implement proper power save handling and integrate it into
the filtered frames support, so the driver/stack doesn't send frames
whilst the station is actually in sleep;
* .. but frames were actually making it to the STA (macbook pro) and
the AP did receive an ACK; but a tcpdump on the receiving side showed
the EAPOL frame never made it. So the stack was dropping it for
some reason;
* Importantly - the EAPOL frames are currently going into the non-QoS
TID, which maps to the BE queue and is susceptible to that queue being
busy doing other things, but;
* There's other traffic going on in the non-QoS TID from other contexts
when scanning is going on and it's possible there's some races causing
sequence number/IV issues, but;
* Importantly importantlly, I think the interaction with TID 16 multicast
traffic in power save mode is causing issues - since I -believe- the
sequence number space being used by the EAPOL frames on TID 16 overlaps
with the multicast frames that have sequence numbers allocated and
are then stuffed on the cabq. Since with EAPOL frames being in TID 16
and queued to the BE queue, it's going to be waiting to be serviced
with all of the aggregate traffic going on - and if the CABQ gets
emptied beforehand, those TID 16 multicast frames with sequence numbers
will go out beforehand.
Now, there's quite likely a bunch of "stuff happening slightly out of
sequence" going on due to the nature of the TX path (read: lots of
overlapping and concurrent ath_start() and ath_raw_xmit() calls going
on, sigh) but I thought I had caught them all and stuffed each TID TX
behind a lock (that lasted as long as it needed to in order to get
the frame onto the relevant destination queue - thus keeping things
in order.)
Unfortunately the last problem is the big one and I'm going to stare at
it some more. If it _is_
So this is a work around for now to ensure that EAPOL frames actually
make it out before any other stuff in the non-QoS TID and HOPEFULLY
before the CABQ gets active.
I'm now going to spend a little time in the TX path figuring out exactly
why the sender is rejecting things. There's two (well, three if you count
EAPOL contents invalid) possibilities:
* The sequence number is out of order (ie, something else like the multicast
traffic on CABQ) is going out first on TID 16;
* The CCMP IV is out of order (similar to above - but less likely, as the
TX key for multicast traffic is different to unicast traffic);
* EAPOL contents strangely invalid.
AP: Ubiquiti RSPRO, AR9160/AR9220 NICs
STA: Macbook Pro, Broadcom 11n NIC
lock may be held.
Kim reported that the TID lock wasn't held when ath_tx_update_clrdmask()
was called. Well, the underlying hardware TXQ for that TID.
I'm betting it's the cabq stuff. ath_tx_xmit_normal() can be called
for both real and software cabq. For software cabq, the real destination
txq is different to the txq. So, the lock check will fail.
Reported by: Kim Culhan <w8hdkim@gmail.com>
offline in response to a INQUIRY command that does not retreive vital
product data(I personally have observed the behaviour on an Adaptec 2405
and a 5805). Force the peripheral qualifier to "connected" so that upper
layers correctly recognize that a disk is present.
This bug was uncovered by r216236. Prior to that fix, aac(4) was
accidentally clearing the peripheral qualifier for all inquiry commands.
This fixes an issue where passthrough devices were not created for
disks behind aac(4) controllers suffering from the bug. I have
verified that if a disk is not present that we still properly detect
that and not create the passthrough device.
Sponsored by: Sandvine Incorporated
MFC after: 1 week
LUNs respectively. This removes a huge number of error messages
from CAM during bus scans.
Copied almost verbatim from mav's commit r237460.
Submitted by: Mike Tancsa <mike at sentex dot net>
MFC after: 3 days
immediately panics on boot with INVARIANTS enabled. The driver already
clearly expects to be able to recurse on this mutex - the main I/O
is always recursing on this lock.
Reported and tested by: Mike Tancsa <mike at sentex dot net>
MFC after: 1 week
This should eventually be unified with ATH_DEBUG() so I can get both
from one macro; that may take some time.
Add some new probes for TX and TX completion.
* use the correct frame status - although the completion descriptor is
the _last_ in the frame/aggregate, the status is currently stored in
the _first_ buffer.
* Print out ath_buf specific fields once, not per descriptor in an ath_buf.
at r230551.
Also while there, make sense polling use reported for each node separately
instead of reporting accumulated total status.
Submitted by: Barbara <barbara.freebsd@gmail.com> (1)
MFC after: 3 days
it's disabled.
The previous commit to enable CLRDMASK setting didn't do it at all
correctly for non-aggregate sessions - so the CLRDMASK bit would be
cleared and never re-set.
* move ath_tx_update_clrdmask() to be called by functions that setup
descriptors and queue frames to the hardware, rather than scattered
everywhere.
* Force CLRDMASK to be set on all non-aggregate session frames being
transmitted.
* Use ath_tx_normal_comp() now on non-aggregate sessoin frames
that are queued via ath_tx_xmit_normal(). That way the TID hwq is
updated and they can trigger (eventual) filter frame queue resets
and software retransmits.
There's still a bit more work to do in this area to reverse the silly
short-sightedness on my part, however it's likely going to be better
to fix this now than just reverting the patch.
Thanks to people on the freebsd-wireless@ mailing list for promptly
pointing this out.
adapter->dropped_pkts instead of if_ierrors because if_ierrors is
overwritten by hw stats collection.
Submitted by: Andrew Boyer <aboyer@averesystems.com>
Reviewed by: Jack F Vogel <jfv@freebsd.org>
MFC after: 2 weeks
gone rule. Optimise use of channels so that when a channel
is not ready another channel is used. Instead of using the SOF interrupt
use the system timer to drive the host statemachine. This might
give lower throughput and higher latency, but reduces the CPU usage
significantly. The DWC OTG host mode support should not be considered
for serious USB host controller applications. Some problems are still
seen with LOW speed USB devices.