instead of axq_link.
This (among a bunch of uncommitted work) is required for EDMA chips
to correctly transmit frames on the CABQ.
Tested:
* AR9280, hostap mode
* AR9380/AR9580, hostap mode (staggered beacons)
TODO:
* This code only really gets called when burst beacons are used;
it glues multiple CABQ queues together when sending to the hardware.
* More thorough bursted beacon testing! (first requires some work with
the beacon queue code for bursted beacons, as that currently uses the
link pointer and will fail on EDMA chips.)
the descriptor link pointer, rather than directly.
This is needed on AR9380 and later (ie, EDMA) NICs so the multicast queue
has a chance in hell of being put together right.
Tested:
* AR9380, AR9580 in hostap mode, CABQ traffic (but with other patches..)
It is already done in SSIF interface code.
This reduces contention/spinning reported by many users.
PR: kern/172166
Submitted by: Eric van Gyzen <eric at vangyzen.net>
MFC after: 2 weeks
extra read from PxCI/PxSACT registers. If only NCQ commands are running, we
don't really need PxCI. If only non-NCQ commands are running we don't need
PxSACT. Mixed set may happen only on controllers with FIS-based switching
when port multiplier is attached, and then we have to read both registers.
MFC after: 1 month
related issues.
Moving the TX locking under one lock made things easier to progress on
but it had one important side-effect - it increased the latency when
handling CABQ setup when sending beacons.
This commit introduces a bunch of new changes and a few unrelated changs
that are just easier to lump in here.
The aim is to have the CABQ locking separate from other locking.
The CABQ transmit path in the beacon process thus doesn't have to grab
the general TX lock, reducing lock contention/latency and making it
more likely that we'll make the beacon TX timing.
The second half of this commit is the CABQ related setup changes needed
for sane looking EDMA CABQ support. Right now the EDMA TX code naively
assumes that only one frame (MPDU or A-MPDU) is being pushed into each
FIFO slot. For the CABQ this isn't true - a whole list of frames is
being pushed in - and thus CABQ handling breaks very quickly.
The aim here is to setup the CABQ list and then push _that list_ to
the hardware for transmission. I can then extend the EDMA TX code
to stamp that list as being "one" FIFO entry (likely by tagging the
last buffer in that list as "FIFO END") so the EDMA TX completion code
correctly tracks things.
Major:
* Migrate the per-TXQ add/removal locking back to per-TXQ, rather than
a single lock.
* Leave the software queue side of things under the ATH_TX_LOCK lock,
(continuing) to serialise things as they are.
* Add a new function which is called whenever there's a beacon miss,
to print out some debugging. This is primarily designed to help
me figure out if the beacon miss events are due to a noisy environment,
issues with the PHY/MAC, or other.
* Move the CABQ setup/enable to occur _after_ all the VAPs have been
looked at. This means that for multiple VAPS in bursted mode, the
CABQ gets primed once all VAPs are checked, rather than being primed
on the first VAP and then having frames appended after this.
Minor:
* Add a (disabled) twiddle to let me enable/disable cabq traffic.
It's primarily there to let me easily debug what's going on with beacon
and CABQ setup/traffic; there's some DMA engine hangs which I'm finally
trying to trace down.
* Clear bf_next when flushing frames; it should quieten some warnings
that show up when a node goes away.
Tested:
* AR9280, STA/hostap, up to 4 vaps (staggered)
* AR5416, STA/hostap, up to 4 vaps (staggered)
TODO:
* (Lots) more AR9380 and later testing, as I may have missed something here.
* Leverage this to fix CABQ hanling for AR9380 and later chips.
* Force bursted beaconing on the chips that default to staggered beacons and
ensure the CABQ stuff is all sane (eg, the MORE bits that aren't being
correctly set when chaining descriptors.)
to stuck beacons.
* Set the cabq readytime (ie, how long to burst for) to 50% of the total
beacon interval time
* fix the cabq adjustment calculation based on how the beacon offset is
calculated (the SWBA/DBA time offset.)
This is all still a bit magic voodoo but it does seem to have further
quietened issues with missed/stuck beacons under my local testing.
In any case, it better matches what the reference HAL implements.
Obtained from: Qualcomm Atheros
held. The ttm_buffer_object_transfer() does not need the mutex locked
at all, except for the call to the driver sync_obj_ref() method.
Reported and tested by: dumbbell
MFC after: 2 weeks
satisfy some alignment restrictions. Do not set TW_OSLI_REQ_FLAGS_CCB
flag for mapped data, pass the csio->data_ptr in the req->data.
Do not put the ccb pointer into req->data ever, ccb is stored in
req->orig_req already.
Submitted by: Shuichi KITAGUCHI <ki@hh.iij4u.or.jp>
PR: kern/177020
auto-correction. This change makes re(4) establish a link with
a system using non-crossover UTP cable.
Tested by: Michael BlackHeart < amdmiek <> gmail dot com >
"complete RX frames."
The 128 entry RX FIFO is really easy to fill up and miss refilling
when it's done in the ath taskq - as that gets blocked up doing
RX completion, TX completion and other random things.
So the 128 entry RX FIFO now gets emptied and refilled in the ath_intr()
task (and it grabs / releases locks, so now ath_intr() can't just be
a FAST handler yet!) but the locks aren't held for very long. The
completion part is done in the ath taskqueue context.
Details:
* Create a new completed frame list - sc->sc_rx_rxlist;
* Split the EDMA RX process queue into two halves - one that
processes the RX FIFO and refills it with new frames; another
that completes the completed frame list;
* When tearing down the driver, flush whatever is in the deferred
queue as well as what's in the FIFO;
* Create two new RX methods - one that processes all RX queues,
one that processes the given RX queue. When MSI is implemented,
we get told which RX queue the interrupt came in on so we can
specifically schedule that. (And I can do that with the non-MSI
path too; I'll figure that out later.)
* Convert the legacy code over to use these new RX methods;
* Replace all the instances of the RX taskqueue enqueue with a call
to a relevant RX method to enqueue one or all RX queues.
Tested:
* AR9380, STA
* AR9580, STA
* AR5413, STA
The vnode-backed md(4) has to map the unmapped bio because VOP_READ()
and VOP_WRITE() interfaces do not allow to pass unmapped requests to
the filesystem. Vnode-backed md(4) uses pbufs instead of relying on
the bio_transient_map, to avoid usual md deadlock.
Sponsored by: The FreeBSD Foundation
Tested by: pho, scottl
first position of compatible property, so simplebus driver can be generic
driver for any bus listed as compatible with "simple-bus".
Sponsored by: The FreeBSD Foundation
to specify the offset into the PCI memory spare at which each serial port
will find its registers. This was already done for other Exar PCI serial
devices; it was accidentally omitted for this specific device.
Sponsored by: Sandvine Incorporated
MFC after: 1 week
driver such that checking against the type was always false.
To detect NS DP83816, driver should have checked silicon revision
register for NS controllers. While here, remove SIS_TYPE_83816 to
not make the similar mistake again.
Reported by: Brad Smith ( brad@openbsd )
* when pulling frames off of the TID queue, the ATH_TID_REMOVE()
macro decrements the axq_depth field. So don't do it twice.
* in ath_tx_comp_cleanup_aggr(), bf wasn't being reset to bf_first
before walking the buffer list to complete buffers; so those buffers
will leak.
Since this is being done during buffer free, it's a crap shoot whether
the TX path lock is held or not. I tried putting the ath_freebuf() code
inside the TX lock and I got all kinds of locking issues - it turns out
that the buffer free path sometimes is called with the lock held and
sometimes isn't. So I'll go and fix that soon.
Hence for now the holdingbf buffers are protected by the TXBUF lock.
When working on TDMA, Sam Leffler found that the MAC DMA hardware
would re-read the last TX descriptor when getting ready to transmit
the next one. Thus the whole ATH_BUF_BUSY came into existance -
the descriptor must be left alone (very specifically the link pointer
must be maintained) until the hardware has moved onto the next frame.
He saw this in TDMA because the MAC would be frequently stopping during
active transmit (ie, when it wasn't its turn to transmit.)
Fast-forward to today. It turns out that this is a problem not with
a single MAC DMA instance, but with each QCU (from 0->9). They each
maintain separate descriptor pointers and will re-read the last
descriptor when starting to transmit the next.
So when your AP is busy transmitting from multiple TX queues, you'll
(more) frequently see one QCU stopped, waiting for a higher-priority QCU
to finsh transmitting, before it'll go ahead and continue. If you mess
up the descriptor (ie by freeing it) then you're short of luck.
Thanks to rpaulo for sticking with me whilst I diagnosed this issue
that he was quite reliably triggering in his environment.
This is a reimplementation; it doesn't have anything in common with
the ath9k or the Qualcomm Atheros reference driver.
Now - it in theory doesn't apply on the EDMA chips, as long as you
push one complete frame into the FIFO at a time. But the MAC can DMA
from a list of frames pushed into the hardware queue (ie, you concat
'n' frames together with link pointers, and then push the head pointer
into the TXQ FIFO.) Since that's likely how I'm going to implement
CABQ handling in hostap mode, it's likely that I will end up teaching
the EDMA TX completion code about busy buffers, just to be "sure"
this doesn't creep up.
Tested - iperf ap->sta and sta->ap (with both sides running this code):
* AR5416 STA
* AR9160/AR9220 hostap
To validate that it doesn't break the EDMA (FIFO) chips:
* AR9380, AR9485, AR9462 STA
Using iperf with the -S <tos byte decimal value> to set the TCP client
side DSCP bits, mapping to different TIDs and thus different TX queues.
TODO:
* Make this work on the EDMA chips, if we end up pushing lists of frames
to the hardware (eg how we eventually will handle cabq in hostap/ibss
mode.)
Else they won't enumerate at all:
hw.usb.full_ddesc=1
- Reduce the USB descriptor read timeout from 1000ms to
500ms. Typical value for LOW speed devices is 50-100ms.
- Enumerate USB device a maximum of 3 times when a port
connection change event is detected, before giving up.
MFC after: 1 month
Fix the IPMI regression by sending BGE_FW_DRV_STATE_UNLOAD to
ASF/IPMI firmware in driver attach phase. Sending heartheat to
ASF/IPMI is enabled only after upping interface so
setting driver state to BGE_FW_DRV_STATE_START in attach phase
broke IPMI access.
While I'm here, add NVRAM arbitration lock before performing
controller reset. ASF/IPMI firmware may be able to access the NVRAM
while controller reset is in progress. Without the arbitration
lock before resetting the controller, ASF/IPMI may not initialize
properly.
Special thanks to Miroslav Lachman who provided full remote
debugging environments.
* a flags field that lets me know what's going on;
* the hardware ratecode, unmolested by conversion to a bitrate;
* the HAL rs_flags field, useful for debugging;
* specifically mark aggregate sub-frames.
This stuff sorely needs tidying up - it's missing some important
stuff (eg numdelims) and it would be nice to put the flags at the
beginning rather than at the end.
Tested:
* AR9380, STA mode, 2x2 HT40, monitoring RSSI and EVM values
I can 100% reliably trigger this on TID 1 traffic by using iperf -S 32
<client fields> to create traffic that maps to TID 1.
The reference driver doesn't do this check.
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.
The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
- VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
- VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
- VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
- VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
(in order to avoid visibility of implementation details)
- The read-mode operations are added:
VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
sys/mutex.h in consumers directly to cater its inlining functions
using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
the compat layer because the name clash between FreeBSD and solaris
versions must be avoided.
At this purpose zfs redefines the vm_object locking functions
directly, isolating the FreeBSD components in specific compat stubs.
The KPI results heavilly broken by this commit. Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).
Sponsored by: EMC / Isilon storage division
Reviewed by: jeff
Reviewed by: pjd (ZFS specific review)
Discussed with: alc
Tested by: pho
/usr/src/sys/modules/nvme/../../dev/nvme/nvme.c:211: warning: format '%qx' expects type 'long unsigned int', but argument 9 has type 'long long unsigned int' [-Wformat]
This fixes a build error at the same time (unused variable "item"), if
the kernel is compiled without INVARIANTS.
Reported by: Hartmann, O. <ohartman@zedat.fu-berlin.de> (build error)
Reviewed by: Konstantin Belousov (kib@)
The early commit is done to facilitate the off-tree work on the
porting of the Radeon driver.
Sponsored by: The FreeBSD Foundation
Debugged and tested by: dumbbell
MFC after: 1 month
When the VirtIO barrier feature is not negotiated, the driver
must enforce the proper ordering for BIO_ORDERED BIOs. All the
in-flight BIOs must complete before starting the BIO, and the
ordered BIO must complete before subsequent BIOs can start.
Also fix a few whitespace nits.
Reported by: neel
Approved by: grehan (mentor)
MFC after: 3 days
being compiled only when setting LEGACY_TX, this means you would
not get the drain when needed on detach!!
Thanks to Bryan Venteicher (bryanv@freebsd.org) for catching this
little gremlin!! :)
Fixes:
- flow control - don't override user value on re-init
- fix to make 1G optics work correctly
- change to interrupt enabling - some bits were incorrect
for certain hardware.
- certain stats fixes, remove a duplicate increment of
ierror, thanks to Scott Long for pointing these out.
- shared code link interface changed, requiring some
core code changes to accomodate this.
- add an m_adj() to ETHER_ALIGN on the recieve side, this
was requested by Mike Karels, thanks Mike.
- Multicast code corrections also thanks to Mike Karels.
Specify that wakeup rate of 7.5-10Hz is enough for yarrow harvesting
thread.
Sponsored by: Google Summer of Code 2012, iXsystems inc.
Tested by: flo, marius, ian, markj, Fabian Keil
- Switch syscons from timeout() to callout_reset_flags() and specify that
precision is not important there -- anything from 20 to 30Hz will be fine.
- Reduce syscons "refresh" rate to 1-2Hz when console is in graphics mode
and there is nothing to do except some polling for keyboard. Text mode
refresh would also be nice to have adaptive, but this change at least
should help laptop users who running X.
Sponsored by: Google Summer of Code 2012, iXsystems inc.
Tested by: flo, marius, ian, markj, Fabian Keil
devices. While at it, update the comment now that we know that MSI-X
doesn't work with ESXi 5.1 for Intel 82576 either and the underlying issue
is a bug in the MSI-X allocation code of the hypervisor.
Reported by: Harald Schmalzbauer
- Make the nomatch table const.
MFC after: 1 week