This implements the kernel glue needed (getc, putc, rxready).
This isn't a 16550 UART, even if the datasheet overview claims so.
The Linux ar933x support was used as a reference, however the uart code
is a reimplementation.
Attentive viewers will note that the uart code is based off of the ns8250
code and the UART bus code is a stubbed-out version of this. I'll be
replacing it with non-stubbed versions soon, making this a fully featured
driver.
Tested:
* AP121 reference board (AR933x), booting through the mountroot> prompt;
then doing some basic interactive tests in ddb.
0x3C: /* Per Intel document 325462-045US 01/2013. */
Add manpage to document all the goodness that is available in this
processor model.
Submitted by: hiren panchasara <hiren.panchasara@gmail.com>
Reviewed by: jimharris, sbruno
Obtained from: Yahoo! Inc.
MFC after: 2 weeks
later found to not be usable because the controller doesn't support the
same number of queues.
This is not the normal case, but does occur with the Chatham prototype
board.
Sponsored by: Intel
1. If we wanted to send exactly as many bytes as the socket buffer is
sized for, the inner loop of kern_sendfile() would see that the
socket is full before seeing that it had no more bytes left to send.
This would cause it to return EAGAIN to the caller instead of
success. Fix by changing the order that these conditions are tested.
2. Simplify the calculation for the bytes to send in each iteration of
the inner loop of kern_sendfile()
3. Fix some calls with bogus arguments to sf_buf_ext(). These would
only trigger on mbuf allocation failure, but would be hilariously
bad if they did trigger.
Submitted by: gibbs(3), andre(2)
Reviewed by: emax, andre
Obtained from: Netflix
MFC after: 1 week
unmapped I/O. That one exception is access to INQUIRY VPD request result.
Those requests are never unmapped now, but to be safe add respective check
there and allow unmapped I/O for the SIM by setting PIM_UNMAPPED flag.
before the vnode is vput() in vm_mmap_vnode(). Error return means
that there is no use reference on the vnode from the vm object
reference, and failing to restore v_writecount breaks the invariant
that v_writecount is less or equal to the usecount.
The situation observed when nfs client returns ESTALE for
VOP_GETATTR() after the open.
In collaboration with: pho
MFC after: 1 week
This was ported from the AR724x code and I think that also doesn't
quite work. I'll investigate that soon.
With this in place the system reset path works, so 'reset' from kdb
actually resets the SoC.
Tested:
* AP121 test board
Before this change they were just leaked. Fortunately USB sticks now use
only one CCB, and so leak was only 2KB per detach, while other bigger SIMs
with much more allocated CCBs are rarely detached.
MFC after: 2 weeks
This basically restores the spirit of r203535, which was partially reverted
in r205557, while we still map fixed amount to work around transient issues
we experienced with r203535.
Prodded by: avg
Tested by: avg
MFC after: 1 week
the thread reference on the vp->v_rdev and use the returned struct
cdev *dev instead of using vp->v_rdev. Call dev_strategy_csw()
instead of dev_strategy(), since we now own the reference.
Since the csw was already calculated, test d_flags to avoid mapping
the buffer if the driver supports unmapped requests [*].
Suggested by: kan [*]
Reviewed by: kan (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
but assumes that a thread reference was already obtained on the passed
device. Use the function from physio(), to avoid two extra dev_mtx
lock and unlock. Note that physio() is always used as the cdevsw
method, or is called from a cdevsw method, and the caller already owns
the reference.
dev_strategy() is left to keep KPI intact, but now it is implemented
as a wrapper around dev_strategy_csw().
Do some style cleanup in physio().
Requested and reviewed by: kan (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
maxbcache size fixed, the auto-tuned transient map is too small for
real-world load on i386.
Tested by: David Wolfskill
Sponsored by: The FreeBSD Foundation
buffer map size, auto-tuned on the 4GB machine. Having the maxbcache
bigger than the buffer map causes the transient bio map sizing logic
to assume that there is enough KVA to use approximately 90MB (buffer
map is sized to 110MB, and maxbcache is 200MB). The increase in the
KVA usage caused other big KVA consumers, like nvidia.ko, to fail the
initialization.
Change the definition for both PAE and non-PAE cases, since PAE is
even more KVA-starved.
Reported and tested by: David Wolfskill
Discussed with: alc
Sponsored by: The FreeBSD Foundation
which requires OVREF to be set to get proper playback volume, but which has
all zeroes in HDA controller subdevice IDs on PCI.
MFC after: 1 month
Sponsored by:
CPUs.
The AR933x is a mips24k based SoC with an AR9380 series SoC on board,
two gigabit ethernet interfaces and an internal 10/100mbit ethernet
switch. There's also the normal interfaces (USB, ethernet, uart, GPIO.)
The downside? There's a non-ns8250 UART device.
With a very basic UART driver (not in this commit) the SoC is initialised
and boots up. I'll commit the UART code soon and then link it into the
general setup path.
This code is a re-implementation based from the Linux kernel / openwrt
AR933x support.
TODO:
* UART (obviously)
* All of the ethernet, USB and wifi SoC glue, including ethernet PLL
programming.
data buffer for a ccb that is unmapped.
This case is currently not possible, since the SCI framework only
requests these pointers for doing SCSI/ATA translation of non-
READ/WRITE commands. The panic is more to protect against the
unlikely future scenario where additional commands could be unmapped.
Sponsored by: Intel
mechanism.
Now that all requests are timed, we are guaranteed to get a completion
notification, even if it is an abort status due to a timed out admin
command.
This has the effect of simplifying the controller and namespace setup
code, so that it reads straight through rather than broken up into
a bunch of different callback functions.
Sponsored by: Intel
Reviewed by: carl
start or reset. Also add a notifier for NVMe consumers for controller fail
conditions and plumb this notifier for nvd(4) to destroy the associated
GEOM disks when a failure occurs.
This requires a bit of work to cover the races when a consumer is sending
I/O requests to a controller that is transitioning to the failed state. To
help cover this condition, add a task to defer completion of I/Os submitted
to a failed controller, so that the consumer will still always receive its
completions in a different context than the submission.
Sponsored by: Intel
Reviewed by: carl
This is just as effective, and removes the need for a bunch of admin commands
to a controller that's going to be disabled shortly anyways.
Sponsored by: Intel
Reviewed by: carl
start process.
The spec indicates the OS driver should use Set Features (Software
Progress Marker) to set the pre-boot software load count to 0
after the OS driver has successfully been initialized. This allows
pre-boot software to determine if there have been any issues with the
OS loading.
Sponsored by: Intel
Reviewed by: carl
This flag was originally added to communicate to the sysctl code
which oids should be built, but there are easier ways to do this. This
needs to be cleaned up prior to adding new controller states - for example,
controller failure.
Sponsored by: Intel
Reviewed by: carl
The controller's IDENTIFY data contains MDTS (Max Data Transfer Size) to
allow the controller to specify the maximum I/O data transfer size. nvme(4)
already provides a default maximum, but make sure it does not exceed what
MDTS reports.
Sponsored by: Intel
Reviewed by: carl
that if a specific I/O repeatedly times out, we don't retry it indefinitely.
The default number of retries will be 4, but is adjusted using hw.nvme.retry_count.
Sponsored by: Intel
Reviewed by: carl
specified log page.
This satisfies the spec condition that future async events of the same type
will not be sent until the associated log page is fetched.
Sponsored by: Intel
Reviewed by: carl
NVMe error log entries include status, so breaking this out into
its own data structure allows it to be included in both the
nvme_completion data structure as well as error log entry data
structures.
While here, expose nvme_completion_is_error(), and change all of
the places that were explicitly looking at sc/sct bits to use this
macro instead.
Sponsored by: Intel
Reviewed by: carl
This protects against cases where a controller crashes with multiple
I/O outstanding, each timing out and requesting controller resets
simultaneously.
While here, remove a debugging printf from a previous commit, and add
more logging around I/O that need to be resubmitted after a controller
reset.
Sponsored by: Intel
Reviewed by: carl
While aborts are typically cleaner than a full controller reset, many times
an I/O timeout indicates other controller-level issues where aborts may not
work. NVMe drivers for other operating systems are also defaulting to
controller reset rather than aborts for timed out I/O.
Sponsored by: Intel
Reviewed by: carl
(Yes, the previous code temporarily broke EDMA TX. I'm sorry; I should've
actually setup ATH_BUF_FIFOEND on frames so txq->axq_fifo_depth was
cleared!)
This code implements a whole bunch of sorely needed EDMA TX improvements
along with CABQ TX support.
The specifics:
* When filling/refilling the FIFO, use the new TXQ staging queue
for FIFO frames
* Tag frames with ATH_BUF_FIFOPTR and ATH_BUF_FIFOEND correctly.
For now the non-CABQ transmit path pushes one frame into the TXQ
staging queue without setting up the intermediary link pointers
to chain them together, so draining frames from the txq staging
queue to the FIFO queue occurs AMPDU / MPDU at a time.
* In the CABQ case, manually tag the list with ATH_BUF_FIFOPTR and
ATH_BUF_FIFOEND so a chain of frames is pushed into the FIFO
at once.
* Now that frames are in a FIFO pending queue, we can top up the
FIFO after completing a single frame. This means we can keep
it filled rather than waiting for it drain and _then_ adding
more frames.
* The EDMA restart routine now walks the FIFO queue in the TXQ
rather than the pending queue and re-initialises the FIFO with
that.
* When restarting EDMA, we may have partially completed sending
a list. So stamp the first frame that we see in a list with
ATH_BUF_FIFOPTR and push _that_ into the hardware.
* When completing frames, only check those on the FIFO queue.
We should never ever queue frames from the pending queue
direct to the hardware, so there's no point in checking.
* Until I figure out what's going on, make sure if the TXSTATUS
for an empty queue pops up, complain loudly and continue.
This will stop the panics that people are seeing. I'll add
some code later which will assist in ensuring I'm populating
each descriptor with the correct queue ID.
* When considering whether to queue frames to the hardware queue
directly or software queue frames, make sure the depth of
the FIFO is taken into account now.
* When completing frames, tag them with ATH_BUF_BUSY if they're
not the final frame in a FIFO list. The same holding descriptor
behaviour is required when handling descriptors linked together
with a link pointer as the hardware will re-read the previous
descriptor to refresh the link pointer before contiuning.
* .. and if we complete the FIFO list (ie, the buffer has
ATH_BUF_FIFOEND set), then we don't need the holding buffer
any longer. Thus, free it.
Tested:
* AR9380/AR9580, STA and hostap
* AR9280, STA/hostap
TODO:
* I don't yet trust that the EDMA restart routine is totally correct
in all circumstances. I'll continue to thrash this out under heavy
multiple-TXQ traffic load and fix whatever pops up.
On any I/O timeout, check for csts.cfs==1. If set, the controller
is reporting fatal status and we reset the controller immediately,
rather than trying to abort the timed out command.
This changeset also includes deferring the controller start portion
of the reset to a separate task. This ensures we are always performing
a controller start operation from a consistent context.
Sponsored by: Intel
Reviewed by: carl
invoke it from nvmecontrol(8).
Controller reset will be performed in cases where I/O are repeatedly
timing out, the controller reports an unrecoverable condition, or
when explicitly requested via IOCTL or an nvme consumer. Since the
controller may be in such a state where it cannot even process queue
deletion requests, we will perform a controller reset without trying
to clean up anything on the controller first.
Sponsored by: Intel
Reviewed by: carl
Each set of frames pushed into a FIFO is represented by a list of
ath_bufs - the first ath_buf in the FIFO list is marked with
ATH_BUF_FIFOPTR; the last ath_buf in the FIFO list is marked with
ATH_BUF_FIFOEND.
Multiple lists of frames are just glued together in the TAILQ as per
normal - except that at the end of a FIFO list, the descriptor link
pointer will be NULL and it'll be tagged with ATH_BUF_FIFOEND.
For non-EDMA chipsets this is a no-op - the ath_txq frame list (axq_q)
stays the same and is treated the same.
For EDMA chipsets the frames are pushed into axq_q and then when
the FIFO is to be (re) filled, frames will be moved onto the FIFO
queue and then pushed into the FIFO.
So:
* Add a new queue in each hardware TXQ (ath_txq) for staging FIFO frame
lists. It's a TAILQ (like the normal hardware frame queue) rather than
the ath9k list-of-lists to represent FIFO entries.
* Add new ath_buf flags - ATH_TX_FIFOPTR and ATH_TX_FIFOEND.
* When allocating ath_buf entries, clear out the flag value before
returning it or it'll end up having stale flags.
* When cloning ath_buf entries, only clone ATH_BUF_MGMT. Don't clone
the FIFO related flags.
* Extend ath_tx_draintxq() to first drain the FIFO staging queue, _then_
drain the normal hardware queue.
Tested:
* AR9280, hostap
* AR9280, STA
* AR9380/AR9580 - hostap
TODO:
* Test on other chipsets, just to be thorough.
Also add logic to clean up all outstanding asynchronous event requests
when resetting or shutting down the controller, since these requests
will not be explicitly completed by the controller itself.
Sponsored by: Intel
function.
This allows for completions outside the normal completion path, for example
when an ABORT command fails due to the controller reporting the targeted
command does not exist. This is mainly for protection against a faulty
controller, but we need to clean up our internal request nonetheless.
Sponsored by: Intel
the submit action assuming the qpair lock has already been acquired.
Also change nvme_qpair_submit_request to just lock/unlock the mutex
around a call to this new function.
This fixes a recursive mutex acquisition in the retry path.
Sponsored by: Intel
using vm_radix_node_page() == NULL, the compiler is able to generate one
less conditional branch when vm_radix_isleaf() is used. More use cases
involving the inner loops of vm_radix_insert(), vm_radix_lookup{,_ge,_le}(),
and vm_radix_remove() will follow.
Reviewed by: attilio
Sponsored by: EMC / Isilon Storage Division
instead of axq_link.
This (among a bunch of uncommitted work) is required for EDMA chips
to correctly transmit frames on the CABQ.
Tested:
* AR9280, hostap mode
* AR9380/AR9580, hostap mode (staggered beacons)
TODO:
* This code only really gets called when burst beacons are used;
it glues multiple CABQ queues together when sending to the hardware.
* More thorough bursted beacon testing! (first requires some work with
the beacon queue code for bursted beacons, as that currently uses the
link pointer and will fail on EDMA chips.)
the descriptor link pointer, rather than directly.
This is needed on AR9380 and later (ie, EDMA) NICs so the multicast queue
has a chance in hell of being put together right.
Tested:
* AR9380, AR9580 in hostap mode, CABQ traffic (but with other patches..)
In physio, check if device can handle unmapped IO and pass an
appropriately mapped buffer to the driver strategy routine. The
only driver in the tree that can handle unmapped buffers is one
exposed by GEOM, so mark it as such with the new flag in the
driver cdevsw structure.
This fixes insta-panics on hosts, running dconschat, as /dev/fwmem
is an example of the driver that makes use of physio routine, but
bypasses the g_down thread, where the buffer gets mapped normally.
Discussed with: kib (earlier version)
Merge change from illumos:
1694 Add type-aware print() action
This is a very nice feature implemented in upstream Dtrace.
A complete description is available here:
http://dtrace.org/blogs/eschrock/2011/10/26/your-mdb-fell-into-my-dtrace/
This change bumps the DT_VERS_* number to 1.9.0 in
accordance to what is done in illumos.
While here also include some minor cleanups to ease further merging
and appease clang with a fix by Fabian Keil.
Illumos Revisions: 13501:c3a7090dbc16
13483:f413e6c5d297
Reference:
https://www.illumos.org/issues/1560https://www.illumos.org/issues/1694
Tested by: Fabian Keil
Obtained from: Illumos
MFC after: 1 month
Merge changes from illumos:
1451 DTrace needs toupper()/tolower() subroutines
1457 lltostr() D subroutine should take an optional base
This change bumps the DT_VERS_* number to 1.8.1 in
accordance to what is done in illumos.
The test suite we currently include is outdated and
doesnt support some updates in tst.subr.d which had to
be left out for now.
Illumos Revisions: r13458 5e394d8db762
r13459 c3454574dd1a
Reference:
https://www.illumos.org/issues/1451https://www.illumos.org/issues/1457
Tested by: Fabian Keil
Obtained from: Illumos
MFC after: 1 month
It is already done in SSIF interface code.
This reduces contention/spinning reported by many users.
PR: kern/172166
Submitted by: Eric van Gyzen <eric at vangyzen.net>
MFC after: 2 weeks
for migrating callouts to new CPU. This value is passed to
callout_cc_add() in order to update properly precision field in case of
rescheduling/migration.
Reviewed by: mav
extra read from PxCI/PxSACT registers. If only NCQ commands are running, we
don't really need PxCI. If only non-NCQ commands are running we don't need
PxSACT. Mixed set may happen only on controllers with FIS-based switching
when port multiplier is attached, and then we have to read both registers.
MFC after: 1 month
- Replace single done mutex with per-disk ones. On system with several
disks on several HBAs that removes small, but measurable lock congestion.
- Modify disk destruction process to not destroy the mutex prematurely.
- Remove some extra pointer derefences.
Merge change from illumos:
1455 DTrace tracemem() should take an optional size argument
Our local enhancements to dt_print_bytes were equivalent to
those in illumos but we made it match the illumos version
to ease further code merges.
For now leave out tst.smallsize.d and tst.smallsize.d.out
since those don't seem to work cleanly on FreeBSD.
This change bumps the DT_VERS_* number to 1.7.1 in accordance
to what is done in illumos.
Illumos Revision: 13457:571b0355c2e3
Reference:
https://www.illumos.org/issues/1455
Tested by: Fabian Keil
Obtained from: Illumos
MFC after: 1 month
that could never be reached in vm_radix_insert(). (If the pointer being
checked by the panic call were ever NULL, the immmediately preceding loop
would have already crashed on a NULL pointer dereference.)
Reviewed by: attilio (an earlier version)
Sponsored by: EMC / Isilon Storage Division
Use destroy_dev_sched_cb() to not wait for device destruction while holding
GEOM topology lock (that actually caused deadlock). Use request counting
protected by mutex to properly wait for outstanding requests completion in
cases of device closing and geom destruction. Unlike r227009, this code
does not block taskqueue thread for indefinite time, waiting for completion.
more topology change done that may require its attention. Add few missing
g_do_wither() calls in respective places to signal it.
This fixes potential infinite loop here when some provider is withered, but
still opened or connected for some reason and so can not be destroyed. For
example, see r227009 and r227510.
related issues.
Moving the TX locking under one lock made things easier to progress on
but it had one important side-effect - it increased the latency when
handling CABQ setup when sending beacons.
This commit introduces a bunch of new changes and a few unrelated changs
that are just easier to lump in here.
The aim is to have the CABQ locking separate from other locking.
The CABQ transmit path in the beacon process thus doesn't have to grab
the general TX lock, reducing lock contention/latency and making it
more likely that we'll make the beacon TX timing.
The second half of this commit is the CABQ related setup changes needed
for sane looking EDMA CABQ support. Right now the EDMA TX code naively
assumes that only one frame (MPDU or A-MPDU) is being pushed into each
FIFO slot. For the CABQ this isn't true - a whole list of frames is
being pushed in - and thus CABQ handling breaks very quickly.
The aim here is to setup the CABQ list and then push _that list_ to
the hardware for transmission. I can then extend the EDMA TX code
to stamp that list as being "one" FIFO entry (likely by tagging the
last buffer in that list as "FIFO END") so the EDMA TX completion code
correctly tracks things.
Major:
* Migrate the per-TXQ add/removal locking back to per-TXQ, rather than
a single lock.
* Leave the software queue side of things under the ATH_TX_LOCK lock,
(continuing) to serialise things as they are.
* Add a new function which is called whenever there's a beacon miss,
to print out some debugging. This is primarily designed to help
me figure out if the beacon miss events are due to a noisy environment,
issues with the PHY/MAC, or other.
* Move the CABQ setup/enable to occur _after_ all the VAPs have been
looked at. This means that for multiple VAPS in bursted mode, the
CABQ gets primed once all VAPs are checked, rather than being primed
on the first VAP and then having frames appended after this.
Minor:
* Add a (disabled) twiddle to let me enable/disable cabq traffic.
It's primarily there to let me easily debug what's going on with beacon
and CABQ setup/traffic; there's some DMA engine hangs which I'm finally
trying to trace down.
* Clear bf_next when flushing frames; it should quieten some warnings
that show up when a node goes away.
Tested:
* AR9280, STA/hostap, up to 4 vaps (staggered)
* AR5416, STA/hostap, up to 4 vaps (staggered)
TODO:
* (Lots) more AR9380 and later testing, as I may have missed something here.
* Leverage this to fix CABQ hanling for AR9380 and later chips.
* Force bursted beaconing on the chips that default to staggered beacons and
ensure the CABQ stuff is all sane (eg, the MORE bits that aren't being
correctly set when chaining descriptors.)
to stuck beacons.
* Set the cabq readytime (ie, how long to burst for) to 50% of the total
beacon interval time
* fix the cabq adjustment calculation based on how the beacon offset is
calculated (the SWBA/DBA time offset.)
This is all still a bit magic voodoo but it does seem to have further
quietened issues with missed/stuck beacons under my local testing.
In any case, it better matches what the reference HAL implements.
Obtained from: Qualcomm Atheros
held. The ttm_buffer_object_transfer() does not need the mutex locked
at all, except for the call to the driver sync_obj_ref() method.
Reported and tested by: dumbbell
MFC after: 2 weeks