Commit Graph

96696 Commits

Author SHA1 Message Date
kib
7ada0d9324 Strip the unnneeded spaces, mostly at the end of lines.
MFC after:	3 days
2013-04-01 09:56:48 +00:00
ian
5b38501da6 Fix low-level uart drivers that set their fifo sizes in the softc too late.
uart(4) allocates send and receiver buffers in attach() before it calls
the low-level driver's attach routine.  Many low-level drivers set the
fifo sizes in their attach routine, which is too late.  Other drivers set
them in the probe() routine, so that they're available when uart(4)
allocates buffers.  This fixes the ones that were setting the values too
late by moving the code to probe().
2013-04-01 00:44:20 +00:00
ian
720da1df6b Enable hardware flow control and high speed bulk data transfer in at91 uarts.
Changes to make rtc/cts flow control work...

This does not turn on the builtin hardware flow control on the SoC's usart
device, because that doesn't work on uart1 due to a chip erratum (they
forgot to wire up pin PA21 to RTS0 internally).  Instead it uses the
hardware flow control logic where the tty layer calls the driver to assert
and de-assert the flow control lines as needed.  This prevents overruns at
the tty layer (app doesn't read fast enough), but does nothing for overruns
at the driver layer (interrupts not serviced fast enough).

To work around the wiring problem with RTS0, the driver reassigns that pin
as a GPIO and controls it manually.  It only does so if given permission via
hint.uart.1.use_rts0_workaround=1, to prevent accidentally driving the pin
if uart1 is used without flow control (because something not related to
serial IO could be wired to that pin).

In addition to the RTS0 workaround, driver changes were needed in the area
of reading the current set of DCE signals.  A priming read is now done at
attach() time, and the interrupt routine now sets SER_INT_SIGCHG when any
of the DCE signals change.  Without these changes, nothing could ever be
transmitted, because the tty layer thought CTS was de-asserted (when in fact
we had just never read the status register, and the hwsig variable was
init'd to CTS de-asserted).

Changes to support bulk high-speed (230kbps and higher) data reception...

Allow the receive fifo size to be tuned with hint.uart.<dev>.fifo_bytes.
For high speed receive, a fifo size of 1024 works well.  The default is
still 128 bytes if no hint is provided.  Using a value larger than 384
requires a change in dev/uart/uart_core.c to size the intermediate
buffer as MAX(384, 3*sc->sc_rxfifosize).

Recalculate the receive timeout whenever the baud rate changes.  At low
baud rates (19.2kbps and below) the timeout is the number of bits in 2
characters.  At higher speed it's calculated to be 500 microseconds
worth of bits.  The idea is to compromise between being responsive in
interactive situations and not timing out prematurely during a brief
pause in bulk data flow.  The old fixed timeout of 1.5 characters was
just 32 microseconds at 460kbps.

At interrupt time, check for receiver holding register overrun status
and set the corresponding status bit in the return value.

When handling a buffer overrun, get a single buffer emptied and handed
back to the hardware as quickly as possible, then deal with the second
buffer.  This at least minimizes data loss compared to the old logic
that fully processed both buffers before restarting the hardware.

Rewrite the logic for handling buffers after a receive timeout.  The
original author speculated in a comment that there may be a race with
high speed data.  There was, although it was rare.  The code now handles
all three possible scenarios on receive timeout: two empty buffers, one
empty and one partial buffer, or one full and one partial buffer.

Reviewed by:	imp
2013-04-01 00:00:10 +00:00
ian
2b56a5ce50 Accommodate uart devices with large FIFOs (or DMA buffers which amount
to the same thing) by allocating the uart(4) rx buffer based on the
device's rxfifosz rather than using a hard-coded size of 384 bytes.

The  historical 384 byte size is 3 times the largest hard-coded fifo
size in the tree, so use that ratio as a guide and allocate the buffer
as three times rxfifosz, but never smaller than the historical size.
2013-03-31 23:24:04 +00:00
ian
f370876f88 When running on armv6, set alignment checking to modulo-4 mode rather
than modulo-8, because clang emits ldrd and strd instructions for
addresses that are only 4-byte aligned
2013-03-31 22:43:16 +00:00
ian
fea6d14816 When running on armv6, set alignment checking to modulo-4 mode rather
than modulo-8, because clang emits ldrd and strd instructions for
addresses that are only 4-byte aligned.
2013-03-31 22:42:25 +00:00
tuexen
c19a395776 Add a macro for checking for IPv4 link local addresses.
MFC after: 1 week
2013-03-31 18:27:46 +00:00
jilles
9d8a3c5c3b Rename do_pipe() to kern_pipe2() and declare it properly. 2013-03-31 17:42:54 +00:00
ian
1f73a954db Fix a typo in the CF device driver name that prevented instantiation. 2013-03-31 12:51:56 +00:00
neel
39fb54304e Add counter to keep track of the number of timer interrupts generated by
the local apic for each virtual cpu.
2013-03-31 03:56:48 +00:00
neel
9282fffb30 Add some more stats to keep track of all the reasons that a vcpu is exiting. 2013-03-30 17:46:03 +00:00
kientzle
01d37a7164 Initialize sym_count to 0.
This fixes a compiler warning introduced in r248121.
2013-03-30 16:33:16 +00:00
mdf
4c77a4b020 Use a shared lock for VOP_GETEXTATTR, as it is a read-like operation.
MFC after:	1 week
2013-03-30 15:09:04 +00:00
jilles
c0fce542aa Improve namespacing in <sys/socket.h>:
* MSG_NOSIGNAL is in POSIX.1-2008.
 * MSG_NOTIFICATION (SCTP) is not in POSIX.
 * PRU_FLUSH_* (SCTP) are not in POSIX.
 * bindat()/connectat() are not in POSIX.

Discussed with:	rrs (PRU_FLUSH_*)
2013-03-30 13:30:27 +00:00
adrian
48f29e1c7d AR933x CPU device improvements:
* Add baud rate and divisor programming code. See below for more
  information.

* Flesh out ar933x_init() to disable interrupts and program the initial
  console setup.

* Remove #if 0'ed code from ar933x_term().

* Explain what these functions do.

Now, the baud rate and divisor code comes from Linux, as a submission
to the OpenWRT project and Linux kernel from
Gabor Juhos <juhosg@openwrt.org>.

The original ticket for this code is https://dev.openwrt.org/ticket/12031 .

I've contacted Gabor and asked for his permission to also licence the patch
in question (which covers this code) to BSD lience and he's agreed.
Hence why I'm including it here in FreeBSD.

Tested:

* AP121 (AR9330)
2013-03-30 04:31:29 +00:00
adrian
b3c007b431 AR933x UART updates:
* Default clock is 25MHz;
* Remove the UART register macro here - it's not needed as we don't need
  to "adjust" the register offset / spacing at all;
* Remove unused fields in the softc.

Tested:

* AP121
2013-03-30 04:13:47 +00:00
np
3c60e22da7 cxgbe(4): Add support for Chelsio's Terminator 5 (aka T5) ASIC. This
includes support for the NIC and TOE features of the 40G, 10G, and
1G/100M cards based on the T5.

The ASIC is mostly backward compatible with the Terminator 4 so cxgbe(4)
has been updated instead of writing a brand new driver.  T5 cards will
show up as cxl (short for cxlgb) ports attached to the t5nex bus driver.

Sponsored by:	Chelsio
2013-03-30 02:26:20 +00:00
smh
fc36a232b9 Adds the ability to enable / disable sorting of BIO requests queued within
CAM. This can significantly improve performance particularly for SSDs
which don't suffer from seek latencies.

The sysctl / tunable kern.cam.sort_io_queues provides the systems default
setting where:-
0 = queued BIOs are NOT sorted
1 = queued BIOs are sorted (default)

Each device gets its own sysctl kern.cam.<type>.<id>.sort_io_queue
Valid values are:-
-1 = use system default (default)
0 = queued BIOs are NOT sorted
1 = queued BIOs are sorted

Note: Additional patch will look to add automatic use of none sorted queues
for none rotating media e.g. SSD's

Reviewed by:	scottl
Approved by:	pjd (mentor)
MFC after:	2 weeks
2013-03-29 22:58:15 +00:00
emaste
35fe5219ab Keep fwd_tag around for subsequent pcb lookups
For TIMEWAIT handling tcp_input may have to jump back for an additional
pass through pcblookup.  Prior to this change the fwd_tag had been
discarded after the first lookup, so a new connection attempt delivered
locally via 'ipfw fwd' would fail to find a match.

As of r248886 the tag will be detached and freed when passed to the
socket buffer.
2013-03-29 20:51:44 +00:00
jimharris
2128397eaf Add "type" to nvme_request, signifying if its payload is a VADDR, UIO, or
NULL. This simplifies decisions around if/how requests are routed through
busdma.  It also paves the way for supporting unmapped bios.

Sponsored by:	Intel
2013-03-29 20:34:28 +00:00
adrian
d5c182905b Disable this; it's a local option that I haven't yet committed to -HEAD. 2013-03-29 20:07:51 +00:00
ian
f7d5fae7ec Add userland access to at91 gpio functionality via ioctl calls. Also,
add the ability for userland to be notified of changes on gpio pins via
a select(2)/read(2) interface.

Change the interrupt handler from filtered to threaded.

Because of the uiomove() calls in the new interface, change locking from
standard mutex to sx.

Add / restore the at91_gpio_high_z() function.

Reviewed by:	imp (long ago)
2013-03-29 19:52:57 +00:00
ian
581bf19e7b Change the API for at91_pio_gpio_get() to return the entire masked set
of bits, not just a 0/1 indicating whether any of the masked bits are on.
This is compatible with the single in-tree caller of this function right now
(at91_vbus_poll() in dev/usb/controller/at91dci_atemelarm.c).
2013-03-29 19:04:18 +00:00
ian
899ea162b1 Call soc_info.soc_data->soc_clock_init() before at91_pmc_init_clock(), so
that the latter correctly fills in the clock data structures based on
proper hardware-specific shift and mask values from the soc_data structure.
2013-03-29 18:47:08 +00:00
jfv
d3b1490c73 Change the define in the header to eliminate unnecessary data
when using LEGACY TX.
2013-03-29 18:46:13 +00:00
ian
6321d72f7d Add a couple forward declarations, so that board support routines don't have
to pre-include a bunch of header files they don't need just to use this one.
2013-03-29 18:43:10 +00:00
jfv
a4c26abd40 Change defines in the igb driver to allow an easier selection of
the older if_start/non-multiqueue interface from the stack. This
is not the default, but can be turned on in the Makefile now regardless
of the OS level to allow either testing or use of ALTQ.

MFC after: one week
2013-03-29 18:25:45 +00:00
ian
10e285049e Redo the workaround for at91rm9200 erratum #26 in a way that doesn't
cause a lockup on some rm92 hardware.
2013-03-29 18:17:51 +00:00
ian
2d18442bf8 Fix a typo: the RXD0 pin is PA18, not PA19. 2013-03-29 18:06:54 +00:00
jfv
5867d4acf9 Two small fixes:
Set promiscuous code was unconditionally turning off multicast when
  turning off promiscuous mode, this should only be done when there are
  less than MAX groups. Thanks to Mike Karels for this correction.

  Second, the overtmp interrupt setup/detection was wrong, correcting it.

MFC after:	one week
2013-03-29 18:03:00 +00:00
ian
50331a7e74 Remove a really noisy printf left over from debugging hardware errata. 2013-03-29 17:57:24 +00:00
jimharris
f59af79144 Add bus_dmamap_load_bio for non-CAM disk drivers that wish to enable
unmapped I/O.

Sponsored by:	Intel
Reviewed by:	kib
2013-03-29 16:26:25 +00:00
jimharris
5febbe1181 Add CTR5() to bus_dmamap_load_ccb, similar to other bus_dmamap_load_*
functions.

Sponsored by:	Intel
2013-03-29 16:00:16 +00:00
jimharris
60c7cceb4c Do not add 1 to nsegs before passing to CTR5(), since nsegs
has already been incremented before these calls.

Sponsored by:	Intel
2013-03-29 15:54:12 +00:00
jimharris
7e64e1827b Pass correct parameter to CTR5() in bus_dmamap_load_uio.
Sponsored by:	Intel
2013-03-29 15:51:45 +00:00
glebius
11f04943de Fix bug in m_split() in a case when split len matches len of the
first mbuf, and the first mbuf is M_PKTHDR.

PR:		kern/176144
Submitted by:	Jacques Fourie <jacques.fourie gmail.com>
2013-03-29 14:10:40 +00:00
glebius
06ecb1b7ca Once ng_ksocket(4) is fixed, re-apply r194662. See this revision for
longer description.

Discussed with:	andre, rwatson
Sponsored by:	Nginx, Inc.
2013-03-29 14:06:04 +00:00
glebius
ffd07149de Revamp mbuf handling in ng_ksocket_incoming2():
- Clear code that workarounded a bug in FreeBSD 3,
  and even predated import of netgraph(4).
- Clear workaround for m_nextpkt pointing into
  next record in buffer (fixed in r248884).
  Assert that m_nextpkt is clear.
- Do not rely on SOCK_STREAM sockets containing
  M_PKTHDR mbufs. Create a header ourselves and
  attach chain to it. This is correct fix for
  kern/154676.

PR:		kern/154676
Sponsored by:	Nginx, Inc
2013-03-29 14:04:26 +00:00
glebius
1bccb6e916 When soreceive_generic() hands off an mbuf from buffer,
clear its pointer to next record, since next record
belongs to the buffer, and shouldn't be leaked.

The ng_ksocket(4) used to clear this pointer itself,
but the correct place is here.

Sponsored by:	Nginx, Inc
2013-03-29 13:57:55 +00:00
glebius
f412a6fab2 Whitespace. 2013-03-29 13:53:14 +00:00
glebius
ee9bd064bb Non-functional cleanup of ng_ksocket_incoming2(). 2013-03-29 13:51:01 +00:00
marius
4f749bdfea Unbreak compilation after r248868. 2013-03-29 11:53:20 +00:00
mav
af8596b2cc Make pre-shutdown flush and spindown routines to not use xpt_polled_action(),
but execute the commands in regular way.  There is no any reason to cook CPU
while the system is still fully operational.  After this change polling in
CAM is used only for kernel dumping.
2013-03-29 08:33:18 +00:00
mav
28542ab068 Implement CAM_PERIPH_FOREACH() macro, safely iterating over the list of
driver's periphs, acquiring and releaseing periph references while doing it.

Use it to iterate over the lists of ada and da periphs when flushing caches
and putting devices to sleep on shutdown and suspend.  Previous code could
panic in theory if some device disappear in the middle of the process.
2013-03-29 07:50:47 +00:00
adrian
becc1654f8 For the AR933x UART, the serial clock is not the AHB clock, it's the
reference clock.  So use that instead.
2013-03-29 06:32:39 +00:00
adrian
80efba7745 * Fix clock register definitions
* Add maximum clock register values
2013-03-29 06:32:02 +00:00
adrian
976eabcf21 Print out the platform reference frequency.
This is useful for AR933x platforms where that matters.
2013-03-29 06:31:31 +00:00
neel
29b9bf0372 Allow caller to skip 'guest linear address' validation when doing instruction
decode. This is to accomodate hardware assist implementations that do not
provide the 'guest linear address' as part of nested page fault collateral.

Submitted by:	Anish Gupta (akgupt3 at gmail dot com)
2013-03-28 21:26:19 +00:00
adrian
2380294328 Initial (unfinished!) AR933x support. 2013-03-28 20:48:58 +00:00
markj
a688a70512 Ignore interface renames instead of removing the interface from the bridge
group.

Reviewed by:	rstone
Approved by:	rstone (co-mentor)
Sponsored by:	Sandvine Incorporated
MFC after:	1 week
2013-03-28 20:37:07 +00:00
adrian
4a2838ceff Tie in the AR933x support into -HEAD. 2013-03-28 19:30:56 +00:00
adrian
7eebeb2d9d Bring over the initial, CPU-only UART support for the AR933x SoC.
This implements the kernel glue needed (getc, putc, rxready).

This isn't a 16550 UART, even if the datasheet overview claims so.

The Linux ar933x support was used as a reference, however the uart code
is a reimplementation.

Attentive viewers will note that the uart code is based off of the ns8250
code and the UART bus code is a stubbed-out version of this.  I'll be
replacing it with non-stubbed versions soon, making this a fully featured
driver.

Tested:

* AP121 reference board (AR933x), booting through the mountroot> prompt;
  then doing some basic interactive tests in ddb.
2013-03-28 19:27:06 +00:00
sbruno
18d941830f Update hwpmc to support Haswell class processors.
0x3C:      /* Per Intel document 325462-045US 01/2013. */

Add manpage to document all the goodness that is available in this
processor model.

Submitted by:	hiren panchasara <hiren.panchasara@gmail.com>
Reviewed by:	jimharris, sbruno
Obtained from:	Yahoo! Inc.
MFC after:	2 weeks
2013-03-28 19:15:54 +00:00
jimharris
536305e233 Remove obsolete comment. This code has now been tested with the QEMU
NVMe device emulator.
2013-03-28 16:57:48 +00:00
jimharris
86f7620634 Delete extra IO qpairs allocated based on number of MSI-X vectors, but
later found to not be usable because the controller doesn't support the
same number of queues.

This is not the normal case, but does occur with the Chatham prototype
board.

Sponsored by:	Intel
2013-03-28 16:54:19 +00:00
scottl
84ae5b84bb Several fixes and improvements to sendfile()
1.  If we wanted to send exactly as many bytes as the socket buffer is
    sized for, the inner loop of kern_sendfile() would see that the
    socket is full before seeing that it had no more bytes left to send.
    This would cause it to return EAGAIN to the caller instead of
    success.  Fix by changing the order that these conditions are tested.
2.  Simplify the calculation for the bytes to send in each iteration of
    the inner loop of kern_sendfile()
3.  Fix some calls with bogus arguments to sf_buf_ext().  These would
    only trigger on mbuf allocation failure, but would be hilariously
    bad if they did trigger.

Submitted by:	gibbs(3), andre(2)
Reviewed by:	emax, andre
Obtained from:	Netflix
MFC after:	1 week
2013-03-28 14:14:28 +00:00
sbruno
79d409e127 Restore DB_COMMAND capabilities of ciss(4) for debugging and diagnostics
Obtained from:	Yahoo! Inc.
MFC after:	2 weeks
2013-03-28 12:44:43 +00:00
mav
674a0b97f5 Except one case mps(4) driver does not touch the data and works well with
unmapped I/O.  That one exception is access to INQUIRY VPD request result.
Those requests are never unmapped now, but to be safe add respective check
there and allow unmapped I/O for the SIM by setting PIM_UNMAPPED flag.
2013-03-28 11:24:30 +00:00
sbruno
e2bc9dfbfd Fix compile of ciss(4) with CISS_DEBUG defined
Obtained from:	Yahoo! Inc.
MFC after:	2 weeks
2013-03-28 11:00:41 +00:00
kib
7b210bf144 Release the v_writecount reference on the vnode in case of error,
before the vnode is vput() in vm_mmap_vnode().  Error return means
that there is no use reference on the vnode from the vm object
reference, and failing to restore v_writecount breaks the invariant
that v_writecount is less or equal to the usecount.

The situation observed when nfs client returns ESTALE for
VOP_GETATTR() after the open.

In collaboration with:	pho
MFC after:	1 week
2013-03-28 06:39:27 +00:00
adrian
a1961a79d2 Fix the AR933x platform device start/stop code.
This was ported from the AR724x code and I think that also doesn't
quite work.  I'll investigate that soon.

With this in place the system reset path works, so 'reset' from kdb
actually resets the SoC.

Tested:

* AP121 test board
2013-03-28 05:43:03 +00:00
jimharris
6ed2dc4d7c deferal -> deferral 2013-03-27 23:07:43 +00:00
mav
368b37ab92 On SIM destruction free associated CCBs, preallocated inside xpt_get_ccb().
Before this change they were just leaked.  Fortunately USB sticks now use
only one CCB, and so leak was only 2KB per detach, while other bigger SIMs
with much more allocated CCBs are rarely detached.

MFC after:	2 weeks
2013-03-27 18:55:01 +00:00
jkim
225dc03864 Limit the amount of video memory we map for the driver to the maximum value.
This basically restores the spirit of r203535, which was partially reverted
in r205557, while we still map fixed amount to work around transient issues
we experienced with r203535.

Prodded by:	avg
Tested by:	avg
MFC after:	1 week
2013-03-27 18:06:28 +00:00
kib
c45e5da903 Fix a race with the vnode reclamation in the aio_qphysio(). Obtain
the thread reference on the vp->v_rdev and use the returned struct
cdev *dev instead of using vp->v_rdev.  Call dev_strategy_csw()
instead of dev_strategy(), since we now own the reference.

Since the csw was already calculated, test d_flags to avoid mapping
the buffer if the driver supports unmapped requests [*].

Suggested by:	kan [*]
Reviewed by:	kan (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2013-03-27 11:47:52 +00:00
kib
448e7c1290 Add dev_strategy_csw() function, which is similar to dev_strategy()
but assumes that a thread reference was already obtained on the passed
device.  Use the function from physio(), to avoid two extra dev_mtx
lock and unlock.  Note that physio() is always used as the cdevsw
method, or is called from a cdevsw method, and the caller already owns
the reference.

dev_strategy() is left to keep KPI intact, but now it is implemented
as a wrapper around dev_strategy_csw().

Do some style cleanup in physio().

Requested and reviewed by:	kan (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2013-03-27 11:34:27 +00:00
kib
df3795022f On i386, double the default size of the bio transient map. With the
maxbcache size fixed, the auto-tuned transient map is too small for
real-world load on i386.

Tested by:	David Wolfskill
Sponsored by:	The FreeBSD Foundation
2013-03-27 10:56:15 +00:00
kib
2c9ac98d06 Fix the VM_BCACHE_SIZE_MAX definition on i386 to match the maximal
buffer map size, auto-tuned on the 4GB machine.  Having the maxbcache
bigger than the buffer map causes the transient bio map sizing logic
to assume that there is enough KVA to use approximately 90MB (buffer
map is sized to 110MB, and maxbcache is 200MB).  The increase in the
KVA usage caused other big KVA consumers, like nvidia.ko, to fail the
initialization.

Change the definition for both PAE and non-PAE cases, since PAE is
even more KVA-starved.

Reported and tested by:	David Wolfskill
Discussed with:	alc
Sponsored by:	The FreeBSD Foundation
2013-03-27 10:52:18 +00:00
mav
e3a102cae6 Add Subsystem ID field to the quirk table. Use it to identify Mac Pro 1,1,
which requires OVREF to be set to get proper playback volume, but which has
all zeroes in HDA controller subdevice IDs on PCI.

MFC after:	1 month
Sponsored by:
2013-03-27 07:30:08 +00:00
adrian
938c374b23 Commit initial (unfinished!) support for the AR933x series of embedded
CPUs.

The AR933x is a mips24k based SoC with an AR9380 series SoC on board,
two gigabit ethernet interfaces and an internal 10/100mbit ethernet
switch.  There's also the normal interfaces (USB, ethernet, uart, GPIO.)

The downside? There's a non-ns8250 UART device.

With a very basic UART driver (not in this commit) the SoC is initialised
and boots up.  I'll commit the UART code soon and then link it into the
general setup path.

This code is a re-implementation based from the Linux kernel / openwrt
AR933x support.

TODO:

* UART (obviously)
* All of the ethernet, USB and wifi SoC glue, including ethernet PLL
  programming.
2013-03-27 03:38:58 +00:00
adrian
a5e3a9bbda Add the reference clock for each supported chip.
Obtained from:	Linux (openwrt)
2013-03-27 03:33:19 +00:00
jimharris
2255407cf0 Fix printf format issue on i386.
Reported by:	bz
2013-03-27 00:37:00 +00:00
adrian
77e2dba197 * Stop processing after HAL_EIO; this is what the reference driver does.
* If we hit an empty queue condition (which I haven't yet root caused, grr.)
  .. make sure we release the lock before continuing.
2013-03-27 00:35:45 +00:00
jimharris
e86d6338eb Panic should the SCI framework ever request a pointer into the ccb's
data buffer for a ccb that is unmapped.

This case is currently not possible, since the SCI framework only
requests these pointers for doing SCSI/ATA translation of non-
READ/WRITE commands.  The panic is more to protect against the
unlikely future scenario where additional commands could be unmapped.

Sponsored by:	Intel
2013-03-27 00:15:22 +00:00
jimharris
ea5eeccdf1 Report support for unmapped I/O by adding PIM_UNMAPPED flag.
Submitted by:	jhb, scottl
2013-03-26 23:04:06 +00:00
jimharris
52767ea66d Clean up debug prints.
1) Consistently use device_printf.
2) Make dump_completion and dump_command into something more
    human-readable.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 22:17:10 +00:00
jimharris
8f8689b1b6 Move common code from the different nvme_allocate_request functions into a
separate function.

Sponsored by:	Intel
Suggested by:	carl
Reviewed by:	carl
2013-03-26 22:13:07 +00:00
jimharris
61a3cd77cc Change a number of malloc(9) calls to use M_WAITOK instead of
M_NOWAIT.

Sponsored by:	Intel
Suggested by:	carl
Reviewed by:	carl
2013-03-26 22:11:34 +00:00
jimharris
5242be57d3 Replace usages of mtx_pool_find used for admin commands with a polling
mechanism.

Now that all requests are timed, we are guaranteed to get a completion
notification, even if it is an abort status due to a timed out admin
command.

This has the effect of simplifying the controller and namespace setup
code, so that it reads straight through rather than broken up into
a bunch of different callback functions.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 22:09:51 +00:00
jimharris
ff567ee3e1 Abort and do not retry any outstanding admin commands left over after
a controller reset.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 22:06:05 +00:00
jimharris
69d2e13801 Add the ability to internally mark a controller as failed, if it is unable to
start or reset.  Also add a notifier for NVMe consumers for controller fail
conditions and plumb this notifier for nvd(4) to destroy the associated
GEOM disks when a failure occurs.

This requires a bit of work to cover the races when a consumer is sending
I/O requests to a controller that is transitioning to the failed state.  To
help cover this condition, add a task to defer completion of I/Os submitted
to a failed controller, so that the consumer will still always receive its
completions in a different context than the submission.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:58:38 +00:00
jimharris
de155eb698 Just disable the controller instead of deleting IO queues during detach.
This is just as effective, and removes the need for a bunch of admin commands
to a controller that's going to be disabled shortly anyways.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:48:41 +00:00
jimharris
aa210ff37b Have nvd(4) register for controller notifications.
Also have nvd maintain controller/namespace relationships internally.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:45:37 +00:00
jimharris
89ce8fee13 Set Pre-boot Software Load Count to 0 at the end of the controller
start process.

The spec indicates the OS driver should use Set Features (Software
Progress Marker) to set the pre-boot software load count to 0
after the OS driver has successfully been initialized.  This allows
pre-boot software to determine if there have been any issues with the
OS loading.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:42:53 +00:00
jimharris
63beb43e5f Remove the is_started flag from struct nvme_controller.
This flag was originally added to communicate to the sysctl code
which oids should be built, but there are easier ways to do this.  This
needs to be cleaned up prior to adding new controller states - for example,
controller failure.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:19:26 +00:00
jimharris
d207d40160 Ensure the controller's MDTS is accounted for in max_xfer_size.
The controller's IDENTIFY data contains MDTS (Max Data Transfer Size) to
allow the controller to specify the maximum I/O data transfer size.  nvme(4)
already provides a default maximum, but make sure it does not exceed what
MDTS reports.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:16:53 +00:00
jimharris
21ee92ac4f Cap the number of retry attempts to a configurable number. This ensures
that if a specific I/O repeatedly times out, we don't retry it indefinitely.

The default number of retries will be 4, but is adjusted using hw.nvme.retry_count.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:14:51 +00:00
jimharris
18a3a60fb4 Pass associated log page data to async event consumers, if requested.
Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:08:32 +00:00
jimharris
894007a2dc When an asynchronous event request is completed, automatically fetch the
specified log page.

This satisfies the spec condition that future async events of the same type
will not be sent until the associated log page is fetched.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:05:15 +00:00
jimharris
79d7c4eec2 Add structure definitions and controller command function for firmware
log pages.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:03:03 +00:00
jimharris
de4e1d0695 Add structure definitions and a controller command function for
error log pages.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:01:53 +00:00
jimharris
3c0b8367a2 Create struct nvme_status.
NVMe error log entries include status, so breaking this out into
its own data structure allows it to be included in both the
nvme_completion data structure as well as error log entry data
structures.

While here, expose nvme_completion_is_error(), and change all of
the places that were explicitly looking at sc/sct bits to use this
macro instead.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:00:18 +00:00
jimharris
d0a775e794 Make nvme_ctrlr_reset a nop if a reset is already in progress.
This protects against cases where a controller crashes with multiple
I/O outstanding, each timing out and requesting controller resets
simultaneously.

While here, remove a debugging printf from a previous commit, and add
more logging around I/O that need to be resubmitted after a controller
reset.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 20:56:58 +00:00
jimharris
b7f7338cc5 By default, always escalate to controller reset when an I/O times out.
While aborts are typically cleaner than a full controller reset, many times
an I/O timeout indicates other controller-level issues where aborts may not
work.  NVMe drivers for other operating systems are also defaulting to
controller reset rather than aborts for timed out I/O.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 20:32:57 +00:00
pfg
996e9d6f74 Dtrace: dtrace.c erroneously checks for memory alignment on amd64.
Merge change from illumos:

3511 dtrace.c erroneously checks for memory alignment on amd64

Illumos Revision:	c93cc65

Reference:
https://www.illumos.org/issues/3511

Obtained from:	Illumos
MFC after:	3 weeks
2013-03-26 20:17:08 +00:00
adrian
e84569c7e8 Implement the replacement EDMA FIFO code.
(Yes, the previous code temporarily broke EDMA TX. I'm sorry; I should've
actually setup ATH_BUF_FIFOEND on frames so txq->axq_fifo_depth was
cleared!)

This code implements a whole bunch of sorely needed EDMA TX improvements
along with CABQ TX support.

The specifics:

* When filling/refilling the FIFO, use the new TXQ staging queue
  for FIFO frames

* Tag frames with ATH_BUF_FIFOPTR and ATH_BUF_FIFOEND correctly.
  For now the non-CABQ transmit path pushes one frame into the TXQ
  staging queue without setting up the intermediary link pointers
  to chain them together, so draining frames from the txq staging
  queue to the FIFO queue occurs AMPDU / MPDU at a time.

* In the CABQ case, manually tag the list with ATH_BUF_FIFOPTR and
  ATH_BUF_FIFOEND so a chain of frames is pushed into the FIFO
  at once.

* Now that frames are in a FIFO pending queue, we can top up the
  FIFO after completing a single frame.  This means we can keep
  it filled rather than waiting for it drain and _then_ adding
  more frames.

* The EDMA restart routine now walks the FIFO queue in the TXQ
  rather than the pending queue and re-initialises the FIFO with
  that.

* When restarting EDMA, we may have partially completed sending
  a list.  So stamp the first frame that we see in a list with
  ATH_BUF_FIFOPTR and push _that_ into the hardware.

* When completing frames, only check those on the FIFO queue.
  We should never ever queue frames from the pending queue
  direct to the hardware, so there's no point in checking.

* Until I figure out what's going on, make sure if the TXSTATUS
  for an empty queue pops up, complain loudly and continue.
  This will stop the panics that people are seeing.  I'll add
  some code later which will assist in ensuring I'm populating
  each descriptor with the correct queue ID.

* When considering whether to queue frames to the hardware queue
  directly or software queue frames, make sure the depth of
  the FIFO is taken into account now.

* When completing frames, tag them with ATH_BUF_BUSY if they're
  not the final frame in a FIFO list.  The same holding descriptor
  behaviour is required when handling descriptors linked together
  with a link pointer as the hardware will re-read the previous
  descriptor to refresh the link pointer before contiuning.

* .. and if we complete the FIFO list (ie, the buffer has
  ATH_BUF_FIFOEND set), then we don't need the holding buffer
  any longer.  Thus, free it.

Tested:

* AR9380/AR9580, STA and hostap
* AR9280, STA/hostap

TODO:

* I don't yet trust that the EDMA restart routine is totally correct
  in all circumstances.  I'll continue to thrash this out under heavy
  multiple-TXQ traffic load and fix whatever pops up.
2013-03-26 20:04:45 +00:00
jimharris
83032bc239 Add a tunable for the I/O timeout interval. Default is still 30 seconds,
but can be adjusted between a min/max of 5 and 120 seconds.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 20:02:35 +00:00
jimharris
711dabaf43 Add handling for controller fatal status (csts.cfs).
On any I/O timeout, check for csts.cfs==1.  If set, the controller
is reporting fatal status and we reset the controller immediately,
rather than trying to abort the timed out command.

This changeset also includes deferring the controller start portion
of the reset to a separate task.  This ensures we are always performing
a controller start operation from a consistent context.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 19:58:17 +00:00
jimharris
cef3145004 Add API for nvme consumers to access controller and namespace identify data.
Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 19:52:57 +00:00
jimharris
93fd264895 Add controller reset capability to nvme(4) and ability to explicitly
invoke it from nvmecontrol(8).

Controller reset will be performed in cases where I/O are repeatedly
timing out, the controller reports an unrecoverable condition, or
when explicitly requested via IOCTL or an nvme consumer.  Since the
controller may be in such a state where it cannot even process queue
deletion requests, we will perform a controller reset without trying
to clean up anything on the controller first.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 19:50:46 +00:00