Commit Graph

92838 Commits

Author SHA1 Message Date
Ian Lepore
6fb87f7371 When running on armv6, set alignment checking to modulo-4 mode rather
than modulo-8, because clang emits ldrd and strd instructions for
addresses that are only 4-byte aligned
2013-03-31 22:43:16 +00:00
Ian Lepore
070cf887bb When running on armv6, set alignment checking to modulo-4 mode rather
than modulo-8, because clang emits ldrd and strd instructions for
addresses that are only 4-byte aligned.
2013-03-31 22:42:25 +00:00
Michael Tuexen
ebae998767 Add a macro for checking for IPv4 link local addresses.
MFC after: 1 week
2013-03-31 18:27:46 +00:00
Jilles Tjoelker
d289dc7b73 Rename do_pipe() to kern_pipe2() and declare it properly. 2013-03-31 17:42:54 +00:00
Ian Lepore
27aa887af3 Fix a typo in the CF device driver name that prevented instantiation. 2013-03-31 12:51:56 +00:00
Neel Natu
77d8fd9bb3 Add counter to keep track of the number of timer interrupts generated by
the local apic for each virtual cpu.
2013-03-31 03:56:48 +00:00
Neel Natu
b5aaf7b22b Add some more stats to keep track of all the reasons that a vcpu is exiting. 2013-03-30 17:46:03 +00:00
Tim Kientzle
2f37ec8b30 Initialize sym_count to 0.
This fixes a compiler warning introduced in r248121.
2013-03-30 16:33:16 +00:00
Matthew D Fleming
926cd204c7 Use a shared lock for VOP_GETEXTATTR, as it is a read-like operation.
MFC after:	1 week
2013-03-30 15:09:04 +00:00
Jilles Tjoelker
937c916587 Improve namespacing in <sys/socket.h>:
* MSG_NOSIGNAL is in POSIX.1-2008.
 * MSG_NOTIFICATION (SCTP) is not in POSIX.
 * PRU_FLUSH_* (SCTP) are not in POSIX.
 * bindat()/connectat() are not in POSIX.

Discussed with:	rrs (PRU_FLUSH_*)
2013-03-30 13:30:27 +00:00
Adrian Chadd
a296efdeeb AR933x CPU device improvements:
* Add baud rate and divisor programming code. See below for more
  information.

* Flesh out ar933x_init() to disable interrupts and program the initial
  console setup.

* Remove #if 0'ed code from ar933x_term().

* Explain what these functions do.

Now, the baud rate and divisor code comes from Linux, as a submission
to the OpenWRT project and Linux kernel from
Gabor Juhos <juhosg@openwrt.org>.

The original ticket for this code is https://dev.openwrt.org/ticket/12031 .

I've contacted Gabor and asked for his permission to also licence the patch
in question (which covers this code) to BSD lience and he's agreed.
Hence why I'm including it here in FreeBSD.

Tested:

* AP121 (AR9330)
2013-03-30 04:31:29 +00:00
Adrian Chadd
8eeea2945d AR933x UART updates:
* Default clock is 25MHz;
* Remove the UART register macro here - it's not needed as we don't need
  to "adjust" the register offset / spacing at all;
* Remove unused fields in the softc.

Tested:

* AP121
2013-03-30 04:13:47 +00:00
Navdeep Parhar
d14b0ac129 cxgbe(4): Add support for Chelsio's Terminator 5 (aka T5) ASIC. This
includes support for the NIC and TOE features of the 40G, 10G, and
1G/100M cards based on the T5.

The ASIC is mostly backward compatible with the Terminator 4 so cxgbe(4)
has been updated instead of writing a brand new driver.  T5 cards will
show up as cxl (short for cxlgb) ports attached to the t5nex bus driver.

Sponsored by:	Chelsio
2013-03-30 02:26:20 +00:00
Steven Hartland
5f83aee5e5 Adds the ability to enable / disable sorting of BIO requests queued within
CAM. This can significantly improve performance particularly for SSDs
which don't suffer from seek latencies.

The sysctl / tunable kern.cam.sort_io_queues provides the systems default
setting where:-
0 = queued BIOs are NOT sorted
1 = queued BIOs are sorted (default)

Each device gets its own sysctl kern.cam.<type>.<id>.sort_io_queue
Valid values are:-
-1 = use system default (default)
0 = queued BIOs are NOT sorted
1 = queued BIOs are sorted

Note: Additional patch will look to add automatic use of none sorted queues
for none rotating media e.g. SSD's

Reviewed by:	scottl
Approved by:	pjd (mentor)
MFC after:	2 weeks
2013-03-29 22:58:15 +00:00
Ed Maste
ce7ad6640c Keep fwd_tag around for subsequent pcb lookups
For TIMEWAIT handling tcp_input may have to jump back for an additional
pass through pcblookup.  Prior to this change the fwd_tag had been
discarded after the first lookup, so a new connection attempt delivered
locally via 'ipfw fwd' would fail to find a match.

As of r248886 the tag will be detached and freed when passed to the
socket buffer.
2013-03-29 20:51:44 +00:00
Jim Harris
1e526bc478 Add "type" to nvme_request, signifying if its payload is a VADDR, UIO, or
NULL. This simplifies decisions around if/how requests are routed through
busdma.  It also paves the way for supporting unmapped bios.

Sponsored by:	Intel
2013-03-29 20:34:28 +00:00
Adrian Chadd
033891b29d Disable this; it's a local option that I haven't yet committed to -HEAD. 2013-03-29 20:07:51 +00:00
Ian Lepore
63cdf42e8c Add userland access to at91 gpio functionality via ioctl calls. Also,
add the ability for userland to be notified of changes on gpio pins via
a select(2)/read(2) interface.

Change the interrupt handler from filtered to threaded.

Because of the uiomove() calls in the new interface, change locking from
standard mutex to sx.

Add / restore the at91_gpio_high_z() function.

Reviewed by:	imp (long ago)
2013-03-29 19:52:57 +00:00
Ian Lepore
914421fa79 Change the API for at91_pio_gpio_get() to return the entire masked set
of bits, not just a 0/1 indicating whether any of the masked bits are on.
This is compatible with the single in-tree caller of this function right now
(at91_vbus_poll() in dev/usb/controller/at91dci_atemelarm.c).
2013-03-29 19:04:18 +00:00
Ian Lepore
b39ec0de86 Call soc_info.soc_data->soc_clock_init() before at91_pmc_init_clock(), so
that the latter correctly fills in the clock data structures based on
proper hardware-specific shift and mask values from the soc_data structure.
2013-03-29 18:47:08 +00:00
Jack F Vogel
be2095895a Change the define in the header to eliminate unnecessary data
when using LEGACY TX.
2013-03-29 18:46:13 +00:00
Ian Lepore
fce4536cfd Add a couple forward declarations, so that board support routines don't have
to pre-include a bunch of header files they don't need just to use this one.
2013-03-29 18:43:10 +00:00
Jack F Vogel
c05891a6da Change defines in the igb driver to allow an easier selection of
the older if_start/non-multiqueue interface from the stack. This
is not the default, but can be turned on in the Makefile now regardless
of the OS level to allow either testing or use of ALTQ.

MFC after: one week
2013-03-29 18:25:45 +00:00
Ian Lepore
5c4938ee48 Redo the workaround for at91rm9200 erratum #26 in a way that doesn't
cause a lockup on some rm92 hardware.
2013-03-29 18:17:51 +00:00
Ian Lepore
c29eb73802 Fix a typo: the RXD0 pin is PA18, not PA19. 2013-03-29 18:06:54 +00:00
Jack F Vogel
7bdac10465 Two small fixes:
Set promiscuous code was unconditionally turning off multicast when
  turning off promiscuous mode, this should only be done when there are
  less than MAX groups. Thanks to Mike Karels for this correction.

  Second, the overtmp interrupt setup/detection was wrong, correcting it.

MFC after:	one week
2013-03-29 18:03:00 +00:00
Ian Lepore
14146e9a04 Remove a really noisy printf left over from debugging hardware errata. 2013-03-29 17:57:24 +00:00
Jim Harris
10a93479b9 Add bus_dmamap_load_bio for non-CAM disk drivers that wish to enable
unmapped I/O.

Sponsored by:	Intel
Reviewed by:	kib
2013-03-29 16:26:25 +00:00
Jim Harris
86675b5c0d Add CTR5() to bus_dmamap_load_ccb, similar to other bus_dmamap_load_*
functions.

Sponsored by:	Intel
2013-03-29 16:00:16 +00:00
Jim Harris
ab72998ef7 Do not add 1 to nsegs before passing to CTR5(), since nsegs
has already been incremented before these calls.

Sponsored by:	Intel
2013-03-29 15:54:12 +00:00
Jim Harris
b327350604 Pass correct parameter to CTR5() in bus_dmamap_load_uio.
Sponsored by:	Intel
2013-03-29 15:51:45 +00:00
Gleb Smirnoff
21f398487c Fix bug in m_split() in a case when split len matches len of the
first mbuf, and the first mbuf is M_PKTHDR.

PR:		kern/176144
Submitted by:	Jacques Fourie <jacques.fourie gmail.com>
2013-03-29 14:10:40 +00:00
Gleb Smirnoff
844cacd17c Once ng_ksocket(4) is fixed, re-apply r194662. See this revision for
longer description.

Discussed with:	andre, rwatson
Sponsored by:	Nginx, Inc.
2013-03-29 14:06:04 +00:00
Gleb Smirnoff
9a4d9e198a Revamp mbuf handling in ng_ksocket_incoming2():
- Clear code that workarounded a bug in FreeBSD 3,
  and even predated import of netgraph(4).
- Clear workaround for m_nextpkt pointing into
  next record in buffer (fixed in r248884).
  Assert that m_nextpkt is clear.
- Do not rely on SOCK_STREAM sockets containing
  M_PKTHDR mbufs. Create a header ourselves and
  attach chain to it. This is correct fix for
  kern/154676.

PR:		kern/154676
Sponsored by:	Nginx, Inc
2013-03-29 14:04:26 +00:00
Gleb Smirnoff
a307eb26ed When soreceive_generic() hands off an mbuf from buffer,
clear its pointer to next record, since next record
belongs to the buffer, and shouldn't be leaked.

The ng_ksocket(4) used to clear this pointer itself,
but the correct place is here.

Sponsored by:	Nginx, Inc
2013-03-29 13:57:55 +00:00
Gleb Smirnoff
6b1781e3ea Whitespace. 2013-03-29 13:53:14 +00:00
Gleb Smirnoff
d09c774bb5 Non-functional cleanup of ng_ksocket_incoming2(). 2013-03-29 13:51:01 +00:00
Marius Strobl
03efffd10e Unbreak compilation after r248868. 2013-03-29 11:53:20 +00:00
Alexander Motin
09cfadbe7f Make pre-shutdown flush and spindown routines to not use xpt_polled_action(),
but execute the commands in regular way.  There is no any reason to cook CPU
while the system is still fully operational.  After this change polling in
CAM is used only for kernel dumping.
2013-03-29 08:33:18 +00:00
Alexander Motin
f371c9e260 Implement CAM_PERIPH_FOREACH() macro, safely iterating over the list of
driver's periphs, acquiring and releaseing periph references while doing it.

Use it to iterate over the lists of ada and da periphs when flushing caches
and putting devices to sleep on shutdown and suspend.  Previous code could
panic in theory if some device disappear in the middle of the process.
2013-03-29 07:50:47 +00:00
Adrian Chadd
10e00ec8cc For the AR933x UART, the serial clock is not the AHB clock, it's the
reference clock.  So use that instead.
2013-03-29 06:32:39 +00:00
Adrian Chadd
19f293bd60 * Fix clock register definitions
* Add maximum clock register values
2013-03-29 06:32:02 +00:00
Adrian Chadd
600f8cb57a Print out the platform reference frequency.
This is useful for AR933x platforms where that matters.
2013-03-29 06:31:31 +00:00
Neel Natu
66f71b7d24 Allow caller to skip 'guest linear address' validation when doing instruction
decode. This is to accomodate hardware assist implementations that do not
provide the 'guest linear address' as part of nested page fault collateral.

Submitted by:	Anish Gupta (akgupt3 at gmail dot com)
2013-03-28 21:26:19 +00:00
Adrian Chadd
43b36ea90a Initial (unfinished!) AR933x support. 2013-03-28 20:48:58 +00:00
Mark Johnston
83a3ff21a8 Ignore interface renames instead of removing the interface from the bridge
group.

Reviewed by:	rstone
Approved by:	rstone (co-mentor)
Sponsored by:	Sandvine Incorporated
MFC after:	1 week
2013-03-28 20:37:07 +00:00
Adrian Chadd
7d52c7525f Tie in the AR933x support into -HEAD. 2013-03-28 19:30:56 +00:00
Adrian Chadd
308a33172f Bring over the initial, CPU-only UART support for the AR933x SoC.
This implements the kernel glue needed (getc, putc, rxready).

This isn't a 16550 UART, even if the datasheet overview claims so.

The Linux ar933x support was used as a reference, however the uart code
is a reimplementation.

Attentive viewers will note that the uart code is based off of the ns8250
code and the UART bus code is a stubbed-out version of this.  I'll be
replacing it with non-stubbed versions soon, making this a fully featured
driver.

Tested:

* AP121 reference board (AR933x), booting through the mountroot> prompt;
  then doing some basic interactive tests in ddb.
2013-03-28 19:27:06 +00:00
Sean Bruno
cc0c1555d3 Update hwpmc to support Haswell class processors.
0x3C:      /* Per Intel document 325462-045US 01/2013. */

Add manpage to document all the goodness that is available in this
processor model.

Submitted by:	hiren panchasara <hiren.panchasara@gmail.com>
Reviewed by:	jimharris, sbruno
Obtained from:	Yahoo! Inc.
MFC after:	2 weeks
2013-03-28 19:15:54 +00:00
Jim Harris
64432b473b Remove obsolete comment. This code has now been tested with the QEMU
NVMe device emulator.
2013-03-28 16:57:48 +00:00
Jim Harris
bb852ae89b Delete extra IO qpairs allocated based on number of MSI-X vectors, but
later found to not be usable because the controller doesn't support the
same number of queues.

This is not the normal case, but does occur with the Chatham prototype
board.

Sponsored by:	Intel
2013-03-28 16:54:19 +00:00
Scott Long
07dbf2c768 Several fixes and improvements to sendfile()
1.  If we wanted to send exactly as many bytes as the socket buffer is
    sized for, the inner loop of kern_sendfile() would see that the
    socket is full before seeing that it had no more bytes left to send.
    This would cause it to return EAGAIN to the caller instead of
    success.  Fix by changing the order that these conditions are tested.
2.  Simplify the calculation for the bytes to send in each iteration of
    the inner loop of kern_sendfile()
3.  Fix some calls with bogus arguments to sf_buf_ext().  These would
    only trigger on mbuf allocation failure, but would be hilariously
    bad if they did trigger.

Submitted by:	gibbs(3), andre(2)
Reviewed by:	emax, andre
Obtained from:	Netflix
MFC after:	1 week
2013-03-28 14:14:28 +00:00
Sean Bruno
b27556e012 Restore DB_COMMAND capabilities of ciss(4) for debugging and diagnostics
Obtained from:	Yahoo! Inc.
MFC after:	2 weeks
2013-03-28 12:44:43 +00:00
Alexander Motin
47bf7bcb97 Except one case mps(4) driver does not touch the data and works well with
unmapped I/O.  That one exception is access to INQUIRY VPD request result.
Those requests are never unmapped now, but to be safe add respective check
there and allow unmapped I/O for the SIM by setting PIM_UNMAPPED flag.
2013-03-28 11:24:30 +00:00
Sean Bruno
3ea59fd3a4 Fix compile of ciss(4) with CISS_DEBUG defined
Obtained from:	Yahoo! Inc.
MFC after:	2 weeks
2013-03-28 11:00:41 +00:00
Konstantin Belousov
bafa6cfc93 Release the v_writecount reference on the vnode in case of error,
before the vnode is vput() in vm_mmap_vnode().  Error return means
that there is no use reference on the vnode from the vm object
reference, and failing to restore v_writecount breaks the invariant
that v_writecount is less or equal to the usecount.

The situation observed when nfs client returns ESTALE for
VOP_GETATTR() after the open.

In collaboration with:	pho
MFC after:	1 week
2013-03-28 06:39:27 +00:00
Adrian Chadd
09ac4e68f3 Fix the AR933x platform device start/stop code.
This was ported from the AR724x code and I think that also doesn't
quite work.  I'll investigate that soon.

With this in place the system reset path works, so 'reset' from kdb
actually resets the SoC.

Tested:

* AP121 test board
2013-03-28 05:43:03 +00:00
Jim Harris
47301c53ed deferal -> deferral 2013-03-27 23:07:43 +00:00
Alexander Motin
cd05a36e54 On SIM destruction free associated CCBs, preallocated inside xpt_get_ccb().
Before this change they were just leaked.  Fortunately USB sticks now use
only one CCB, and so leak was only 2KB per detach, while other bigger SIMs
with much more allocated CCBs are rarely detached.

MFC after:	2 weeks
2013-03-27 18:55:01 +00:00
Jung-uk Kim
ba9855e380 Limit the amount of video memory we map for the driver to the maximum value.
This basically restores the spirit of r203535, which was partially reverted
in r205557, while we still map fixed amount to work around transient issues
we experienced with r203535.

Prodded by:	avg
Tested by:	avg
MFC after:	1 week
2013-03-27 18:06:28 +00:00
Konstantin Belousov
f3215a60fd Fix a race with the vnode reclamation in the aio_qphysio(). Obtain
the thread reference on the vp->v_rdev and use the returned struct
cdev *dev instead of using vp->v_rdev.  Call dev_strategy_csw()
instead of dev_strategy(), since we now own the reference.

Since the csw was already calculated, test d_flags to avoid mapping
the buffer if the driver supports unmapped requests [*].

Suggested by:	kan [*]
Reviewed by:	kan (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2013-03-27 11:47:52 +00:00
Konstantin Belousov
d1e99f43ed Add dev_strategy_csw() function, which is similar to dev_strategy()
but assumes that a thread reference was already obtained on the passed
device.  Use the function from physio(), to avoid two extra dev_mtx
lock and unlock.  Note that physio() is always used as the cdevsw
method, or is called from a cdevsw method, and the caller already owns
the reference.

dev_strategy() is left to keep KPI intact, but now it is implemented
as a wrapper around dev_strategy_csw().

Do some style cleanup in physio().

Requested and reviewed by:	kan (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2013-03-27 11:34:27 +00:00
Konstantin Belousov
88c8c0a70f On i386, double the default size of the bio transient map. With the
maxbcache size fixed, the auto-tuned transient map is too small for
real-world load on i386.

Tested by:	David Wolfskill
Sponsored by:	The FreeBSD Foundation
2013-03-27 10:56:15 +00:00
Konstantin Belousov
d4e9009cc8 Fix the VM_BCACHE_SIZE_MAX definition on i386 to match the maximal
buffer map size, auto-tuned on the 4GB machine.  Having the maxbcache
bigger than the buffer map causes the transient bio map sizing logic
to assume that there is enough KVA to use approximately 90MB (buffer
map is sized to 110MB, and maxbcache is 200MB).  The increase in the
KVA usage caused other big KVA consumers, like nvidia.ko, to fail the
initialization.

Change the definition for both PAE and non-PAE cases, since PAE is
even more KVA-starved.

Reported and tested by:	David Wolfskill
Discussed with:	alc
Sponsored by:	The FreeBSD Foundation
2013-03-27 10:52:18 +00:00
Alexander Motin
7019329c1a Add Subsystem ID field to the quirk table. Use it to identify Mac Pro 1,1,
which requires OVREF to be set to get proper playback volume, but which has
all zeroes in HDA controller subdevice IDs on PCI.

MFC after:	1 month
Sponsored by:
2013-03-27 07:30:08 +00:00
Adrian Chadd
601a83560e Commit initial (unfinished!) support for the AR933x series of embedded
CPUs.

The AR933x is a mips24k based SoC with an AR9380 series SoC on board,
two gigabit ethernet interfaces and an internal 10/100mbit ethernet
switch.  There's also the normal interfaces (USB, ethernet, uart, GPIO.)

The downside? There's a non-ns8250 UART device.

With a very basic UART driver (not in this commit) the SoC is initialised
and boots up.  I'll commit the UART code soon and then link it into the
general setup path.

This code is a re-implementation based from the Linux kernel / openwrt
AR933x support.

TODO:

* UART (obviously)
* All of the ethernet, USB and wifi SoC glue, including ethernet PLL
  programming.
2013-03-27 03:38:58 +00:00
Adrian Chadd
a4a1b49368 Add the reference clock for each supported chip.
Obtained from:	Linux (openwrt)
2013-03-27 03:33:19 +00:00
Jim Harris
bdd1fd402c Fix printf format issue on i386.
Reported by:	bz
2013-03-27 00:37:00 +00:00
Adrian Chadd
b92b5f6e3a * Stop processing after HAL_EIO; this is what the reference driver does.
* If we hit an empty queue condition (which I haven't yet root caused, grr.)
  .. make sure we release the lock before continuing.
2013-03-27 00:35:45 +00:00
Jim Harris
944497f65f Panic should the SCI framework ever request a pointer into the ccb's
data buffer for a ccb that is unmapped.

This case is currently not possible, since the SCI framework only
requests these pointers for doing SCSI/ATA translation of non-
READ/WRITE commands.  The panic is more to protect against the
unlikely future scenario where additional commands could be unmapped.

Sponsored by:	Intel
2013-03-27 00:15:22 +00:00
Jim Harris
1da66a2776 Report support for unmapped I/O by adding PIM_UNMAPPED flag.
Submitted by:	jhb, scottl
2013-03-26 23:04:06 +00:00
Jim Harris
547d523eb8 Clean up debug prints.
1) Consistently use device_printf.
2) Make dump_completion and dump_command into something more
    human-readable.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 22:17:10 +00:00
Jim Harris
dd433dd0fb Move common code from the different nvme_allocate_request functions into a
separate function.

Sponsored by:	Intel
Suggested by:	carl
Reviewed by:	carl
2013-03-26 22:13:07 +00:00
Jim Harris
237d2019e5 Change a number of malloc(9) calls to use M_WAITOK instead of
M_NOWAIT.

Sponsored by:	Intel
Suggested by:	carl
Reviewed by:	carl
2013-03-26 22:11:34 +00:00
Jim Harris
955910a916 Replace usages of mtx_pool_find used for admin commands with a polling
mechanism.

Now that all requests are timed, we are guaranteed to get a completion
notification, even if it is an abort status due to a timed out admin
command.

This has the effect of simplifying the controller and namespace setup
code, so that it reads straight through rather than broken up into
a bunch of different callback functions.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 22:09:51 +00:00
Jim Harris
43a3725688 Abort and do not retry any outstanding admin commands left over after
a controller reset.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 22:06:05 +00:00
Jim Harris
232e2edb6c Add the ability to internally mark a controller as failed, if it is unable to
start or reset.  Also add a notifier for NVMe consumers for controller fail
conditions and plumb this notifier for nvd(4) to destroy the associated
GEOM disks when a failure occurs.

This requires a bit of work to cover the races when a consumer is sending
I/O requests to a controller that is transitioning to the failed state.  To
help cover this condition, add a task to defer completion of I/Os submitted
to a failed controller, so that the consumer will still always receive its
completions in a different context than the submission.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:58:38 +00:00
Jim Harris
3d7eb41c1b Just disable the controller instead of deleting IO queues during detach.
This is just as effective, and removes the need for a bunch of admin commands
to a controller that's going to be disabled shortly anyways.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:48:41 +00:00
Jim Harris
ec84ecbba0 Have nvd(4) register for controller notifications.
Also have nvd maintain controller/namespace relationships internally.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:45:37 +00:00
Jim Harris
74019d4b67 Set Pre-boot Software Load Count to 0 at the end of the controller
start process.

The spec indicates the OS driver should use Set Features (Software
Progress Marker) to set the pre-boot software load count to 0
after the OS driver has successfully been initialized.  This allows
pre-boot software to determine if there have been any issues with the
OS loading.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:42:53 +00:00
Jim Harris
be34f21609 Remove the is_started flag from struct nvme_controller.
This flag was originally added to communicate to the sysctl code
which oids should be built, but there are easier ways to do this.  This
needs to be cleaned up prior to adding new controller states - for example,
controller failure.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:19:26 +00:00
Jim Harris
02e3348484 Ensure the controller's MDTS is accounted for in max_xfer_size.
The controller's IDENTIFY data contains MDTS (Max Data Transfer Size) to
allow the controller to specify the maximum I/O data transfer size.  nvme(4)
already provides a default maximum, but make sure it does not exceed what
MDTS reports.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:16:53 +00:00
Jim Harris
cb5b7c1304 Cap the number of retry attempts to a configurable number. This ensures
that if a specific I/O repeatedly times out, we don't retry it indefinitely.

The default number of retries will be 4, but is adjusted using hw.nvme.retry_count.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:14:51 +00:00
Jim Harris
0d7e13ecb2 Pass associated log page data to async event consumers, if requested.
Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:08:32 +00:00
Jim Harris
2868353a57 When an asynchronous event request is completed, automatically fetch the
specified log page.

This satisfies the spec condition that future async events of the same type
will not be sent until the associated log page is fetched.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:05:15 +00:00
Jim Harris
0692579bf3 Add structure definitions and controller command function for firmware
log pages.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:03:03 +00:00
Jim Harris
0892778256 Add structure definitions and a controller command function for
error log pages.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:01:53 +00:00
Jim Harris
cf81529ce3 Create struct nvme_status.
NVMe error log entries include status, so breaking this out into
its own data structure allows it to be included in both the
nvme_completion data structure as well as error log entry data
structures.

While here, expose nvme_completion_is_error(), and change all of
the places that were explicitly looking at sc/sct bits to use this
macro instead.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:00:18 +00:00
Jim Harris
f37c22a3bd Make nvme_ctrlr_reset a nop if a reset is already in progress.
This protects against cases where a controller crashes with multiple
I/O outstanding, each timing out and requesting controller resets
simultaneously.

While here, remove a debugging printf from a previous commit, and add
more logging around I/O that need to be resubmitted after a controller
reset.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 20:56:58 +00:00
Jim Harris
48ce317898 By default, always escalate to controller reset when an I/O times out.
While aborts are typically cleaner than a full controller reset, many times
an I/O timeout indicates other controller-level issues where aborts may not
work.  NVMe drivers for other operating systems are also defaulting to
controller reset rather than aborts for timed out I/O.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 20:32:57 +00:00
Pedro F. Giffuni
f5678b698a Dtrace: dtrace.c erroneously checks for memory alignment on amd64.
Merge change from illumos:

3511 dtrace.c erroneously checks for memory alignment on amd64

Illumos Revision:	c93cc65

Reference:
https://www.illumos.org/issues/3511

Obtained from:	Illumos
MFC after:	3 weeks
2013-03-26 20:17:08 +00:00
Adrian Chadd
92e84e43a6 Implement the replacement EDMA FIFO code.
(Yes, the previous code temporarily broke EDMA TX. I'm sorry; I should've
actually setup ATH_BUF_FIFOEND on frames so txq->axq_fifo_depth was
cleared!)

This code implements a whole bunch of sorely needed EDMA TX improvements
along with CABQ TX support.

The specifics:

* When filling/refilling the FIFO, use the new TXQ staging queue
  for FIFO frames

* Tag frames with ATH_BUF_FIFOPTR and ATH_BUF_FIFOEND correctly.
  For now the non-CABQ transmit path pushes one frame into the TXQ
  staging queue without setting up the intermediary link pointers
  to chain them together, so draining frames from the txq staging
  queue to the FIFO queue occurs AMPDU / MPDU at a time.

* In the CABQ case, manually tag the list with ATH_BUF_FIFOPTR and
  ATH_BUF_FIFOEND so a chain of frames is pushed into the FIFO
  at once.

* Now that frames are in a FIFO pending queue, we can top up the
  FIFO after completing a single frame.  This means we can keep
  it filled rather than waiting for it drain and _then_ adding
  more frames.

* The EDMA restart routine now walks the FIFO queue in the TXQ
  rather than the pending queue and re-initialises the FIFO with
  that.

* When restarting EDMA, we may have partially completed sending
  a list.  So stamp the first frame that we see in a list with
  ATH_BUF_FIFOPTR and push _that_ into the hardware.

* When completing frames, only check those on the FIFO queue.
  We should never ever queue frames from the pending queue
  direct to the hardware, so there's no point in checking.

* Until I figure out what's going on, make sure if the TXSTATUS
  for an empty queue pops up, complain loudly and continue.
  This will stop the panics that people are seeing.  I'll add
  some code later which will assist in ensuring I'm populating
  each descriptor with the correct queue ID.

* When considering whether to queue frames to the hardware queue
  directly or software queue frames, make sure the depth of
  the FIFO is taken into account now.

* When completing frames, tag them with ATH_BUF_BUSY if they're
  not the final frame in a FIFO list.  The same holding descriptor
  behaviour is required when handling descriptors linked together
  with a link pointer as the hardware will re-read the previous
  descriptor to refresh the link pointer before contiuning.

* .. and if we complete the FIFO list (ie, the buffer has
  ATH_BUF_FIFOEND set), then we don't need the holding buffer
  any longer.  Thus, free it.

Tested:

* AR9380/AR9580, STA and hostap
* AR9280, STA/hostap

TODO:

* I don't yet trust that the EDMA restart routine is totally correct
  in all circumstances.  I'll continue to thrash this out under heavy
  multiple-TXQ traffic load and fix whatever pops up.
2013-03-26 20:04:45 +00:00
Jim Harris
941433323c Add a tunable for the I/O timeout interval. Default is still 30 seconds,
but can be adjusted between a min/max of 5 and 120 seconds.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 20:02:35 +00:00
Jim Harris
12d191ec12 Add handling for controller fatal status (csts.cfs).
On any I/O timeout, check for csts.cfs==1.  If set, the controller
is reporting fatal status and we reset the controller immediately,
rather than trying to abort the timed out command.

This changeset also includes deferring the controller start portion
of the reset to a separate task.  This ensures we are always performing
a controller start operation from a consistent context.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 19:58:17 +00:00
Jim Harris
dbba74428b Add API for nvme consumers to access controller and namespace identify data.
Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 19:52:57 +00:00
Jim Harris
b846efd7ec Add controller reset capability to nvme(4) and ability to explicitly
invoke it from nvmecontrol(8).

Controller reset will be performed in cases where I/O are repeatedly
timing out, the controller reports an unrecoverable condition, or
when explicitly requested via IOCTL or an nvme consumer.  Since the
controller may be in such a state where it cannot even process queue
deletion requests, we will perform a controller reset without trying
to clean up anything on the controller first.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 19:50:46 +00:00
Adrian Chadd
3feffbd796 Add per-TXQ EDMA FIFO staging queue support.
Each set of frames pushed into a FIFO is represented by a list of
ath_bufs - the first ath_buf in the FIFO list is marked with
ATH_BUF_FIFOPTR; the last ath_buf in the FIFO list is marked with
ATH_BUF_FIFOEND.

Multiple lists of frames are just glued together in the TAILQ as per
normal - except that at the end of a FIFO list, the descriptor link
pointer will be NULL and it'll be tagged with ATH_BUF_FIFOEND.

For non-EDMA chipsets this is a no-op - the ath_txq frame list (axq_q)
stays the same and is treated the same.

For EDMA chipsets the frames are pushed into axq_q and then when
the FIFO is to be (re) filled, frames will be moved onto the FIFO
queue and then pushed into the FIFO.

So:

* Add a new queue in each hardware TXQ (ath_txq) for staging FIFO frame
  lists.  It's a TAILQ (like the normal hardware frame queue) rather than
  the ath9k list-of-lists to represent FIFO entries.

* Add new ath_buf flags - ATH_TX_FIFOPTR and ATH_TX_FIFOEND.

* When allocating ath_buf entries, clear out the flag value before
  returning it or it'll end up having stale flags.

* When cloning ath_buf entries, only clone ATH_BUF_MGMT.  Don't clone
  the FIFO related flags.

* Extend ath_tx_draintxq() to first drain the FIFO staging queue, _then_
  drain the normal hardware queue.

Tested:

* AR9280, hostap
* AR9280, STA
* AR9380/AR9580 - hostap

TODO:

* Test on other chipsets, just to be thorough.
2013-03-26 19:46:51 +00:00
Jim Harris
65c2474e6d Keep a doubly-linked list of outstanding trackers.
This enables in-order re-submission of I/O after a controller reset.

Sponsored by:	Intel
2013-03-26 18:45:16 +00:00
Jim Harris
5f1e251de6 Create a generic nvme_ctrlr_cmd_get_log_page function, and change the
health information log page function to use it.

Sponsored by:	Intel
2013-03-26 18:43:53 +00:00
Jim Harris
99d99f7408 Expose the get/set features API to nvme consumers.
Sponsored by:	Intel
2013-03-26 18:42:05 +00:00
Jim Harris
038a5ee403 Add an interface for nvme shim drivers (i.e. nvd) to register for
notifications when new nvme controllers are added to the system.

Sponsored by:	Intel
2013-03-26 18:39:54 +00:00
Jim Harris
0a0b08cc30 Enable asynchronous event requests on non-Chatham devices.
Also add logic to clean up all outstanding asynchronous event requests
when resetting or shutting down the controller, since these requests
will not be explicitly completed by the controller itself.

Sponsored by:	Intel
2013-03-26 18:37:36 +00:00
Jim Harris
990e741c18 Move controller destruction code from nvme_detach() to new nvme_ctrlr_destruct()
function.

Sponsored by:	Intel
2013-03-26 18:34:19 +00:00
Jim Harris
274b3a88fa Specify command timeout interval on a per-command type basis.
This is primarily driven by the need to disable timeouts for asynchronous
event requests, which by nature should not be timed out.

Sponsored by:	Intel
2013-03-26 18:31:46 +00:00
Jim Harris
879de69910 Explicitly abort a timed out command, if the ABORT command sent to the
controller indicates the command was not found.

Sponsored by:	Intel
2013-03-26 18:29:04 +00:00
Jim Harris
6cb0607039 Break out the code for completing an nvme_tracker object into a separate
function.

This allows for completions outside the normal completion path, for example
when an ABORT command fails due to the controller reporting the targeted
command does not exist.  This is mainly for protection against a faulty
controller, but we need to clean up our internal request nonetheless.

Sponsored by:	Intel
2013-03-26 18:27:22 +00:00
Jim Harris
448195e764 Add support for ABORT commands, including issuing these commands when
an I/O times out.

Also ensure that we retry commands that are aborted due to a timeout.

Sponsored by:	Intel
2013-03-26 18:23:35 +00:00
Jim Harris
d6f54866ea Add an internal _nvme_qpair_submit_request function, which performs
the submit action assuming the qpair lock has already been acquired.

Also change nvme_qpair_submit_request to just lock/unlock the mutex
around a call to this new function.

This fixes a recursive mutex acquisition in the retry path.

Sponsored by:	Intel
2013-03-26 18:20:11 +00:00
Jim Harris
aaf6b84a4e Make the DSM range count 0-based. Previously we were deallocating one more
LBA than we should have been.

Sponsored by:	Intel
2013-03-26 18:16:30 +00:00
Jim Harris
51a9feb9b0 Do not look at the namespace's thin provisioning field to determine if DSM
command is supported.  The two are not related.

Sponsored by:	Intel
2013-03-26 18:01:24 +00:00
Alan Cox
3fc10b7363 Introduce vm_radix_isleaf() and use it in a couple places. As compared to
using vm_radix_node_page() == NULL, the compiler is able to generate one
less conditional branch when vm_radix_isleaf() is used.  More use cases
involving the inner loops of vm_radix_insert(), vm_radix_lookup{,_ge,_le}(),
and vm_radix_remove() will follow.

Reviewed by:	attilio
Sponsored by:	EMC / Isilon Storage Division
2013-03-26 17:30:40 +00:00
Gleb Smirnoff
fa75f402ae Return ENOMEM if malloc() fails. 2013-03-26 14:08:14 +00:00
Gleb Smirnoff
a23a2dd138 Cleanup: wrap long lines, cleanup comments, etc. 2013-03-26 14:05:37 +00:00
Alexander Motin
7868ec506b geom_slice.c and its consumers like GEOM_LABEL are not touching the data
unless hotspots are used.  Pass G_PF_ACCEPT_UNMAPPED flag through except
such rare cases (obsolete GEOM_SUNLABEL and GEOM_BSD).
2013-03-26 07:55:24 +00:00
Alexander Motin
6c6e13b6e1 GEOM NOP does not touch the data, so pass G_PF_ACCEPT_UNMAPPED flag through. 2013-03-26 05:58:49 +00:00
Alexander Motin
a93c0ed463 Remove extra bio_data and bio_length copying to child request after calling
g_clone_bio(), that already copied them.
2013-03-26 05:42:12 +00:00
Adrian Chadd
35bec3655e Remove the mcast path calls to ath_hal_gettxdesclinkptr() for axq_link -
they're no longer needed for the legacy path and they're not wanted
for the EDMA path.

Tested:

* AR9280, hostap + CABQ
* AR9380/AR9580, hostap + CABQ
2013-03-26 04:56:54 +00:00
Adrian Chadd
b708ea2941 Remove this dead code - it's no longer relevant (as yes, we do actually
support TX on EDMA chips.)
2013-03-26 04:53:40 +00:00
Adrian Chadd
b6ef0f8ac8 Convert the CABQ queue code over to use the HAL link pointer method
instead of axq_link.

This (among a bunch of uncommitted work) is required for EDMA chips
to correctly transmit frames on the CABQ.

Tested:

* AR9280, hostap mode
* AR9380/AR9580, hostap mode (staggered beacons)

TODO:

* This code only really gets called when burst beacons are used;
  it glues multiple CABQ queues together when sending to the hardware.
* More thorough bursted beacon testing! (first requires some work with
  the beacon queue code for bursted beacons, as that currently uses the
  link pointer and will fail on EDMA chips.)
2013-03-26 04:52:16 +00:00
Adrian Chadd
9e7259a2a3 Convert the EDMA multicast queue code over to use the HAL method to set
the descriptor link pointer, rather than directly.

This is needed on AR9380 and later (ie, EDMA) NICs so the multicast queue
has a chance in hell of being put together right.

Tested:

* AR9380, AR9580 in hostap mode, CABQ traffic (but with other patches..)
2013-03-26 04:48:58 +00:00
Adrian Chadd
0891354cd2 Migrate the multicast queue assembly code to not use the axq_link pointer
and instead use the HAL method to set the link pointer.

Tested:

* AR9280, hostap mode, CABQ frames being queued and transmitted
2013-03-26 04:47:40 +00:00
Alexander Kabaev
31932fae1e Do not pass unmapped buffers to drivers that cannot handle them
In physio, check if device can handle unmapped IO and pass an
appropriately mapped buffer to the driver strategy routine. The
only driver in the tree that can handle unmapped buffers is one
exposed by GEOM, so mark it as such with the new flag in the
driver cdevsw structure.

This fixes insta-panics on hosts, running dconschat, as /dev/fwmem
is an example of the driver that makes use of physio routine, but
bypasses the g_down thread, where the buffer gets mapped normally.

Discussed with: kib (earlier version)
2013-03-26 01:17:06 +00:00
Pedro F. Giffuni
5472787377 Dtrace: Add SUN MDB-like type-aware print() action.
Merge change from illumos:

1694 Add type-aware print() action

This is a very nice feature implemented in upstream Dtrace.
A complete description is available here:
http://dtrace.org/blogs/eschrock/2011/10/26/your-mdb-fell-into-my-dtrace/

This change bumps the DT_VERS_* number to 1.9.0 in
accordance to what is done in illumos.

While here also include some minor cleanups to ease further merging
and appease clang with a fix by Fabian Keil.

Illumos Revisions:	13501:c3a7090dbc16
			13483:f413e6c5d297

Reference:
https://www.illumos.org/issues/1560
https://www.illumos.org/issues/1694

Tested by:	Fabian Keil
Obtained from:	Illumos
MFC after:	1 month
2013-03-25 20:38:09 +00:00
Pedro F. Giffuni
730cecb05a Dtrace: add toupper()/tolower() and enhancements to lltostr().
Merge changes from illumos:

1451 DTrace needs toupper()/tolower() subroutines
1457 lltostr() D subroutine should take an optional base

This change bumps the DT_VERS_* number to 1.8.1 in
accordance to what is done in illumos.

The test suite we currently include is outdated and
doesnt support some updates in tst.subr.d which had to
be left out for now.

Illumos Revisions:	r13458 5e394d8db762
			r13459 c3454574dd1a

Reference:
https://www.illumos.org/issues/1451
https://www.illumos.org/issues/1457

Tested by:	Fabian Keil
Obtained from:	Illumos
MFC after:	1 month
2013-03-25 15:40:57 +00:00
Alexander V. Chernikov
58e8e6e6bb Unlock IPMI sc while performing requests via KCS and SMIC interfaces.
It is already done in SSIF interface code.
This reduces contention/spinning reported by many users.

PR:		kern/172166
Submitted by:	Eric van Gyzen <eric at vangyzen.net>
MFC after:	2 weeks
2013-03-25 14:30:34 +00:00
Alexander Motin
6a740c4a4f Read Asynchronous Notification statuses only if Port Multiplier or ATAPI
device are connected. ATA disks are not using ANs, while the extra register
read operation is quite expensive.
2013-03-25 13:58:17 +00:00
Davide Italiano
3f321a4eac Cache the callout precision argument as part of the informations required
for migrating callouts to new CPU. This value is passed to
callout_cc_add() in order to update properly precision field in case of
rescheduling/migration.

Reviewed by:	mav
2013-03-25 09:43:50 +00:00
Alexander Motin
3d44989055 Depending on combination of running commands (NCQ/non-NCQ) try to avoid
extra read from PxCI/PxSACT registers.  If only NCQ commands are running, we
don't really need PxCI.  If only non-NCQ commands are running we don't need
PxSACT.  Mixed set may happen only on controllers with FIS-based switching
when port multiplier is attached, and then we have to read both registers.

MFC after:	1 month
2013-03-25 08:50:51 +00:00
Andrey V. Elsukov
5b4661289d When we are removing a specific set, call ipfw_expire_dyn_rules only once.
Obtained from:	Yandex LLC
MFC after:	1 week
2013-03-25 07:43:46 +00:00
Alexander Motin
f4673017b3 Make GEOM MULTIPATH to report unmapped bio support if underling path report
it.  GEOM MULTIPATH itself never touches the data and so transparent.
2013-03-25 07:24:58 +00:00
Alexander Motin
6d14d0d010 Remove two bzero()s that are erasing only few more bytes then set later. 2013-03-25 06:31:17 +00:00
Alexander Motin
30ba747160 In GEOM DISK:
- Replace single done mutex with per-disk ones.  On system with several
disks on several HBAs that removes small, but measurable lock congestion.
 - Modify disk destruction process to not destroy the mutex prematurely.
 - Remove some extra pointer derefences.
2013-03-25 05:45:24 +00:00
Pedro F. Giffuni
f2e66d30b8 Dtrace: add optional size argument to tracemem().
Merge change from illumos:

1455 DTrace tracemem() should take an optional size argument

Our local enhancements to dt_print_bytes were equivalent to
those in illumos but we made it match the illumos version
to ease further code merges.

For now leave out tst.smallsize.d and tst.smallsize.d.out
since those don't seem to work cleanly on FreeBSD.

This change bumps the DT_VERS_* number to 1.7.1 in accordance
to what is done in illumos.

Illumos Revision:	13457:571b0355c2e3

Reference:
https://www.illumos.org/issues/1455

Tested by:	Fabian Keil
Obtained from:	Illumos
MFC after:	1 month
2013-03-24 19:12:08 +00:00
Ian Lepore
a350e54067 Set the backlink in mmc commands to the mmc request that contains them. 2013-03-24 17:23:10 +00:00
Alexander Motin
db12db318d No need to erase all 64 bytes of CFIS area if we never use more then 16. 2013-03-24 16:51:21 +00:00
Alan Cox
652615dcb7 Micro-optimize the control flow in a few places. Eliminate a panic call
that could never be reached in vm_radix_insert().  (If the pointer being
checked by the panic call were ever NULL, the immmediately preceding loop
would have already crashed on a NULL pointer dereference.)

Reviewed by:	attilio (an earlier version)
Sponsored by:	EMC / Isilon Storage Division
2013-03-24 16:43:07 +00:00
Alexander Motin
3c330aff3f Fix long known deadlock between geom dev destruction and d_close() call.
Use destroy_dev_sched_cb() to not wait for device destruction while holding
GEOM topology lock (that actually caused deadlock).  Use request counting
protected by mutex to properly wait for outstanding requests completion in
cases of device closing and geom destruction.  Unlike r227009, this code
does not block taskqueue thread for indefinite time, waiting for completion.
2013-03-24 10:14:25 +00:00
Adrian Chadd
1f6b3ed63c Add new regulatory domain.
Obtained from:	Qualcomm Atheros
2013-03-24 04:42:56 +00:00
Adrian Chadd
56a859789f Move the TXQ lock earlier in this routine - so to correctly protect the
link pointer check.
2013-03-24 04:09:54 +00:00
Adrian Chadd
0acf45ed86 Fix the locking changes due to the TXQ change drive-by.
Tested:

* AR9580, STA mode
2013-03-24 04:09:29 +00:00
Alexander Motin
50199fa0d0 Make g_wither_washer() to not loop by itself, but only when there was some
more topology change done that may require its attention.  Add few missing
g_do_wither() calls in respective places to signal it.

This fixes potential infinite loop here when some provider is withered, but
still opened or connected for some reason and so can not be destroyed.  For
example, see r227009 and r227510.
2013-03-24 03:15:20 +00:00
Adrian Chadd
b837332d0a Overhaul the TXQ locking (again!) as part of some beacon/cabq timing
related issues.

Moving the TX locking under one lock made things easier to progress on
but it had one important side-effect - it increased the latency when
handling CABQ setup when sending beacons.

This commit introduces a bunch of new changes and a few unrelated changs
that are just easier to lump in here.

The aim is to have the CABQ locking separate from other locking.
The CABQ transmit path in the beacon process thus doesn't have to grab
the general TX lock, reducing lock contention/latency and making it
more likely that we'll make the beacon TX timing.

The second half of this commit is the CABQ related setup changes needed
for sane looking EDMA CABQ support.  Right now the EDMA TX code naively
assumes that only one frame (MPDU or A-MPDU) is being pushed into each
FIFO slot.  For the CABQ this isn't true - a whole list of frames is
being pushed in - and thus CABQ handling breaks very quickly.

The aim here is to setup the CABQ list and then push _that list_ to
the hardware for transmission.  I can then extend the EDMA TX code
to stamp that list as being "one" FIFO entry (likely by tagging the
last buffer in that list as "FIFO END") so the EDMA TX completion code
correctly tracks things.

Major:

* Migrate the per-TXQ add/removal locking back to per-TXQ, rather than
  a single lock.

* Leave the software queue side of things under the ATH_TX_LOCK lock,
  (continuing) to serialise things as they are.

* Add a new function which is called whenever there's a beacon miss,
  to print out some debugging.  This is primarily designed to help
  me figure out if the beacon miss events are due to a noisy environment,
  issues with the PHY/MAC, or other.

* Move the CABQ setup/enable to occur _after_ all the VAPs have been
  looked at.  This means that for multiple VAPS in bursted mode, the
  CABQ gets primed once all VAPs are checked, rather than being primed
  on the first VAP and then having frames appended after this.

Minor:

* Add a (disabled) twiddle to let me enable/disable cabq traffic.
  It's primarily there to let me easily debug what's going on with beacon
  and CABQ setup/traffic; there's some DMA engine hangs which I'm finally
  trying to trace down.

* Clear bf_next when flushing frames; it should quieten some warnings
  that show up when a node goes away.

Tested:

* AR9280, STA/hostap, up to 4 vaps (staggered)
* AR5416, STA/hostap, up to 4 vaps (staggered)

TODO:

* (Lots) more AR9380 and later testing, as I may have missed something here.
* Leverage this to fix CABQ hanling for AR9380 and later chips.
* Force bursted beaconing on the chips that default to staggered beacons and
  ensure the CABQ stuff is all sane (eg, the MORE bits that aren't being
  correctly set when chaining descriptors.)
2013-03-24 00:03:12 +00:00
Adrian Chadd
49ddabc4bd CABQ calculation changes to try and fix some weird corner cases leading
to stuck beacons.

* Set the cabq readytime (ie, how long to burst for) to 50% of the total
  beacon interval time
* fix the cabq adjustment calculation based on how the beacon offset is
  calculated (the SWBA/DBA time offset.)

This is all still a bit magic voodoo but it does seem to have further
quietened issues with missed/stuck beacons under my local testing.
In any case, it better matches what the reference HAL implements.

Obtained from:	Qualcomm Atheros
2013-03-23 23:51:11 +00:00
Konstantin Belousov
b11d58b63f Do not call malloc(M_WAITOK) while bodev->fence_lock mutex is
held. The ttm_buffer_object_transfer() does not need the mutex locked
at all, except for the call to the driver sync_obj_ref() method.

Reported and tested by:	dumbbell
MFC after:   2 weeks
2013-03-23 22:23:15 +00:00
Jean-Sébastien Pédron
accadf8de2 drm/ttm: Fix a typo: s/pTTM]/[TTM]/ 2013-03-23 20:46:47 +00:00
Jean-Sébastien Pédron
76c40c6986 drm/ttm: Explain why we don't need to acquire a ref in ttm_bo_vm_ctor() 2013-03-23 20:43:26 +00:00
Martin Matuska
7608b757d7 Fix kernel build with options ZFS after r24571 (libzfs_core).
Submitted by:	Bjoern A. Zeeb <bz@FreeBSD.org>
2013-03-23 20:01:45 +00:00
Jean-Sébastien Pédron
a649986089 drm/ttm: Fix TTM buffer object refcount
This fixes memory leaks in the radeonkms driver.

Reviewed by:	Konstantin Belousov (kib@)
Tested by:	J.R. Oldroyd <jr@opal.com>
2013-03-23 19:19:19 +00:00
Ian Lepore
49addc5755 Don't check and warn about pmap mismatch on every call to busdma sync.
With some recent busdma refactoring, sometimes it happens that a sync
op gets called when bus_dmamap_load() never got called, which results
in a spurious warning about a map mismatch when no sync operations will
actually happen anyway.  Now the check is done only if a sync operation
is actually performed, and the result of the check is a panic, not just
a printf.

Reviewed by:	cognet (who prevented me from donning a point hat)
2013-03-23 17:17:06 +00:00
Will Andrews
ef04b888d2 Be more explicit about what each bio_cmd & bio_flags value means.
Reviewed by:	ken (mentor)
2013-03-23 16:55:07 +00:00
Will Andrews
58567a1b4e ZFS: Fix a panic while unmounting a busy filesystem.
This particular scenario was easily reproduced using a NFS export.  When the
first 'zfs unmount' occurred, it returned EBUSY via this path, while
vflush() had flushed references on the filesystem's root vnode, which in
turn caused its v_interlock to be destroyed.  The next time 'zfs unmount'
was called, vflush() tried to obtain this lock, which caused this panic.

Since vflush() on FreeBSD is a definitive call, there is no need to check
vfsp->vfs_count after it completes.  Simply #ifdef sun this check.

Submitted by:	avg
Reviewed by:	avg
Approved by:	ken (mentor)
MFC after:	1 month
2013-03-23 16:34:56 +00:00
Will Andrews
fdbc71742b Extend taskqueue(9) to enable per-taskqueue callbacks.
The scope of these callbacks is primarily to support actions that affect the
taskqueue's thread environments.  They are entirely optional, and
consequently are introduced as a new API: taskqueue_set_callback().

This interface allows the caller to specify that a taskqueue requires a
callback and optional context pointer for a given callback type.

The callback types included in this commit can be used to register a
constructor and destructor for thread-local storage using osd(9).  This
allows a particular taskqueue to define that its threads require a specific
type of TLS, without the need for a specially-orchestrated task-based
mechanism for startup and shutdown in order to accomplish it.

Two callback types are supported at this point:

- TASKQUEUE_CALLBACK_TYPE_INIT, called by every thread when it starts, prior
  to processing any tasks.
- TASKQUEUE_CALLBACK_TYPE_SHUTDOWN, called by every thread when it exits,
  after it has processed its last task but before the taskqueue is
  reclaimed.

While I'm here:

- Add two new macros, TQ_ASSERT_LOCKED and TQ_ASSERT_UNLOCKED, and use them
  in appropriate locations.
- Fix taskqueue.9 to mention taskqueue_start_threads(), which is a required
  interface for all consumers of taskqueue(9).

Reviewed by:	kib (all), eadler (taskqueue.9), brd (taskqueue.9)
Approved by:	ken (mentor)
Sponsored by:	Spectra Logic
MFC after:	1 month
2013-03-23 15:11:53 +00:00
Andriy Gapon
ca84e042a3 post mountroot event after a real/final root is mounted
not every time an intermediate root (including the first devfs) is
mounted.
This is also consistent with waking up via root_mount_complete.

Reviewed by:	jhb
MFC after:	13 days
2013-03-23 08:59:34 +00:00
Andriy Gapon
aaf2546b67 fbt_getargdesc: correctly handle types for return probes
MFC after:	6 days
2013-03-23 08:52:50 +00:00
Andriy Gapon
a47016e9a9 fbt_typoff_init: fix an off by one in determining required memory size
This issue would be silent most of the time, but if the requested memory
is a multiple of a page size, then accessing one element beyond the end
would lead to a kernel page fault.
Otherwise, the unlucky last type would just be inaccessible.

Reported by:	glebius
Tested by:	glebius
MFC after:	6 days
2013-03-23 08:48:44 +00:00
Xin LI
843b298e62 Don't attempt to reference sc before testing whether it's NULL.
Submitted by:	Sascha Wildner
Obtained from:	DragonFly
MFC after:	2 weeks
2013-03-22 22:46:19 +00:00
Kirk McKusick
baa12a84a7 The purpose of this change to the FFS layout policy is to reduce the
running time for a full fsck. It also reduces the random access time
for large files and speeds the traversal time for directory tree walks.

The key idea is to reserve a small area in each cylinder group
immediately following the inode blocks for the use of metadata,
specifically indirect blocks and directory contents. The new policy
is to preferentially place metadata in the metadata area and
everything else in the blocks that follow the metadata area.

The size of this area can be set when creating a filesystem using
newfs(8) or changed in an existing filesystem using tunefs(8).
Both utilities use the `-k held-for-metadata-blocks' option to
specify the amount of space to be held for metadata blocks in each
cylinder group. By default, newfs(8) sets this area to half of
minfree (typically 4% of the data area).

This work was inspired by a paper presented at Usenix's FAST '13:
www.usenix.org/conference/fast13/ffsck-fast-file-system-checker

Details of this implementation appears in the April 2013 of ;login:
www.usenix.org/publications/login/april-2013-volume-38-number-2.
A copy of the April 2013 ;login: paper can also be downloaded
from: www.mckusick.com/publications/faster_fsck.pdf.

Reviewed by: kib
Tested by:   Peter Holm
MFC after:   4 weeks
2013-03-22 21:45:28 +00:00
Gleb Smirnoff
209dddb90e Remove __FreeBSD_version ifdefs. 2013-03-22 20:44:16 +00:00
Pawel Jakub Dawidek
051a23d4e8 - Constify local path variable for chflagsat().
- Use correct format characters (%lx) for u_long.

This fixes the build broken in r248599.
2013-03-22 07:40:34 +00:00
Kevin Lo
b3dcd51dde Clean up some unused leftover code.
Pointed out by:	ae
2013-03-22 01:45:54 +00:00
Kevin Lo
dda95c6e59 Remove unused global variables.
Reviewed by:	ae, glebius
2013-03-22 01:40:17 +00:00
Steven Hartland
def84b9736 Fix for building libzpool under i386.
Reviewed by:	pjd (mentor)
Approved by:	pjd (mentor)
MFC after:	2 weeks
2013-03-21 23:06:11 +00:00
Pawel Jakub Dawidek
5d46382415 Regenerate after r248599.
Sponsored by:	The FreeBSD Foundation
2013-03-21 23:02:19 +00:00
Pawel Jakub Dawidek
e948704e4b Implement chflagsat(2) system call, similar to fchmodat(2), but operates on
file flags.

Reviewed by:	kib, jilles
Sponsored by:	The FreeBSD Foundation
2013-03-21 22:59:01 +00:00
Pawel Jakub Dawidek
14cd1ffdf8 Regenerate after r248597.
Sponsored by:	The FreeBSD Foundation
2013-03-21 22:47:03 +00:00
Pawel Jakub Dawidek
b4b2596b97 - Make 'flags' argument to chflags(2), fchflags(2) and lchflags(2) of type
u_long. Before this change it was of type int for syscalls, but prototypes
  in sys/stat.h and documentation for chflags(2) and fchflags(2) (but not
  for lchflags(2)) stated that it was u_long. Now some related functions
  use u_long type for flags (strtofflags(3), fflagstostr(3)).
- Make path argument of type 'const char *' for consistency.

Discussed on:	arch
Sponsored by:	The FreeBSD Foundation
2013-03-21 22:44:33 +00:00
Konstantin Belousov
e808788c05 Correct the page count when excess length is trimmed from the bio.
Reported and tested by:	Ivan Klymenko <fidaj@ukr.net
2013-03-21 22:36:43 +00:00
Jilles Tjoelker
46f10cc265 Allow O_CLOEXEC in posix_openpt() flags.
PR:		kern/162374
Reviewed by:	ed
2013-03-21 21:39:15 +00:00
Attilio Rao
d52d7aa871 Fix a bug in UMTX_PROFILING:
UMTX_PROFILING should really analyze the distribution of locks as they
index entries in the umtxq_chains hash-table.
However, the current implementation does add/dec the length counters
for *every* thread insert/removal, measuring at all really userland
contention and not the hash distribution.

Fix this by correctly add/dec the length counters in the points where
it is really needed.

Please note that this bug brought us questioning in the past the quality
of the umtx hash table distribution.
To date with all the benchmarks I could try I was not able to reproduce
any issue about the hash distribution on umtx.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	jeff, davide
MFC after:	2 weeks
2013-03-21 19:58:25 +00:00
Alexander Motin
359b47db97 Minimal timer period of 100us introduced in r244758 is overkill. While
original 2us are indeed not enough, 3us are working quite well on my tests.
To be more safe set minimal period to 5us and to be even more safe replicate
here from HPET mechanism of rereading counter after programming comparator.

This change allows to handle 30K of short nanosleep() calls per second on
Raspberry Pi instead of just 8K before.

Discussed with:	gonzo
2013-03-21 15:42:41 +00:00
John Baldwin
d071a6fa33 Another NFS SIGSTOP related fix: Ignore thread suspend requests due to
SIGSTOP if stop signals are currently deferred.  This can occur if a
process is stopped via SIGSTOP while a thread is running or runnable
but before it has set TDF_SBDRY.

Tested by:	pho
Reviewed by:	kib
MFC after:	1 week
2013-03-21 14:06:27 +00:00
Konstantin Belousov
c46262f810 Fix twa(4) after the r246713. The driver copies data around to
satisfy some alignment restrictions.  Do not set TW_OSLI_REQ_FLAGS_CCB
flag for mapped data, pass the csio->data_ptr in the req->data.

Do not put the ccb pointer into req->data ever, ccb is stored in
req->orig_req already.

Submitted by:	Shuichi KITAGUCHI <ki@hh.iij4u.or.jp>
PR:	kern/177020
2013-03-21 13:06:28 +00:00
Konstantin Belousov
4d569af96c Initialize the variable to avoid (false) compiler warning about
use of an uninitialized local.

Reported by:	Ivan Klymenko <fidaj@ukr.net>
MFC after:	2 weeks
2013-03-21 12:59:24 +00:00
Steven Hartland
2b114ad2a4 Add missing descriptions for ZFS sysctls
Reviewed by:	pjd (mentor)
Approved by:	pjd (mentor)
MFC after:	2 weeks
2013-03-21 11:25:21 +00:00
Steven Hartland
adea827b21 Optimisation of TRIM processing.
Previously TRIM processing was very bursty. This was made worse by the fact
that TRIM requests on SSD's are typically much slower than reads or writes.
This often resulted in stalls while large numbers of TRIM's where processed.

In addition due to the way the TRIM thread was only woken by writes, deletes
could stall in the queue for extensive periods of time.

This patch adds a number of controls to how often the TRIM thread for each
SPA processes its outstanding delete requests.
vfs.zfs.trim.timeout: Delay TRIMs by up to this many seconds
vfs.zfs.trim.txg_delay: Delay TRIMs by up to this many TXGs (reduced to 32)
vfs.zfs.vdev.trim_max_bytes: Maximum pending TRIM bytes for a vdev
vfs.zfs.vdev.trim_max_pending: Maximum pending TRIM segments for a vdev
vfs.zfs.trim.max_interval: Maximum interval between TRIM queue processing
(seconds)

Given the most common TRIM implementation is ATA TRIM the current defaults
are targeted at that.

Reviewed by:	pjd (mentor)
Approved by:	pjd (mentor)
MFC after:	2 weeks
2013-03-21 11:02:08 +00:00
Steven Hartland
6ad46cec23 Names the ZFS TRIM thread
Reviewed by:	pjd (mentor)
Approved by:	pjd (mentor)
MFC after:	2 weeks
2013-03-21 10:41:30 +00:00
Steven Hartland
89e5b43079 TRIM cache devices based on time instead of TXGs.
Currently, the trim module uses the same algorithm for data and cache
devices when deciding to issue TRIM requests, based on how far in the
past the TXG is.

Unfortunately, this is not ideal for cache devices, because the L2ARC
doesn't use the concept of TXGs at all. In fact, when using a pool for
reading only, the L2ARC is written but the TXG counter doesn't
increase, and so no new TRIM requests are issued to the cache device.

This patch fixes the issue by using time instead of the TXG number as
the criteria for trimming on cache devices. The basic delay principle
stays the same, but parameters are expressed in seconds instead of
TXGs. The new parameters are named trim_l2arc_limit and
trim_l2arc_batch, and both default to 30 second.

Reviewed by:	pjd (mentor)
Approved by:	pjd (mentor)
Obtained from:	17122c31ac
MFC after:	2 weeks
2013-03-21 10:29:05 +00:00
Steven Hartland
78ad0c1c80 Improve TXG handling in the TRIM module.
This patch adds some improvements to the way the trim module considers
TXGs:

 - Free ZIOs are registered with the TXG from the ZIO itself, not the
   current SPA syncing TXG (which may be out of date);
 - L2ARC are registered with a zero TXG number, as L2ARC has no concept
   of TXGs;
 - The TXG limit for issuing TRIMs is now computed from the last synced
   TXG, not the currently syncing TXG. Indeed, under extremely unlikely
   race conditions, there is a risk we could trim blocks which have been
   freed in a TXG that has not finished syncing, resulting in potential
   data corruption in case of a crash.

Reviewed by:	pjd (mentor)
Approved by:	pjd (mentor)
Obtained from:	5b46ad40d9
MFC after:	2 weeks
2013-03-21 10:16:10 +00:00
Steven Hartland
e07e3a3792 Don't register repair writes in the trim map.
The trim map inflight writes tree assumes non-conflicting writes, i.e.
that there will never be two simultaneous write I/Os to the same range
on the same vdev. This seemed like a sane assumption; however, in
actual testing, it appears that repair I/Os can very well conflict
with "normal" writes.

I'm not quite sure if these conflicting writes are supposed to happen
or not, but in the mean time, let's ignore repair writes for now. This
should be safe considering that, by definition, we never repair blocks
that are freed.

Reviewed by:	pjd (mentor)
Approved by:	pjd (mentor)
Obtained from:	Source: 6a3cebaf7c
2013-03-21 10:02:32 +00:00
Steven Hartland
e05aad2d33 Add TRIM support for L2ARC.
This adds TRIM support to cache vdevs. When ARC buffers are removed
from the L2ARC in arc_hdr_destroy(), arc_release() or l2arc_evict(),
the size previously occupied by the buffer gets scheduled for TRIMming.
As always, actual TRIMs are only issued to the L2ARC after
txg_trim_limit.

Reviewed by:	pjd (mentor)
Approved by:	pjd (mentor)
Obtained from:	31aae37399
MFC after:	2 weeks
2013-03-21 09:34:41 +00:00
Martin Matuska
05f49d92ef Merge libzfs_core branch:
includes MFV 238590, 238592, 247580

MFV 238590, 238592:
  In the first zfs ioctl restructuring phase, the libzfs_core library was
  introduced. It is a new thin library that wraps around kernel ioctl's.
  The idea is to provide a forward-compatible way of dealing with new
  features. Arguments are passed in nvlists and not random zfs_cmd fields,
  new-style ioctls are logged to pool history using a new method of
  history logging.

  http://blog.delphix.com/matt/2012/01/17/the-future-of-libzfs/

MFV 247580 [1]:
  To address issues of several deadlocks and race conditions the locking
  code around dsl_dataset was rewritten and the interface to synctasks
  was changed.

User-Visible Changes:
  "zfs snapshot" can create more arbitrary snapshots at once (atomically)
  "zfs destroy" destroys multiple snapshots at once
  "zfs recv" has improved performance

Backward Compatibility:
  I have extended the compatibility layer to support full backward
  compatibility by remapping or rewriting the responsible ioctl arguments.
  Old utilities are fully supported by the new kernel module.

Forward Compatibility:
  New utilities work with old kernels with the following restrictions:
    - creating, destroying, holding and releasing of multiple snapshots
      at once is not supported, this includes recursive (-r) commands

Illumos ZFS issues:
  2882 implement libzfs_core
  2900 "zfs snapshot" should be able to create multiple,
       arbitrary snapshots at once
  3464 zfs synctask code needs restructuring

References:
  https://www.illumos.org/issues/2882
  https://www.illumos.org/issues/2900
  https://www.illumos.org/issues/3464 [1]

MFC after:	1 month
Sponsored by:	Hybrid Logic Inc. [1]
2013-03-21 08:38:03 +00:00
Gleb Smirnoff
5aedfa32a4 Add NGM_NAT_LIBALIAS_INFO command, that reports internal stats
of libalias instance. To be used in the mpd5 daemon.

Submitted by:	Dmitry Luhtionov <dmitryluhtionov gmail.com>
2013-03-21 08:36:15 +00:00
Konstantin Belousov
7db07e1c85 Only size and create the bio_transient_map when unmapped buffers are
enabled.  Now, disabling the unmapped buffers should result in the
kernel memory map identical to pre-r248550.

Sponsored by:	The FreeBSD Foundation
2013-03-21 07:28:15 +00:00
Konstantin Belousov
6c83fce371 Assert that transient mapping of the bio is only done when unmapped
buffers are allowed.

Sponsored by:	The FreeBSD Foundation
2013-03-21 07:26:33 +00:00
Konstantin Belousov
7157d8f7ab Do not call vnode_pager_setsize() while a NFS node mutex is
locked. vnode_pager_setsize() might sleep waiting for the page after
EOF be unbusied.

Call vnode_pager_setsize() both for the regular and directory vnodes.

Reported by:	mich
Reviewed by:	rmacklem
Discussed with:	avg, jhb
MFC after:	2 weeks
2013-03-21 07:25:08 +00:00
Hans Petter Selasky
3232aae327 Add new USB ID.
PR:		usb/177173
MFC after:	1 week
2013-03-21 07:04:17 +00:00
Konstantin Belousov
e3269b5096 In bufwrite(), a dirty buffer is moved to the clean queue before the
bufobj counter of the writes in progress is incremented.  Other thread
inspecting the bufobj would consider it clean.

For the regular vnodes, the vnode lock is typically held both by the
thread performing the bufwrite() and an other thread doing syncing,
which prevents the situation.  On the other hand, writes to the VCHR
vnodes are done without holding vnode lock.

Increment the write ref counter for the buffer object before calling
bundirty().

Sponsored by:	The FreeBSD Foundation
Tested by:	pho
MFC after:	2 weeks
2013-03-20 21:08:00 +00:00
Konstantin Belousov
8d6884ce9c When the journaled FFS volume is suspended due to the journal space
becoming too low, the softdep flush thread processes the workitems,
which frees the space in journal, and then unsuspends the fs.  The
softdep_flush() and other workitem processing functions busy the
filesystem before iterating over the worklist, to prevent the parallel
unmount from freeing the mount data. The vfs_busy() is called with
MBF_NOWAIT flag.

Now, if the unmount is already started and the filesystem is suspended
due to low journal space, the journal is never flushed and filesystem
is never unsuspended, because vfs_busy(MBF_NOWAIT) call cannot succeed
for the unmounting fs, and softdep_flush() does not process the
workitems. Unmount needs to write metadata, where it hangs in the
"suspfs" state.

Move the vn_start_write() call in the dounmount() before setting the
MNTK_UNMOUNT flag. This practically ensures that softdep_flush()
processed the pending journal writes by making dounmount() wait for
the lift of the suspension.

Sponsored by:	The FreeBSD Foundation
Reported and tested by:	pho
MFC after:	2 weeks
2013-03-20 21:07:49 +00:00
Kirk McKusick
3289d5877a When renaming a directory from one parent directory to another,
we need to call ufs_checkpath() to walk from our new location to
the root of the filesystem to ensure that we do not encounter
ourselves along the way. Until now, we accomplished this by reading
the ".." entries of each directory in our path until we reached
the root (or encountered an error). This change tries to avoid the
I/O of reading the ".." entries by first looking them up in the
name cache and only doing the I/O when the name cache lookup fails.

Reviewed by: kib
Tested by:   Peter Holm
MFC after:   4 weeks
2013-03-20 17:57:00 +00:00
Aleksandr Rybalko
a2c472e741 Integrate Efika MX project back to home.
Sponsored by:	The FreeBSD Foundation
2013-03-20 15:39:27 +00:00
Hans Petter Selasky
76be9c89ba Fix spelling. 2013-03-20 11:51:26 +00:00
Alexander V. Chernikov
ae01d73c04 Add ipfw support for setting/matching DiffServ codepoints (DSCP).
Setting DSCP support is done via O_SETDSCP which works for both
IPv4 and IPv6 packets. Fast checksum recalculation (RFC 1624) is done for IPv4.
Dscp can be specified by name (AFXY, CSX, BE, EF), by value
(0..63) or via tablearg.

Matching DSCP is done via another opcode (O_DSCP) which accepts several
classes at once (af11,af22,be). Classes are stored in bitmask (2 u32 words).

Many people made their variants of this patch, the ones I'm aware of are
(in alphabetic order):

Dmitrii Tejblum
Marcelo Araujo
Roman Bogorodskiy (novel)
Sergey Matveichuk (sem)
Sergey Ryabin

PR:		kern/102471, kern/121122
MFC after:	2 weeks
2013-03-20 10:35:33 +00:00
Martin Matuska
192d547574 Release hold on pool before calling zvol_create_minor() 2013-03-20 09:56:20 +00:00
Konstantin Belousov
6991ee13a6 Fix the logic inversion in the r248512.
Noted by:	mckay
2013-03-20 09:44:23 +00:00
Adrian Chadd
9cda8c8082 Fix the EDMA CABQ handling - for now, the CABQ takes a descriptor chain
like the legacy chips expect.
2013-03-20 05:44:03 +00:00
Pyun YongHyeon
cf402cc979 For RTL8211B or later PHYs, enable crossover detection and
auto-correction. This change makes re(4) establish a link with
a system using non-crossover UTP cable.

Tested by:	Michael BlackHeart < amdmiek <> gmail dot com >
2013-03-20 05:31:34 +00:00
Adrian Chadd
bd8cbcc32c Add VNET wrappers around the rest of the ieee80211 rtsock messages.
I triggered the cac/radar messages when doing testing in DFS channels.
2013-03-20 02:42:52 +00:00
Martin Matuska
a0abc0d302 Run zvol_create_minors() only if in non-error case 2013-03-19 22:27:15 +00:00
Martin Matuska
e56718d734 Run zvol_create_minors() on snapshot creation 2013-03-19 22:14:50 +00:00
Jilles Tjoelker
c2e3c52e0d Implement SOCK_CLOEXEC, SOCK_NONBLOCK and MSG_CMSG_CLOEXEC.
This change allows creating file descriptors with close-on-exec set in some
situations. SOCK_CLOEXEC and SOCK_NONBLOCK can be OR'ed in socket() and
socketpair()'s type parameter, and MSG_CMSG_CLOEXEC to recvmsg() makes file
descriptors (SCM_RIGHTS) atomically close-on-exec.

The numerical values for SOCK_CLOEXEC and SOCK_NONBLOCK are as in NetBSD.
MSG_CMSG_CLOEXEC is the first free bit for MSG_*.

The SOCK_* flags are not passed to MAC because this may cause incorrect
failures and can be done later via fcntl() anyway. On the other hand, audit
is expected to cope with the new flags.

For MSG_CMSG_CLOEXEC, unp_externalize() is extended to take a flags
argument.

Reviewed by:	kib
2013-03-19 20:58:17 +00:00
Adrian Chadd
f0db652cf6 Break out the RX completion path into "FIFO check / refill" and
"complete RX frames."

The 128 entry RX FIFO is really easy to fill up and miss refilling
when it's done in the ath taskq - as that gets blocked up doing
RX completion, TX completion and other random things.

So the 128 entry RX FIFO now gets emptied and refilled in the ath_intr()
task (and it grabs / releases locks, so now ath_intr() can't just be
a FAST handler yet!) but the locks aren't held for very long. The
completion part is done in the ath taskqueue context.

Details:

* Create a new completed frame list - sc->sc_rx_rxlist;
* Split the EDMA RX process queue into two halves - one that
  processes the RX FIFO and refills it with new frames; another
  that completes the completed frame list;
* When tearing down the driver, flush whatever is in the deferred
  queue as well as what's in the FIFO;
* Create two new RX methods - one that processes all RX queues,
  one that processes the given RX queue.  When MSI is implemented,
  we get told which RX queue the interrupt came in on so we can
  specifically schedule that.  (And I can do that with the non-MSI
  path too; I'll figure that out later.)
* Convert the legacy code over to use these new RX methods;
* Replace all the instances of the RX taskqueue enqueue with a call
  to a relevant RX method to enqueue one or all RX queues.

Tested:

* AR9380, STA
* AR9580, STA
* AR5413, STA
2013-03-19 19:32:28 +00:00
Adrian Chadd
74ea88c379 Add more TODO items. 2013-03-19 17:55:36 +00:00
Adrian Chadd
378a752f59 Now that the tx map field is correctly populated for both edma and
legacy chips, just use that.
2013-03-19 17:54:37 +00:00
Konstantin Belousov
129c6621f7 ahci(4) and siis(4) are ready to process the unmapped i/o requests
Sponsored by:	The FreeBSD Foundation
Tested by:	pho
Submitted by:	bf (siis patch)
2013-03-19 15:09:32 +00:00
Konstantin Belousov
59a01b70af UFS support of the unmapped i/o for the user data buffers.
Sponsored by:	The FreeBSD Foundation
Tested by:	pho, scottl, jhb, bf
2013-03-19 15:08:15 +00:00
Konstantin Belousov
2649fcc1d8 Commit the removal of a whitespace to record the proper commit message
for the r248519:

For the cam-attached HBAs, allow the driver to specify that it accepts
the unmapped bio by the PIM_UNMAPPED flag.  The CAM passes the
CAM_DATA_BIO data transfer type request for the unmapped bio, and the
driver could use the bus_dmamap_load_ccb() as a helper to
transparently handle the ccb.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	scottl
Tested by:	pho, scottl
2013-03-19 15:05:21 +00:00
Konstantin Belousov
abc1e60e0e Support unmapped i/o for the md(4).
The vnode-backed md(4) has to map the unmapped bio because VOP_READ()
and VOP_WRITE() interfaces do not allow to pass unmapped requests to
the filesystem. Vnode-backed md(4) uses pbufs instead of relying on
the bio_transient_map, to avoid usual md deadlock.

Sponsored by:	The FreeBSD Foundation
Tested by:	pho, scottl
2013-03-19 15:01:50 +00:00
Konstantin Belousov
59ec9023ca Support unmapped i/o for the md(4).
The vnode-backed md(4) has to map the unmapped bio because VOP_READ()
and VOP_WRITE() interfaces do not allow to pass unmapped requests to
the filesystem. Vnode-backed md(4) uses pbufs instead of relying on
the bio_transient_map, to avoid usual md deadlock.

Sponsored by:	The FreeBSD Foundation
Tested by:	pho, scottl
2013-03-19 14:53:23 +00:00
Konstantin Belousov
db7bfaa8ce The geom_part provider supports unmapped bio iff the underlying
provider does so, since geom_part never inspects the bio_data.

Sponsored by:	The FreeBSD Foundation
Tested by:	pho
2013-03-19 14:50:24 +00:00
Konstantin Belousov
f8c19ba466 A flag for the geom disk driver to indicate that it accepts the
unmapped i/o requests.

Sponsored by:	The FreeBSD Foundation
Tested by:	pho
2013-03-19 14:49:15 +00:00
Konstantin Belousov
e81ff91e62 Do not remap usermode pages into KVA for physio.
Sponsored by:	The FreeBSD Foundation
Tested by:	pho
2013-03-19 14:43:57 +00:00
Konstantin Belousov
2cc718a11c Do not map the swap i/o pbufs if the geom provider for the swap
partition accepts unmapped requests.

Sponsored by:	The FreeBSD Foundation
Tested by:	pho
2013-03-19 14:39:27 +00:00
Konstantin Belousov
6ce697dc73 Pass unmapped buffers for page in requests if the filesystem indicated support
for the unmapped i/o.

Sponsored by:	The FreeBSD Foundation
Tested by:	pho
2013-03-19 14:36:28 +00:00
Konstantin Belousov
f8c09530bd A flag for the filesystem to indicate to the upper levels that it accepts
unmapped buffers for the VOP_STRATEGY().

Sponsored by:	The FreeBSD Foundation
Tested by:	pho
2013-03-19 14:33:01 +00:00
Konstantin Belousov
7d5365c70b Add a helper function vfs_bio_bzero_buf() to zero the portion of the
buffer, transparently handling mapped or unmapped buffers.  Its intent
is to replace the use of bzero(bp->b_data) in cases where the buffer
might be unmapped, to avoid unneeded upgrades.

Sponsored by:	The FreeBSD Foundation
Tested by:	pho
2013-03-19 14:27:14 +00:00
Aleksandr Rybalko
5ac9d9890f Return "start" and "end" to u_long world. Because rman handle addresses as
u_long too.

Discussed with:	ian@
Pointy hat to:	ray@
2013-03-19 14:15:41 +00:00
Konstantin Belousov
ee75e7de7b Implement the concept of the unmapped VMIO buffers, i.e. buffers which
do not map the b_pages pages into buffer_map KVA.  The use of the
unmapped buffers eliminate the need to perform TLB shootdown for
mapping on the buffer creation and reuse, greatly reducing the amount
of IPIs for shootdown on big-SMP machines and eliminating up to 25-30%
of the system time on i/o intensive workloads.

The unmapped buffer should be explicitely requested by the GB_UNMAPPED
flag by the consumer.  For unmapped buffer, no KVA reservation is
performed at all. The consumer might request unmapped buffer which
does have a KVA reserve, to manually map it without recursing into
buffer cache and blocking, with the GB_KVAALLOC flag.

When the mapped buffer is requested and unmapped buffer already
exists, the cache performs an upgrade, possibly reusing the KVA
reservation.

Unmapped buffer is translated into unmapped bio in g_vfs_strategy().
Unmapped bio carry a pointer to the vm_page_t array, offset and length
instead of the data pointer.  The provider which processes the bio
should explicitely specify a readiness to accept unmapped bio,
otherwise g_down geom thread performs the transient upgrade of the bio
request by mapping the pages into the new bio_transient_map KVA
submap.

The bio_transient_map submap claims up to 10% of the buffer map, and
the total buffer_map + bio_transient_map KVA usage stays the
same. Still, it could be manually tuned by kern.bio_transient_maxcnt
tunable, in the units of the transient mappings.  Eventually, the
bio_transient_map could be removed after all geom classes and drivers
can accept unmapped i/o requests.

Unmapped support can be turned off by the vfs.unmapped_buf_allowed
tunable, disabling which makes the buffer (or cluster) creation
requests to ignore GB_UNMAPPED and GB_KVAALLOC flags.  Unmapped
buffers are only enabled by default on the architectures where
pmap_copy_page() was implemented and tested.

In the rework, filesystem metadata is not the subject to maxbufspace
limit anymore. Since the metadata buffers are always mapped, the
buffers still have to fit into the buffer map, which provides a
reasonable (but practically unreachable) upper bound on it. The
non-metadata buffer allocations, both mapped and unmapped, is
accounted against maxbufspace, as before. Effectively, this means that
the maxbufspace is forced on mapped and unmapped buffers separately.
The pre-patch bufspace limiting code did not worked, because
buffer_map fragmentation does not allow the limit to be reached.

By Jeff Roberson request, the getnewbuf() function was split into
smaller single-purpose functions.

Sponsored by:	The FreeBSD Foundation
Discussed with:	jeff (previous version)
Tested by:	pho, scottl (previous version), jhb, bf
MFC after:	2 weeks
2013-03-19 14:13:12 +00:00
Konstantin Belousov
36a6d2ebc4 Add a convenience macro bread_gb() to wrap a call to
breadn_flags(). Comparing with bread(), it adds an argument to pass
the flags to getblk().

Sponsored by:	The FreeBSD Foundation
Tested by:	pho
MFC after:	2 weeks
2013-03-19 13:21:39 +00:00
Aleksandr Rybalko
ee52bd5713 Cast "start" to u_long. Temporary fix to unbreak tinderbox.
We need here max possible storage or dynamic, depend on size of address cell.
2013-03-19 13:13:26 +00:00
Konstantin Belousov
b4862fafd5 Assert that a ccb passed to cam_periph_mapmem() for XPT_SCSI_IO and
XPT_ATA_IO holds virtual buffer address.

Sponsored by:	The FreeBSD Foundation
Tested by:	pho
2013-03-19 13:10:14 +00:00
Ed Maste
96ecfd9813 Fix remainder calculation when biosize is not a power of 2
In common configurations biosize is a power of two, but is not required to
be so.  Thanks to markj@ for spotting an additional case beyond my original
patch.

Reviewed by: rmacklem@
2013-03-19 13:06:11 +00:00
Hans Petter Selasky
565d8205f3 Add new USB ID.
PR:		usb/177105
MFC after:	1 week
2013-03-19 12:52:13 +00:00
Martin Matuska
07091d8f14 MFV r247580:
Merge synctask code restructuring from vendor.

Modify forward and backward compatibility to support new change.

Illumos ZFS issues:
  3464 zfs synctask code needs restructuring

Sponsored by:	Hybrid Logic Ltd.
2013-03-19 12:51:18 +00:00
Martin Matuska
520268fb97 MFC @248493 2013-03-19 11:09:15 +00:00
Martin Matuska
87a5cb4650 Plug memory leak in dsl_check_snap_cb()
This was unnoticed because the function is very rarely used.

MFC after:	3 days
2013-03-19 07:47:51 +00:00
Andrey V. Elsukov
93bb4f9ed5 Separate the locking macros that are used in the packet flow path
from others. This helps easy switch to use pfil(4) lock.
2013-03-19 06:04:17 +00:00
Andrey V. Elsukov
5474386bd3 Fix style and comments. 2013-03-19 05:51:47 +00:00
Justin Hibbits
ec86c487a6 Fix the powerpc64 build. MACHINE_CPUARCH is common for powerpc/powerpc64,
not MACHINE_ARCH.
2013-03-19 00:39:02 +00:00
Aleksandr Rybalko
36581e4785 Don't hesitate to ask parent to setup IRQ finally.
Sponsored by:	The FreeBSD Foundation
2013-03-18 23:51:39 +00:00
Aleksandr Rybalko
bf9d6206b0 Allow simplebus to attach to another simplebus.
Sponsored by:	The FreeBSD Foundation
2013-03-18 23:41:19 +00:00
Aleksandr Rybalko
089dfb09f1 Hide "no default resources for" warning under bootverbose. It's ok to use
optional resources.

Sponsored by:	The FreeBSD Foundation
2013-03-18 23:38:15 +00:00
Aleksandr Rybalko
2737a5a925 Allow simplebus to attach in less strict way, when "simple-bus" listed on not
first position of compatible property, so simplebus driver can be generic
driver for any bus listed as compatible with "simple-bus".

Sponsored by:	The FreeBSD Foundation
2013-03-18 23:35:01 +00:00
Jung-uk Kim
78260bb30a List TrackPoint device before generic model. 2013-03-18 23:31:22 +00:00
Jung-uk Kim
569d8f7e27 Add preliminary support for IBM/Lenovo TrackPoint.
PR:		kern/147237 (based on the initial patch for 8.x)
Tested by:	glebius (device detection and suspend/resume)
MFC after:	1 month
2013-03-18 23:22:47 +00:00
Martin Matuska
a602517b63 Add missing zvol_create_mirrors() on zfs_ioc_create() 2013-03-18 20:22:40 +00:00
Ryan Stone
3aff0961dd Correct the definition for Exar XR17V258IV: we must use a config_function
to specify the offset into the PCI memory spare at which each serial port
will find its registers.  This was already done for other Exar PCI serial
devices; it was accidentally omitted for this specific device.

Sponsored by:	Sandvine Incorporated
MFC after:	1 week
2013-03-18 19:22:51 +00:00
John Baldwin
1968f37bc9 Tweak some comments. 2013-03-18 18:04:09 +00:00
John Baldwin
3cf3b9f097 Partially revert r195702. Deferring stops is now implemented via a set of
calls to toggle TDF_SBDRY rather than passing PBDRY to individual sleep
calls.
- Remove the stop_allowed parameters from cursig() and issignal().
  issignal() checks TDF_SBDRY directly.
- Remove the PBDRY and SLEEPQ_STOP_ON_BDRY flags.
2013-03-18 17:23:58 +00:00
Aleksandr Rybalko
4117c1db9e o Switch to use physical addresses in rman for FDT.
o Remove vtophys used to translate virtual address to physical in case rman carry virtual.

Sponsored by:	The FreeBSD Foundation
2013-03-18 15:18:55 +00:00
Martin Matuska
876a84e867 MFC @248461 2013-03-18 09:39:51 +00:00
Martin Matuska
6f4accc2de Move common zfs ioctl compatibility functions (userland) into libzfs_compat.c
Introduce additional constants for zfs ioctl versions
2013-03-18 09:32:29 +00:00
Hans Petter Selasky
bd247e9ddd Add new USB ID.
PR:		usb/177013
MFC after:	1 week
2013-03-18 07:02:58 +00:00
Justin Hibbits
80a5635c8b Add FBT for PowerPC DTrace. Also, clean up the DTrace assembly code,
much of which is not necessary for PowerPC.

The FBT module can likely be factored into 3 separate files: common,
intel, and powerpc, rather than duplicating most of the code between
the x86 and PowerPC flavors.

All DTrace modules for PowerPC will be MFC'd together once Fasttrap is
completed.
2013-03-18 05:30:18 +00:00
Pyun YongHyeon
e8bedbd24a r119712 introduced SIS_TYPE_83816 but it was not actually set in
driver such that checking against the type was always false.
To detect NS DP83816, driver should have checked silicon revision
register for NS controllers. While here, remove SIS_TYPE_83816 to
not make the similar mistake again.

Reported by:	Brad Smith ( brad@openbsd )
2013-03-18 04:46:17 +00:00
Adrian Chadd
1ab002f461 Print out the current fifo queue depth correctly - not just the max
queue depth.

Silly hat to me.
2013-03-18 02:29:57 +00:00
Adrian Chadd
eefc93a947 Dump out information about the RX descriptor free list and FIFO information. 2013-03-18 01:12:36 +00:00
Adrian Chadd
d50e882ab9 Log some more information when the RX buffer allocation failed. 2013-03-18 01:11:52 +00:00
Attilio Rao
774d251d99 Sync back vmcontention branch into HEAD:
Replace the per-object resident and cached pages splay tree with a
path-compressed multi-digit radix trie.
Along with this, switch also the x86-specific handling of idle page
tables to using the radix trie.

This change is supposed to do the following:
- Allowing the acquisition of read locking for lookup operations of the
  resident/cached pages collections as the per-vm_page_t splay iterators
  are now removed.
- Increase the scalability of the operations on the page collections.

The radix trie does rely on the consumers locking to ensure atomicity of
its operations.  In order to avoid deadlocks the bisection nodes are
pre-allocated in the UMA zone.  This can be done safely because the
algorithm needs at maximum one new node per insert which means the
maximum number of the desired nodes is the number of available physical
frames themselves.  However, not all the times a new bisection node is
really needed.

The radix trie implements path-compression because UFS indirect blocks
can lead to several objects with a very sparse trie, increasing the number
of levels to usually scan.  It also helps in the nodes pre-fetching by
introducing the single node per-insert property.

This code is not generalized (yet) because of the possible loss of
performance by having much of the sizes in play configurable.
However, efforts to make this code more general and then reusable in
further different consumers might be really done.

The only KPI change is the removal of the function vm_page_splay() which
is now reaped.
The only KBI change, instead, is the removal of the left/right iterators
from struct vm_page, which are now reaped.

Further technical notes broken into mealpieces can be retrieved from the
svn branch:
http://svn.freebsd.org/base/user/attilio/vmcontention/

Sponsored by:	EMC / Isilon storage division
In collaboration with:	alc, jeff
Tested by:	flo, pho, jhb, davide
Tested by:	ian (arm)
Tested by:	andreast (powerpc)
2013-03-18 00:25:02 +00:00
Martin Matuska
af2e40ccd1 Merge libzfs_core part of r239388
Illumos ZFS issues:
  3085 zfs diff panics, then panics in a loop on booting

References:
  https://www.illumos.org/issues/3085
2013-03-17 18:49:11 +00:00
Martin Matuska
70b0720877 Fix accidentially changed ioc variable for old v15 compatibility 2013-03-17 17:28:06 +00:00