Commit Graph

92424 Commits

Author SHA1 Message Date
Jilles Tjoelker
c2e3c52e0d Implement SOCK_CLOEXEC, SOCK_NONBLOCK and MSG_CMSG_CLOEXEC.
This change allows creating file descriptors with close-on-exec set in some
situations. SOCK_CLOEXEC and SOCK_NONBLOCK can be OR'ed in socket() and
socketpair()'s type parameter, and MSG_CMSG_CLOEXEC to recvmsg() makes file
descriptors (SCM_RIGHTS) atomically close-on-exec.

The numerical values for SOCK_CLOEXEC and SOCK_NONBLOCK are as in NetBSD.
MSG_CMSG_CLOEXEC is the first free bit for MSG_*.

The SOCK_* flags are not passed to MAC because this may cause incorrect
failures and can be done later via fcntl() anyway. On the other hand, audit
is expected to cope with the new flags.

For MSG_CMSG_CLOEXEC, unp_externalize() is extended to take a flags
argument.

Reviewed by:	kib
2013-03-19 20:58:17 +00:00
Adrian Chadd
f0db652cf6 Break out the RX completion path into "FIFO check / refill" and
"complete RX frames."

The 128 entry RX FIFO is really easy to fill up and miss refilling
when it's done in the ath taskq - as that gets blocked up doing
RX completion, TX completion and other random things.

So the 128 entry RX FIFO now gets emptied and refilled in the ath_intr()
task (and it grabs / releases locks, so now ath_intr() can't just be
a FAST handler yet!) but the locks aren't held for very long. The
completion part is done in the ath taskqueue context.

Details:

* Create a new completed frame list - sc->sc_rx_rxlist;
* Split the EDMA RX process queue into two halves - one that
  processes the RX FIFO and refills it with new frames; another
  that completes the completed frame list;
* When tearing down the driver, flush whatever is in the deferred
  queue as well as what's in the FIFO;
* Create two new RX methods - one that processes all RX queues,
  one that processes the given RX queue.  When MSI is implemented,
  we get told which RX queue the interrupt came in on so we can
  specifically schedule that.  (And I can do that with the non-MSI
  path too; I'll figure that out later.)
* Convert the legacy code over to use these new RX methods;
* Replace all the instances of the RX taskqueue enqueue with a call
  to a relevant RX method to enqueue one or all RX queues.

Tested:

* AR9380, STA
* AR9580, STA
* AR5413, STA
2013-03-19 19:32:28 +00:00
Adrian Chadd
74ea88c379 Add more TODO items. 2013-03-19 17:55:36 +00:00
Adrian Chadd
378a752f59 Now that the tx map field is correctly populated for both edma and
legacy chips, just use that.
2013-03-19 17:54:37 +00:00
Konstantin Belousov
129c6621f7 ahci(4) and siis(4) are ready to process the unmapped i/o requests
Sponsored by:	The FreeBSD Foundation
Tested by:	pho
Submitted by:	bf (siis patch)
2013-03-19 15:09:32 +00:00
Konstantin Belousov
59a01b70af UFS support of the unmapped i/o for the user data buffers.
Sponsored by:	The FreeBSD Foundation
Tested by:	pho, scottl, jhb, bf
2013-03-19 15:08:15 +00:00
Konstantin Belousov
2649fcc1d8 Commit the removal of a whitespace to record the proper commit message
for the r248519:

For the cam-attached HBAs, allow the driver to specify that it accepts
the unmapped bio by the PIM_UNMAPPED flag.  The CAM passes the
CAM_DATA_BIO data transfer type request for the unmapped bio, and the
driver could use the bus_dmamap_load_ccb() as a helper to
transparently handle the ccb.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	scottl
Tested by:	pho, scottl
2013-03-19 15:05:21 +00:00
Konstantin Belousov
abc1e60e0e Support unmapped i/o for the md(4).
The vnode-backed md(4) has to map the unmapped bio because VOP_READ()
and VOP_WRITE() interfaces do not allow to pass unmapped requests to
the filesystem. Vnode-backed md(4) uses pbufs instead of relying on
the bio_transient_map, to avoid usual md deadlock.

Sponsored by:	The FreeBSD Foundation
Tested by:	pho, scottl
2013-03-19 15:01:50 +00:00
Konstantin Belousov
59ec9023ca Support unmapped i/o for the md(4).
The vnode-backed md(4) has to map the unmapped bio because VOP_READ()
and VOP_WRITE() interfaces do not allow to pass unmapped requests to
the filesystem. Vnode-backed md(4) uses pbufs instead of relying on
the bio_transient_map, to avoid usual md deadlock.

Sponsored by:	The FreeBSD Foundation
Tested by:	pho, scottl
2013-03-19 14:53:23 +00:00
Konstantin Belousov
db7bfaa8ce The geom_part provider supports unmapped bio iff the underlying
provider does so, since geom_part never inspects the bio_data.

Sponsored by:	The FreeBSD Foundation
Tested by:	pho
2013-03-19 14:50:24 +00:00
Konstantin Belousov
f8c19ba466 A flag for the geom disk driver to indicate that it accepts the
unmapped i/o requests.

Sponsored by:	The FreeBSD Foundation
Tested by:	pho
2013-03-19 14:49:15 +00:00
Konstantin Belousov
e81ff91e62 Do not remap usermode pages into KVA for physio.
Sponsored by:	The FreeBSD Foundation
Tested by:	pho
2013-03-19 14:43:57 +00:00
Konstantin Belousov
2cc718a11c Do not map the swap i/o pbufs if the geom provider for the swap
partition accepts unmapped requests.

Sponsored by:	The FreeBSD Foundation
Tested by:	pho
2013-03-19 14:39:27 +00:00
Konstantin Belousov
6ce697dc73 Pass unmapped buffers for page in requests if the filesystem indicated support
for the unmapped i/o.

Sponsored by:	The FreeBSD Foundation
Tested by:	pho
2013-03-19 14:36:28 +00:00
Konstantin Belousov
f8c09530bd A flag for the filesystem to indicate to the upper levels that it accepts
unmapped buffers for the VOP_STRATEGY().

Sponsored by:	The FreeBSD Foundation
Tested by:	pho
2013-03-19 14:33:01 +00:00
Konstantin Belousov
7d5365c70b Add a helper function vfs_bio_bzero_buf() to zero the portion of the
buffer, transparently handling mapped or unmapped buffers.  Its intent
is to replace the use of bzero(bp->b_data) in cases where the buffer
might be unmapped, to avoid unneeded upgrades.

Sponsored by:	The FreeBSD Foundation
Tested by:	pho
2013-03-19 14:27:14 +00:00
Aleksandr Rybalko
5ac9d9890f Return "start" and "end" to u_long world. Because rman handle addresses as
u_long too.

Discussed with:	ian@
Pointy hat to:	ray@
2013-03-19 14:15:41 +00:00
Konstantin Belousov
ee75e7de7b Implement the concept of the unmapped VMIO buffers, i.e. buffers which
do not map the b_pages pages into buffer_map KVA.  The use of the
unmapped buffers eliminate the need to perform TLB shootdown for
mapping on the buffer creation and reuse, greatly reducing the amount
of IPIs for shootdown on big-SMP machines and eliminating up to 25-30%
of the system time on i/o intensive workloads.

The unmapped buffer should be explicitely requested by the GB_UNMAPPED
flag by the consumer.  For unmapped buffer, no KVA reservation is
performed at all. The consumer might request unmapped buffer which
does have a KVA reserve, to manually map it without recursing into
buffer cache and blocking, with the GB_KVAALLOC flag.

When the mapped buffer is requested and unmapped buffer already
exists, the cache performs an upgrade, possibly reusing the KVA
reservation.

Unmapped buffer is translated into unmapped bio in g_vfs_strategy().
Unmapped bio carry a pointer to the vm_page_t array, offset and length
instead of the data pointer.  The provider which processes the bio
should explicitely specify a readiness to accept unmapped bio,
otherwise g_down geom thread performs the transient upgrade of the bio
request by mapping the pages into the new bio_transient_map KVA
submap.

The bio_transient_map submap claims up to 10% of the buffer map, and
the total buffer_map + bio_transient_map KVA usage stays the
same. Still, it could be manually tuned by kern.bio_transient_maxcnt
tunable, in the units of the transient mappings.  Eventually, the
bio_transient_map could be removed after all geom classes and drivers
can accept unmapped i/o requests.

Unmapped support can be turned off by the vfs.unmapped_buf_allowed
tunable, disabling which makes the buffer (or cluster) creation
requests to ignore GB_UNMAPPED and GB_KVAALLOC flags.  Unmapped
buffers are only enabled by default on the architectures where
pmap_copy_page() was implemented and tested.

In the rework, filesystem metadata is not the subject to maxbufspace
limit anymore. Since the metadata buffers are always mapped, the
buffers still have to fit into the buffer map, which provides a
reasonable (but practically unreachable) upper bound on it. The
non-metadata buffer allocations, both mapped and unmapped, is
accounted against maxbufspace, as before. Effectively, this means that
the maxbufspace is forced on mapped and unmapped buffers separately.
The pre-patch bufspace limiting code did not worked, because
buffer_map fragmentation does not allow the limit to be reached.

By Jeff Roberson request, the getnewbuf() function was split into
smaller single-purpose functions.

Sponsored by:	The FreeBSD Foundation
Discussed with:	jeff (previous version)
Tested by:	pho, scottl (previous version), jhb, bf
MFC after:	2 weeks
2013-03-19 14:13:12 +00:00
Konstantin Belousov
36a6d2ebc4 Add a convenience macro bread_gb() to wrap a call to
breadn_flags(). Comparing with bread(), it adds an argument to pass
the flags to getblk().

Sponsored by:	The FreeBSD Foundation
Tested by:	pho
MFC after:	2 weeks
2013-03-19 13:21:39 +00:00
Aleksandr Rybalko
ee52bd5713 Cast "start" to u_long. Temporary fix to unbreak tinderbox.
We need here max possible storage or dynamic, depend on size of address cell.
2013-03-19 13:13:26 +00:00
Konstantin Belousov
b4862fafd5 Assert that a ccb passed to cam_periph_mapmem() for XPT_SCSI_IO and
XPT_ATA_IO holds virtual buffer address.

Sponsored by:	The FreeBSD Foundation
Tested by:	pho
2013-03-19 13:10:14 +00:00
Ed Maste
96ecfd9813 Fix remainder calculation when biosize is not a power of 2
In common configurations biosize is a power of two, but is not required to
be so.  Thanks to markj@ for spotting an additional case beyond my original
patch.

Reviewed by: rmacklem@
2013-03-19 13:06:11 +00:00
Hans Petter Selasky
565d8205f3 Add new USB ID.
PR:		usb/177105
MFC after:	1 week
2013-03-19 12:52:13 +00:00
Martin Matuska
87a5cb4650 Plug memory leak in dsl_check_snap_cb()
This was unnoticed because the function is very rarely used.

MFC after:	3 days
2013-03-19 07:47:51 +00:00
Andrey V. Elsukov
93bb4f9ed5 Separate the locking macros that are used in the packet flow path
from others. This helps easy switch to use pfil(4) lock.
2013-03-19 06:04:17 +00:00
Andrey V. Elsukov
5474386bd3 Fix style and comments. 2013-03-19 05:51:47 +00:00
Justin Hibbits
ec86c487a6 Fix the powerpc64 build. MACHINE_CPUARCH is common for powerpc/powerpc64,
not MACHINE_ARCH.
2013-03-19 00:39:02 +00:00
Aleksandr Rybalko
36581e4785 Don't hesitate to ask parent to setup IRQ finally.
Sponsored by:	The FreeBSD Foundation
2013-03-18 23:51:39 +00:00
Aleksandr Rybalko
bf9d6206b0 Allow simplebus to attach to another simplebus.
Sponsored by:	The FreeBSD Foundation
2013-03-18 23:41:19 +00:00
Aleksandr Rybalko
089dfb09f1 Hide "no default resources for" warning under bootverbose. It's ok to use
optional resources.

Sponsored by:	The FreeBSD Foundation
2013-03-18 23:38:15 +00:00
Aleksandr Rybalko
2737a5a925 Allow simplebus to attach in less strict way, when "simple-bus" listed on not
first position of compatible property, so simplebus driver can be generic
driver for any bus listed as compatible with "simple-bus".

Sponsored by:	The FreeBSD Foundation
2013-03-18 23:35:01 +00:00
Jung-uk Kim
78260bb30a List TrackPoint device before generic model. 2013-03-18 23:31:22 +00:00
Jung-uk Kim
569d8f7e27 Add preliminary support for IBM/Lenovo TrackPoint.
PR:		kern/147237 (based on the initial patch for 8.x)
Tested by:	glebius (device detection and suspend/resume)
MFC after:	1 month
2013-03-18 23:22:47 +00:00
Ryan Stone
3aff0961dd Correct the definition for Exar XR17V258IV: we must use a config_function
to specify the offset into the PCI memory spare at which each serial port
will find its registers.  This was already done for other Exar PCI serial
devices; it was accidentally omitted for this specific device.

Sponsored by:	Sandvine Incorporated
MFC after:	1 week
2013-03-18 19:22:51 +00:00
John Baldwin
1968f37bc9 Tweak some comments. 2013-03-18 18:04:09 +00:00
John Baldwin
3cf3b9f097 Partially revert r195702. Deferring stops is now implemented via a set of
calls to toggle TDF_SBDRY rather than passing PBDRY to individual sleep
calls.
- Remove the stop_allowed parameters from cursig() and issignal().
  issignal() checks TDF_SBDRY directly.
- Remove the PBDRY and SLEEPQ_STOP_ON_BDRY flags.
2013-03-18 17:23:58 +00:00
Aleksandr Rybalko
4117c1db9e o Switch to use physical addresses in rman for FDT.
o Remove vtophys used to translate virtual address to physical in case rman carry virtual.

Sponsored by:	The FreeBSD Foundation
2013-03-18 15:18:55 +00:00
Hans Petter Selasky
bd247e9ddd Add new USB ID.
PR:		usb/177013
MFC after:	1 week
2013-03-18 07:02:58 +00:00
Justin Hibbits
80a5635c8b Add FBT for PowerPC DTrace. Also, clean up the DTrace assembly code,
much of which is not necessary for PowerPC.

The FBT module can likely be factored into 3 separate files: common,
intel, and powerpc, rather than duplicating most of the code between
the x86 and PowerPC flavors.

All DTrace modules for PowerPC will be MFC'd together once Fasttrap is
completed.
2013-03-18 05:30:18 +00:00
Pyun YongHyeon
e8bedbd24a r119712 introduced SIS_TYPE_83816 but it was not actually set in
driver such that checking against the type was always false.
To detect NS DP83816, driver should have checked silicon revision
register for NS controllers. While here, remove SIS_TYPE_83816 to
not make the similar mistake again.

Reported by:	Brad Smith ( brad@openbsd )
2013-03-18 04:46:17 +00:00
Adrian Chadd
1ab002f461 Print out the current fifo queue depth correctly - not just the max
queue depth.

Silly hat to me.
2013-03-18 02:29:57 +00:00
Adrian Chadd
eefc93a947 Dump out information about the RX descriptor free list and FIFO information. 2013-03-18 01:12:36 +00:00
Adrian Chadd
d50e882ab9 Log some more information when the RX buffer allocation failed. 2013-03-18 01:11:52 +00:00
Attilio Rao
774d251d99 Sync back vmcontention branch into HEAD:
Replace the per-object resident and cached pages splay tree with a
path-compressed multi-digit radix trie.
Along with this, switch also the x86-specific handling of idle page
tables to using the radix trie.

This change is supposed to do the following:
- Allowing the acquisition of read locking for lookup operations of the
  resident/cached pages collections as the per-vm_page_t splay iterators
  are now removed.
- Increase the scalability of the operations on the page collections.

The radix trie does rely on the consumers locking to ensure atomicity of
its operations.  In order to avoid deadlocks the bisection nodes are
pre-allocated in the UMA zone.  This can be done safely because the
algorithm needs at maximum one new node per insert which means the
maximum number of the desired nodes is the number of available physical
frames themselves.  However, not all the times a new bisection node is
really needed.

The radix trie implements path-compression because UFS indirect blocks
can lead to several objects with a very sparse trie, increasing the number
of levels to usually scan.  It also helps in the nodes pre-fetching by
introducing the single node per-insert property.

This code is not generalized (yet) because of the possible loss of
performance by having much of the sizes in play configurable.
However, efforts to make this code more general and then reusable in
further different consumers might be really done.

The only KPI change is the removal of the function vm_page_splay() which
is now reaped.
The only KBI change, instead, is the removal of the left/right iterators
from struct vm_page, which are now reaped.

Further technical notes broken into mealpieces can be retrieved from the
svn branch:
http://svn.freebsd.org/base/user/attilio/vmcontention/

Sponsored by:	EMC / Isilon storage division
In collaboration with:	alc, jeff
Tested by:	flo, pho, jhb, davide
Tested by:	ian (arm)
Tested by:	andreast (powerpc)
2013-03-18 00:25:02 +00:00
Ian Lepore
b479b38c0a Eliminate an intermediate buffer and some memcpy() operations, and do
DMA directly to/from the buffers passed in from higher layer drivers.

Reviewed by:	gonzo
2013-03-17 16:31:09 +00:00
Martin Matuska
6cf922c88b Fix typo in sysctl description
Reported by:	Jeremy Chadwick
MFC after:	3 days
2013-03-17 15:53:27 +00:00
Konstantin Belousov
0d3bb4afa8 Remove negative name cache entry pointing to the target name, which
could be instantiated while tdvp was unlocked.

Reported by:	Rick Miller <vmiller at hostileadmin com>
Tested by:	pho
MFC after:	1 week
2013-03-17 15:11:37 +00:00
Gleb Smirnoff
4f67e14304 In m_align() add assertions that mbuf is virgin, similar to assertions
in M_ALIGN(), MH_ALIGN, MEXT_ALIGN() macros.
2013-03-17 07:41:14 +00:00
Gleb Smirnoff
23909ac90d Add MEXT_ALIGN() macro, similar to M_ALIGN() and MH_ALIGN(), but for
mbufs with external buffer.
2013-03-17 07:39:45 +00:00
Gleb Smirnoff
7525c48111 In m_megapullup() instead of reserving some space at the end of packet,
m_align() it, reserving space to prepend data.

Reviewed by:	mav
2013-03-17 07:37:10 +00:00