5413 Commits

Author SHA1 Message Date
jhb
be47bc68fb Refactor the AIO subsystem to permit file-type-specific handling and
improve cancellation robustness.

Introduce a new file operation, fo_aio_queue, which is responsible for
queueing and completing an asynchronous I/O request for a given file.
The AIO subystem now exports library of routines to manipulate AIO
requests as well as the ability to run a handler function in the
"default" pool of AIO daemons to service a request.

A default implementation for file types which do not include an
fo_aio_queue method queues requests to the "default" pool invoking the
fo_read or fo_write methods as before.

The AIO subsystem permits file types to install a private "cancel"
routine when a request is queued to permit safe dequeueing and cleanup
of cancelled requests.

Sockets now use their own pool of AIO daemons and service per-socket
requests in FIFO order.  Socket requests will not block indefinitely
permitting timely cancellation of all requests.

Due to the now-tight coupling of the AIO subsystem with file types,
the AIO subsystem is now a standard part of all kernels.  The VFS_AIO
kernel option and aio.ko module are gone.

Many file types may block indefinitely in their fo_read or fo_write
callbacks resulting in a hung AIO daemon.  This can result in hung
user processes (when processes attempt to cancel all outstanding
requests during exit) or a hung system.  To protect against this, AIO
requests are only permitted for known "safe" files by default.  AIO
requests for all file types can be enabled by setting the new
vfs.aio.enable_usafe sysctl to a non-zero value.  The AIO tests have
been updated to skip operations on unsafe file types if the sysctl is
zero.

Currently, AIO requests on sockets and raw disks are considered safe
and are enabled by default.  aio_mlock() is also enabled by default.

Reviewed by:	cem, jilles
Discussed with:	kib (earlier version)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D5289
2016-03-01 18:12:14 +00:00
trasz
08e04d4818 ioctl(8) -> ioctl(2)
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-02-29 17:37:35 +00:00
trasz
b042debeb4 kbdmux(8) -> kbdmux(4)
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-02-29 17:36:00 +00:00
trasz
c2a326a8e4 The tcpdump is section 1, not 8.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-02-29 17:16:06 +00:00
jhibbits
8bf1194fe5 Add support for the Freescale dTSEC DPAA-based ethernet controller.
Freescale's QorIQ line includes a new ethernet controller, based on their
Datapath Acceleration Architecture (DPAA).  This uses a combination of a Frame
manager, Buffer manager, and Queue manager to improve performance across all
interfaces by being able to pass data directly between hardware acceleration
interfaces.

As part of this import, Freescale's Netcomm Software (ncsw) driver is imported.
This was an attempt by Freescale to create an OS-agnostic sub-driver for
managing the hardware, using shims to interface to the OS-specific APIs.  This
work was abandoned, and Freescale's primary work is in the Linux driver (dual
BSD/GPL license).  Hence, this was imported directly to sys/contrib, rather than
going through the vendor area.  Going forward, FreeBSD-specific changes may be
made to the ncsw code, diverging from the upstream in potentially incompatible
ways.  An alternative could be to import the Linux driver itself, using the
linuxKPI layer, as that would maintain parity with the vendor-maintained driver.
However, the Linux driver has not been evaluated for reliability yet, and may
have issues with the import, whereas the ncsw-based driver in this commit was
completed by Semihalf 4 years ago, and is very stable.

Other SoC modules based on DPAA, which could be added in the future:
* Security and Encryption engine (SEC4.x, SEC5.x)
* RAID engine

Additional work to be done:
* Implement polling mode
* Test vlan support
* Add support for the Pattern Matching Engine, which can do regular expression
  matching on packets.

This driver has been tested on the P5020 QorIQ SoC.  Others listed in the
dtsec(4) manual page are expected to work as the same DPAA engine is included in
all.

Obtained from:	Semihalf
Relnotes:	Yes
Sponsored by:	Alex Perez/Inertial Computing
2016-02-29 03:38:00 +00:00
oshogbo
023f14d65b Convert casperd(8) daemon to the libcasper.
After calling the cap_init(3) function Casper will fork from it's original
process, using pdfork(2). Forking from a process has a lot of advantages:
1. We have the same cwd as the original process.
2. The same uid, gid and groups.
3. The same MAC labels.
4. The same descriptor table.
5. The same routing table.
6. The same umask.
7. The same cpuset(1).
From now services are also in form of libraries.
We also removed libcapsicum at all and converts existing program using Casper
to new architecture.

Discussed with:		pjd, jonathan, ed, drysdale@google.com, emaste
Partially reviewed by:	drysdale@google.com, bdrewery
Approved by:		pjd (mentor)
Differential Revision:	https://reviews.freebsd.org/D4277
2016-02-25 18:23:40 +00:00
sobomax
8867ff935e Kill few remaininng instances of GEOM_UNCOMPRESS. 2016-02-24 05:16:24 +00:00
sobomax
85ce861e46 Obsolete mkulzma(8) and geom_uncompress(4), their functionality
is now provided by mkuzip(8) and geom_uzip(4) respectively.

MFC after:	1 month
2016-02-24 00:39:36 +00:00
sobomax
8abe971b5e Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8)
and geom_uncompress(4):

1. mkuzip(8):

 - Proper support for eliminating all-zero blocks when compressing an
   image. This feature is already supported by the geom_uzip(4) module
   and CLOOP format in general, so it's just a matter of making mkuzip(8)
   match. It should be noted, however that this feature while it sounds
   great, results in very slight improvement in the overall compression
   ratio, since compressing default 16k all-zero block produces only 39
   bytes compressed output block, which is 99.8% compression ratio. With
   typical average compression ratio of amd64 binaries and data being
   around 60-70% the difference between 99.8% and 100.0% is not that
   great further diluted by the ratio of number of zero blocks in the
   uncompressed image to the overall number of blocks being less than
   0.5 (typically). However, this may be important from performance
   standpoint, so that kernel are not spinning its wheels decompressing
   those empty blocks every time this zero region is read. It could also
   be important when you create huge image mostly filled with zero
   blocks for testing purposes.

 - New feature allowing to de-duplicate output image. It turns out that
   if you twist CLOOP format a bit you can do that as well. And unlike
   zero-blocks elimination, this gives a noticeable improvement in the
   overall compression ratio, reducing output image by something like
   3-4% on my test UFS2 3GB image consisting of full FreeBSD base system
   plus some of the packages (openjdk, apache etc), about 2.3GB worth of
   file data (800+MB compressed). The only caveat is that images created
   with this feature "on" would not work on older versions of FeeBSDxi
   kernel, hence it's turned off by default.

 - provide options to control both features and document them in manual
   page.

 - merge in all relevant LZMA compression support from the mkulzma(8),
   add new option to select between both.

 - switch license from ad-hoc beerware into standard 2-clause BSD.

2. geom_uzip(4):

 - implement support for de-duplicated images;

 - optimize some code paths to handle "all-zero" blocks without reading
   any compressed data;

 - beef up manual page to explain that geom_uzip(4) is not limited only
   to md(4) images. The compressed data can be written to the block
   device and accessed directly via magic of GEOM(4) and devfs(4),
   including to mount root fs from a compressed drive.

 - convert debug log code from being compiled in conditionally into
   being present all the time and provide two sysctls to turn it on or
   off. Due to intended use of the module, it can be used in
   environments where there may not be a luxury to put new kernel with
   debug code enabled. Having those options handy allows debug issues
   without as much problem by just having access to serial console or
   network shell access to a box/appliance. The resulting additional
   CPU cycles are just few int comparisons and branches, and those are
   minuscule when compared to data decompression which is the main
   feature of the module.

 - hopefully improve robustness and resiliency of the geom_uzip(4) by
   performing some of the data validation / range checking on the TOC
   entries and rejecting to attach to an image if those checks fail.

 - merge in all relevant LZMA decompression support from the
   geom_uncompress(4), enable automatically when appropriate format is
   indicated in the header.

 - move compilation work into its own worker thread so that it does not
   clog g_up. This allows multiple instances work in parallel utilizing
   smp cores.

 - document new knobs in the manual page.

Reviewed by:		adrian
MFC after:		1 month
Differential Revision:	https://reviews.freebsd.org/D5333
2016-02-23 23:59:08 +00:00
sobomax
d0cfd1a9da Fix section number of .Xr geom_uzip in r295782.
MFC after:	1 months
		(together with r295782)
2016-02-19 01:06:45 +00:00
sobomax
1e0cd6018a Clear up confision as to who the original historical authors of code
and manual page were.

For whatever reason it listed myself as a primary author, which is
just not true.

Also, majority of the manpage is copied verbatim from the geom_uzip(4),
contributed by ceri, with only minor adjustments from loos, so put ceri
back into the copyright secrion where he belongs and reflect that in the
AUTHORS section.

For what it's worth, I think this one should be deleted and LZMA
support just folded back into geom_uzip(4) / mkuzip(4) whete it really
belongs.

MFC after:	1 month
2016-02-19 01:00:48 +00:00
bjk
d555ff6357 Update .Dd for r295565 2016-02-12 17:03:24 +00:00
ian
8407e108db Clarify the difference between 7- and 8-bit i2c addresses, used in FDT
versus hints-based configuration, respectively.

Reported by: Jukka Ukkonen <jau789@gmail.com>
2016-02-12 16:59:42 +00:00
dteske
0bd7c2b97b Add missing comma 2016-02-07 13:33:18 +00:00
gnn
8f5a4a0c37 Summary: Update the date 2016-02-04 21:46:37 +00:00
gnn
143293cd0e Summary: Remove discussion of fastforwarding. 2016-02-04 21:39:58 +00:00
bdrewery
078a106cb8 Note the double fork behavior with filemon.
X-MFC-With:	r295029
MFC after:	2 weeks
Sponsored by:	EMC / Isilon Storage Division
2016-01-29 01:09:04 +00:00
bdrewery
ab39a5f58e Document the purpose and non-purpose of filemon(4).
MFC after:	2 weeks
Sponsored by:	EMC / Isilon Storage Division
2016-01-29 01:00:12 +00:00
jimharris
bab4c585f3 nvd: add hw.nvd.delete_max tunable
The NVMe specification does not define a maximum or optimal delete
size, so technically max delete size is min(full size of namespace,
2^32 - 1 LBAs).  A single delete operation for a multi-TB NVMe
namespace though may take much longer to complete than the nvme(4)
I/O timeout period.  So choose a sensible default here that is still
suitably large to minimize the number of overall delete operations.

This also fixes possible uint32_t overflow on initial TRIM operation
for zpool create operations for NVMe namespaces with >4G LBAs.

MFC after:	3 days
Sponsored by:	Intel
2016-01-28 23:15:14 +00:00
araujo
a1319b3488 Add an IOCTL rr_limit to let users fine tuning the number of packets to be
sent using roundrobin protocol and set a better granularity and distribution
among the interfaces. Tuning the number of packages sent by interface can
increase throughput and reduce unordered packets as well as reduce SACK.

Example of usage:
# ifconfig bge0 up
# ifconfig bge1 up
# ifconfig lagg0 create
# ifconfig lagg0 laggproto roundrobin laggport bge0 laggport bge1 \
	192.168.1.1 netmask 255.255.255.0
# ifconfig lagg0 rr_limit 500

Reviewed by:	thompsa, glebius, adrian (old patch)
Approved by:	bapt (mentor)
Relnotes:	Yes
Differential Revision:	https://reviews.freebsd.org/D540
2016-01-23 04:18:44 +00:00
glebius
40ba1ae95a Provide new socket option TCP_CCALGOOPT, which stands for TCP congestion
control algorithm options.  The argument is variable length and is opaque
to TCP, forwarded directly to the algorithm's ctl_output method.

Provide new includes directory netinet/cc, where algorithm specific
headers can be installed.

The new API doesn't yet have any in tree consumers.

The original code written by lstewart.
Reviewed by:	rrs, emax
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D711
2016-01-22 02:07:48 +00:00
brooks
cbe7419b70 Add a simple manpage for the cfi(4) and associated cfid(4) drivers.
MFC after:	1 week
Sponsored by:	DARPA, AFRL
2016-01-20 18:47:33 +00:00
ngie
a097ef8c36 Bump .Dd for the content changes 2016-01-16 05:35:42 +00:00
imp
782436485f trim-time? What was I thinking. run-time.
Noticed by: Allan Jude
2016-01-16 01:30:55 +00:00
imp
0ae770c6ad Add some clarifications. 2016-01-16 01:13:27 +00:00
imp
96f990267f Improve the sentence flow as well which has the happy benefit of
making read-only modify a noun, a case where it unquestionably should
be hyphenated.
2016-01-16 00:45:48 +00:00
imp
7dc0da3c8b Although not directly modifying a noun, read-only should be hyphenated
in this context (or in any, really).
2016-01-16 00:43:10 +00:00
imp
9def4cf348 Read-only is hyphenated when it modifies a noun. 2016-01-16 00:37:27 +00:00
arybchik
8cbcec3cf7 sfxge: support FATSOv2
Reviewed by:    gnn
Sponsored by:   Solarflare Communications, Inc.
MFC after:      2 days
Differential Revision: https://reviews.freebsd.org/D4934
2016-01-15 06:25:26 +00:00
cem
532a8c64da ioat(4): Add support for 'fence' bit with DMA_FENCE flag
Some classes of IOAT hardware prefetch reads.  DMA operations that
depend on the result of prior DMA operations must use the DMA_FENCE flag
to prevent stale reads.

(E.g., I've hit this personally on Broadwell-EP.  The Broadwell-DE has a
different IOAT unit that is documented to not pipeline DMA operations.)

Sponsored by:	EMC / Isilon Storage Division
2016-01-15 01:34:43 +00:00
ngie
1368ec235b Fix spelling of IPMI
Sponsored by: EMC / Isilon Storage Division
2016-01-14 18:04:49 +00:00
bjk
0e305366eb Update .Dd, missed in r294011 2016-01-14 17:16:47 +00:00
imp
94d0fcb350 Document how to enter the debugger here. I'm sure there's some better
canonical place, and the nit-pickers are welcome to move this
information there with a cross reference.

Differential Review: https://reviews.freebsd.org/D4860
2016-01-14 16:23:07 +00:00
ian
33067117d5 Restore uart PPS signal capture polarity to its historical norm, and add an
option to invert the polarity in software. Also add an option to capture
very narrow pulses by using the hardware's MSR delta-bit capability of
latching line state changes.

This effectively reverts the mistake I made in r286595 which was based on
empirical measurements made on hardware using TTL-level signaling, in which
the logic levels are inverted from RS-232. Thus, this re-syncs the polarity
with the requirements of RFC 2783, which is writen in terms of RS-232
signaling.

Narrow-pulse mode uses the ability of most ns8250 and similar chips to
provide a delta indication in the modem status register. The hardware is
able to notice and latch the change when the pulse width is shorter than
interrupt latency, which results in the signal no longer being asserted by
time the interrupt service code runs. When running in this mode we get
notified only that "a pulse happened" so the driver synthesizes both an
ASSERT and a CLEAR event (with the same timestamp for each). When the pulse
width is about equal to the interrupt latency the driver may intermittantly
see both edges of the pulse. To prevent generating spurious events, the
driver implements a half-second lockout period after generating an event
before it will generate another.

Differential Revision:	https://reviews.freebsd.org/D4477
2016-01-12 18:42:00 +00:00
jimharris
b5abcf3a88 Update ismt(4) man page to reflect inclusion in upcoming 10.3 release.
MFC after:	3 days
Sponsored by:	Intel
2016-01-11 17:57:49 +00:00
cem
66aa33e3b5 ioat(4): Add ioat_acquire_reserve() KPI
ioat_acquire_reserve() is an extended version of ioat_acquire().  It
allows users to reserve space in the channel for some number of
descriptors.  If this succeeds, it guarantees that at least submission
of N valid descriptors will succeed.

Sponsored by:	EMC / Isilon Storage Division
2016-01-07 23:02:15 +00:00
jimharris
94f3dfd067 nvme: add hw.nvme.min_cpus_per_ioq tunable
Due to FreeBSD system-wide limits on number of MSI-X vectors
(https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321),
it may be desirable to allocate fewer than the maximum number
of vectors for an NVMe device, in order to save vectors for
other devices (usually Ethernet) that can take better
advantage of them and may be probed after NVMe.

This tunable is expressed in terms of minimum number of CPUs
per I/O queue instead of max number of queues per controller,
to allow for a more even distribution of CPUs per queue.  This
avoids cases where some number of CPUs have a dedicated queue,
but other CPUs need to share queues.  Ideally the PR referenced
above will eventually be fixed and the mechanism implemented
here becomes obsolete anyways.

While here, fix a bug in the CPUs per I/O queue calculation to
properly account for the admin queue's MSI-X vector.

Reviewed by:	gallatin
MFC after:	3 days
Sponsored by:	Intel
2016-01-07 20:32:04 +00:00
cem
a377302780 ioat(4): Add ioat_get_max_io_size() KPI
Consumers need to know the permitted IO size to send maximally sized
chunks to the hardware.

Sponsored by:	EMC / Isilon Storage Division
2016-01-05 20:42:19 +00:00
des
11d782769f 17 years and change after I wrote warp_saver, here's a simple plasma effect
(currently only three circular patterns) which requires quite a bit of
fixed-point arithmetic, including sqrt() and cos().  Happy New Year!
2016-01-01 04:04:40 +00:00
adrian
91d6917b30 [rtwn] Add initial manpages for the rtwn driver. 2015-12-31 22:34:16 +00:00
adrian
86c6170821 [mdio] migrate mdiobus out of etherswitch and into a top-level device of its own.
The mdio driver interface is generally useful for devices that require
MDIO without the full MII bus interface. This lifts the driver/interface
out of etherswitch(4), and adds a mdio(4) man page.

Submitted by:	Landon Fuller <landon@landonf.org>
Differential Revision:	https://reviews.freebsd.org/D4606
2015-12-26 02:31:39 +00:00
cem
a12e9d2b9f ioat(4): Add an API to get HW revision
Different revisions support different operations.  Refer to Intel
External Design Specifications to figure out what your hardware
supports.

Sponsored by:	EMC / Isilon Storage Division
2015-12-17 23:21:37 +00:00
brueffer
d195f06a04 Fix example code rendering, \n needs escaping to show up.
PR:		203536
Submitted by:	Fabian Keil
2015-12-15 13:29:05 +00:00
kevlo
ce202b136b Fix a typo (opencrypto -> crypto) and remove useless comment. 2015-12-15 06:01:02 +00:00
cem
ae2a2410df ioat(4): Add support for interrupt coalescing
In I/OAT, this is done through the INTRDELAY register.  On supported
platforms, this register can coalesce interrupts in a set period to
avoid excessive interrupt load for small descriptor workflows.  The
period is configurable anywhere from 1 microsecond to 16.38
milliseconds, in microsecond granularity.

Sponsored by:	EMC / Isilon Storage Division
2015-12-14 22:01:52 +00:00
brueffer
9009bdb748 Clean up issues reported by mandoc -Tlint 2015-12-14 13:01:36 +00:00
brueffer
cc6522ee84 Non-exhaustive mdoc/spelling/style cleanup.
PR:		202716, 204301 (both spelling)
Submitted by:	Richard Farr, madpilot
2015-12-14 12:37:06 +00:00
kevlo
7795028e9f Add the cryptodev device. 2015-12-14 07:08:17 +00:00
rpokala
7fd0d14bf5 [PR 195033] Document mps.enable_ssu
mps(4) sends StartStopUnit to SATA direct-access devices during shutdown.
Document the tunables which control that behavior.

PR:		195033
Reviewed by:	scottl
Approved by:	jhb
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D4456
2015-12-11 21:50:59 +00:00
mav
8f18910d2b Update list of card names. 2015-12-10 01:41:05 +00:00