Commit Graph

5414 Commits

Author SHA1 Message Date
Bryan Drewery
044fd54366 FILEMON_SET_FD: Disallow changing the fd.
MFC after:	1 week
Suggested by:	mjg
Sponsored by:	EMC / Isilon Storage Division
2016-03-09 19:50:35 +00:00
Bryan Drewery
4039c53163 Require kldunload -f to unload.
Code may still be executing from the wrappers at unload time and thus is
not generally safe to unload.  Converting the wrappers to use
EVENTHANDLER(9) will allow this to safely drain on active threads in
hooks.  More work on EVENTHANDLER(9) is needed first.

MFC after:	1 week
Sponsored by:	EMC / Isilon Storage Division
2016-03-07 21:39:29 +00:00
John Baldwin
f3215338ef Refactor the AIO subsystem to permit file-type-specific handling and
improve cancellation robustness.

Introduce a new file operation, fo_aio_queue, which is responsible for
queueing and completing an asynchronous I/O request for a given file.
The AIO subystem now exports library of routines to manipulate AIO
requests as well as the ability to run a handler function in the
"default" pool of AIO daemons to service a request.

A default implementation for file types which do not include an
fo_aio_queue method queues requests to the "default" pool invoking the
fo_read or fo_write methods as before.

The AIO subsystem permits file types to install a private "cancel"
routine when a request is queued to permit safe dequeueing and cleanup
of cancelled requests.

Sockets now use their own pool of AIO daemons and service per-socket
requests in FIFO order.  Socket requests will not block indefinitely
permitting timely cancellation of all requests.

Due to the now-tight coupling of the AIO subsystem with file types,
the AIO subsystem is now a standard part of all kernels.  The VFS_AIO
kernel option and aio.ko module are gone.

Many file types may block indefinitely in their fo_read or fo_write
callbacks resulting in a hung AIO daemon.  This can result in hung
user processes (when processes attempt to cancel all outstanding
requests during exit) or a hung system.  To protect against this, AIO
requests are only permitted for known "safe" files by default.  AIO
requests for all file types can be enabled by setting the new
vfs.aio.enable_usafe sysctl to a non-zero value.  The AIO tests have
been updated to skip operations on unsafe file types if the sysctl is
zero.

Currently, AIO requests on sockets and raw disks are considered safe
and are enabled by default.  aio_mlock() is also enabled by default.

Reviewed by:	cem, jilles
Discussed with:	kib (earlier version)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D5289
2016-03-01 18:12:14 +00:00
Edward Tomasz Napierala
2017d1b01b ioctl(8) -> ioctl(2)
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-02-29 17:37:35 +00:00
Edward Tomasz Napierala
345c0478ad kbdmux(8) -> kbdmux(4)
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-02-29 17:36:00 +00:00
Edward Tomasz Napierala
e81210c1d0 The tcpdump is section 1, not 8.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-02-29 17:16:06 +00:00
Justin Hibbits
0aeed3e993 Add support for the Freescale dTSEC DPAA-based ethernet controller.
Freescale's QorIQ line includes a new ethernet controller, based on their
Datapath Acceleration Architecture (DPAA).  This uses a combination of a Frame
manager, Buffer manager, and Queue manager to improve performance across all
interfaces by being able to pass data directly between hardware acceleration
interfaces.

As part of this import, Freescale's Netcomm Software (ncsw) driver is imported.
This was an attempt by Freescale to create an OS-agnostic sub-driver for
managing the hardware, using shims to interface to the OS-specific APIs.  This
work was abandoned, and Freescale's primary work is in the Linux driver (dual
BSD/GPL license).  Hence, this was imported directly to sys/contrib, rather than
going through the vendor area.  Going forward, FreeBSD-specific changes may be
made to the ncsw code, diverging from the upstream in potentially incompatible
ways.  An alternative could be to import the Linux driver itself, using the
linuxKPI layer, as that would maintain parity with the vendor-maintained driver.
However, the Linux driver has not been evaluated for reliability yet, and may
have issues with the import, whereas the ncsw-based driver in this commit was
completed by Semihalf 4 years ago, and is very stable.

Other SoC modules based on DPAA, which could be added in the future:
* Security and Encryption engine (SEC4.x, SEC5.x)
* RAID engine

Additional work to be done:
* Implement polling mode
* Test vlan support
* Add support for the Pattern Matching Engine, which can do regular expression
  matching on packets.

This driver has been tested on the P5020 QorIQ SoC.  Others listed in the
dtsec(4) manual page are expected to work as the same DPAA engine is included in
all.

Obtained from:	Semihalf
Relnotes:	Yes
Sponsored by:	Alex Perez/Inertial Computing
2016-02-29 03:38:00 +00:00
Mariusz Zaborski
c501d73c7e Convert casperd(8) daemon to the libcasper.
After calling the cap_init(3) function Casper will fork from it's original
process, using pdfork(2). Forking from a process has a lot of advantages:
1. We have the same cwd as the original process.
2. The same uid, gid and groups.
3. The same MAC labels.
4. The same descriptor table.
5. The same routing table.
6. The same umask.
7. The same cpuset(1).
From now services are also in form of libraries.
We also removed libcapsicum at all and converts existing program using Casper
to new architecture.

Discussed with:		pjd, jonathan, ed, drysdale@google.com, emaste
Partially reviewed by:	drysdale@google.com, bdrewery
Approved by:		pjd (mentor)
Differential Revision:	https://reviews.freebsd.org/D4277
2016-02-25 18:23:40 +00:00
Maxim Sobolev
3c3cbe9cf4 Kill few remaininng instances of GEOM_UNCOMPRESS. 2016-02-24 05:16:24 +00:00
Maxim Sobolev
5497acc527 Obsolete mkulzma(8) and geom_uncompress(4), their functionality
is now provided by mkuzip(8) and geom_uzip(4) respectively.

MFC after:	1 month
2016-02-24 00:39:36 +00:00
Maxim Sobolev
8f8cb840b0 Improve mkuzip(8) and geom_uzip(4), merge in LZMA support from mkulzma(8)
and geom_uncompress(4):

1. mkuzip(8):

 - Proper support for eliminating all-zero blocks when compressing an
   image. This feature is already supported by the geom_uzip(4) module
   and CLOOP format in general, so it's just a matter of making mkuzip(8)
   match. It should be noted, however that this feature while it sounds
   great, results in very slight improvement in the overall compression
   ratio, since compressing default 16k all-zero block produces only 39
   bytes compressed output block, which is 99.8% compression ratio. With
   typical average compression ratio of amd64 binaries and data being
   around 60-70% the difference between 99.8% and 100.0% is not that
   great further diluted by the ratio of number of zero blocks in the
   uncompressed image to the overall number of blocks being less than
   0.5 (typically). However, this may be important from performance
   standpoint, so that kernel are not spinning its wheels decompressing
   those empty blocks every time this zero region is read. It could also
   be important when you create huge image mostly filled with zero
   blocks for testing purposes.

 - New feature allowing to de-duplicate output image. It turns out that
   if you twist CLOOP format a bit you can do that as well. And unlike
   zero-blocks elimination, this gives a noticeable improvement in the
   overall compression ratio, reducing output image by something like
   3-4% on my test UFS2 3GB image consisting of full FreeBSD base system
   plus some of the packages (openjdk, apache etc), about 2.3GB worth of
   file data (800+MB compressed). The only caveat is that images created
   with this feature "on" would not work on older versions of FeeBSDxi
   kernel, hence it's turned off by default.

 - provide options to control both features and document them in manual
   page.

 - merge in all relevant LZMA compression support from the mkulzma(8),
   add new option to select between both.

 - switch license from ad-hoc beerware into standard 2-clause BSD.

2. geom_uzip(4):

 - implement support for de-duplicated images;

 - optimize some code paths to handle "all-zero" blocks without reading
   any compressed data;

 - beef up manual page to explain that geom_uzip(4) is not limited only
   to md(4) images. The compressed data can be written to the block
   device and accessed directly via magic of GEOM(4) and devfs(4),
   including to mount root fs from a compressed drive.

 - convert debug log code from being compiled in conditionally into
   being present all the time and provide two sysctls to turn it on or
   off. Due to intended use of the module, it can be used in
   environments where there may not be a luxury to put new kernel with
   debug code enabled. Having those options handy allows debug issues
   without as much problem by just having access to serial console or
   network shell access to a box/appliance. The resulting additional
   CPU cycles are just few int comparisons and branches, and those are
   minuscule when compared to data decompression which is the main
   feature of the module.

 - hopefully improve robustness and resiliency of the geom_uzip(4) by
   performing some of the data validation / range checking on the TOC
   entries and rejecting to attach to an image if those checks fail.

 - merge in all relevant LZMA decompression support from the
   geom_uncompress(4), enable automatically when appropriate format is
   indicated in the header.

 - move compilation work into its own worker thread so that it does not
   clog g_up. This allows multiple instances work in parallel utilizing
   smp cores.

 - document new knobs in the manual page.

Reviewed by:		adrian
MFC after:		1 month
Differential Revision:	https://reviews.freebsd.org/D5333
2016-02-23 23:59:08 +00:00
Maxim Sobolev
b2db562452 Fix section number of .Xr geom_uzip in r295782.
MFC after:	1 months
		(together with r295782)
2016-02-19 01:06:45 +00:00
Maxim Sobolev
5c74f47c96 Clear up confision as to who the original historical authors of code
and manual page were.

For whatever reason it listed myself as a primary author, which is
just not true.

Also, majority of the manpage is copied verbatim from the geom_uzip(4),
contributed by ceri, with only minor adjustments from loos, so put ceri
back into the copyright secrion where he belongs and reflect that in the
AUTHORS section.

For what it's worth, I think this one should be deleted and LZMA
support just folded back into geom_uzip(4) / mkuzip(4) whete it really
belongs.

MFC after:	1 month
2016-02-19 01:00:48 +00:00
Benjamin Kaduk
3c56cded4e Update .Dd for r295565 2016-02-12 17:03:24 +00:00
Ian Lepore
0c0a157c15 Clarify the difference between 7- and 8-bit i2c addresses, used in FDT
versus hints-based configuration, respectively.

Reported by: Jukka Ukkonen <jau789@gmail.com>
2016-02-12 16:59:42 +00:00
Devin Teske
df81f97740 Add missing comma 2016-02-07 13:33:18 +00:00
George V. Neville-Neil
24e61dbfdd Summary: Update the date 2016-02-04 21:46:37 +00:00
George V. Neville-Neil
208ccc1284 Summary: Remove discussion of fastforwarding. 2016-02-04 21:39:58 +00:00
Bryan Drewery
0c370c1a96 Note the double fork behavior with filemon.
X-MFC-With:	r295029
MFC after:	2 weeks
Sponsored by:	EMC / Isilon Storage Division
2016-01-29 01:09:04 +00:00
Bryan Drewery
22bcf8a634 Document the purpose and non-purpose of filemon(4).
MFC after:	2 weeks
Sponsored by:	EMC / Isilon Storage Division
2016-01-29 01:00:12 +00:00
Jim Harris
aeae6079b4 nvd: add hw.nvd.delete_max tunable
The NVMe specification does not define a maximum or optimal delete
size, so technically max delete size is min(full size of namespace,
2^32 - 1 LBAs).  A single delete operation for a multi-TB NVMe
namespace though may take much longer to complete than the nvme(4)
I/O timeout period.  So choose a sensible default here that is still
suitably large to minimize the number of overall delete operations.

This also fixes possible uint32_t overflow on initial TRIM operation
for zpool create operations for NVMe namespaces with >4G LBAs.

MFC after:	3 days
Sponsored by:	Intel
2016-01-28 23:15:14 +00:00
Marcelo Araujo
d62edc5eb5 Add an IOCTL rr_limit to let users fine tuning the number of packets to be
sent using roundrobin protocol and set a better granularity and distribution
among the interfaces. Tuning the number of packages sent by interface can
increase throughput and reduce unordered packets as well as reduce SACK.

Example of usage:
# ifconfig bge0 up
# ifconfig bge1 up
# ifconfig lagg0 create
# ifconfig lagg0 laggproto roundrobin laggport bge0 laggport bge1 \
	192.168.1.1 netmask 255.255.255.0
# ifconfig lagg0 rr_limit 500

Reviewed by:	thompsa, glebius, adrian (old patch)
Approved by:	bapt (mentor)
Relnotes:	Yes
Differential Revision:	https://reviews.freebsd.org/D540
2016-01-23 04:18:44 +00:00
Gleb Smirnoff
d519cedbad Provide new socket option TCP_CCALGOOPT, which stands for TCP congestion
control algorithm options.  The argument is variable length and is opaque
to TCP, forwarded directly to the algorithm's ctl_output method.

Provide new includes directory netinet/cc, where algorithm specific
headers can be installed.

The new API doesn't yet have any in tree consumers.

The original code written by lstewart.
Reviewed by:	rrs, emax
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D711
2016-01-22 02:07:48 +00:00
Brooks Davis
159da0bcfd Add a simple manpage for the cfi(4) and associated cfid(4) drivers.
MFC after:	1 week
Sponsored by:	DARPA, AFRL
2016-01-20 18:47:33 +00:00
Enji Cooper
01bab39427 Bump .Dd for the content changes 2016-01-16 05:35:42 +00:00
Warner Losh
dce512b671 trim-time? What was I thinking. run-time.
Noticed by: Allan Jude
2016-01-16 01:30:55 +00:00
Warner Losh
21e079b85b Add some clarifications. 2016-01-16 01:13:27 +00:00
Warner Losh
195a8c0316 Improve the sentence flow as well which has the happy benefit of
making read-only modify a noun, a case where it unquestionably should
be hyphenated.
2016-01-16 00:45:48 +00:00
Warner Losh
8202ceccea Although not directly modifying a noun, read-only should be hyphenated
in this context (or in any, really).
2016-01-16 00:43:10 +00:00
Warner Losh
62cb31dc18 Read-only is hyphenated when it modifies a noun. 2016-01-16 00:37:27 +00:00
Andrew Rybchenko
a45a0da19c sfxge: support FATSOv2
Reviewed by:    gnn
Sponsored by:   Solarflare Communications, Inc.
MFC after:      2 days
Differential Revision: https://reviews.freebsd.org/D4934
2016-01-15 06:25:26 +00:00
Conrad Meyer
6ca07079af ioat(4): Add support for 'fence' bit with DMA_FENCE flag
Some classes of IOAT hardware prefetch reads.  DMA operations that
depend on the result of prior DMA operations must use the DMA_FENCE flag
to prevent stale reads.

(E.g., I've hit this personally on Broadwell-EP.  The Broadwell-DE has a
different IOAT unit that is documented to not pipeline DMA operations.)

Sponsored by:	EMC / Isilon Storage Division
2016-01-15 01:34:43 +00:00
Enji Cooper
69c0fce6ba Fix spelling of IPMI
Sponsored by: EMC / Isilon Storage Division
2016-01-14 18:04:49 +00:00
Benjamin Kaduk
e2b10854e4 Update .Dd, missed in r294011 2016-01-14 17:16:47 +00:00
Warner Losh
5147131ae4 Document how to enter the debugger here. I'm sure there's some better
canonical place, and the nit-pickers are welcome to move this
information there with a cross reference.

Differential Review: https://reviews.freebsd.org/D4860
2016-01-14 16:23:07 +00:00
Ian Lepore
fdfbb3f5b1 Restore uart PPS signal capture polarity to its historical norm, and add an
option to invert the polarity in software. Also add an option to capture
very narrow pulses by using the hardware's MSR delta-bit capability of
latching line state changes.

This effectively reverts the mistake I made in r286595 which was based on
empirical measurements made on hardware using TTL-level signaling, in which
the logic levels are inverted from RS-232. Thus, this re-syncs the polarity
with the requirements of RFC 2783, which is writen in terms of RS-232
signaling.

Narrow-pulse mode uses the ability of most ns8250 and similar chips to
provide a delta indication in the modem status register. The hardware is
able to notice and latch the change when the pulse width is shorter than
interrupt latency, which results in the signal no longer being asserted by
time the interrupt service code runs. When running in this mode we get
notified only that "a pulse happened" so the driver synthesizes both an
ASSERT and a CLEAR event (with the same timestamp for each). When the pulse
width is about equal to the interrupt latency the driver may intermittantly
see both edges of the pulse. To prevent generating spurious events, the
driver implements a half-second lockout period after generating an event
before it will generate another.

Differential Revision:	https://reviews.freebsd.org/D4477
2016-01-12 18:42:00 +00:00
Jim Harris
adcd1d80b8 Update ismt(4) man page to reflect inclusion in upcoming 10.3 release.
MFC after:	3 days
Sponsored by:	Intel
2016-01-11 17:57:49 +00:00
Conrad Meyer
1502e36346 ioat(4): Add ioat_acquire_reserve() KPI
ioat_acquire_reserve() is an extended version of ioat_acquire().  It
allows users to reserve space in the channel for some number of
descriptors.  If this succeeds, it guarantees that at least submission
of N valid descriptors will succeed.

Sponsored by:	EMC / Isilon Storage Division
2016-01-07 23:02:15 +00:00
Jim Harris
50dea2da12 nvme: add hw.nvme.min_cpus_per_ioq tunable
Due to FreeBSD system-wide limits on number of MSI-X vectors
(https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321),
it may be desirable to allocate fewer than the maximum number
of vectors for an NVMe device, in order to save vectors for
other devices (usually Ethernet) that can take better
advantage of them and may be probed after NVMe.

This tunable is expressed in terms of minimum number of CPUs
per I/O queue instead of max number of queues per controller,
to allow for a more even distribution of CPUs per queue.  This
avoids cases where some number of CPUs have a dedicated queue,
but other CPUs need to share queues.  Ideally the PR referenced
above will eventually be fixed and the mechanism implemented
here becomes obsolete anyways.

While here, fix a bug in the CPUs per I/O queue calculation to
properly account for the admin queue's MSI-X vector.

Reviewed by:	gallatin
MFC after:	3 days
Sponsored by:	Intel
2016-01-07 20:32:04 +00:00
Conrad Meyer
bd81fe68ee ioat(4): Add ioat_get_max_io_size() KPI
Consumers need to know the permitted IO size to send maximally sized
chunks to the hardware.

Sponsored by:	EMC / Isilon Storage Division
2016-01-05 20:42:19 +00:00
Dag-Erling Smørgrav
a50b01d224 17 years and change after I wrote warp_saver, here's a simple plasma effect
(currently only three circular patterns) which requires quite a bit of
fixed-point arithmetic, including sqrt() and cos().  Happy New Year!
2016-01-01 04:04:40 +00:00
Adrian Chadd
2600131bd6 [rtwn] Add initial manpages for the rtwn driver. 2015-12-31 22:34:16 +00:00
Adrian Chadd
71e8eac4fd [mdio] migrate mdiobus out of etherswitch and into a top-level device of its own.
The mdio driver interface is generally useful for devices that require
MDIO without the full MII bus interface. This lifts the driver/interface
out of etherswitch(4), and adds a mdio(4) man page.

Submitted by:	Landon Fuller <landon@landonf.org>
Differential Revision:	https://reviews.freebsd.org/D4606
2015-12-26 02:31:39 +00:00
Conrad Meyer
31bf2875ea ioat(4): Add an API to get HW revision
Different revisions support different operations.  Refer to Intel
External Design Specifications to figure out what your hardware
supports.

Sponsored by:	EMC / Isilon Storage Division
2015-12-17 23:21:37 +00:00
Christian Brueffer
af1e9f5a46 Fix example code rendering, \n needs escaping to show up.
PR:		203536
Submitted by:	Fabian Keil
2015-12-15 13:29:05 +00:00
Kevin Lo
71d96c11f0 Fix a typo (opencrypto -> crypto) and remove useless comment. 2015-12-15 06:01:02 +00:00
Conrad Meyer
5ca9fc2a8d ioat(4): Add support for interrupt coalescing
In I/OAT, this is done through the INTRDELAY register.  On supported
platforms, this register can coalesce interrupts in a set period to
avoid excessive interrupt load for small descriptor workflows.  The
period is configurable anywhere from 1 microsecond to 16.38
milliseconds, in microsecond granularity.

Sponsored by:	EMC / Isilon Storage Division
2015-12-14 22:01:52 +00:00
Christian Brueffer
280186c773 Clean up issues reported by mandoc -Tlint 2015-12-14 13:01:36 +00:00
Christian Brueffer
e91d04f7f7 Non-exhaustive mdoc/spelling/style cleanup.
PR:		202716, 204301 (both spelling)
Submitted by:	Richard Farr, madpilot
2015-12-14 12:37:06 +00:00
Kevin Lo
695be8b931 Add the cryptodev device. 2015-12-14 07:08:17 +00:00