Summary:
Many instances of bus_alloc_resource() simply use 0 and ~0 for start and end to
denote 'anywhere' with a given count. To clean this up, introduce a
bus_alloc_resource_anywhere() convenience function.
Bump __FreeBSD_version for the new API.
Reviewed By: jhb
Differential Revision: https://reviews.freebsd.org/D5370
This allows 'make analyze' or 'make OBJ.clang-analyzer' to run the
Clang static analyzer and present results on stdout.
Obtained from: NetBSD (CVS Rev. 1.3)
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D5449
After calling the cap_init(3) function Casper will fork from it's original
process, using pdfork(2). Forking from a process has a lot of advantages:
1. We have the same cwd as the original process.
2. The same uid, gid and groups.
3. The same MAC labels.
4. The same descriptor table.
5. The same routing table.
6. The same umask.
7. The same cpuset(1).
From now services are also in form of libraries.
We also removed libcapsicum at all and converts existing program using Casper
to new architecture.
Discussed with: pjd, jonathan, ed, drysdale@google.com, emaste
Partially reviewed by: drysdale@google.com, bdrewery
Approved by: pjd (mentor)
Differential Revision: https://reviews.freebsd.org/D4277
is exhausted.
How to use:
Basically we need to add on rc.conf an another option like:
If we want to protect only the main processes.
syslogd_oomprotect="YES"
If we want to protect all future children of the specified processes.
syslogd_oomprotect="ALL"
PR: 204741 (based on)
Submitted by: eugen@grosbein.net
Reviewed by: jhb, allanjude, rpokala and bapt
MFC after: 4 weeks
Relnotes: Yes
Sponsored by: gandi.net
Differential Revision: https://reviews.freebsd.org/D5176
and geom_uncompress(4):
1. mkuzip(8):
- Proper support for eliminating all-zero blocks when compressing an
image. This feature is already supported by the geom_uzip(4) module
and CLOOP format in general, so it's just a matter of making mkuzip(8)
match. It should be noted, however that this feature while it sounds
great, results in very slight improvement in the overall compression
ratio, since compressing default 16k all-zero block produces only 39
bytes compressed output block, which is 99.8% compression ratio. With
typical average compression ratio of amd64 binaries and data being
around 60-70% the difference between 99.8% and 100.0% is not that
great further diluted by the ratio of number of zero blocks in the
uncompressed image to the overall number of blocks being less than
0.5 (typically). However, this may be important from performance
standpoint, so that kernel are not spinning its wheels decompressing
those empty blocks every time this zero region is read. It could also
be important when you create huge image mostly filled with zero
blocks for testing purposes.
- New feature allowing to de-duplicate output image. It turns out that
if you twist CLOOP format a bit you can do that as well. And unlike
zero-blocks elimination, this gives a noticeable improvement in the
overall compression ratio, reducing output image by something like
3-4% on my test UFS2 3GB image consisting of full FreeBSD base system
plus some of the packages (openjdk, apache etc), about 2.3GB worth of
file data (800+MB compressed). The only caveat is that images created
with this feature "on" would not work on older versions of FeeBSDxi
kernel, hence it's turned off by default.
- provide options to control both features and document them in manual
page.
- merge in all relevant LZMA compression support from the mkulzma(8),
add new option to select between both.
- switch license from ad-hoc beerware into standard 2-clause BSD.
2. geom_uzip(4):
- implement support for de-duplicated images;
- optimize some code paths to handle "all-zero" blocks without reading
any compressed data;
- beef up manual page to explain that geom_uzip(4) is not limited only
to md(4) images. The compressed data can be written to the block
device and accessed directly via magic of GEOM(4) and devfs(4),
including to mount root fs from a compressed drive.
- convert debug log code from being compiled in conditionally into
being present all the time and provide two sysctls to turn it on or
off. Due to intended use of the module, it can be used in
environments where there may not be a luxury to put new kernel with
debug code enabled. Having those options handy allows debug issues
without as much problem by just having access to serial console or
network shell access to a box/appliance. The resulting additional
CPU cycles are just few int comparisons and branches, and those are
minuscule when compared to data decompression which is the main
feature of the module.
- hopefully improve robustness and resiliency of the geom_uzip(4) by
performing some of the data validation / range checking on the TOC
entries and rejecting to attach to an image if those checks fail.
- merge in all relevant LZMA decompression support from the
geom_uncompress(4), enable automatically when appropriate format is
indicated in the header.
- move compilation work into its own worker thread so that it does not
clog g_up. This allows multiple instances work in parallel utilizing
smp cores.
- document new knobs in the manual page.
Reviewed by: adrian
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D5333
and manual page were.
For whatever reason it listed myself as a primary author, which is
just not true.
Also, majority of the manpage is copied verbatim from the geom_uzip(4),
contributed by ceri, with only minor adjustments from loos, so put ceri
back into the copyright secrion where he belongs and reflect that in the
AUTHORS section.
For what it's worth, I think this one should be deleted and LZMA
support just folded back into geom_uzip(4) / mkuzip(4) whete it really
belongs.
MFC after: 1 month
While here clean up the documentation for jail_list
PR: 196152
Approved by: jamie, wblock
MFC after: 1 week, with r295471
Differential Revision: https://reviews.freebsd.org/D5243
As of r294068 boot1.efi can load loader.efi from ZFS.
As of r295320 boot1.efi prefers to load loader.efi from the same device
it was loaded from.
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Replace `make regress` (legacy test make target) and `make test` (incomplete
test make target added with the FreeBSD test suite) with make check as it's
consistent with other open source projects.
`make check` defaults to running tests from `.OBJDIR`, but can be overridden
with the `CHECKDIR` variable.
Add `make checkworld` target to simplify running the FreeBSD test suite from
`TESTSBASE` (i.e. the top-level tests directory), similar to buildworld.
Document `make check` and `make checkworld` in build(7).
Other minor changes:
- Rename intermediate file (`Kyuafile.auto`) to `Kyuafile` to simplify
`make check`.
- Remove terse warnings attached to `beforetest`/`aftertest`.
- Add kyua binary check to check target in suite.test.mk; error out if it's
not found
The MFC is [partly] contingent on other build related changes being MFCed.
Differential Revision: https://reviews.freebsd.org/D4406
MFC after: 2 months
X-MFC to: stable/10
Relnotes: yes
Reviewed by: bdrewery, Evan Cramer <eccramer@gmail.com>
Sponsored by: EMC / Isilon Storage Division
The NVMe specification does not define a maximum or optimal delete
size, so technically max delete size is min(full size of namespace,
2^32 - 1 LBAs). A single delete operation for a multi-TB NVMe
namespace though may take much longer to complete than the nvme(4)
I/O timeout period. So choose a sensible default here that is still
suitably large to minimize the number of overall delete operations.
This also fixes possible uint32_t overflow on initial TRIM operation
for zpool create operations for NVMe namespaces with >4G LBAs.
MFC after: 3 days
Sponsored by: Intel
Summary:
Migrate to using the semi-opaque type rman_res_t to specify rman resources. For
now, this is still compatible with u_long.
This is step one in migrating rman to use uintmax_t for resources instead of
u_long.
Going forward, this could feasibly be used to specify architecture-specific
definitions of resource ranges, rather than baking a specific integer type into
the API.
This change has been broken out to facilitate MFC'ing drivers back to 10 without
breaking ABI.
Reviewed By: jhb
Sponsored by: Alex Perez/Inertial Computing
Differential Revision: https://reviews.freebsd.org/D5075
sent using roundrobin protocol and set a better granularity and distribution
among the interfaces. Tuning the number of packages sent by interface can
increase throughput and reduce unordered packets as well as reduce SACK.
Example of usage:
# ifconfig bge0 up
# ifconfig bge1 up
# ifconfig lagg0 create
# ifconfig lagg0 laggproto roundrobin laggport bge0 laggport bge1 \
192.168.1.1 netmask 255.255.255.0
# ifconfig lagg0 rr_limit 500
Reviewed by: thompsa, glebius, adrian (old patch)
Approved by: bapt (mentor)
Relnotes: Yes
Differential Revision: https://reviews.freebsd.org/D540
control algorithm options. The argument is variable length and is opaque
to TCP, forwarded directly to the algorithm's ctl_output method.
Provide new includes directory netinet/cc, where algorithm specific
headers can be installed.
The new API doesn't yet have any in tree consumers.
The original code written by lstewart.
Reviewed by: rrs, emax
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D711
This API has no in-tree consumers at the moment but is useful to at least
one out-of-tree consumer, and naturally complements existing vnode refcount
functions (vholdl(9), vdropl(9)).
Obtained from: kib (sys/ portion)
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D4947
Differential Revision: https://reviews.freebsd.org/D4953
Some classes of IOAT hardware prefetch reads. DMA operations that
depend on the result of prior DMA operations must use the DMA_FENCE flag
to prevent stale reads.
(E.g., I've hit this personally on Broadwell-EP. The Broadwell-DE has a
different IOAT unit that is documented to not pipeline DMA operations.)
Sponsored by: EMC / Isilon Storage Division
canonical place, and the nit-pickers are welcome to move this
information there with a cross reference.
Differential Review: https://reviews.freebsd.org/D4860
option to invert the polarity in software. Also add an option to capture
very narrow pulses by using the hardware's MSR delta-bit capability of
latching line state changes.
This effectively reverts the mistake I made in r286595 which was based on
empirical measurements made on hardware using TTL-level signaling, in which
the logic levels are inverted from RS-232. Thus, this re-syncs the polarity
with the requirements of RFC 2783, which is writen in terms of RS-232
signaling.
Narrow-pulse mode uses the ability of most ns8250 and similar chips to
provide a delta indication in the modem status register. The hardware is
able to notice and latch the change when the pulse width is shorter than
interrupt latency, which results in the signal no longer being asserted by
time the interrupt service code runs. When running in this mode we get
notified only that "a pulse happened" so the driver synthesizes both an
ASSERT and a CLEAR event (with the same timestamp for each). When the pulse
width is about equal to the interrupt latency the driver may intermittantly
see both edges of the pulse. To prevent generating spurious events, the
driver implements a half-second lockout period after generating an event
before it will generate another.
Differential Revision: https://reviews.freebsd.org/D4477
ioat_acquire_reserve() is an extended version of ioat_acquire(). It
allows users to reserve space in the channel for some number of
descriptors. If this succeeds, it guarantees that at least submission
of N valid descriptors will succeed.
Sponsored by: EMC / Isilon Storage Division
Due to FreeBSD system-wide limits on number of MSI-X vectors
(https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321),
it may be desirable to allocate fewer than the maximum number
of vectors for an NVMe device, in order to save vectors for
other devices (usually Ethernet) that can take better
advantage of them and may be probed after NVMe.
This tunable is expressed in terms of minimum number of CPUs
per I/O queue instead of max number of queues per controller,
to allow for a more even distribution of CPUs per queue. This
avoids cases where some number of CPUs have a dedicated queue,
but other CPUs need to share queues. Ideally the PR referenced
above will eventually be fixed and the mechanism implemented
here becomes obsolete anyways.
While here, fix a bug in the CPUs per I/O queue calculation to
properly account for the admin queue's MSI-X vector.
Reviewed by: gallatin
MFC after: 3 days
Sponsored by: Intel
Immediate problem fixed by the new KPI is the long-standing race
between device creation and assignments to cdev->si_drv1 and
cdev->si_drv2, which allows the window where cdevsw methods might be
called with si_drv1,2 fields not yet set. Devices typically checked
for NULL and returned spurious errors to usermode, and often left some
methods unchecked.
The new function interface is designed to be extensible, which should
allow to add more features to make_dev_s(9) without inventing yet
another name for function to create devices, while maintaining KPI and
even KBI backward-compatibility.
Reviewed by: hps, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Differential revision: https://reviews.freebsd.org/D4746
The mdio driver interface is generally useful for devices that require
MDIO without the full MII bus interface. This lifts the driver/interface
out of etherswitch(4), and adds a mdio(4) man page.
Submitted by: Landon Fuller <landon@landonf.org>
Differential Revision: https://reviews.freebsd.org/D4606
While here, explicitly note the requirement that the BAR(s) must be
allocated prior to calling pci_alloc_msix().
Reviewed by: andrew, emaste
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D4688
exhausted.
It is possible for a bug in the code (or, theoretically, even unusual
network conditions) to exhaust all possible mbufs or mbuf clusters.
When this occurs, things can grind to a halt fairly quickly. However,
we currently do not call mb_reclaim() unless the entire system is
experiencing a low-memory condition.
While it is best to try to prevent exhaustion of one of the mbuf zones,
it would also be useful to have a mechanism to attempt to recover from
these situations by freeing "expendable" mbufs.
This patch makes two changes:
a) The patch adds a generic API to the UMA zone allocator to set a
function that should be called when an allocation fails because the
zone limit has been reached. Because of the way this function can be
called, it really should do minimal work.
b) The patch uses this API to try to free mbufs when an allocation
fails from one of the mbuf zones because the zone limit has been
reached. The function schedules a callout to run mb_reclaim().
Differential Revision: https://reviews.freebsd.org/D3864
Reviewed by: gnn
Comments by: rrs, glebius
MFC after: 2 weeks
Sponsored by: Juniper Networks
Different revisions support different operations. Refer to Intel
External Design Specifications to figure out what your hardware
supports.
Sponsored by: EMC / Isilon Storage Division
o With new KPI consumers can request contiguous ranges of pages, and
unlike before, all pages will be kept busied on return, like it was
done before with the 'reqpage' only. Now the reqpage goes away. With
new interface it is easier to implement code protected from race
conditions.
Such arrayed requests for now should be preceeded by a call to
vm_pager_haspage() to make sure that request is possible. This
could be improved later, making vm_pager_haspage() obsolete.
Strenghtening the promises on the business of the array of pages
allows us to remove such hacks as swp_pager_free_nrpage() and
vm_pager_free_nonreq().
o New KPI accepts two integer pointers that may optionally point at
values for read ahead and read behind, that a pager may do, if it
can. These pages are completely owned by pager, and not controlled
by the caller.
This shifts the UFS-specific readahead logic from vm_fault.c, which
should be file system agnostic, into vnode_pager.c. It also removes
one VOP_BMAP() request per hard fault.
Discussed with: kib, alc, jeff, scottl
Sponsored by: Nginx, Inc.
Sponsored by: Netflix
In I/OAT, this is done through the INTRDELAY register. On supported
platforms, this register can coalesce interrupts in a set period to
avoid excessive interrupt load for small descriptor workflows. The
period is configurable anywhere from 1 microsecond to 16.38
milliseconds, in microsecond granularity.
Sponsored by: EMC / Isilon Storage Division
mps(4) sends StartStopUnit to SATA direct-access devices during shutdown.
Document the tunables which control that behavior.
PR: 195033
Reviewed by: scottl
Approved by: jhb
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D4456
The hardware supports descriptors with two non-contiguous pages. This
allows issuing one descriptor for an 8k copy from/to non-contiguous but
otherwise page-aligned memory.
Sponsored by: EMC / Isilon Storage Division
These helper functions can be used to read in or write a buffer from or to
an arbitrary process' address space. Without them, this can only be done
using proc_rwmem(), which requires the caller to fill out a uio. This is
onerous and results in code duplication; the new functions provide a simpler
interface which is sufficient for most existing callers of proc_rwmem().
This change also adds a manual page for proc_rwmem() and the new functions.
Reviewed by: jhb, kib
Differential Revision: https://reviews.freebsd.org/D4245
It was allowed before, but make it very explicit it is allowed now. And
prefer 'bool' to older types that were used for the same purpose -- int and
boolean_t.
Like with the C99 fixed-width types, use common sense when changing old
code.
No igor regressions.
Suggested by: bde <20151205031713.T3286@besplex.bde.org>
Reviewed by: glebius, davide, bapt (earlier versions)
Reviewed by: imp
Feedback from: julian
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D4384
Submitted by: Artem V. Andreev <Artem.Andreev at oktetlabs.ru>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4355
camdd(8) utility.
CCBs may be queued to the driver via the new CAMIOQUEUE ioctl, and
completed CCBs may be retrieved via the CAMIOGET ioctl. User
processes can use poll(2) or kevent(2) to get notification when
I/O has completed.
While the existing CAMIOCOMMAND blocking ioctl interface only
supports user virtual data pointers in a CCB (generally only
one per CCB), the new CAMIOQUEUE ioctl supports user virtual and
physical address pointers, as well as user virtual and physical
scatter/gather lists. This allows user applications to have more
flexibility in their data handling operations.
Kernel memory for data transferred via the queued interface is
allocated from the zone allocator in MAXPHYS sized chunks, and user
data is copied in and out. This is likely faster than the
vmapbuf()/vunmapbuf() method used by the CAMIOCOMMAND ioctl in
configurations with many processors (there are more TLB shootdowns
caused by the mapping/unmapping operation) but may not be as fast
as running with unmapped I/O.
The new memory handling model for user requests also allows
applications to send CCBs with request sizes that are larger than
MAXPHYS. The pass(4) driver now limits queued requests to the I/O
size listed by the SIM driver in the maxio field in the Path
Inquiry (XPT_PATH_INQ) CCB.
There are some things things would be good to add:
1. Come up with a way to do unmapped I/O on multiple buffers.
Currently the unmapped I/O interface operates on a struct bio,
which includes only one address and length. It would be nice
to be able to send an unmapped scatter/gather list down to
busdma. This would allow eliminating the copy we currently do
for data.
2. Add an ioctl to list currently outstanding CCBs in the various
queues.
3. Add an ioctl to cancel a request, or use the XPT_ABORT CCB to do
that.
4. Test physical address support. Virtual pointers and scatter
gather lists have been tested, but I have not yet tested
physical addresses or scatter/gather lists.
5. Investigate multiple queue support. At the moment there is one
queue of commands per pass(4) device. If multiple processes
open the device, they will submit I/O into the same queue and
get events for the same completions. This is probably the right
model for most applications, but it is something that could be
changed later on.
Also, add a new utility, camdd(8) that uses the asynchronous pass(4)
driver interface.
This utility is intended to be a basic data transfer/copy utility,
a simple benchmark utility, and an example of how to use the
asynchronous pass(4) interface.
It can copy data to and from pass(4) devices using any target queue
depth, starting offset and blocksize for the input and ouptut devices.
It currently only supports SCSI devices, but could be easily extended
to support ATA devices.
It can also copy data to and from regular files, block devices, tape
devices, pipes, stdin, and stdout. It does not support queueing
multiple commands to any of those targets, since it uses the standard
read(2)/write(2)/writev(2)/readv(2) system calls.
The I/O is done by two threads, one for the reader and one for the
writer. The reader thread sends completed read requests to the
writer thread in strictly sequential order, even if they complete
out of order. That could be modified later on for random I/O patterns
or slightly out of order I/O.
camdd(8) uses kqueue(2)/kevent(2) to get I/O completion events from
the pass(4) driver and also to send request notifications internally.
For pass(4) devcies, camdd(8) uses a single buffer (CAM_DATA_VADDR)
per CAM CCB on the reading side, and a scatter/gather list
(CAM_DATA_SG) on the writing side. In addition to testing both
interfaces, this makes any potential reblocking of I/O easier. No
data is copied between the reader and the writer, but rather the
reader's buffers are split into multiple I/O requests or combined
into a single I/O request depending on the input and output blocksize.
For the file I/O path, camdd(8) also uses a single buffer (read(2),
write(2), pread(2) or pwrite(2)) on reads, and a scatter/gather list
(readv(2), writev(2), preadv(2), pwritev(2)) on writes.
Things that would be nice to do for camdd(8) eventually:
1. Add support for I/O pattern generation. Patterns like all
zeros, all ones, LBA-based patterns, random patterns, etc. Right
Now you can always use /dev/zero, /dev/random, etc.
2. Add support for a "sink" mode, so we do only reads with no
writes. Right now, you can use /dev/null.
3. Add support for automatic queue depth probing, so that we can
figure out the right queue depth on the input and output side
for maximum throughput. At the moment it defaults to 6.
4. Add support for SATA device passthrough I/O.
5. Add support for random LBAs and/or lengths on the input and
output sides.
6. Track average per-I/O latency and busy time. The busy time
and latency could also feed in to the automatic queue depth
determination.
sys/cam/scsi/scsi_pass.h:
Define two new ioctls, CAMIOQUEUE and CAMIOGET, that queue
and fetch asynchronous CAM CCBs respectively.
Although these ioctls do not have a declared argument, they
both take a union ccb pointer. If we declare a size here,
the ioctl code in sys/kern/sys_generic.c will malloc and free
a buffer for either the CCB or the CCB pointer (depending on
how it is declared). Since we have to keep a copy of the
CCB (which is fairly large) anyway, having the ioctl malloc
and free a CCB for each call is wasteful.
sys/cam/scsi/scsi_pass.c:
Add asynchronous CCB support.
Add two new ioctls, CAMIOQUEUE and CAMIOGET.
CAMIOQUEUE adds a CCB to the incoming queue. The CCB is
executed immediately (and moved to the active queue) if it
is an immediate CCB, but otherwise it will be executed
in passstart() when a CCB is available from the transport layer.
When CCBs are completed (because they are immediate or
passdone() if they are queued), they are put on the done
queue.
If we get the final close on the device before all pending
I/O is complete, all active I/O is moved to the abandoned
queue and we increment the peripheral reference count so
that the peripheral driver instance doesn't go away before
all pending I/O is done.
The new passcreatezone() function is called on the first
call to the CAMIOQUEUE ioctl on a given device to allocate
the UMA zones for I/O requests and S/G list buffers. This
may be good to move off to a taskqueue at some point.
The new passmemsetup() function allocates memory and
scatter/gather lists to hold the user's data, and copies
in any data that needs to be written. For virtual pointers
(CAM_DATA_VADDR), the kernel buffer is malloced from the
new pass(4) driver malloc bucket. For virtual
scatter/gather lists (CAM_DATA_SG), buffers are allocated
from a new per-pass(9) UMA zone in MAXPHYS-sized chunks.
Physical pointers are passed in unchanged. We have support
for up to 16 scatter/gather segments (for the user and
kernel S/G lists) in the default struct pass_io_req, so
requests with longer S/G lists require an extra kernel malloc.
The new passcopysglist() function copies a user scatter/gather
list to a kernel scatter/gather list. The number of elements
in each list may be different, but (obviously) the amount of data
stored has to be identical.
The new passmemdone() function copies data out for the
CAM_DATA_VADDR and CAM_DATA_SG cases.
The new passiocleanup() function restores data pointers in
user CCBs and frees memory.
Add new functions to support kqueue(2)/kevent(2):
passreadfilt() tells kevent whether or not the done
queue is empty.
passkqfilter() adds a knote to our list.
passreadfiltdetach() removes a knote from our list.
Add a new function, passpoll(), for poll(2)/select(2)
to use.
Add devstat(9) support for the queued CCB path.
sys/cam/ata/ata_da.c:
Add support for the BIO_VLIST bio type.
sys/cam/cam_ccb.h:
Add a new enumeration for the xflags field in the CCB header.
(This doesn't change the CCB header, just adds an enumeration to
use.)
sys/cam/cam_xpt.c:
Add a new function, xpt_setup_ccb_flags(), that allows specifying
CCB flags.
sys/cam/cam_xpt.h:
Add a prototype for xpt_setup_ccb_flags().
sys/cam/scsi/scsi_da.c:
Add support for BIO_VLIST.
sys/dev/md/md.c:
Add BIO_VLIST support to md(4).
sys/geom/geom_disk.c:
Add BIO_VLIST support to the GEOM disk class. Re-factor the I/O size
limiting code in g_disk_start() a bit.
sys/kern/subr_bus_dma.c:
Change _bus_dmamap_load_vlist() to take a starting offset and
length.
Add a new function, _bus_dmamap_load_pages(), that will load a list
of physical pages starting at an offset.
Update _bus_dmamap_load_bio() to allow loading BIO_VLIST bios.
Allow unmapped I/O to start at an offset.
sys/kern/subr_uio.c:
Add two new functions, physcopyin_vlist() and physcopyout_vlist().
sys/pc98/include/bus.h:
Guard kernel-only parts of the pc98 machine/bus.h header with
#ifdef _KERNEL.
This allows userland programs to include <machine/bus.h> to get the
definition of bus_addr_t and bus_size_t.
sys/sys/bio.h:
Add a new bio flag, BIO_VLIST.
sys/sys/uio.h:
Add prototypes for physcopyin_vlist() and physcopyout_vlist().
share/man/man4/pass.4:
Document the CAMIOQUEUE and CAMIOGET ioctls.
usr.sbin/Makefile:
Add camdd.
usr.sbin/camdd/Makefile:
Add a makefile for camdd(8).
usr.sbin/camdd/camdd.8:
Man page for camdd(8).
usr.sbin/camdd/camdd.c:
The new camdd(8) utility.
Sponsored by: Spectra Logic
MFC after: 1 week
Each virtual interface has its own MAC address, queues, and statistics.
The dedicated netmap interfaces (ncxgbeX / ncxlX) were already implemented
as additional VIs on each port. This change allows additional non-netmap
interfaces to be configured on each port. Additional virtual interfaces
use the naming scheme vcxgbeX or vcxlX.
Additional VIs are enabled by setting the hw.cxgbe.num_vis tunable to a
value greater than 1 before loading the cxgbe(4) or cxl(4) driver.
NB: The first VI on each port is the "main" interface (cxgbeX or cxlX).
T4/T5 NICs provide a limited number of MAC addresses for each physical port.
As a result, a maximum of six VIs can be configured on each port (including
the "main" interface and the netmap interface when netmap is enabled).
One user-visible result is that when netmap is enabled, packets received
or transmitted via the netmap interface are no longer counted in the stats
for the "main" interface, but are not accounted to the netmap interface.
The netmap interfaces now also have a new-bus device and export various
information sysctl nodes via dev.n(cxgbe|cxl).X.
The cxgbetool 'clearstats' command clears the stats for all VIs on the
specified port along with the port's stats. There is currently no way to
clear the stats of an individual VI.
Reviewed by: np
MFC after: 1 month
Sponsored by: Chelsio
like the various d_*_t typedefs since it declared a function pointer rather
than a function. Add a new d_priv_dtor_t typedef that declares the function
and can be used as a function prototype. The previous typedef wasn't
useful outside of the cdevpriv implementation, so retire it.
The name d_priv_dtor_t was chosen to be more consistent with cdev methods
since it is commonly used in place of d_close_t even though it is not a
direct pointer in struct cdevsw.
Reviewed by: kib, imp
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D4340
IPv4/IPv6 checksum offloading and VLAN tag insertion/stripping.
Since uether doesn't provide a way to announce driver specific offload
capabilities to upper stack, checksum offloading support needs more work
and will be done in the future.
Special thanks to Hayes Wang from RealTek who gave input.
The mlx5* driver(s) are built [*]/installed separate from the OFED stack thanks
to recent refactoring done in the linuxkpi(4) module.
Always install the manpages instead of conditionally installing them if
MK_OFED != no
* Further refactoring of sys/ofed and linuxkpi(4) is pending to fully divorce
mlx5* from ofed headers
MFC after: never
Requested by: hps
Hacks to enable target mode there complicated code, while didn't really
work. And for outdated hardware fixing it is not really interesting.
Initiator mode tested with Qlogic 1080 adapter is still working fine.
critical section.
uma_zalloc_arg()/uma_zalloc_free() may acquire a sleepable lock on the
zone. The malloc() family of functions may call uma_zalloc_arg() or
uma_zalloc_free().
The malloc(9) man page currently claims that free() will never sleep.
It also implies that the malloc() family of functions will not sleep
when called with M_NOWAIT. However, it is more correct to say that
these functions will not sleep indefinitely. Indeed, they may acquire
a sleepable lock. However, a developer may overlook this restriction
because the WITNESS check that catches attempts to call the malloc()
family of functions within a critical section is inconsistenly
applied.
This change clarifies the language of the malloc(9) man page to clarify
the restriction against calling the malloc() family of functions
while in a critical section or holding a spin lock. It also adds
KASSERTs at appropriate points to make the enforcement of this
restriction more consistent.
PR: 204633
Differential Revision: https://reviews.freebsd.org/D4197
Reviewed by: markj
Approved by: gnn (mentor)
Sponsored by: Juniper Networks
default and add a manual page for mlx5en. The mlx5 module contains
shared code for both infiniband and ethernet. The mlx5en module
contains specific code for ethernet functionality only. A mlx5ib
module is in the works for infiniband support.
Supported hardware:
- ConnectX-4: 10/20/25/40/50/56/100Gb/s speeds.
- ConnectX-4 LX: 10/25/40/50Gb/s speeds (low power consumption)
Refer to the mlx5en(4) manual page for a comprehensive list.
The team porting the mlx5 driver(s) to FreeBSD:
- Hans Petter Selasky <hselasky@freebsd.org>
- Oded Shanoon <odeds@mellanox.com>
- Meny Yossefi <menyy@mellanox.com>
- Shany Michaely <shanim@mellanox.com>
- Shahar Klein <shahark@mellanox.com>
- Daria Genzel <dariaz@mellanox.com>
- Mark Bloch <markb@mellanox.com>
Differential Revision: https://reviews.freebsd.org/D4163
Submitted by: Mark Block <markb@mellanox.com>
Sponsored by: Mellanox Technologies
Reviewed by: gnn @
MFC after: 3 days
This is useful in environments where system configuration is performed by
automated interaction with the system console, since unexpected witness
output makes such automation difficult. With this change, the new
debug.witness.output_channel sysctl allows one to specify that witness
output is to be printed to the kernel log (using log(9)) rather than the
console.
Reviewed by: cem, jhb
MFC after: 2 weeks
Relnotes: yes
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D4183
new return codes of -1 were mistakenly being considered "true". Callout_stop
now returns -1 to indicate the callout had either already completed or
was not running and 0 to indicate it could not be stopped. Also update
the manual page to make it more consistent no non-zero in the callout_stop
or callout_reset descriptions.
MFC after: 1 Month with associated callout change.
Igor has many less complaints now. I think the two remaining are bogus, but I
am also not sure why Igor is producing them.
The page still needs more work.
Sponsored by: EMC / Isilon Storage Division
should be used by TCP for sure in its cleanup of the IN-PCB (will be coming shortly).
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D4076
- Move all section 5 bluetooth manpages under MK_BLUETOOTH != no
MFC after: 3 days
PR: 193260
Reported by: Philippe Michel <philippe.michel7@sfr.fr>
Sponsored by: EMC / Isilon Storage Division
Add S8, S16, S32, and U32 types; add SYSCTL*() macros for them, as well
as for the existing 64-bit types. (While SYSCTL*QUAD and UQUAD macros
already exist, they do not take the same sort of 'val' parameter that
the other macros do.)
Clean up the documented "types" in the sysctl.9 document. (These are
macros and thus not real types, but the manual page documents intent.)
The sysctl_add_oid(9) arg2 has been bumped from intptr_t to intmax_t to
accommodate 64-bit types on 32-bit pointer architectures.
This is just the kernel support piece; the userspace sysctl(1) support
will follow in a later patch.
Submitted by: Ravi Pokala <rpokala@panasas.com>
Reviewed by: cem
Relnotes: no
Sponsored by: Panasas
Differential Revision: https://reviews.freebsd.org/D4091
Add net.link.lagg.lacp.default_strict_mode which defines
the default value for LACP strict compliance for created
lagg devices.
Also:
* Add lacp_strict option to ifconfig(8).
* Fix lagg(4) creation examples.
* Minor style(9) fix.
MFC after: 1 week
PCI-Express capability registers (that is, PCI config registers in the
standard PCI config space belonging to the PCI-Express capability
register set).
Note that all of the current PCI-e registers are either 16 or 32-bits,
so only widths of 2 or 4 bytes are supported.
Reviewed by: imp
MFC after: 1 week
Sponsored by: Chelsio
Differential Revision: https://reviews.freebsd.org/D4088
MK_USB != no
Add the manpages to OptionalObsoleteFiles.inc
As a side-effect, this also fixes installworld with MK_USB == no
X-MFC with: r290128
Sponsored by: EMC / Isilon Storage Division
Certain invalid operations trigger hardware error conditions. Error
conditions that only halt one channel can be detected and recovered by
resetting the channel. Error conditions that halt the whole device are
generally not recoverable.
Add a sysctl to inject channel-fatal HW errors,
'dev.ioat.<N>.force_hw_error=1'.
When a halt due to a channel error is detected, ioat(4) blocks new
operations from being queued on the channel, completes any outstanding
operations with an error status, and resets the channel before allowing
new operations to be queued again.
Update ioat.4 to document error recovery; document blockfill introduced
in r290021 while we are here; document ioat_put_dmaengine() added in
r289907; document DMA_NO_WAIT added in r289982.
Sponsored by: EMC / Isilon Storage Division
window in number of segments on fly. It is set to 10 segments by default.
Remove net.inet.tcp.experimental.initcwnd10 which is now redundant. Also remove
the parent node net.inet.tcp.experimental as it's not needed anymore and also
because it was not well thought out.
Differential Revision: https://reviews.freebsd.org/D3858
In collaboration with: lstewart
Reviewed by: gnn (prev version), rwatson, allanjude, wblock (man page)
MFC after: 2 weeks
Relnotes: yes
Sponsored by: Limelight Networks
It turns out that it is pretty easy to make CloudABI work on ARM64. We
essentially only need to copy over the sysvec from AMD64 and ensure that
we use ARM64 specific registers.
As there is an overlap between function argument and return registers,
we do need to extend cloudabi64_schedtail() to only set its values if
we're actually forking. Not when we're creating a new thread.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D3917
Now it can be used to effectively "build in a subdir". It will use the
'cross-tools', 'libraries', and 'includes' phases of 'buildworld' to properly
setup a WORLDTMP to use. Then it will build 'everything' only in the
listed SUBDIR_OVERRIDE directories. It is still required to list custom
library directories in LOCAL_LIB_DIRS if SUBDIR_OVERRIDE is something
that contains libraries outside of the normal area (such as
SUBDIR_OVERRIDE=contrib/ofed needing LOCAL_LIB_DIRS=contrib/ofed/usr.lib)
Without these changes, SUBDIR_OVERRIDE with buildworld was broken or hit
obscure failures due to missing libraries, includes, or cross compiler.
SUBDIR_OVERRIDE with 'make <target that is not buildworld>' will continue to
work as it did before although its usefulness is questionable.
With a fully populated WORLDTMP, building with a SUBDIR_OVERRIDE with
-DNO_CLEAN only takes a few minutes to start building the target
directories. This is still much better than building unneeded things via
'everything' when testing small subset changes. A BUILDFAST or
SKIPWORLDTMP might make sense for this as well.
- Add in '_worldtmp' as we still need to create WORLDTMP as later targets,
such as '_libraries' and '_includes' use it. This probably was avoiding
calling '_worldtmp' to not remove WORLDTMP for debugging purposes, but
-DNO_CLEAN can be used for that.
- '_legacy' must be included since '_build-tools' uses -legacy.
The SUBDIR_OVERRIDE change came in r95509, while -legacy being part
of build-tools came in r113136.
- 'bootstrap-tools' is still skipped as this feature is not for
upgrades.
- Fix buildworld combined with SUBDIR_OVERRIDE not installing all includes.
The original change for SUBDIR_OVERRIDE in r95509 kept '_includes'
and '_libraries' as building everything possible as the SUBDIR_OVERRIDE
could need anything from them. However in r96462 the real 'includes'
target was changed from manual sub-makes to just recursing 'includes'
on SUBDIR, thus not all includes have been installed into WORLDTMP since then
when combined with 'buildworld'.
This is not done unless calling 'make buildworld' as it would be
unexpected to have it go into all directories when doing 'make
SUBDIR_OVERRIDE=mydir includes'.
- Also need to build the cross-compiler so it is used with --sysroot.
If this is burdensome then telling the build to use the local compiler
as an external compiler (thus using a proper --sysroot to WORLDTMP) is
possible by setting CC=/usr/bin/cc, CXX=/usr/bin/c++, etc.
- Don't build the lib32 distribution with SUBDIR_OVERRIDE in buildworld
since it won't contain anything related to SUBDIR_OVERRIDE. Testing
of the lib32 build can be done with 'make build32'.
- Document these changes in build.7
Sponsored by: EMC / Isilon Storage Division
MFC after: 2 weeks
bug of installing 'realtek' and 'intel_iwn' as files rather then as
a 'LICENSE' file in their directories.
Also add obsolete entries for the older names and names that existed in head
for a period of time.
Suggested by: jmg
X-MFC-With: r289391
MFC after: 3 weeks
Sponsored by: EMC / Isilon Storage Division
On each resolver query, use stat(2) to see if the modification time
of /etc/resolv.conf has changed. If so, reload the file and reinitialize
the resolver library. However, only call stat(2) if at least two seconds
have passed since the last call to stat(2), since calling it on every
query could kill performance.
This new behavior is enabled by default. Add a "reload-period" option
to disable it or change the period of the test.
Document this behavior and option in resolv.conf(5).
Polish the man page just enough to appease igor.
https://lists.freebsd.org/pipermail/freebsd-arch/2015-October/017342.html
Reviewed by: kp, wblock
Discussed with: jilles, imp, alfred
MFC after: 1 month
Relnotes: yes
Sponsored by: Dell Inc.
Differential Revision: https://reviews.freebsd.org/D3867
This allows to set delete method via tunable, before device capabilities
are known. Also allow ZERO method for devices not reporting LBP, if user
explicitly requests it -- it may be useful if storage supports compression
and WRITE SAME, but does not support UNMAP.
MFC after: 2 weeks
This fix is spiritually similar to r287442 and was discovered thanks to
the KASSERT added in that revision.
NT_PROCSTAT_VMMAP output length, when packing kinfo structs, is tied to
the length of filenames corresponding to vnodes in the process' vm map
via vn_fullpath. As vnodes may move during coredump, this is racy.
We do not remove the race, only prevent it from causing coredump
corruption.
- Add a sysctl, kern.coredump_pack_vmmapinfo, to allow users to disable
kinfo packing for PROCSTAT_VMMAP notes. This avoids VMMAP corruption
and truncation, even if names change, at the cost of up to PATH_MAX
bytes per mapped object. The new sysctl is documented in core.5.
- Fix note_procstat_vmmap to self-limit in the second pass. This
addresses corruption, at the cost of sometimes producing a truncated
result.
- Fix PROCSTAT_VMMAP consumers libutil (and libprocstat, via copy-paste)
to grok the new zero padding.
Reported by: pho (https://people.freebsd.org/~pho/stress/log/datamove4-2.txt)
Relnotes: yes
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D3824
Refer to the usb_quirk(4) manual page for more details on how to use
this new feature.
Submitted by: Maxime Soule <btik-fbsd@scoubidou.com>
PR: 203249
MFC after: 2 weeks
This avoids needing a large boot partition / file system in order to
accommodate multiple kernels, and provides consistency with userland
debug. This also simplifies the process of moving kernel debug files
to a separate package and installing them on demand.
In addition, change kernel debug file extension to .debug, to match
userland debug files.
When using the supported kernel installation method the
/usr/lib/debug/boot/kernel directory will be renamed (to kernel.old)
as is done with /boot/kernel.
Developers wishing to maintain the historical behavior of installing
debug files in /boot/kernel/ can set KERN_DEBUGDIR="" in src.conf(5).
Reviewed by: bdrewery, brooks, imp, markj
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D1006
branch.
This function is used to drain a callout via a callback instead of
blocking the caller until the drain is complete. Refer to the
callout_drain_async() manual page for a detailed description.
Limitation: If a lock is used with the callout, the callout can only
be drained asynchronously one time unless the callout_init_mtx()
function is called again. This limitation is not present in
projects/hps_head and will require more invasive changes to the
timeout code, which was not in the scope of this patch.
Differential Revision: https://reviews.freebsd.org/D3521
Reviewed by: wblock
MFC after: 1 month