This commit imports the new fusefs driver. It raises the protocol level
from 7.8 to 7.23, fixes many bugs, adds a test suite for the driver, and
adds many new features. New features include:
* Optional kernel-side permissions checks (-o default_permissions)
* Implement VOP_MKNOD, VOP_BMAP, and VOP_ADVLOCK
* Allow interrupting FUSE operations
* Support named pipes and unix-domain sockets in fusefs file systems
* Forward UTIME_NOW during utimensat(2) to the daemon
* kqueue support for /dev/fuse
* Allow updating mounts with "mount -u"
* Allow exporting fusefs file systems over NFS
* Server-initiated invalidation of the name cache or data cache
* Respect RLIMIT_FSIZE
* Try to support servers as old as protocol 7.4
Performance enhancements include:
* Implement FUSE's FOPEN_KEEP_CACHE and FUSE_ASYNC_READ flags
* Cache file attributes
* Cache lookup entries, both positive and negative
* Server-selectable cache modes: writethrough, writeback, or uncached
* Write clustering
* Readahead
* Use counter(9) for statistical reporting
PR: 199934 216391 233783 234581 235773 235774 235775
PR: 236226 236231 236236 236291 236329 236381 236405
PR: 236327 236466 236472 236473 236474 236530 236557
PR: 236560 236844 237052 237181 237588 238565
Reviewed by: bcr (man pages)
Reviewed by: cem, ngie, rpokala, glebius, kib, bde, emaste (post-commit
review on project branch)
MFC after: 3 weeks
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Pull Request: https://reviews.freebsd.org/D21110
I merged passthru.c from the wrong branch (it was a branch that went further in
a direction I wound up not taking). Fix the mismerge and turn passthru on.
NVMe reservations are quite alike to SCSI persistent reservations and
can be used in clustered setups with shared multiport storage.
MFC after: 10 days
Relnotes: yes
Sponsored by: iXsystems, Inc.
r348215 changed jail_getid(3) to validate passed-in jids as active jails
(as the function is documented to return -1 if the jail does not exist).
This broke the jail option (in some cases?) as the jail historically hasn't
needed to exist at the time of rule parsing; jids will get stored and later
applied.
Fix this caller to attempt to parse *av as a number first and just use it
as-is to match historical behavior. jail_getid(3) must still be used in
order for name arguments to work, but it's strictly a fallback in case we
weren't given a number.
Reported and tested by: Ari Suutari <ari stonepile fi>
Reviewed by: ae
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D21128
It allows to delete all user data from NVM subsystem in one of 3 methods.
It is a close equivalent of SCSI SANITIZE command of `camcontrol sanitize`,
so I tried to keep arguments as close as possible.
While there, fix supported sanitize methods reporting in `identify`.
MFC after: 2 weeks
Relnotes: yes
Sponsored by: iXsystems, Inc.
Add capsicum support to ping6, mostly copying the strategy used for ping.
Submitted by: Ján Sučan <jansucan@gmail.com>
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: Google, inc. (Google Summer of Code 2019)
Differential Revision: https://reviews.freebsd.org/D21050
In particular: Changed Namespace List, Commands Supported and Effects,
Reservation Notification, Sanitize Status.
Add few new arguments to `nvmecontrol log` subcommand.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
While very useful by itself, it also makes `nvmecontrol` not depend on
hardcoded device names parsing, that in its turn makes simple to take
nvdX (and potentially any other) device names as arguments.
Also added IOCTL bypass from nvdX to respective nvmeYnsZ makes them
interchangeable for management purposes.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
This adds several previously missed but important subcommands to list
namespaces and controllers. It also fixes few previously added but
just found with real testing to be broken subcommands.
Also while there, add possibility to explicitly specify nsid for
`nvmecontrol identify` subcommand. It may be useful to specify nsids
not having own devices, for example 0xffffffff, or just newly created
ones.
MFC after: 2 weeks
Relnotes: yes
Sponsored by: iXsystems, Inc.
While old devices may not support 10 byte MODE SENSE/MODE SELECT commands,
new ones may not be able to report all mode pages with 6 byte commands.
This patch makes camcontrol by default start with 10 byte commands and
fall back to 6 byte on ILLEGAL REQUEST error, or 6 byte can be forced.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
ATA sanitize is functionally identical to SCSI, just uses different
initiation commands and status reporting mechanism.
While there, make kernel better handle sanitize commands and statuses.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
This option was imported as part of the KAME project in r62627 (in 2000).
It was turned on unconditionally in r121472 (in 2003) and has been on ever
since. The old alternative code has bitrotted. Reap the dead code.
Reported by: Ján Sučan <jansucan@gmail.com>
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20938
This makes `camcontrol debug` also allow peripheral device specification.
While there, make BTL parser more strict and switch from strtok() to
strsep().
MFC after: 2 weeks
having their check hashes recomputed which resulted in spurious inode
check-hash errors when the system came back up after a crash.
Reported by: Alan Somers
Sponsored by: Netflix
trimming so that a geli device isn't detached before swapon is
invoked.
Submitted by: sigsys_gmail.com
Discussed with: alc
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D21006
AMA replaced HPA in ACS-3 specification. It allows to limit size of the
disk alike to HPA, but declares inaccessible data as indeterminate. One
of its practical use cases is to under-provision SATA SSDs for better
reliability and performance.
While there, fix HPA Security detection/reporting.
MFC after: 2 weeks
Relnotes: yes
Sponsored by: iXsystems, Inc.
FUSE file systems can optionally support interrupting outstanding
operations. However, the file system does not identify to the kernel at
mount time whether it's capable of doing that. Instead it signals its
noncapability by returning ENOSYS to the first FUSE_INTERRUPT operation it
receives. That's a problem for reliable signal delivery, because the kernel
must choose which thread should get a signal before it knows whether the
FUSE server can handle interrupts. The problem is even worse because the
FUSE protocol allows a file system to simply ignore all FUSE_INTERRUPT
operations.
Fix the signal delivery logic by making interruptibility an opt-in mount
option. This will require a corresponding change to libfuse, but not to
most file systems that link to libfuse.
Bump __FreeBSD_version due to the new mount option.
Sponsored by: The FreeBSD Foundation
These are mostly compatible with Linux, with three exceptions.
1. We don't do metadata segment stuff. Our passthrough interface
doesn't cope. The code is there, but generates an error.
2. Linux lets you specify a namespace ID for the command. We current
do not: we get ours from the namespace device, or pass in a generic
one. Generally, this will lead to the same command, but FreeBSD's
is safer since you can't specify the wrong id.
3. --show-command outputs to stderr instead of stdout so you can both
see your command, and capture its output with a simple redirect.
Differential Revision: https://reviews.freebsd.org/D19296
Create a set of routines and structures to hold the data for the args
for a command. Use them to generate help and to parse args. Convert
all the current commands over to the new format. "comnd" is a hat-tip
to the TOPS-20 %COMND JSYS that (very) loosely inspired much of the
subsequent command line notions in the industry, but this is far
simpler (the %COMND man page is longer than this code) and not in the
kernel... Also, it implements today's de-facto
command [verb]+ [opts]* [args]*
format rather than the old, archaic TOPS-20 command format :)
This is a snapshot of a work in progress to get the nvme passthru
stuff committed. In time it will become a private library and used
by some other programs in the tree that conform to the above pattern.
Differential Revision: https://reviews.freebsd.org/D19296
gcc hates dt < CC_DT_NONE since it can never be true when dt is an unsigned
type. Since that's a compiler choice and may be affected by weird stuff, instead
use (unsigned)dt > CC_DT_UNKNOWN to test for bounds error since that will work
regardless of the signedness of dt.
List the device's protocol. The returned value is one of the following:
ata direct attach ATA or SATA device
satl a SATA device attached via SAS
scsi A parallel SCSI or SAS
nvme A direct attached NVMe device
mmcsd A MMC or SD attached device
Reviewed by: scottl@, rpokala@
Differential Revision: https://reviews.freebsd.org/D20950
Most people know SAS attached SATA devices by the name SAT or SATL
(with the latter being a little more common). Change the device type
ATA_BEHIND_SCSI to SATL since it's more specific and meaningful.
Suggested by: scottl@
We remove IPSEC only in parts of the tree, and not others. RELEASE_CRUNCH to
disable it has not kept up with all its uses. Remove it. Should there be a real
need to disable IPSEC, one that hasn't shown up in the base system to date,
it can be re-added behind a WITHOUT_IPSEC build option.
We've not used this in years since we retired sysinstall, and it
hasn't compiled in at least a year. A full camcontrol is only 180k, so
making it smaller is not as important as it once was.
OK'd by: ken@, scottl@
is to notify the kernel that the file system is untrusted and it
should use more extensive checks on the file-system's metadata
before using it. This option is intended to be used when mounting
file systems from untrusted media such as USB memory sticks or other
externally-provided media.
It will initially be used by the UFS/FFS file system, but should
likely be expanded to be used by other file systems that may appear
on external media like msdosfs, exfat, and ext2fs.
Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20786
Unmapped mbufs allow sendfile to carry multiple pages of data in a
single mbuf, without mapping those pages. It is a requirement for
Netflix's in-kernel TLS, and provides a 5-10% CPU savings on heavy web
serving workloads when used by sendfile, due to effectively
compressing socket buffers by an order of magnitude, and hence
reducing cache misses.
For this new external mbuf buffer type (EXT_PGS), the ext_buf pointer
now points to a struct mbuf_ext_pgs structure instead of a data
buffer. This structure contains an array of physical addresses (this
reduces cache misses compared to an earlier version that stored an
array of vm_page_t pointers). It also stores additional fields needed
for in-kernel TLS such as the TLS header and trailer data that are
currently unused. To more easily detect these mbufs, the M_NOMAP flag
is set in m_flags in addition to M_EXT.
Various functions like m_copydata() have been updated to safely access
packet contents (using uiomove_fromphys()), to make things like BPF
safe.
NIC drivers advertise support for unmapped mbufs on transmit via a new
IFCAP_NOMAP capability. This capability can be toggled via the new
'nomap' and '-nomap' ifconfig(8) commands. For NIC drivers that only
transmit packet contents via DMA and use bus_dma, adding the
capability to if_capabilities and if_capenable should be all that is
required.
If a NIC does not support unmapped mbufs, they are converted to a
chain of mapped mbufs (using sf_bufs to provide the mapping) in
ip_output or ip6_output. If an unmapped mbuf requires software
checksums, it is also converted to a chain of mapped mbufs before
computing the checksum.
Submitted by: gallatin (earlier version)
Reviewed by: gallatin, hselasky, rrs
Discussed with: ae, kp (firewalls)
Relnotes: yes
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20616