Add an atomic shm rename operation, similar in spirit to a file
rename. Atomically unlink an shm from a source path and link it to a
destination path. If an existing shm is linked at the destination
path, unlink it as part of the same atomic operation. The caller needs
the same permissions as shm_unlink to the shm being renamed, and the
same permissions for the shm at the destination which is being
unlinked, if it exists. If those fail, EACCES is returned, as with the
other shm_* syscalls.
truss support is included; audit support will come later.
This commit includes only the implementation; the sysent-generated
bits will come in a follow-on commit.
Submitted by: Matthew Bryan <matthew.bryan@isilon.com>
Reviewed by: jilles (earlier revision)
Reviewed by: brueffer (manpages, earlier revision)
Relnotes: yes
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D21423
memfd_create is effectively a SHM_ANON shm_open(2) mapping with optional
CLOEXEC and file sealing support. This is used by some mesa parts, some
linux libs, and qemu can also take advantage of it and uses the sealing to
prevent resizing the region.
This reimplements shm_open in terms of shm_open2(2) at the same time.
shm_open(2) will be moved to COMPAT12 shortly.
Reviewed by: markj, kib
Differential Revision: https://reviews.freebsd.org/D21393
It is useful to have some tests for page fault signals.
More tests would be useful but creating the conditions (such as various
kinds of running out of memory and I/O errors) is more complicated.
The tests page_fault_signal__bus_objerr_1 and
page_fault_signal__bus_objerr_2 depend on https://reviews.freebsd.org/D21566
before they can pass.
PR: 211924
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D21624
Coverity complained that I wasn't initializing some class members until the
SetUp method. Do it in the constructor instead.
Reported by: Coverity
Coverity CIDs: 1404352, 1404378
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Where open(2) is expected to fail, the tests should assert or expect that
its return value is -1. These tests all accepted too much but happened to
pass anyway.
Reported by: Coverity
Coverity CID: 1404512, 1404378, 1404504, 1404483
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
to the traditional tree(3) RB trees, but using an array (preallocated,
linear chunk of memory) to store the tree.
This avoids allocation overhead, improves memory locality,
and makes it trivially easy to share/transfer/copy the entire tree
without the need for marshalling. The downside is that the size
is fixed at initialization time; there is no mechanism to resize
it.
This is one of the dependencies for the new stats(3) framework
(https://reviews.freebsd.org/D20477).
Reviewed by: bcr (man pages), markj
Discussed with: cem
MFC after: 2 weeks
Sponsored by: Klara Inc, Netflix
Obtained from: Netflix
Differential Revision: https://reviews.freebsd.org/D20324
When communicating with a FUSE server that implements version 7.8 (or older)
of the FUSE protocol, the FUSE_WRITE request structure is 16 bytes shorter
than normal. The protocol version check wasn't applied universally, leading
to an extra 16 bytes being sent to such servers. The extra bytes were
allocated and bzero()d, so there was no information disclosure.
Reviewed by: emaste
MFC after: 3 days
MFC-With: r350665
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21557
The fusefs tests deliberately leak file descriptors. To do otherwise would
add extra complications to the tests' mock FUSE server. This annotation
should hopefully convince Coverity to shut up about the leaks.
Reviewed by: uqs
MFC after: 4 days
Sponsored by: The FreeBSD Foundation
Address the following defects reported by Coverity:
* Structurally dead code (CID 1404366): set m_quit before FAIL, not after
* Unchecked return value of sysctlbyname (CID 1404321)
* Unchecked return value of stat(2) (CID 1404471)
* Unchecked return value of open(2) (CID 1404402, 1404529)
* Unchecked return value of dup(2) (CID 1404478)
* Buffer overflows. These are all false positives caused by the fact that
Coverity thinks I'm using a buffer to store strings, when in fact I'm
really just using it to store a byte array that happens to be initialized
with a string. I'm changing the type from char to uint8_t in the hopes
that it will placate Coverity. (CID 1404338, 1404350, 1404367, 1404376,
1404379, 1404381, 1404388, 1404403, 1404425, 1404433, 1404434, 1404474,
1404480, 1404484, 1404503, 1404505)
* False positive file descriptor leak. I'm going to try to fix this with
Coverity modeling, but I'll also change an EXPECT to ASSERT so we don't
perform meaningless assertions after the failure. (CID 1404320, 1404324,
1404440, 1404445).
* Unannotated file descriptor leak. This will be followed up by a Coverity
modeling change. (CID 1404326, 1404334, 1404336, 1404357, 1404361,
1404372, 1404391, 1404395, 1404409, 1404430, 1404448, 1404451, 1404455,
1404457, 1404458, 1404460)
* Uninitialized variables in C++ constructors (CID 1404327, 1404346). In the
case of m_maxphys, this actually led to part of the FUSE_INIT's response
being set to stack garbage during the WriteCluster::clustering test.
* Uninitialized sun_len field in struct sockaddr_un (CID 1404330, 1404371,
1404429).
Reported by: Coverity
Reviewed by: emaste
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21457
r339782 re-enabled acl test 00 and 02, which were disabled in r336617
due to PR 229930.
When the tests were disabled the code to set their required programs was
disabled as well, but this was not reinstated when r339782 re-enabled
them.
Do so now.
Sponsored by: Axiado
* A small error in r338152 let to the returned size always being exactly
eight bytes too large.
* The FUSE_LISTXATTR operation works like Linux's listxattr(2): if the
caller does not provide enough space, then the server should return ERANGE
rather than return a truncated list. That's true even though in FUSE's
case the kernel doesn't provide space to the client at all; it simply
requests a maximum size for the list. We previously weren't handling the
case where the server returns ERANGE even though the kernel requested as
much size as the server had told us it needs; that can happen due to a
race.
* We also need to ensure that a pathological server that always returns
ERANGE no matter what size we request in FUSE_LISTXATTR won't cause an
infinite loop in the kernel. As of this commit, it will instead cause an
infinite loop that exits and enters the kernel on each iteration, allowing
signals to be processed.
Reviewed by: cem
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21287
This makes it possible to perform mathematical operations on
fractional values without using floating point. It operates on Q
numbers, which are integer-sized, opaque structures initialized
to hold a chosen number of integer and fractional bits.
For a general description of the Q number system, see the "Fixed Point
Representation & Fractional Math" whitepaper[1]; for the actual
API see the qmath(3) man page.
This is one of dependencies for the upcoming stats(3) framework[2]
that will be applied to the TCP stack in a later commit.
1. https://www.superkits.net/whitepapers/Fixed%20Point%20Representation%20&%20Fractional%20Math.pdf
2. https://reviews.freebsd.org/D20477
Reviewed by: bcr (man pages, earlier version), sef (earlier version)
Discussed with: cem, dteske, imp, lstewart
Sponsored By: Klara Inc, Netflix
Obtained from: Netflix
Differential Revision: https://reviews.freebsd.org/D20116
Motivated by the fact that I'm messing around near the implementation and
wanting to ensure this doesn't get messed up in the process.
MFC after: 1 week
Failure test cases:
sys.netpfil.common.pass_block.pf_v6
sys.netpfil.pf.pass_block.noalias
sys.netpfil.pf.pass_block.v6
Sponsored by: The FreeBSD Foundation
machine/regnum.h ends up being included by sys/procfs.h and sys/ptrace.h via
machine/reg.h. Many of the regnum definitions are too short and too generic
to be exposing to any userland application including one of these two
headers. Moreover, these actively cause build failures in googletest
(template <typename T1 ...> expanding to template <typename 9 ...>).
Hide the definitions behind _KERNEL or _WANT_MIPS_REGNUM, and patch all of
the userland consumers to define as needed.
Discussed with: imp, jhb
Reviewed by: imp, jhb
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D21330
Attempting to build the fusefs tests WITHOUT_GOOGLETEST will result in an
error if the host system or sysroot doesn't already have googletest headers
in /usr/include/private (e.g. host built/installed WITHOUT_GOOGLETEST, clean
cross-buildworld WITHOUT_GOOGLETEST).
Reviewed by: asomers
Differential Revision: https://reviews.freebsd.org/D21367
Add test for checking that the packets are dropped if it is fragmented into
more than the defined value.
Submitted by: Ahsan Barkati
Reviewed by: kp
Sponsored by: Google, Inc. (GSoC 2019)
Differential Revision: https://reviews.freebsd.org/D21307
This test tests the following:
- The firewall is able to set the tos bits
- The firewall is able to set the DSCP bits when EN bits is already set and
the EN bits remains unchanged.
- The firewall is able to drop the packets based on ToS value
Submitted by: Ahsan Barkati
Reviewed by: kp
Sponsored by: Google, Inc. (GSoC 2019)
Differential Revision: https://reviews.freebsd.org/D21305
The pft_ping.py and sniffer.py tool is moved from tests/sys/netpfil/pf to
tests/sys/netpfil/common directory because these tools are to be used in
common for all the firewalls.
Submitted by: Ahsan Barkati
Reviewed by: kp, thj
Sponsored by: Google, Inc. (GSoC 2019)
Differential Revision: https://reviews.freebsd.org/D21276
Inform D that C executed procctl(PROC_PDEATHSIG_CTL). Otherwise D
might allow B to exit before C is set up to receive a signal on the
parent exit. In this case, C waits forever for the signal and test
hangs.
PR: 237657
Reported and tested by: lwhsu
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
ptrace(PT_DETACH) requires stopped debuggee, otherwise it fails. When
the call fails, the C process is left as debuggee of the process D,
and might be killed too early if process D exits occurs fast enough.
Since pipes are not closed in the forked children, this resulted in
the test hanging, since no write occured from C to wake A.
PR: 237657
Reported and tested by: lwhsu
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
The FUSE_LISTXATTR operation always returns the full list of a file's
extended attributes, in all namespaces. There's no way to filter the list
server-side. However, currently FreeBSD's fusefs driver sends a namespace
string with the FUSE_LISTXATTR request. That behavior was probably copied
from fuse_vnop_getextattr, which has an attribute name argument. It's
been there ever since extended attribute support was added in r324620. This
commit removes it.
Reviewed by: cem
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21280
Some files got their contented duplicated in r345409. Some mistakes where
fixed in r345430. The only file that was left with a duplicated content was
CVE-2019-5598.py.
Reviewed by: kp
Approved by: src (kp)
Differential Revision: https://reviews.freebsd.org/D21267
The entirety of r351061 was a copy/paste error. I'm sorry I've been
comitting so hastily.
Reported by: rpokala
Reviewed by: rpokala
MFC after: 2 weeks
MFC-With: 351061
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21265
In FUSE protocol 7.9, the size of the FUSE_GETATTR request has increased.
However, the fusefs driver is currently not sending the additional fields.
In our implementation, the additional fields are always zero, so I there
haven't been any test failures until now. But fusefs-lkl requires the
request's length to be correct.
Fix this bug, and also enhance the test suite to catch similar bugs.
PR: 239830
MFC after: 2 weeks
MFC-With: 350665
Sponsored by: The FreeBSD Foundation
The test needs to expect a FUSE_FORGET operation. Most of the time the test
would pass anyway, because by chance FUSE_FORGET would arrive after the
unmount.
MFC after: 2 weeks
MFC-With: 350665
Sponsored by: The FreeBSD Foundation
This change makes required modifications in runtests to also only require the
aesni module on Intel (i386/amd64) platforms, as it is an Intel specific
module.
MFC after: 1 month
MFC to: ^/stable/12 (support not present on ^/stable/11)
Submitted by: Greg V <greg@unrelenting.technology>
Differential Revision: https://reviews.freebsd.org/D21018
This is a gcc 8.0+ warning which needed to be silenced on for the riscv
build. amd64-xtoolchain-gcc still uses gcc 6.4.0 and does not understand
this flag.
Reviewed by: asomers
Feedback from: imp
Differential Revision: https://reviews.freebsd.org/D21195
This commit imports the new fusefs driver. It raises the protocol level
from 7.8 to 7.23, fixes many bugs, adds a test suite for the driver, and
adds many new features. New features include:
* Optional kernel-side permissions checks (-o default_permissions)
* Implement VOP_MKNOD, VOP_BMAP, and VOP_ADVLOCK
* Allow interrupting FUSE operations
* Support named pipes and unix-domain sockets in fusefs file systems
* Forward UTIME_NOW during utimensat(2) to the daemon
* kqueue support for /dev/fuse
* Allow updating mounts with "mount -u"
* Allow exporting fusefs file systems over NFS
* Server-initiated invalidation of the name cache or data cache
* Respect RLIMIT_FSIZE
* Try to support servers as old as protocol 7.4
Performance enhancements include:
* Implement FUSE's FOPEN_KEEP_CACHE and FUSE_ASYNC_READ flags
* Cache file attributes
* Cache lookup entries, both positive and negative
* Server-selectable cache modes: writethrough, writeback, or uncached
* Write clustering
* Readahead
* Use counter(9) for statistical reporting
PR: 199934 216391 233783 234581 235773 235774 235775
PR: 236226 236231 236236 236291 236329 236381 236405
PR: 236327 236466 236472 236473 236474 236530 236557
PR: 236560 236844 237052 237181 237588 238565
Reviewed by: bcr (man pages)
Reviewed by: cem, ngie, rpokala, glebius, kib, bde, emaste (post-commit
review on project branch)
MFC after: 3 weeks
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Pull Request: https://reviews.freebsd.org/D21110
The process is reparented to the debugger while it is attached.
B B
/ ----> |
A A D
Every time when the process is reparented, it is added to the orphan list
of the previous parent:
A->orphan = B
D->orphan = NULL
When the A process will close the process descriptor to the B process,
the B process will be reparented to the init process.
B B - init
| ---->
A D A D
A->orphan = B
D->orphan = B
In this scenario, the B process is in the orphan list of A and D.
When the last process descriptor is closed instead of reparenting
it to the reaper let it stay with the debugger process and set
our previews parent to the reaper.
Add test case for this situation.
Notice that without this patch the kernel will crash with this test case:
panic: orphan 0xfffff8000e990530 of 0xfffff8000e990000 has unexpected oppid 1
Reviewed by: markj, kib
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D20361
Add a common test suite for the firewalls included in the base system. The test
suite allows common test infrastructure to test pf, ipfw and ipf firewalls from
test files containing the setup for all three firewalls.
Add the pass block test for pf, ipfw and ipf. The pass block test checks the
allow/deny functionality of the firewalls tested.
Submitted by: Ahsan Barkati
Sponsored by: Google, Inc. (GSoC 2019)
Reviewed by: kp
Approved by: bz (co-mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D21065
When a fusefs file system is mounted using the writeback cache, the cache
may still be bypassed by opening a file with O_DIRECT. When writing with
O_DIRECT, the cache must be invalidated for the affected portion of the
file. Fix some panics caused by inadvertently invalidating too much.
Sponsored by: The FreeBSD Foundation
FUSE file systems can optionally support interrupting outstanding
operations. However, the file system does not identify to the kernel at
mount time whether it's capable of doing that. Instead it signals its
noncapability by returning ENOSYS to the first FUSE_INTERRUPT operation it
receives. That's a problem for reliable signal delivery, because the kernel
must choose which thread should get a signal before it knows whether the
FUSE server can handle interrupts. The problem is even worse because the
FUSE protocol allows a file system to simply ignore all FUSE_INTERRUPT
operations.
Fix the signal delivery logic by making interruptibility an opt-in mount
option. This will require a corresponding change to libfuse, but not to
most file systems that link to libfuse.
Bump __FreeBSD_version due to the new mount option.
Sponsored by: The FreeBSD Foundation
1) Don't explicitly not mask SIGKILL. kern_sigprocmask won't allow it to be
masked, anyway.
2) Fix an infinite loop bug. If a process received both a maskable signal
lower than 9 (like SIGINT) and then received SIGKILL,
fticket_wait_answer would spin. msleep would immediately return EINTR,
but cursig would return SIGINT, so the sleep would get retried. Fix it
by explicitly checking whether SIGKILL has been received.
3) Abandon the sig_isfatal optimization introduced by r346357. That
optimization would cause fticket_wait_answer to return immediately,
without waiting for a response from the server, if the process were going
to exit anyway. However, it's vulnerable to a race:
1) fatal signal is received while fticket_wait_answer is sleeping.
2) fticket_wait_answer sends the FUSE_INTERRUPT operation.
3) fticket_wait_answer determines that the signal was fatal and returns
without waiting for a response.
4) Another thread changes the signal to non-fatal.
5) The first thread returns to userspace. Instead of exiting, the
process continues.
6) The application receives EINTR, wrongly believes that the operation
was successfully interrupted, and restarts it. This could cause
problems for non-idempotent operations like FUSE_RENAME.
Reported by: kib (the race part)
Sponsored by: The FreeBSD Foundation
This ptrace operation returns a structure containing the error and
return values from the current system call. It is only valid when a
thread is stopped during a system call exit (PL_FLAG_SCX is set).
The sr_error member holds the error value from the system call. Note
that this error value is the native FreeBSD error value that has _not_
been translated to an ABI-specific error value similar to the values
logged to ktrace.
If sr_error is zero, then the return values of the system call will be
set in sr_retval[0] and sr_retval[1].
Reviewed by: kib
MFC after: 1 month
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D20901
* Fix the kernel build with gcc by removing a redundant extern declaration
* In the tests, fix a printf format specifier that assumed LP64
Sponsored by: The FreeBSD Foundation
Previously fusefs would never recycle vnodes. After VOP_INACTIVE, they'd
linger around until unmount or the vnlru reclaimed them. This commit
essentially actives and inlines the old reclaim_revoked sysctl, and fixes
some issues dealing with the attribute cache and multiply linked files.
Sponsored by: The FreeBSD Foundation
closing a file descriptor causes FUSE activity that is superfluous to the
purpose of most tests, but would nonetheless require matching expectations.
Rather than do that, most tests deliberately leak file descriptors instead.
This commit moves the leakage from each test into two trivial functions:
leak and leakdir. Hopefully Coverity will only complain about those
functions and not all of their callers.
Sponsored by: The FreeBSD Foundation
Now the io tests are run in all cache modes. The fusefs test suite can now
get adequate coverage without changing the value of
vfs.fusefs.data_cache_mode, which is only needed for legacy file systems
now.
Sponsored by: The FreeBSD Foundation
As of protocol 7.23, fuse file systems can specify their cache behavior on a
per-mountpoint basis. If they set FUSE_WRITEBACK_CACHE in
fuse_init_out.flags, then they'll get the writeback cache. If not, then
they'll get the writethrough cache. If they set FOPEN_DIRECT_IO in every
FUSE_OPEN response, then they'll get no cache at all.
The old vfs.fusefs.data_cache_mode sysctl is ignored for servers that use
protocol 7.23 or later. However, it's retained for older servers,
especially for those running in jails that lack access to the new protocol.
This commit also fixes two other minor test bugs:
* WriteCluster:SetUp was using an uninitialized variable.
* Read.direct_io_pread wasn't verifying that the cache was actually
bypassed.
Sponsored by: The FreeBSD Foundation
If a server supports a timestamp granularity other than 1ns, it can tell the
client this as of protocol 7.23. The client will use that granularity when
updating its cached timestamps during write. This way the timestamps won't
appear to change following flush.
Sponsored by: The FreeBSD Foundation
I originally thought that the kernel would be responsible for ctime in
protocol 7.23. But now I realize that's not the case. The server is
responsible for ctime. The kernel only sets it when there are dirty writes
cached, because that's when the server can't.
Sponsored by: The FreeBSD Foundation
As of r349396 the kernel will internally update the mtime and ctime of files
on write. It will also flush the mtime should a SETATTR happen before the
data cache gets flushed. Now it will flush the ctime too, if the server is
using protocol 7.23 or higher.
This is the only case in which the kernel will explicitly set a file's
ctime, since neither utimensat(2) nor any other user interfaces allow it.
Sponsored by: The FreeBSD Foundation
Writing should implicitly update a file's mtime and ctime. For fuse, the
server is supposed to do that. But the client needs to do it too, because
the FUSE_WRITE response does not include time attributes, and it's not
desirable to issue a GETATTR after every WRITE. When using the writeback
cache, there's another hitch: the kernel should ignore the mtime and ctime
fields in any GETATTR response for files with a dirty write cache.
Sponsored by: The FreeBSD Foundation
Writes that extend a file should update the file's size. r344185 restricted
that behavior for fusefs to only happen when the data cache was enabled.
That probably made sense at the time because the attribute cache wasn't
fully baked yet. Now that it is, we should always update the cached file
size during write.
Sponsored by: The FreeBSD Foundation
Use the standard facilities for getpages and putpages instead of bespoke
implementations that don't work well with the writeback cache. This has
several corollaries:
* Change the way we handle short reads _again_. vfs_bio_getpages doesn't
provide any way to handle unexpected short reads. Plus, I found some more
lock-order problems. So now when the short read is detected we'll just
clear the vnode's attribute cache, forcing the file size to be requeried
the next time it's needed. VOP_GETPAGES doesn't have any way to indicate
a short read to the "caller", so we just bzero the rest of the page
whenever a short read happens.
* Change the way we decide when to set the FUSE_WRITE_CACHE bit. We now set
it for clustered writes even when the writeback cache is not in use.
Sponsored by: The FreeBSD Foundation
* During TearDown, close the test file before the backing file. That way
the backing file artifact will have the correct contents after the test
completes. It doesn't matter when running in Kyua, but it may when
running the test manually.
* Add a closeopen operation that mimics what FSX does with the "-c" option.
* Skip mmap-related tests when vfs.fusefs.data_cache_mode == 0
Sponsored by: The FreeBSD Foundation
VOP_GETPAGES intentionally tries to read beyond EOF, so fuse_read_biobackend
can't rely on bp->b_resid > 0 indicating a short read. And adjusting
bp->b_count after a short read seems to cause some sort of resource leak.
Instead, store the shortfall in the bp->b_fsprivate1 field.
Sponsored by: The FreeBSD Foundation
A fuse server may return a short read for three reasons:
* The file is opened with FOPEN_DIRECT_IO. In this case, the short read
should be returned directly to userland. We already handled this case
correctly.
* The file was truncated server-side, and the read hit EOF. In this case,
the kernel should update the file size. Fixed in the case of VOP_READ.
Fixing this for VOP_GETPAGES is TODO.
* The file is opened in writeback mode, there are dirty buffers past what
the server thinks is the file's EOF, and the read hit what the server
thinks is the file's EOF. In this case, the client is trying to read a
hole, and should zero-fill it. We already handled this case, and I added
a test for it.
Sponsored by: The FreeBSD Foundation
None of the new features are implemented yet. This commit just adds the new
protocol definitions and adds backwards-compatibility code for pre 7.23
servers.
Sponsored by: The FreeBSD Foundation
r349260 removed some Linuxisms from the FUSE protocol header file in favor
of standard C99 types. This change follows suit in the tests.
Sponsored by: The FreeBSD Foundation
This protocol level adds two new features: the ability for the server to
store or retrieve data into/from the client's cache. But the messages
aren't defined soundly since they identify the file only by its inode,
without the generation number. So it's possible for them to modify the
wrong file's cache. Also, I don't know of any file systems in ports that
use these messages. So I'm not implementing them. I did add a (disabled)
test for the store message, however.
Sponsored by: The FreeBSD Foundation
If the fuse daemon supports FUSE_BMAP, then use that for the block mapping.
Otherwise, use the same technique used by vop_stdbmap. Report large values
for runp and runb in order to maximize read clustering and minimize upcalls,
even if we don't know the true layout.
The major result of this change is that sequential reads to FUSE files will
now usually happen 128KB at a time instead of 64KB.
Sponsored by: The FreeBSD Foundation
* Don't always write the last page synchronously. That's not actually
required. It was probably just masking another bug that I fixed later,
possibly in r349021.
* Enable the NotifyWriteback tests now that Writeback cache is working.
* Add a test to ensure that the write cache isn't flushed synchronously when
in writeback mode.
Sponsored by: The FreeBSD Foundation
fusefs will now use cluster_read. This allows readahead of more than one
cache block. However, it won't yet actually cluster the reads because that
requires VOP_BMAP, which fusefs does not yet implement.
Sponsored by: The FreeBSD Foundation
Add experimental feature to increase concurrency in Fortuna. As this
diverges slightly from canonical Fortuna, and due to the security
sensitivity of random(4), it is off by default. To enable it, set the
tunable kern.random.fortuna.concurrent_read="1". The rest of this commit
message describes the behavior when enabled.
Readers continue to update shared Fortuna state under global mutex, as they
do in the status quo implementation of the algorithm, but shift the actual
PRF generation out from under the global lock. This massively reduces the
CPU time readers spend holding the global lock, allowing for increased
concurrency on SMP systems and less bullying of the harvestq kthread.
It is somewhat of a deviation from FS&K. I think the primary difference is
that the specific sequence of AES keys will differ if READ_RANDOM_UIO is
accessed concurrently (as the 2nd thread to take the mutex will no longer
receive a key derived from rekeying the first thread). However, I believe
the goals of rekeying AES are maintained: trivially, we continue to rekey
every 1MB for the statistical property; and each consumer gets a
forward-secret, independent AES key for their PRF.
Since Chacha doesn't need to rekey for sequences of any length, this change
makes no difference to the sequence of Chacha keys and PRF generated when
Chacha is used in place of AES.
On a GENERIC 4-thread VM (so, INVARIANTS/WITNESS, numbers not necessarily
representative), 3x concurrent AES performance jumped from ~55 MiB/s per
thread to ~197 MB/s per thread. Concurrent Chacha20 at 3 threads went from
roughly ~113 MB/s per thread to ~430 MB/s per thread.
Prior to this change, the system was extremely unresponsive with 3-4
concurrent random readers; each thread had high variance in latency and
throughput, depending on who got lucky and won the lock. "rand_harvestq"
thread CPU use was high (double digits), seemingly due to spinning on the
global lock.
After the change, concurrent random readers and the system in general are
much more responsive, and rand_harvestq CPU use dropped to basically zero.
Tests are added to the devrandom suite to ensure the uint128_add64 primitive
utilized by unlocked read functions to specification.
Reviewed by: markm
Approved by: secteam(delphij)
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D20313
rename the source to gsb_crc32.c.
This is a prerequisite of unifying kernel zlib instances.
PR: 229763
Submitted by: Yoshihiro Ota <ota at j.email.ne.jp>
Differential Revision: https://reviews.freebsd.org/D20193