Redirect (and temporal) route expiration was broken a while ago.
This change brings route expiration back, with unified IPv4/IPv6 handling code.
It introduces net.inet.icmp.redirtimeout sysctl, allowing to set
an expiration time for redirected routes. It defaults to 10 minutes,
analogues with net.inet6.icmp6.redirtimeout.
Implementation uses separate file, route_temporal.c, as route.c is already
bloated with tons of different functions.
Internally, expiration is implemented as an per-rnh callout scheduled when
route with non-zero rt_expire time is added or rt_expire is changed.
It does not add any overhead when no temporal routes are present.
Callout traverses entire routing tree under wlock, scheduling expired routes
for deletion and calculating the next time it needs to be run. The rationale
for such implemention is the following: typically workloads requiring large
amount of routes have redirects turned off already, while the systems with
small amount of routes will not inhibit large overhead during tree traversal.
This changes also fixes netstat -rn display of route expiration time, which
has been broken since the conversion from kread() to sysctl.
Reviewed by: bz
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D23075
Linux expects to be able to use posix_fallocate(2) on a memfd. Other places
would use this with shm_open(2) to act as a smarter ftruncate(2).
Test has been added to go along with this.
Reviewed by: kib (earlier version)
Differential Revision: https://reviews.freebsd.org/D23042
This re-enables building the googletest suite by default on mips and instead
specifically doesn't build fusefs tests for mips+clang builds. clang will
easily spent >= 1.5 hours compiling a single file due to a bug in
optimization (see LLVM PR 43263), so turn these off for now while that's
hashed out.
GCC builds are unaffected and build the fusefs tests as-is. Clang builds
only happen by early adopters attempting to hash out the remaining issues.
The comment has been updated to reflect its new position and use less strong
wording about imposing on people.
Discussed with: ngie, asomers
Reviewed by: ngie
Add ATF tests for most gmultipath operations. Add some dtrace probes too,
primarily for configuration changes that happen in response to provider
errors.
PR: 178473
MFC after: 2 weeks
Sponsored by: Axcient
Differential Revision: https://reviews.freebsd.org/D22235
The debugger like truss(1) depends on the wait(2) syscall. This syscall
waits for ALL children. When it is waiting for ALL child's the children
created by process descriptors are not returned. This behavior was
introduced because we want to implement libraries which may pdfork(1).
The behavior of process descriptor brakes truss(1) because it will
not be able to collect the status of processes with process descriptors.
To address this problem the status is returned to parent when the
child is traced. While the process is traced the debugger is the new parent.
In case the original parent and debugger are the same process it means the
debugger explicitly used pdfork() to create the child. In that case the debugger
should be using kqueue()/pdwait() instead of wait().
Add test case to verify that. The test case was implemented by markj@.
Reviewed by: kib, markj
Discussed with: jhb
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D20362
We have -Werror=strict-overflow so gcc complains:
In file included from /tmp/obj/workspace/src/amd64.amd64/tmp/usr/include/bitstring.h:36:0,
from /workspace/src/tests/sys/sys/bitstring_test.c:34:
/workspace/src/tests/sys/sys/bitstring_test.c: In function 'bit_ffc_at_test':
/workspace/src/sys/sys/bitstring.h:239:5: error: assuming signed overflow does not occur when assuming that (X + c) >= X is always true [-Werror=strict-overflow]
if (_start >= _nbits) {
^
Disable assuming overflow of signed integer will never happen by specifying
-fno-strict-overflow
Sponsored by: The FreeBSD Foundation
should identify accurately which function exhibited the bug.
Reviewed by: asomers
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D22519
r354991 removed variable-sized object initializing on defining. For the safe
reason, manually initialize the members to 0.
Sponsored by: The FreeBSD Foundation
Add bit_ffs_area_at and bit_ffc_area_at functions for searching a bit
string for a sequence of contiguous set or unset bits of at least the
specified size.
The bit_ffc_area function will be used by the Intel ice driver for
implementing resource assignment logic using a bitstring to represent
whether or not a given index has been assigned or is currently free.
The bit_ffs_area, bit_ffc_area_at and bit_ffs_area_at functions are
implemented for completeness.
I'd like to add further test cases for the new functions, but I'm not
really sure how to add them easily. The new functions depend on specific
sequences of bits being set, while the bitstring tests appear to run for
varying bit sizes.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: asomers@, erj@
MFC after: 1 week
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D22400
bit_ffs_at and bit_ffc_at both take _start parameters which indicate to
start searching from _start onwards.
If the given _start index is past the size of the bit string, these
functions will calculate an address of the current bitstring which is
after the expected size. The function will also dereference the memory,
resulting in a read buffer overflow.
The output of the function remains correct, because the tests ensure to
stop the loop if the current bitstring chunk passes the stop bitstring
chunk, and because of a check to ensure the reported _value is never
past _nbits.
However, if <sys/bitstring.h> is ever used in code which is checked by
-fsanitize=undefined, or similar static analysis, it can produce
warnings about reading past the buffer size.
Because of the above mentioned checks, these buffer overflows do not
occur as long as _start is less than _nbits. Additionally, by definition
bit_ffs_at and bif_ffc_at should set _result to -1 in any case where the
_start is after the _nbits.
Check for this case at the start of the function and exit early if so,
preventing the buffer read overflow, and reducing the amount of
computation that occurs.
Note that it may seem odd to ever have code that could call bit_ffc_at
or bit_ffs_at with a _start value greater than _nbits. However, consider
a for-loop that used bit_ffs and bit_ffs_at to loop over a bit string
and perform some operation on each bit that was set. If the last bit of
the bit string was set, the simplest loop implementation would call
bit_ffs_at with a start of _nbits, and expect that to return -1. While
it does infact perform correctly, this is what ultimately triggers the
unexpected buffer read overflow.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: asomers@, erj@
MFC after: 1 week
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D22398
After r354748 mld_input() can change the mbuf. The new pointer
is never returned to icmp6_input() and when passed to
icmp6_rip6_input() the mbuf may no longer valid leading to
a panic.
Pass a pointer to the mbuf to mld_input() so we can return an
updated version in the non-error case.
Add a test sending an MLD packet case which will trigger this bug.
Pointyhat to: bz
Reported by: gallatin, thj
MFC After: 2 weeks
X-MFC with: r354748
Sponsored by: Netflix
Co-mingling two things here:
* Addressing some feedback from Konstantin and Kyle re: jail,
capability mode, and a few other things
* Adding audit support as promised.
The audit support change includes a partial refresh of OpenBSM from
upstream, where the change to add shm_rename has already been
accepted. Matthew doesn't plan to work on refreshing anything else to
support audit for those new event types.
Submitted by: Matthew Bryan <matthew.bryan@isilon.com>
Reviewed by: kib
Relnotes: Yes
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D22083
RFC 8200 says:
"If the fragment is a whole datagram (that is, both the Fragment
Offset field and the M flag are zero), then it does not need
any further reassembly and should be processed as a fully
reassembled packet (i.e., updating Next Header, adjust Payload
Length, removing the Fragment header, etc.). .."
That means we should remove the fragment header and make all the adjustments
rather than just skipping over the fragment header. The difference should
be noticeable in that a properly handled atomic fragment triggering an ICMPv6
message at an upper layer (e.g. dest unreach, unreachable port) will not
include the fragment header.
Update the test cases to also test for an unfragmentable part. That is
needed so that the next header is properly updated (not just lengths).
MFC after: 3 weeks
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D22155
Remove mentions of fragmentation tests from extension header test.
Remove setting an MTU > IF_MAXMTU from the test cases to avoid warnings;
this was only possible in a local research tree.
MFC after: 2 weeks
Sponsored by: Netflix
Add a simple test case which can exercise some of the IPv6 extension
header code paths. At the moment only a small set of extension headers
is implemented and no options to the ones which take them.
Also implements a "bad" case to make sure that error handling works.
The tests were used to test m_pullup() changes to the code paths while
removing the KAME PULLDOWN_TEST cases and related macros.
MFC after: 3 weeks
Sponsored by: Netflix
There are times when we have to wait for reply packets. There are
either an ICMPv6 (error) reply or the expiration timeout.
In these cases synchonous ICMPv6 replies should arrive, always,
unless the packet is lost. Due to errors experienced with the
test software sending an invlaid request on at least i386 (*) these
packets are not generated. That means we are waiting for a long time
for the replies or even timeout the test case.
Manually set the "End" flag on these test cases as well, so they do
fail rather than timeout as the sniffer timeout happens. This improves
debugging options, reflects the error properly, and saves time on each
test suit run.
(*) The real cause for that is still to be found (see the referenced PRs)
PR: 241493, 239380
MFC after: 2 weeks
Sponsored by: Netflix
stderr:
Traceback (most recent call last):
File "/usr/tests/sys/netpfil/common/pft_ping.py", line 135, in <module>
main()
File "/usr/tests/sys/netpfil/common/pft_ping.py", line 124, in main
ping(args.sendif[0], args.to[0], args)
File "/usr/tests/sys/netpfil/common/pft_ping.py", line 74, in ping
raw = sp.raw(str(PAYLOAD_MAGIC))
File "/usr/local/lib/python3.6/site-packages/scapy/compat.py", line 52, in raw
return bytes(x)
TypeError: string argument without an encoding
MFC with: r354121
Sponsored by: The FreeBSD Foundation
In order to move python2 out of the test framework to avoid py2 vs. py3
confusions upgrade the remaining test cases using scapy to work with py3.
That means only one version of scapy needs to be installed in the CI system.
It also gives a path forward for testing i386 issues observed in the CI
system with some of these tests.
Fixes are:
- Use default python from environment (which is 3.x these days).
- properly ident some lines as common for the rest of the file to avoid
errors.
- cast the calculated offset to an int as the division result is considered
a float which is not accepted input.
- when comparing payload to a magic number make sure we always add the
payload properly to the packet and do not try to compare string in
the result but convert the data payload back into an integer.
- fix print formating.
Discussed with: lwhsu, kp (taking it off his todo :)
MFC after: 2 weeks
The change to conform to RFC 8200 for overlapping fragments now frees
the entire reassembly queue if the overlapping fragments are not an
exact match.
As a result we do see one less packet in the timeout statistics from
expiry. No other statistics change as the event is not counted.
It can be argued that we should improve the statistics counters in
that case.
This test case update should have been committed alongside the original
commit.
Pointyhat to: bz
MFC after: 3 weeks
X-MFC with: r354046
Sponsored by: Netflix
When we receive the packet with the first fragmented part (fragoff=0)
we remember the length of the unfragmentable part and the next header
(and should probably also remember ECN) as meta-data on the reassembly
queue.
Someone replying this packet so far could change these 2 (3) values.
While changing the next header seems more severe, for a full size
fragmented UDP packet, for example, adding an extension header to the
unfragmentable part would go unnoticed (as the framented part would be
considered an exact duplicate) but make reassembly fail.
So do not allow updating the meta-data after we have seen the first
fragmented part anymore.
The frag6_20 test case is added which failed before triggering an
ICMPv6 "param prob" due to the check for each queued fragment for
a max-size violation if a fragoff=0 packet was received.
MFC after: 3 weeks
Sponsored by: Netflix
When done with tests check that both the per-VNET and the global-fragmented-
packets-in-system counters are zero to make sure we do not leak counters or
queue entries.
This implies that for all test cases we either have to check for the ICMPv6
packet sent in case of TLL=0 expiry (if it is sent) or sleep at least long
enough for the TTL to expire for all packets (e.g., fragments where we do not
have the off=0 packet).
This also means that statistics are now updated to include all the expired
packets.
There are cases when we do not check for counters to be zero and this is
when testing VNET teardown to behave properly and not panic, when we are
intentionally leaving fragments in the system.
MFC after: 3 weeks
Sponsored by: Netflix
In order to ensure that changing the frag6 code does not change behaviour
or break code a set of test cases were implemented.
Like some other test cases these use Scapy to generate packets and possibly
wait for expected answers. In most cases we do check the global and
per interface (netstat) statistics output using the libxo output and grep
to validate fields and numbers. This is a bit hackish but we currently have
no better way to match a selected number of stats only (we have to ignore
some of the ND6 variables; otherwise we could use the entire list).
Test cases include atomic fragments, single fragments, multi-fragments,
and try to cover most error cases in the code currently.
In addition vnet teardown is tested to not panic.
A separate set (not in-tree currently) of probes were used in order to
make sure that the test cases actually test what they should.
The "sniffer" code was copied and adjusted from the netpfil version
as we sometimes will not get packets or have longer timeouts to deal with.
Sponsored by: Netflix
Set up two jails connected by an epair. Create VLAN interfaces in both
jails and check connectivity.
This is a very basic test, but exposed panics during the network stack
epoch work, so this is worth testing.
* Don't partition a disk if too few are available. Just rely on Kyua to
ensure that the tests aren't run with insufficient disks.
* Remove redundant cleanup steps
* In zpool_add_003_pos, store the temporary file in $PWD so Kyua will
automatically clean it up.
* Update zpool_add_005_pos to use dumpon instead of dumpadm. This test had
never been ported to FreeBSD.
* In zpool_add_005_pos, don't format the dump disk with UFS. That was
pointless.
MFC after: 2 weeks
Sponsored by: Axcient
The test is necessarily racy, because it depends on being able to complete a
"zpool add" before a previous resilver finishes. But it was racier than it
needed to be. Move the first "zpool add" to before the resilver starts.
MFC after: 2 weeks
Sponsored by: Axcient
The ZFS test suite was overriding the common $PWD variable with the path to
the pwd command, even though no test wanted to use it that way. Most tests
didn't notice, because ksh93 eventually restored it to its proper meaning.
MFC after: 2 weeks
Sponsored by: Axcient
* Don't create a UFS mountpoint just to store some temporary files. The
tests should always be executed with a sufficiently large TMPDIR.
Creating the UFS mountpoint is not only unneccessary, but it slowed
zpool_import_missing_002_pos greatly, because that test moves large files
between TMPDIR and the UFS mountpoint. This change also allows many of
the tests to be executed with just a single test disk, instead of two.
* Move zpool_import_missing_002_pos's backup device dir from / to $PWD to
prevent cross-device moves. On my system, these two changes improved that
test's speed by 39x. It should also prevent ENOSPC errors seen in CI.
* If insufficient disks are available, don't try to partition one of them.
Just rely on Kyua to skip the test. Users who care will configure Kyua
with sufficient disks.
MFC after: 2 weeks
Sponsored by: Axcient
It was trying to destroy the pool while zfsd was detaching the spare, and
"zpool destroy" failed. Fix by waiting until the spare has fully detached.
MFC after: 2 weeks
Sponsored by: Axcient
The test declared that it only needed 5 disks, but actually tried to use 6.
Fix it to use just 5, which is all it really needs.
MFC after: 2 weeks
Sponsored by: Axcient
ATF functions such as ATF_REQUIRE do not work correctly in child processes.
Use plain C functions to report errors instead.
In the parent, check for the untimely demise of children. Without this,
the test hung until the framework's timeout.
Raise the resource limit on the number of open files. If this was too low,
the test hit the two problems above.
Restore the kern.maxfiles sysctl OID in the cleanup function.
The body prematurely removed the symlink in which the old value was saved.
Make the test more robust by opening more files. In fact, due to the
integer division by 4, this was necessary to make the test valid with
some initial values of maxfiles. Thanks, asomers@.
wait() for children instead of sleeping.
Clean up a temporary file created by the test ("afile").
Reviewed by: asomers
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D21900
* Fix force_sync_path, which ensures that a file is fully flushed to disk.
Apparently "zpool history"'s performance has improved, but exporting and
importing the pool still works.
* Fix file_dva by using undocumented zdb syntax to clarify that we're
interested in the pool's root file system, not the pool itself. This
should also fix the zpool_clear_001_pos test.
* Remove a redundant cleanup step
MFC after: 2 weeks
Sponsored by: Axcient
Differential Revision: https://reviews.freebsd.org/D21901
These tests have never worked correctly
* Replace runwattr with sudo
* Fix a scoping bug with the "dtst" variable
* Cleanup user properties created during tests
* Eliminate the checks for refreservation and send support. They will always
be supported.
* Fix verify_fs_snapshot. It seemed to assume that permissions would not yet
be delegated, but that's not how it's actually used.
* Combine verify_fs_promote with verify_vol_promote
* Remove some useless sleeps
* Fix backwards condition in verify_vol_volsize
* Remove some redundant cleanup steps in the tests. cleanup.ksh will handle
everything.
* Disable some parts of the tests that FreeBSD doesn't support:
* Creating snapshots with mkdir
* devices
* shareisci
* sharenfs
* xattr
* zoned
The sharenfs parts could probably be reenabled with more work to remove the
Solarisms.
MFC after: 2 weeks
Sponsored by: Axcient
Differential Revision: https://reviews.freebsd.org/D21898
ZFS has grown some additional properties that hadn't been added to the
config file yet. While I'm here, improve the error message, and remove a
superfluous command.
MFC after: 2 weeks
Sponsored by: Axcient
This test attempts to corrupt a file-backed vdev by deleting it and then
recreating it with truncate. But that doesn't work, because the pool
already has the vdev open, and it happily hangs on to the open-but-deleted
file. Fix by truncating the file without deleting it.
MFC after: 2 weeks
Sponsored by: Axcient
* Adapt zvol_misc_001_neg to use dumpon instead of Solaris's dumpadm
* Disable zvol_misc_003_neg, zvol_misc_005_neg, and zvol_misc_006_pos,
because they involve using a zvol as a dump device, which FreeBSD does not
yet support.
MFC after: 2 weeks
Sponsored by: Axcient
* Remove zpool_create_013_neg. FreeBSD doesn't have an equivalent of
Solaris's metadevices. GEOM would be the equivalent, but since all geoms
are the same from ZFS's perspective, this test would be redundant with
zpool_create_012_neg
* Remove zpool_create_014_neg. FreeBSD does not support swapping to regular
files.
* Remove zpool_create_016_pos. This test is redundant with literally every
other test that creates a disk-backed pool.
* s:/etc/vfstab:/etc/fstab in zpool_create_011_neg
* Delete the VTOC-related portion of zpool_create_008_pos. FreeBSD doesn't
use VTOC.
* Replace dumpadm with dumpon and swap with swapon in multiple tests.
* In zpool_create_015_neg, don't require "zpool create -n" to fail. It's
reasonable for that variant to succeed, because it doesn't actually open
the zvol.
* Greatly simplify zpool_create_012_neg. Make it safer, too, but not
interfering with the system's regular swap devices.
* Expect zpool_create_011_neg to fail (PR 241070)
* Delete some redundant cleanup steps in various tests
* Remove some unneeeded ATF timeout specifications. The default is fine.
PR: 241070
MFC after: 2 weeks
Sponsored by: Axcient
kern_shm_open2(), since conception, completely fails to pass the mode along
to kern_shm_open(). This breaks most uses of it.
Add tests alongside this that actually check the mode of the returned
files.
PR: 240934 [pulseaudio breakage]
Reported by: ler, Andrew Gierth [postgres breakage]
Diagnosed by: Andrew Gierth (great catch)
Tested by: ler, tmunro
Pointy hat to: kevans
Add an atomic shm rename operation, similar in spirit to a file
rename. Atomically unlink an shm from a source path and link it to a
destination path. If an existing shm is linked at the destination
path, unlink it as part of the same atomic operation. The caller needs
the same permissions as shm_unlink to the shm being renamed, and the
same permissions for the shm at the destination which is being
unlinked, if it exists. If those fail, EACCES is returned, as with the
other shm_* syscalls.
truss support is included; audit support will come later.
This commit includes only the implementation; the sysent-generated
bits will come in a follow-on commit.
Submitted by: Matthew Bryan <matthew.bryan@isilon.com>
Reviewed by: jilles (earlier revision)
Reviewed by: brueffer (manpages, earlier revision)
Relnotes: yes
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D21423
memfd_create is effectively a SHM_ANON shm_open(2) mapping with optional
CLOEXEC and file sealing support. This is used by some mesa parts, some
linux libs, and qemu can also take advantage of it and uses the sealing to
prevent resizing the region.
This reimplements shm_open in terms of shm_open2(2) at the same time.
shm_open(2) will be moved to COMPAT12 shortly.
Reviewed by: markj, kib
Differential Revision: https://reviews.freebsd.org/D21393
It is useful to have some tests for page fault signals.
More tests would be useful but creating the conditions (such as various
kinds of running out of memory and I/O errors) is more complicated.
The tests page_fault_signal__bus_objerr_1 and
page_fault_signal__bus_objerr_2 depend on https://reviews.freebsd.org/D21566
before they can pass.
PR: 211924
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D21624
Coverity complained that I wasn't initializing some class members until the
SetUp method. Do it in the constructor instead.
Reported by: Coverity
Coverity CIDs: 1404352, 1404378
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Where open(2) is expected to fail, the tests should assert or expect that
its return value is -1. These tests all accepted too much but happened to
pass anyway.
Reported by: Coverity
Coverity CID: 1404512, 1404378, 1404504, 1404483
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
to the traditional tree(3) RB trees, but using an array (preallocated,
linear chunk of memory) to store the tree.
This avoids allocation overhead, improves memory locality,
and makes it trivially easy to share/transfer/copy the entire tree
without the need for marshalling. The downside is that the size
is fixed at initialization time; there is no mechanism to resize
it.
This is one of the dependencies for the new stats(3) framework
(https://reviews.freebsd.org/D20477).
Reviewed by: bcr (man pages), markj
Discussed with: cem
MFC after: 2 weeks
Sponsored by: Klara Inc, Netflix
Obtained from: Netflix
Differential Revision: https://reviews.freebsd.org/D20324
When communicating with a FUSE server that implements version 7.8 (or older)
of the FUSE protocol, the FUSE_WRITE request structure is 16 bytes shorter
than normal. The protocol version check wasn't applied universally, leading
to an extra 16 bytes being sent to such servers. The extra bytes were
allocated and bzero()d, so there was no information disclosure.
Reviewed by: emaste
MFC after: 3 days
MFC-With: r350665
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21557
The fusefs tests deliberately leak file descriptors. To do otherwise would
add extra complications to the tests' mock FUSE server. This annotation
should hopefully convince Coverity to shut up about the leaks.
Reviewed by: uqs
MFC after: 4 days
Sponsored by: The FreeBSD Foundation
Address the following defects reported by Coverity:
* Structurally dead code (CID 1404366): set m_quit before FAIL, not after
* Unchecked return value of sysctlbyname (CID 1404321)
* Unchecked return value of stat(2) (CID 1404471)
* Unchecked return value of open(2) (CID 1404402, 1404529)
* Unchecked return value of dup(2) (CID 1404478)
* Buffer overflows. These are all false positives caused by the fact that
Coverity thinks I'm using a buffer to store strings, when in fact I'm
really just using it to store a byte array that happens to be initialized
with a string. I'm changing the type from char to uint8_t in the hopes
that it will placate Coverity. (CID 1404338, 1404350, 1404367, 1404376,
1404379, 1404381, 1404388, 1404403, 1404425, 1404433, 1404434, 1404474,
1404480, 1404484, 1404503, 1404505)
* False positive file descriptor leak. I'm going to try to fix this with
Coverity modeling, but I'll also change an EXPECT to ASSERT so we don't
perform meaningless assertions after the failure. (CID 1404320, 1404324,
1404440, 1404445).
* Unannotated file descriptor leak. This will be followed up by a Coverity
modeling change. (CID 1404326, 1404334, 1404336, 1404357, 1404361,
1404372, 1404391, 1404395, 1404409, 1404430, 1404448, 1404451, 1404455,
1404457, 1404458, 1404460)
* Uninitialized variables in C++ constructors (CID 1404327, 1404346). In the
case of m_maxphys, this actually led to part of the FUSE_INIT's response
being set to stack garbage during the WriteCluster::clustering test.
* Uninitialized sun_len field in struct sockaddr_un (CID 1404330, 1404371,
1404429).
Reported by: Coverity
Reviewed by: emaste
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21457
r339782 re-enabled acl test 00 and 02, which were disabled in r336617
due to PR 229930.
When the tests were disabled the code to set their required programs was
disabled as well, but this was not reinstated when r339782 re-enabled
them.
Do so now.
Sponsored by: Axiado
* A small error in r338152 let to the returned size always being exactly
eight bytes too large.
* The FUSE_LISTXATTR operation works like Linux's listxattr(2): if the
caller does not provide enough space, then the server should return ERANGE
rather than return a truncated list. That's true even though in FUSE's
case the kernel doesn't provide space to the client at all; it simply
requests a maximum size for the list. We previously weren't handling the
case where the server returns ERANGE even though the kernel requested as
much size as the server had told us it needs; that can happen due to a
race.
* We also need to ensure that a pathological server that always returns
ERANGE no matter what size we request in FUSE_LISTXATTR won't cause an
infinite loop in the kernel. As of this commit, it will instead cause an
infinite loop that exits and enters the kernel on each iteration, allowing
signals to be processed.
Reviewed by: cem
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21287
This makes it possible to perform mathematical operations on
fractional values without using floating point. It operates on Q
numbers, which are integer-sized, opaque structures initialized
to hold a chosen number of integer and fractional bits.
For a general description of the Q number system, see the "Fixed Point
Representation & Fractional Math" whitepaper[1]; for the actual
API see the qmath(3) man page.
This is one of dependencies for the upcoming stats(3) framework[2]
that will be applied to the TCP stack in a later commit.
1. https://www.superkits.net/whitepapers/Fixed%20Point%20Representation%20&%20Fractional%20Math.pdf
2. https://reviews.freebsd.org/D20477
Reviewed by: bcr (man pages, earlier version), sef (earlier version)
Discussed with: cem, dteske, imp, lstewart
Sponsored By: Klara Inc, Netflix
Obtained from: Netflix
Differential Revision: https://reviews.freebsd.org/D20116
Motivated by the fact that I'm messing around near the implementation and
wanting to ensure this doesn't get messed up in the process.
MFC after: 1 week
Failure test cases:
sys.netpfil.common.pass_block.pf_v6
sys.netpfil.pf.pass_block.noalias
sys.netpfil.pf.pass_block.v6
Sponsored by: The FreeBSD Foundation
machine/regnum.h ends up being included by sys/procfs.h and sys/ptrace.h via
machine/reg.h. Many of the regnum definitions are too short and too generic
to be exposing to any userland application including one of these two
headers. Moreover, these actively cause build failures in googletest
(template <typename T1 ...> expanding to template <typename 9 ...>).
Hide the definitions behind _KERNEL or _WANT_MIPS_REGNUM, and patch all of
the userland consumers to define as needed.
Discussed with: imp, jhb
Reviewed by: imp, jhb
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D21330
Attempting to build the fusefs tests WITHOUT_GOOGLETEST will result in an
error if the host system or sysroot doesn't already have googletest headers
in /usr/include/private (e.g. host built/installed WITHOUT_GOOGLETEST, clean
cross-buildworld WITHOUT_GOOGLETEST).
Reviewed by: asomers
Differential Revision: https://reviews.freebsd.org/D21367
Add test for checking that the packets are dropped if it is fragmented into
more than the defined value.
Submitted by: Ahsan Barkati
Reviewed by: kp
Sponsored by: Google, Inc. (GSoC 2019)
Differential Revision: https://reviews.freebsd.org/D21307
This test tests the following:
- The firewall is able to set the tos bits
- The firewall is able to set the DSCP bits when EN bits is already set and
the EN bits remains unchanged.
- The firewall is able to drop the packets based on ToS value
Submitted by: Ahsan Barkati
Reviewed by: kp
Sponsored by: Google, Inc. (GSoC 2019)
Differential Revision: https://reviews.freebsd.org/D21305
The pft_ping.py and sniffer.py tool is moved from tests/sys/netpfil/pf to
tests/sys/netpfil/common directory because these tools are to be used in
common for all the firewalls.
Submitted by: Ahsan Barkati
Reviewed by: kp, thj
Sponsored by: Google, Inc. (GSoC 2019)
Differential Revision: https://reviews.freebsd.org/D21276
Inform D that C executed procctl(PROC_PDEATHSIG_CTL). Otherwise D
might allow B to exit before C is set up to receive a signal on the
parent exit. In this case, C waits forever for the signal and test
hangs.
PR: 237657
Reported and tested by: lwhsu
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
ptrace(PT_DETACH) requires stopped debuggee, otherwise it fails. When
the call fails, the C process is left as debuggee of the process D,
and might be killed too early if process D exits occurs fast enough.
Since pipes are not closed in the forked children, this resulted in
the test hanging, since no write occured from C to wake A.
PR: 237657
Reported and tested by: lwhsu
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
The FUSE_LISTXATTR operation always returns the full list of a file's
extended attributes, in all namespaces. There's no way to filter the list
server-side. However, currently FreeBSD's fusefs driver sends a namespace
string with the FUSE_LISTXATTR request. That behavior was probably copied
from fuse_vnop_getextattr, which has an attribute name argument. It's
been there ever since extended attribute support was added in r324620. This
commit removes it.
Reviewed by: cem
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21280
Some files got their contented duplicated in r345409. Some mistakes where
fixed in r345430. The only file that was left with a duplicated content was
CVE-2019-5598.py.
Reviewed by: kp
Approved by: src (kp)
Differential Revision: https://reviews.freebsd.org/D21267
The entirety of r351061 was a copy/paste error. I'm sorry I've been
comitting so hastily.
Reported by: rpokala
Reviewed by: rpokala
MFC after: 2 weeks
MFC-With: 351061
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21265
In FUSE protocol 7.9, the size of the FUSE_GETATTR request has increased.
However, the fusefs driver is currently not sending the additional fields.
In our implementation, the additional fields are always zero, so I there
haven't been any test failures until now. But fusefs-lkl requires the
request's length to be correct.
Fix this bug, and also enhance the test suite to catch similar bugs.
PR: 239830
MFC after: 2 weeks
MFC-With: 350665
Sponsored by: The FreeBSD Foundation
The test needs to expect a FUSE_FORGET operation. Most of the time the test
would pass anyway, because by chance FUSE_FORGET would arrive after the
unmount.
MFC after: 2 weeks
MFC-With: 350665
Sponsored by: The FreeBSD Foundation
This change makes required modifications in runtests to also only require the
aesni module on Intel (i386/amd64) platforms, as it is an Intel specific
module.
MFC after: 1 month
MFC to: ^/stable/12 (support not present on ^/stable/11)
Submitted by: Greg V <greg@unrelenting.technology>
Differential Revision: https://reviews.freebsd.org/D21018
This is a gcc 8.0+ warning which needed to be silenced on for the riscv
build. amd64-xtoolchain-gcc still uses gcc 6.4.0 and does not understand
this flag.
Reviewed by: asomers
Feedback from: imp
Differential Revision: https://reviews.freebsd.org/D21195
This commit imports the new fusefs driver. It raises the protocol level
from 7.8 to 7.23, fixes many bugs, adds a test suite for the driver, and
adds many new features. New features include:
* Optional kernel-side permissions checks (-o default_permissions)
* Implement VOP_MKNOD, VOP_BMAP, and VOP_ADVLOCK
* Allow interrupting FUSE operations
* Support named pipes and unix-domain sockets in fusefs file systems
* Forward UTIME_NOW during utimensat(2) to the daemon
* kqueue support for /dev/fuse
* Allow updating mounts with "mount -u"
* Allow exporting fusefs file systems over NFS
* Server-initiated invalidation of the name cache or data cache
* Respect RLIMIT_FSIZE
* Try to support servers as old as protocol 7.4
Performance enhancements include:
* Implement FUSE's FOPEN_KEEP_CACHE and FUSE_ASYNC_READ flags
* Cache file attributes
* Cache lookup entries, both positive and negative
* Server-selectable cache modes: writethrough, writeback, or uncached
* Write clustering
* Readahead
* Use counter(9) for statistical reporting
PR: 199934 216391 233783 234581 235773 235774 235775
PR: 236226 236231 236236 236291 236329 236381 236405
PR: 236327 236466 236472 236473 236474 236530 236557
PR: 236560 236844 237052 237181 237588 238565
Reviewed by: bcr (man pages)
Reviewed by: cem, ngie, rpokala, glebius, kib, bde, emaste (post-commit
review on project branch)
MFC after: 3 weeks
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Pull Request: https://reviews.freebsd.org/D21110
The process is reparented to the debugger while it is attached.
B B
/ ----> |
A A D
Every time when the process is reparented, it is added to the orphan list
of the previous parent:
A->orphan = B
D->orphan = NULL
When the A process will close the process descriptor to the B process,
the B process will be reparented to the init process.
B B - init
| ---->
A D A D
A->orphan = B
D->orphan = B
In this scenario, the B process is in the orphan list of A and D.
When the last process descriptor is closed instead of reparenting
it to the reaper let it stay with the debugger process and set
our previews parent to the reaper.
Add test case for this situation.
Notice that without this patch the kernel will crash with this test case:
panic: orphan 0xfffff8000e990530 of 0xfffff8000e990000 has unexpected oppid 1
Reviewed by: markj, kib
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D20361
Add a common test suite for the firewalls included in the base system. The test
suite allows common test infrastructure to test pf, ipfw and ipf firewalls from
test files containing the setup for all three firewalls.
Add the pass block test for pf, ipfw and ipf. The pass block test checks the
allow/deny functionality of the firewalls tested.
Submitted by: Ahsan Barkati
Sponsored by: Google, Inc. (GSoC 2019)
Reviewed by: kp
Approved by: bz (co-mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D21065
When a fusefs file system is mounted using the writeback cache, the cache
may still be bypassed by opening a file with O_DIRECT. When writing with
O_DIRECT, the cache must be invalidated for the affected portion of the
file. Fix some panics caused by inadvertently invalidating too much.
Sponsored by: The FreeBSD Foundation
FUSE file systems can optionally support interrupting outstanding
operations. However, the file system does not identify to the kernel at
mount time whether it's capable of doing that. Instead it signals its
noncapability by returning ENOSYS to the first FUSE_INTERRUPT operation it
receives. That's a problem for reliable signal delivery, because the kernel
must choose which thread should get a signal before it knows whether the
FUSE server can handle interrupts. The problem is even worse because the
FUSE protocol allows a file system to simply ignore all FUSE_INTERRUPT
operations.
Fix the signal delivery logic by making interruptibility an opt-in mount
option. This will require a corresponding change to libfuse, but not to
most file systems that link to libfuse.
Bump __FreeBSD_version due to the new mount option.
Sponsored by: The FreeBSD Foundation
1) Don't explicitly not mask SIGKILL. kern_sigprocmask won't allow it to be
masked, anyway.
2) Fix an infinite loop bug. If a process received both a maskable signal
lower than 9 (like SIGINT) and then received SIGKILL,
fticket_wait_answer would spin. msleep would immediately return EINTR,
but cursig would return SIGINT, so the sleep would get retried. Fix it
by explicitly checking whether SIGKILL has been received.
3) Abandon the sig_isfatal optimization introduced by r346357. That
optimization would cause fticket_wait_answer to return immediately,
without waiting for a response from the server, if the process were going
to exit anyway. However, it's vulnerable to a race:
1) fatal signal is received while fticket_wait_answer is sleeping.
2) fticket_wait_answer sends the FUSE_INTERRUPT operation.
3) fticket_wait_answer determines that the signal was fatal and returns
without waiting for a response.
4) Another thread changes the signal to non-fatal.
5) The first thread returns to userspace. Instead of exiting, the
process continues.
6) The application receives EINTR, wrongly believes that the operation
was successfully interrupted, and restarts it. This could cause
problems for non-idempotent operations like FUSE_RENAME.
Reported by: kib (the race part)
Sponsored by: The FreeBSD Foundation
This ptrace operation returns a structure containing the error and
return values from the current system call. It is only valid when a
thread is stopped during a system call exit (PL_FLAG_SCX is set).
The sr_error member holds the error value from the system call. Note
that this error value is the native FreeBSD error value that has _not_
been translated to an ABI-specific error value similar to the values
logged to ktrace.
If sr_error is zero, then the return values of the system call will be
set in sr_retval[0] and sr_retval[1].
Reviewed by: kib
MFC after: 1 month
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D20901
* Fix the kernel build with gcc by removing a redundant extern declaration
* In the tests, fix a printf format specifier that assumed LP64
Sponsored by: The FreeBSD Foundation
Previously fusefs would never recycle vnodes. After VOP_INACTIVE, they'd
linger around until unmount or the vnlru reclaimed them. This commit
essentially actives and inlines the old reclaim_revoked sysctl, and fixes
some issues dealing with the attribute cache and multiply linked files.
Sponsored by: The FreeBSD Foundation
closing a file descriptor causes FUSE activity that is superfluous to the
purpose of most tests, but would nonetheless require matching expectations.
Rather than do that, most tests deliberately leak file descriptors instead.
This commit moves the leakage from each test into two trivial functions:
leak and leakdir. Hopefully Coverity will only complain about those
functions and not all of their callers.
Sponsored by: The FreeBSD Foundation
Now the io tests are run in all cache modes. The fusefs test suite can now
get adequate coverage without changing the value of
vfs.fusefs.data_cache_mode, which is only needed for legacy file systems
now.
Sponsored by: The FreeBSD Foundation
As of protocol 7.23, fuse file systems can specify their cache behavior on a
per-mountpoint basis. If they set FUSE_WRITEBACK_CACHE in
fuse_init_out.flags, then they'll get the writeback cache. If not, then
they'll get the writethrough cache. If they set FOPEN_DIRECT_IO in every
FUSE_OPEN response, then they'll get no cache at all.
The old vfs.fusefs.data_cache_mode sysctl is ignored for servers that use
protocol 7.23 or later. However, it's retained for older servers,
especially for those running in jails that lack access to the new protocol.
This commit also fixes two other minor test bugs:
* WriteCluster:SetUp was using an uninitialized variable.
* Read.direct_io_pread wasn't verifying that the cache was actually
bypassed.
Sponsored by: The FreeBSD Foundation
If a server supports a timestamp granularity other than 1ns, it can tell the
client this as of protocol 7.23. The client will use that granularity when
updating its cached timestamps during write. This way the timestamps won't
appear to change following flush.
Sponsored by: The FreeBSD Foundation
I originally thought that the kernel would be responsible for ctime in
protocol 7.23. But now I realize that's not the case. The server is
responsible for ctime. The kernel only sets it when there are dirty writes
cached, because that's when the server can't.
Sponsored by: The FreeBSD Foundation
As of r349396 the kernel will internally update the mtime and ctime of files
on write. It will also flush the mtime should a SETATTR happen before the
data cache gets flushed. Now it will flush the ctime too, if the server is
using protocol 7.23 or higher.
This is the only case in which the kernel will explicitly set a file's
ctime, since neither utimensat(2) nor any other user interfaces allow it.
Sponsored by: The FreeBSD Foundation
Writing should implicitly update a file's mtime and ctime. For fuse, the
server is supposed to do that. But the client needs to do it too, because
the FUSE_WRITE response does not include time attributes, and it's not
desirable to issue a GETATTR after every WRITE. When using the writeback
cache, there's another hitch: the kernel should ignore the mtime and ctime
fields in any GETATTR response for files with a dirty write cache.
Sponsored by: The FreeBSD Foundation
Writes that extend a file should update the file's size. r344185 restricted
that behavior for fusefs to only happen when the data cache was enabled.
That probably made sense at the time because the attribute cache wasn't
fully baked yet. Now that it is, we should always update the cached file
size during write.
Sponsored by: The FreeBSD Foundation
Use the standard facilities for getpages and putpages instead of bespoke
implementations that don't work well with the writeback cache. This has
several corollaries:
* Change the way we handle short reads _again_. vfs_bio_getpages doesn't
provide any way to handle unexpected short reads. Plus, I found some more
lock-order problems. So now when the short read is detected we'll just
clear the vnode's attribute cache, forcing the file size to be requeried
the next time it's needed. VOP_GETPAGES doesn't have any way to indicate
a short read to the "caller", so we just bzero the rest of the page
whenever a short read happens.
* Change the way we decide when to set the FUSE_WRITE_CACHE bit. We now set
it for clustered writes even when the writeback cache is not in use.
Sponsored by: The FreeBSD Foundation
* During TearDown, close the test file before the backing file. That way
the backing file artifact will have the correct contents after the test
completes. It doesn't matter when running in Kyua, but it may when
running the test manually.
* Add a closeopen operation that mimics what FSX does with the "-c" option.
* Skip mmap-related tests when vfs.fusefs.data_cache_mode == 0
Sponsored by: The FreeBSD Foundation
VOP_GETPAGES intentionally tries to read beyond EOF, so fuse_read_biobackend
can't rely on bp->b_resid > 0 indicating a short read. And adjusting
bp->b_count after a short read seems to cause some sort of resource leak.
Instead, store the shortfall in the bp->b_fsprivate1 field.
Sponsored by: The FreeBSD Foundation
A fuse server may return a short read for three reasons:
* The file is opened with FOPEN_DIRECT_IO. In this case, the short read
should be returned directly to userland. We already handled this case
correctly.
* The file was truncated server-side, and the read hit EOF. In this case,
the kernel should update the file size. Fixed in the case of VOP_READ.
Fixing this for VOP_GETPAGES is TODO.
* The file is opened in writeback mode, there are dirty buffers past what
the server thinks is the file's EOF, and the read hit what the server
thinks is the file's EOF. In this case, the client is trying to read a
hole, and should zero-fill it. We already handled this case, and I added
a test for it.
Sponsored by: The FreeBSD Foundation
None of the new features are implemented yet. This commit just adds the new
protocol definitions and adds backwards-compatibility code for pre 7.23
servers.
Sponsored by: The FreeBSD Foundation
r349260 removed some Linuxisms from the FUSE protocol header file in favor
of standard C99 types. This change follows suit in the tests.
Sponsored by: The FreeBSD Foundation
This protocol level adds two new features: the ability for the server to
store or retrieve data into/from the client's cache. But the messages
aren't defined soundly since they identify the file only by its inode,
without the generation number. So it's possible for them to modify the
wrong file's cache. Also, I don't know of any file systems in ports that
use these messages. So I'm not implementing them. I did add a (disabled)
test for the store message, however.
Sponsored by: The FreeBSD Foundation
If the fuse daemon supports FUSE_BMAP, then use that for the block mapping.
Otherwise, use the same technique used by vop_stdbmap. Report large values
for runp and runb in order to maximize read clustering and minimize upcalls,
even if we don't know the true layout.
The major result of this change is that sequential reads to FUSE files will
now usually happen 128KB at a time instead of 64KB.
Sponsored by: The FreeBSD Foundation
* Don't always write the last page synchronously. That's not actually
required. It was probably just masking another bug that I fixed later,
possibly in r349021.
* Enable the NotifyWriteback tests now that Writeback cache is working.
* Add a test to ensure that the write cache isn't flushed synchronously when
in writeback mode.
Sponsored by: The FreeBSD Foundation
fusefs will now use cluster_read. This allows readahead of more than one
cache block. However, it won't yet actually cluster the reads because that
requires VOP_BMAP, which fusefs does not yet implement.
Sponsored by: The FreeBSD Foundation
Add experimental feature to increase concurrency in Fortuna. As this
diverges slightly from canonical Fortuna, and due to the security
sensitivity of random(4), it is off by default. To enable it, set the
tunable kern.random.fortuna.concurrent_read="1". The rest of this commit
message describes the behavior when enabled.
Readers continue to update shared Fortuna state under global mutex, as they
do in the status quo implementation of the algorithm, but shift the actual
PRF generation out from under the global lock. This massively reduces the
CPU time readers spend holding the global lock, allowing for increased
concurrency on SMP systems and less bullying of the harvestq kthread.
It is somewhat of a deviation from FS&K. I think the primary difference is
that the specific sequence of AES keys will differ if READ_RANDOM_UIO is
accessed concurrently (as the 2nd thread to take the mutex will no longer
receive a key derived from rekeying the first thread). However, I believe
the goals of rekeying AES are maintained: trivially, we continue to rekey
every 1MB for the statistical property; and each consumer gets a
forward-secret, independent AES key for their PRF.
Since Chacha doesn't need to rekey for sequences of any length, this change
makes no difference to the sequence of Chacha keys and PRF generated when
Chacha is used in place of AES.
On a GENERIC 4-thread VM (so, INVARIANTS/WITNESS, numbers not necessarily
representative), 3x concurrent AES performance jumped from ~55 MiB/s per
thread to ~197 MB/s per thread. Concurrent Chacha20 at 3 threads went from
roughly ~113 MB/s per thread to ~430 MB/s per thread.
Prior to this change, the system was extremely unresponsive with 3-4
concurrent random readers; each thread had high variance in latency and
throughput, depending on who got lucky and won the lock. "rand_harvestq"
thread CPU use was high (double digits), seemingly due to spinning on the
global lock.
After the change, concurrent random readers and the system in general are
much more responsive, and rand_harvestq CPU use dropped to basically zero.
Tests are added to the devrandom suite to ensure the uint128_add64 primitive
utilized by unlocked read functions to specification.
Reviewed by: markm
Approved by: secteam(delphij)
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D20313
rename the source to gsb_crc32.c.
This is a prerequisite of unifying kernel zlib instances.
PR: 229763
Submitted by: Yoshihiro Ota <ota at j.email.ne.jp>
Differential Revision: https://reviews.freebsd.org/D20193
fusefs will now read ahead at most one cache block at a time (usually 64
KB). Clustered reads are still TODO. Individual file systems may disable
read ahead by setting fuse_init_out.max_readahead=0 during initialization.
Sponsored by: The FreeBSD Foundation
At a basic level, remove assumptions about the underlying algorithm (such as
output block size and reseeding requirements) from the algorithm-independent
logic in randomdev.c. Chacha20 does not have many of the restrictions that
AES-ICM does as a PRF (Pseudo-Random Function), because it has a cipher
block size of 512 bits. The motivation is that by generalizing the API,
Chacha is not penalized by the limitations of AES.
In READ_RANDOM_UIO, first attempt to NOWAIT allocate a large enough buffer
for the entire user request, or the maximal input we'll accept between
signal checking, whichever is smaller. The idea is that the implementation
of any randomdev algorithm is then free to divide up large requests in
whatever fashion it sees fit.
As part of this, two responsibilities from the "algorithm-generic" randomdev
code are pushed down into the Fortuna ra_read implementation (and any other
future or out-of-tree ra_read implementations):
1. If an algorithm needs to rekey every N bytes, it is responsible for
handling that in ra_read(). (I.e., Fortuna's 1MB rekey interval for AES
block generation.)
2. If an algorithm uses a block cipher that doesn't tolerate partial-block
requests (again, e.g., AES), it is also responsible for handling that in
ra_read().
Several APIs are changed from u_int buffer length to the more canonical
size_t. Several APIs are changed from taking a blockcount to a bytecount,
to permit PRFs like Chacha20 to directly generate quantities of output that
are not multiples of RANDOM_BLOCKSIZE (AES block size).
The Fortuna algorithm is changed to NOT rekey every 1MiB when in Chacha20
mode (kern.random.use_chacha20_cipher="1"). This is explicitly supported by
the math in FS&K §9.4 (Ferguson, Schneier, and Kohno; "Cryptography
Engineering"), as well as by their conclusion: "If we had a block cipher
with a 256-bit [or greater] block size, then the collisions would not
have been an issue at all."
For now, continue to break up reads into PAGE_SIZE chunks, as they were
before. So, no functional change, mostly.
Reviewed by: markm
Approved by: secteam(delphij)
Differential Revision: https://reviews.freebsd.org/D20312
Add some basic regression tests to verify behavior of both uint128
implementations at typical boundary conditions, to run on all architectures.
Test uint128 increment behavior of Chacha in keystream mode, as used by
'kern.random.use_chacha20_cipher=1' (r344913) to verify assumptions at edge
cases. These assumptions are critical to the safety of using Chacha as a
PRF in Fortuna (as implemented).
(Chacha's use in arc4random is safe regardless of these tests, as it is
limited to far less than 4 billion blocks of output in that API.)
Reviewed by: markm
Approved by: secteam(gordon)
Differential Revision: https://reviews.freebsd.org/D20392
Our fusefs(5) module supports three cache modes: uncached, write-through,
and write-back. However, the write-through mode (which is the default) has
never actually worked as its name suggests. Rather, it's always been more
like "write-around". It wrote directly, bypassing the cache. The cache
would only be populated by a subsequent read of the same data.
This commit fixes that problem. Now the write-through mode works as one
would expect: write(2) immediately adds data to the cache and then blocks
while the daemon processes the write operation.
A side effect of this change is that non-cache-block-aligned writes will now
incur a read-modify-write cycle of the cache block. The old behavior
(bypassing write cache entirely) can still be achieved by opening a file
with O_DIRECT.
PR: 237588
Sponsored by: The FreeBSD Foundation
Enable write clustering in fusefs whenever cache mode is set to writeback
and the "async" mount option is used. With default values for MAXPHYS,
DFLTPHYS, and the fuse max_write mount parameter, that means sequential
writes will now be written 128KB at a time instead of 64KB.
Also, add a regression test for PR 238565, a panic during unmount that
probably affects UFS, ext2, and msdosfs as well as fusefs.
PR: 238565
Sponsored by: The FreeBSD Foundation
An errant vfs_bio_clrbuf snuck in in r348931. Surprisingly, it doesn't have
any effect most of the time. But under some circumstances it cause the
buffer to behave in a write-only fashion.
Sponsored by: The FreeBSD Foundation
Implements the missing test cases for epair in a similar fashion to the
existing tests. Fixes shared abstractions to work with epair tests.
Submitted by: Ryan Moeller <ryan@freqlabs.com>
Reviewed by: asomers
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D20498
The current "writeback" cache mode, selected by the
vfs.fusefs.data_cache_mode sysctl, doesn't do writeback cacheing at all. It
merely goes through the motions of using buf(9), but then writes every
buffer synchronously. This commit:
* Enables delayed writes when the sysctl is set to writeback cacheing
* Fixes a cache-coherency problem when extending a file whose last page has
just been written.
* Removes the "sync" mount option, which had been set unconditionally.
* Adjusts some SDT probes
* Adds several new tests that mimic what fsx does but with more control and
without a real file system. As I discover failures with fsx, I add
regression tests to this file.
* Adds a test that ensures we can append to a file without reading any data
from it.
This change is still incomplete. Clustered writing is not yet supported,
and there are frequent "panic: vm_fault_hold: fault on nofault entry" panics
that I need to fix.
Sponsored by: The FreeBSD Foundation
In r348560 I thought that FUSE_EXPORT_SUPPORT was required for cases where
the node to be invalidated (or the parent of the entry to be invalidated)
wasn't cached. But I realize now that that's not the case. During entry
invalidation, if the parent isn't in the vfs hash table, then it must've
been reclaimed. And since fuse_vnop_reclaim does a cache_purge, that means
the entry to be invalidated has already been removed from the namecache.
And during inode invalidation, if the inode to be invalidated isn't in the
vfs hash table, then it too must've been reclaimed. In that case it will
have no buffer cache to invalidate.
Sponsored by: The FreeBSD Foundation
Protocol 7.12 adds a way for the server to notify the client that it should
invalidate an inode's data cache and/or attributes. This commit implements
that mechanism. Unlike Linux's implementation, ours requires that the file
system also supports FUSE_EXPORT_SUPPORT (NFS-style lookups). Otherwise the
invalidation operation will return EINVAL.
Sponsored by: The FreeBSD Foundation
Protocol 7.12 adds a way for the server to notify the client that it should
invalidate an entry from its name cache. This commit implements that
mechanism.
Sponsored by: The FreeBSD Foundation
FUSE allows entries to be cached for a limited amount of time. fusefs's
vnop_lookup method already implements that using the timeout functionality
of cache_lookup/cache_enter_time. However, lookups for the NFS server go
through a separate path: vfs_vget. That path can't use the same timeout
functionality because cache_lookup/cache_enter_time only work on pathnames,
whereas vfs_vget works by inode number.
This commit adds entry timeout information to the fuse vnode structure, and
checks it during vfs_vget. This allows the NFS server to take advantage of
cached entries. It's also the same path that FUSE's asynchronous cache
invalidation operations will use.
Sponsored by: The FreeBSD Foundation
The tests are failing because the return value and output have changed, but
before test code structure adjusted, removing these test cases help people
be able to focus on more important cases.
Discussed with: emaste
MFC with: r348206
Sponsored by: The FreeBSD Foundation
This commit raises the protocol level and adds backwards-compatibility code
to handle structure size changes. It doesn't implement any new features.
The new features added in protocol 7.12 are:
* server-side umask processing (which FreeBSD won't do)
* asynchronous inode and directory entry invalidation (which I'll do next)
Sponsored by: The FreeBSD Foundation
These fields are supposed to contain the file descriptor flags as supplied
to open(2) or set by fcntl(2). The feature is kindof useless on FreeBSD
since we don't supply all of these flags to fuse (because of the weak
relationship between struct file and struct vnode). But we should at least
set the access mode flags (O_RDONLY, etc).
This is the last fusefs change needed to get full protocol 7.9 support.
There are still a few options we don't support for good reason (mandatory
file locking is dumb, flock support is broken in the protocol until 7.17,
etc), but there's nothing else to do at this protocol level.
Sponsored by: The FreeBSD Foundation
If a FUSE file system sets the FUSE_POSIX_LOCKS flag then it can support
fcntl(2)-style locks directly. However, the protocol does not adequately
support flock(2)-style locks until revision 7.17. They must be implemented
locally in-kernel instead. This unfortunately breaks the interoperability
of fcntl(2) and flock(2) locks for file systems that support the former.
C'est la vie.
Prior to this commit flock(2) would get sent to the server as a
fcntl(2)-style lock with the lock owner field set to stack garbage.
Sponsored by: The FreeBSD Foundation
This bit tells the server that we're not sure which uid, gid, and/or pid
originated the write. I don't know of a single file system that cares, but
it's part of the protocol.
Sponsored by: The FreeBSD Foundation
* Prefer std::unique_ptr to raw pointers
* Prefer pass-by-reference to pass-by-pointer
* Prefer static_cast to C-style cast, unless it's too much typing
Reported by: ngie
Sponsored by: The FreeBSD Foundation
* Fix printf format strings on 32-bit OSes
* Fix -Wclass-memaccess violation on GCC-8 caused by using memset on an object
of non-trivial type.
* Fix memory leak in MockFS::init
* Fix -Wcast-align error on i386 in expect_readdir
* Fix some heterogenous comparison errors on 32-bit OSes.
Sponsored by: The FreeBSD Foundation
* Only build the tests on platforms with C++14 support
* Fix an undefined symbol error on lint builds
* Remove an unused function: fiov_clear
Sponsored by: The FreeBSD Foundation
If a daemon sets the FUSE_ASYNC_READ flag during initialization, then the
client is allowed to issue multiple concurrent reads for the same file
handle. Otherwise concurrent reads are not allowed. This commit implements
it. Previously we unconditionally disallowed concurrent reads.
Sponsored by: The FreeBSD Foundation
This commit adds the VOPs needed by userspace NFS servers (tested with
net/unfs3). More work is needed to make the in-kernel nfsd work, because of
its stateless nature. It doesn't open files prior to doing I/O. Also, the
NFS-related VOPs currently ignore the entry cache.
Sponsored by: The FreeBSD Foundation
When mounted with -o default_permissions and when
vfs.fusefs.data_cache_mode=2, fuse_io_strategy would try to clear the suid
bit after a successful write by a non-owner. When combined with a
not-yet-committed attribute-caching patch I'm working on, and if the
FUSE_SETATTR response indicates an unexpected filesize (legal, if the file
system has other clients), this would end up calling vtruncbuf. That would
panic, because the buffer lock was already held by bufwrite or bufstrategy
or something else upstack from fuse_vnop_strategy.
Sponsored by: The FreeBSD Foundation
to then try to reproduce a kernel panic, which turned out to be a
race condition and hard to test from here.
Commit the changes anywhere as the "bind zero" case was a surprise
to me and we should try to maintain this status.
Also it is easy examples someone can build upon.
With help from: markj
Event: Waterloo Hackathon 2019
I have contributed a number of changes to these tests over the past few
hundred revisions, and believe I deserve credit for the changes I have
made (plus, the copyright hadn't been updated since 2014).
MFC after: 1 week
This is not completely necessary today, but this change is being made in a
conservative manner to avoid accidental breakage in the future, if this ever
was a unicode string.
PR: 237403
MFC after: 1 week
In python 3, the default encoding was switched from ascii character sets to
unicode character sets in order to support internationalization by default.
Some interfaces, like ioctls and packets, however, specify data in terms of
non-unicode encodings formats, either in host endian (`fcntl.ioctl`) or
network endian (`dpkt`) byte order/format.
This change alters assumptions made by previous code where it was all
data objects were assumed to be basestrings, when they should have been
treated as byte arrays. In order to achieve this the following are done:
* str objects with encodings needing to be encoded as ascii byte arrays are
done so via `.encode("ascii")`. In order for this to work on python 3 in a
type agnostic way (as it anecdotally varied depending on the caller), call
`.encode("ascii")` only on str objects with python 3 to cast them to ascii
byte arrays in a helper function name `str_to_ascii(..)`.
* `dpkt.Packet` objects needing to be passed in to `fcntl.ioctl(..)` are done
so by casting them to byte arrays via `bytes()`, which calls
`dpkt.Packet__str__` under the covers and does the necessary str to byte array
conversion needed for the `dpkt` APIs and `struct` module.
In order to accomodate this change, apply the necessary typecasting for the
byte array literal in order to search `fop.name` for nul bytes.
This resolves all remaining python 2.x and python 3.x compatibility issues on
amd64. More work needs to be done for the tests to function with i386, in
general (this is a legacy issue).
PR: 237403
MFC after: 1 week
Tested with: python 2.7.16 (amd64), python 3.6.8 (amd64)
Even though some python styles suggest there should be multiple newlines between
methods/classes, for consistency with the surrounding code, it's best to be
consistent by having merely one newline between each functional block.
MFC after: 1 week
Make `KAT(CCM)?Parser` into a context suite-capable object by implementing
`__enter__` and `__exit__` methods which manage opening up the file descriptors
and closing them on context exit. This implementation was decided over adding
destructor logic to a `__del__` method, as there are a number of issues around
object lifetimes when dealing with threading cleanup, atexit handlers, and a
number of other less obvious edgecases. Plus, the architected solution is more
pythonic and clean.
Complete the iterator implementation by implementing a `__next__` method for
both classes which handles iterating over the data using a generator pattern,
and by changing `__iter__` to return the object instead of the data which it
would iterate over. Alias the `__next__` method to `next` when working with
python 2.x in order to maintain functional compatibility between the two major
versions.
As part of this work and to ensure readability, push the initialization of the
parser objects up one layer and pass it down to a helper function. This could
have been done via a decorator, but I was trying to keep it simple for other
developers to make it easier to modify in the future.
This fixes ResourceWarnings with python 3.
PR: 237403
MFC after: 1 week
Tested with: python 2.7.16 (amd64), python 3.6.8 (amd64)
In version 3.2+, `array.array(..).tostring()` was renamed to
`array.array(..).tobytes()`. Conditionally call `array.array(..).tobytes()` if
the python version is 3.2+.
PR: 237403
MFC after: 1 week
Replace uses of `foo.encode("hex")` with `binascii.hexlify(foo)` for forwards
compatibility between python 2.x and python 3.
PR: 237403
MFC after: 1 week
Python 3 no longer doesn't support encoding/decoding hexadecimal numbers using
the `str.format` method. The backwards compatible new method (using the
binascii module/methods) is a comparable means of converting to/from
hexadecimal format.
In short, the functional change is the following:
* `foo.decode('hex')` -> `binascii.unhexlify(foo)`
* `foo.encode('hex')` -> `binascii.hexlify(foo)`
While here, move the dpkt import in `cryptodev.py` down per PEP8, so it comes
after the standard library provided imports.
PR: 237403
MFC after: 1 week
If a user sets both atime and mtime to UTIME_NOW when calling a syscall like
utimensat(2), allow the server to choose what "now" means. Due to the
design of FreeBSD's VFS, it's not possible to do this for just one of atime
or mtime; it's all or none.
PR: 237181
Sponsored by: The FreeBSD Foundation
If the server sets fuse_attr.blksize to a nonzero value in the response to
FUSE_GETATTR, then the client should use that as the value for
stat.st_blksize .
Sponsored by: The FreeBSD Foundation
This commit upgrades the FUSE API to protocol 7.9 and adds unit tests for
backwards compatibility with servers built for version 7.8. It doesn't
implement any of 7.9's new features yet.
Sponsored by: The FreeBSD Foundation
As of r347410 IPSec is no longer built into GENERIC. The ipsec.ko module must
be loaded before we can execute the IPSec tests.
Check this, and skip the tests if IPSec is not available.
When using poll, kevent, or select there was a race window during which it
would be impossible to shut down the daemon. The problem was that poll,
kevent, and select don't return when the file descriptor gets closed (or
maybe it was that the file descriptor got closed before those syscalls were
entered?). The solution is to impose a timeout on those syscalls, and check
m_quit after they time out.
Sponsored by: The FreeBSD Foundation
Just like /dev/devctl, /dev/fuse will now report the number of operations
available for immediate read in the kevent.data field during kevent(2).
Sponsored by: The FreeBSD Foundation
/dev/fuse was already pollable with poll and select. Add support for
kqueue, too. And add tests for polling with poll, select, and kqueue.
Sponsored by: The FreeBSD Foundation
* In the fatal_signal test, wait for the daemon to receive FUSE_INTERRUPT
before exiting.
* Explicitly disable restarting syscalls after SIGUSR2. This fixes
intermittency in the priority test. I don't know why, but sometimes that
test's mkdir would be restarted, and sometimes it would return EINTR.
ERESTART should be the default.
* Remove a useless copy/pasted sleep in the priority test.
Sponsored by: The FreeBSD Foundation
If the daemon dies, return ENOTCONN for all operations that have already
been sent to the daemon, as well as any new ones.
Sponsored by: The FreeBSD Foundation
When a FUSE daemon dies or closes /dev/fuse, all of that daemon's pending
requests must be terminated. Previously that was done in /dev/fuse's
.d_close method. However, d_close only gets called on the *last* close of
the device. That means that if multiple daemons were running concurrently,
all but the last daemon to close would leave their I/O hanging around. The
problem was easily visible just by running "kyua -v parallelism=2 test" in
fusefs's test directory.
Fix this bug by terminating a daemon's pending I/O during /dev/fuse's
cdvpriv dtor method instead. That method runs on every close of a file.
Also, fix some potential races in the tests:
* Clear SA_RESTART when registering the daemon's signal handler so read(2)
will return EINTR.
* Wait for the daemon to die before unmounting the mountpoint, so we won't
see an unwanted FUSE_DESTROY operation in the mock file system.
Sponsored by: The FreeBSD Foundation
* Convert from plain to TAP for slightly improved introspection when skipping
the tests due to requirements not being met.
* Test for the net/py-dpkt (origin) package being required when running the
tests, instead of relying on a copy of the dpkt.py module from 2014. This
enables the tests to work with py3. Subsequently, remove
`tests/sys/opencrypto/dpkt.py(c)?` via `make delete-old`.
* Parameterize out `python2` as `$PYTHON`.
PR: 237403
MFC after: 1 week
Some fusefs tests must sleep because they deliberately trigger a race, or
because they're testing the cache timeout functionality. Consolidate the
sleep interval in a single place so it will be easy to adjust. Shorten it
from either 500ms or 250ms to 100ms. From experiment I find that 10ms works
every time, so 100ms should be fairly safe.
Sponsored by: The FreeBSD Foundation
Replace some sleeps with semaphore operations. Not all sleeps can be
replaced, though. Some are trying to lose a race.
Sponsored by: The FreeBSD Foundation
libfuse expects sockets to be created with FUSE_MKNOD, not FUSE_CREATE,
because that's how Linux does it. My first attempt at creating sockets
(r346894) used FUSE_CREATE because FreeBSD uses VOP_CREATE for this purpose.
There are no backwards-compatibility concerns with this change, because
socket support hasn't yet been merged to head.
Sponsored by: The FreeBSD Foundation
Any change to a directory's contents should cause its mtime and ctime to be
updated by the FUSE daemon. Clear its attribute cache so we'll get the new
attributs the next time that they're needed. This affects the following
VOPs: VOP_CREATE, VOP_LINK, VOP_MKDIR, VOP_MKNOD, VOP_REMOVE, VOP_RMDIR, and
VOP_SYMLINK
Reported by: pjdfstest
Sponsored by: The FreeBSD Foundation
If the file to be renamed is a directory and it's going to get a new parent,
then the user must have write permissions to that directory, because the
".." dirent must be changed.
Reported by: pjdfstest
Sponsored by: The FreeBSD Foundation
FUSE_LINK returns a new set of attributes. fusefs should cache them just
like it does during other VOPs. This is not only a matter of performance
but of correctness too; without caching the new attributes the vnode's nlink
value would be out-of-date.
Reported by: pjdfstest
Sponsored by: The FreeBSD Foundation
Even an unprivileged user should be able to chown a file to its current
owner, or chgrp it to its current group. Those are no-ops.
Reported by: pjdfstest
Sponsored by: The FreeBSD Foundation
ftruncate should succeed as long as the file descriptor is writable, even if
the file doesn't have write permission. This is important when combined
with O_CREAT.
Reported by: pjdfstest
Sponsored by: The FreeBSD Foundation
Don't allow unprivileged users to set SGID on files to whose group they
don't belong. This is slightly different than what POSIX says we should do
(clear sgid on return from a successful chmod), but it matches what UFS
currently does.
Reported by: pjdfstest
Sponsored by: The FreeBSD Foundation
When mounted with -o default_permissions fusefs is supposed to validate all
permissions in the kernel, not the file system. This commit fixes two
permissions that I had previously overlooked.
* Only root may chown a file
* Non-root users may only chgrp a file to a group to which they belong
PR: 216391
Sponsored by: The FreeBSD Foundation
This test had been disabled because it was designed to check protocol
7.9-specific functionality. Enable it without the 7.9-specific bit.
Sponsored by: The FreeBSD Foundation
An off-by-one error led to the last page of a write not being removed from
its object, even though that page's buffer was marked as invalid.
PR: 235774
Sponsored by: The FreeBSD Foundation
Though it's not documented, Linux will interpret a FUSE_INTERRUPT response
of ENOSYS as "the file system does not support FUSE_INTERRUPT".
Subsequently it will never send FUSE_INTERRUPT again to the same mount
point. This change matches Linux's behavior.
PR: 346357
Sponsored by: The FreeBSD Foundation
`xrange` is a pre-python 2.x compatible idiom. Use `range` instead. The values
being iterated over are sufficiently small that using range on python 2.x won't
be a noticeable issue.
MFC after: 2 months