Incorporate a fix from zol:
ab5036df1c
commit log from upstream:
Fix race in parallel mount's thread dispatching algorithm
Strategy of parallel mount is as follows.
1) Initial thread dispatching is to select sets of mount points that
don't have dependencies on other sets, hence threads can/should run
lock-less and shouldn't race with other threads for other sets. Each
thread dispatched corresponds to top level directory which may or may
not have datasets to be mounted on sub directories.
2) Subsequent recursive thread dispatching for each thread from 1)
is to mount datasets for each set of mount points. The mount points
within each set have dependencies (i.e. child directories), so child
directories are processed only after parent directory completes.
The problem is that the initial thread dispatching in
zfs_foreach_mountpoint() can be multi-threaded when it needs to be
single-threaded, and this puts threads under race condition. This race
appeared as mount/unmount issues on ZoL for ZoL having different
timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8).
`zfs unmount -a` which expects proper mount order can't unmount if the
mounts were reordered by the race condition.
There are currently two known patterns of input list `handles` in
`zfs_foreach_mountpoint(..,handles,..)` which cause the race condition.
1) #8833 case where input is `/a /a /a/b` after sorting.
The problem is that libzfs_path_contains() can't correctly handle an
input list with two same top level directories.
There is a race between two POSIX threads A and B,
* ThreadA for "/a" for test1 and "/a/b"
* ThreadB for "/a" for test0/a
and in case of #8833, ThreadA won the race. Two threads were created
because "/a" wasn't considered as `"/a" contains "/a"`.
2) #8450 case where input is `/ /var/data /var/data/test` after sorting.
The problem is that libzfs_path_contains() can't correctly handle an
input list containing "/".
There is a race between two POSIX threads A and B,
* ThreadA for "/" and "/var/data/test"
* ThreadB for "/var/data"
and in case of #8450, ThreadA won the race. Two threads were created
because "/var/data" wasn't considered as `"/" contains "/var/data"`.
In other words, if there is (at least one) "/" in the input list,
the initial thread dispatching must be single-threaded since every
directory is a child of "/", meaning they all directly or indirectly
depend on "/".
In both cases, the first non_descendant_idx() call fails to correctly
determine "path1-contains-path2", and as a result the initial thread
dispatching creates another thread when it needs to be single-threaded.
Fix a conditional in libzfs_path_contains() to consider above two.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
PR: 237517, 237397, 239243
Submitted by: Matthew D. Fuller <fullermd@over-yonder.net> (by email)
MFC after: 3 days
This snapshot among other things includes a fix for a crash of mandoc with empty
tbl reported by rea@ (his regression test has been incorporated upstream)
MFC after: 3 weeks
The values to report can be set via LUN options. It can be useful for
testing, and also required for Drive Maintenance 2016 feature set.
MFC after: 2 weeks
CTL implements all defined feature sets except Drive Maintenance 2016,
which is not very applicable to such a virtual device, and implemented
only partially now. But may be it could be fixed later at least for
completeness.
MFC after: 2 weeks
if a demotion succeeds, then all of the 4KB page mappings within the
superpage-sized region must be valid, so there is no point in testing the
validity of the 4KB page mapping that is going to be write protected.
Deindent the nearby code.
Reviewed by: kib, markj
Tested by: pho (amd64, i386)
X-MFC after: r350004 (this change depends on arm64 dirty bit emulation)
Differential Revision: https://reviews.freebsd.org/D21027
The timeout field in the CAPS register is defined to be 8 bits, so its type was
uint8_t. We recently started adding 1 to it to cope with rogue devices that
listed 0 timeout time (which is impossible). However, in so doing, other devices
that list 0xff (for a 2 minute timeout) were broken when adding 1
overflowed. Widen the type to be uint32_t like its source register to avoid the
issue.
Reported by: bapt@
ATA sanitize is functionally identical to SCSI, just uses different
initiation commands and status reporting mechanism.
While there, make kernel better handle sanitize commands and statuses.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Added allocation retry loop in alloc_pvo_entry(), to wait for
memory to become available if the caller specifies the M_WAITOK flag.
Also, the loop in moa64_enter() was removed, as moea64_pvo_enter()
never returns ENOMEM. It is alloc_pvo_entry() memory allocation that
can fail and must be retried.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D21035
Error message:
make[5]: make[5]: don't know how to make /usr/obj/usr/src/amd64.amd64/obj-lib32/tmp/sys/netinet/in.h. Stop
make[5]: stopped in /usr/src/lib/libsysdecode
Sponsored by: The FreeBSD Foundation
This commit is for the correct version. (The incorrect one had the order
of the last two entries reversed, due to my testing with copy_file_range
at 568 instead of 569.) This misordering should not have been a problem,
but is now fixed.
r350315 created a Linux compatible copy_file_range(2) syscall.
It uses a VOP method called VOP_COPY_FILE_RANGE so that file systems,
such as the NFSv4.2 client can do file system specific copying.
For NFSv4.2, this allows the copying to be done locally on the NFS server,
avoiding transferring the data across the wire twice.
This is a new man page (content changed).
Reviewed by: kib, asomers
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D20584
copy_file_range(2) is a Linux compatible syscall created by r350315.
Reviewed by: kib, asomers
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D20584
copy_file_range.2 is a new man page (content change).
Reviewed by: kib, asomers
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D20584
This patch adds support to the kernel for a Linux compatible
copy_file_range(2) syscall and the related VOP_COPY_FILE_RANGE(9).
This syscall/VOP can be used by the NFSv4.2 client to implement the
Copy operation against an NFSv4.2 server to do file copies locally on
the server.
The vn_generic_copy_file_range() function in this patch can be used
by the NFSv4.2 server to implement the Copy operation.
Fuse may also me able to use the VOP_COPY_FILE_RANGE() method.
vn_generic_copy_file_range() attempts to maintain holes in the output
file in the range to be copied, but may fail to do so if the input and
output files are on different file systems with different _PC_MIN_HOLE_SIZE
values.
Separate commits will be done for the generated syscall files and userland
changes. A commit for a compat32 syscall will be done later.
Reviewed by: kib, asomers (plus comments by brooks, jilles)
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D20584
fusefs file systems may have a fsname subtype (set by mount_fusefs's "-o
subtype" option) that gets appended to the fsname as returned by statfs(2).
The subtype is set on a per-mount basis so it isn't part of the struct
vfsconf. Special-case getvfsbyname to match either the full "fusefs.foobar"
or short "fusefs" fsname.
This is a merge of r348007, r348054, and r350093 from projects/fuse2
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21043
Summary:
It turns out statistics accounting is very expensive in the pmap driver,
and doesn't seem necessary in the common case. Make this optional
behind a MOEA64_STATS #define, which one can set if they really need
statistics.
This saves ~7-8% on buildworld time on a POWER9.
Found by bdragon.
Reviewed by: luporl
Differential Revision: https://reviews.freebsd.org/D20903
turnstile_{lock,unlock}() were added for use in epoch. turnstile_lock()
returned NULL to indicate that the calling thread had lost a race and
the turnstile was no longer associated with the given lock, or the lock
owner. However, reader-writer locks may not have a designated owner,
in which case turnstile_lock() would return NULL and
epoch_block_handler_preempt() would leak spinlocks as a result.
Apply a minimal fix: return the lock owner as a separate return value.
Reviewed by: kib
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21048
Now that we have a way to obtain entropy in capability mode
(getrandom(2)), libcap_random is obsolete. Remove it.
Bump __FreeBSD_version in case anything happens to use it, though I've
found no consumers.
Reviewed by: delphij, emaste, oshogbo
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21033
Commit text by Jake:
If a driver's IFDI_ATTACH_PRE function fails, the iflib_device_register
function will free the ctx pointer. However, it does not reset the
device softc pointer to NULL.
This will result in memory corruption as a future access to the now
invalid pointer will corrupt memory that is later allocated on top of
the same memory location.
The iflib_device_deregister function correctly resets the softc pointer
by using device_set_softc().
This clears up the invalid dangling pointer and prevents memory
corruption that could lead to a panic or undefined behavior if the
device's driver failed to attach.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: erj@, gallatin@
MFC after: 1 week
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D21003
The already-listed APMC0D0F ID belongs to the Ampere eMAG aarch64
platform, but ACPI support was not even built on aarch64.
Submitted by: Greg V <greg_unrelenting.technology>
Differential Revision: https://reviews.freebsd.org/D21059
r349369 removed IP_MIN_MEMBERSHIPS and IPV6_MIN_MEMBERSHIPS, and r349893
removed TCP_RACK_SESS_CWV. libsysdecode lacked dependencies to trigger a
rebuild of tables.h.
Add explicit dependencies as a workaround to address these specific
cases; a holistic solution is still needed.
Sponsored by: The FreeBSD Foundation
This will be used to gate the fusefs tests. It's a partial merge of r348281
from projects/fuse2.
Reviewed by: kib, emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21044
of the TCP TS offset from taking the IP addresses and the TCP port
numbers into account to a version just taking only the IP addresses
into account. This works around broken middleboxes or endpoints.
The default is to keep the behaviour, which is also the behaviour
recommended in RFC 7323.
Reported by: devgs@ukr.net
Reviewed by: rrs@
MFC after: 2 weeks
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D20980
- Wrong order of casting and bit shift caused that enabling and disabling
queues didn't work properly for queues number larger than 32. Use literals
with right suffix instead.
- TX ring tail address was not updated during reinitiailzation of TX
structures. It could block sending traffic.
- Also remove unused variables 'eims' and 'active_queues'.
Submitted by: Krzysztof Galazka <krzysztof.galazka@intel.com>
Reviewed by: erj@
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D20826