This commit brings back the driver from FreeBSD commit
f187d6dfbf633665ba6740fe22742aec60ce02a2 plus subsequent fixes from
upstream.
Relative to upstream this commit includes a few other small fixes such
as additional INET and INET6 #ifdef's, #include cleanups, and updates
for recent API changes in main.
Reviewed by: pauamma, gbe, kevans, emaste
Obtained from: git@git.zx2c4.com:wireguard-freebsd @ 3cc22b2
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36909
Refactor the symlink and mountpoint traversal logic to avoid
repeatedly checking the vnode type; a symlink cannot be a mountpoint
and vice versa. Avoid repeatedly checking cn_flags for NOCROSSMOUNT
and simplify the check which determines whether the vnode is a
mountpoint.
Suggested by: mjg
Reviewed by: kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D35054
These are of limited use since the crossmp vnode locking ops have not
actually used a lock since commit
a2d35545429117e68fbcbc68e14ad55e84265d69. We in fact require that
these operations are always issued with LK_SHARED. Additionally,
these directives can produce a false positive in certain VV_CROSSLOCK
cases which require upgrading of the covered vnode lock from shared
to exclusive.
While here, replace the runtime check of LK_SHARED with a KASSERT and
expand the check to include LK_NOWAIT, which all callers pass.
Reviewed by: kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D35054
When a lookup operation crosses into a new mountpoint, the mountpoint
must first be busied before the root vnode can be locked. When a
filesystem is unmounted, the vnode covered by the mountpoint must
first be locked, and then the busy count for the mountpoint drained.
Ordinarily, these two operations work fine if executed concurrently,
but with a stacked filesystem the root vnode may in fact use the
same lock as the covered vnode. By design, this will always be
the case for unionfs (with either the upper or lower root vnode
depending on mount options), and can also be the case for nullfs
if the target and mount point are the same (which admittedly is
very unlikely in practice).
In this case, we have LOR. The lookup path holds the mountpoint
busy while waiting on what is effectively the covered vnode lock,
while a concurrent unmount holds the covered vnode lock and waits
for the mountpoint's busy count to drain.
Attempt to resolve this LOR by allowing the stacked filesystem
to specify a new flag, VV_CROSSLOCK, on a covered vnode as necessary.
Upon observing this flag, the vfs_lookup() will leave the covered
vnode lock held while crossing into the mountpoint. Employ this flag
for unionfs with the caveat that it can't be used for '-o below' mounts
until other unionfs locking issues are resolved.
Reported by: pho
Tested by: pho
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D35054
In order to safely reuse excluded memory when it's reserved for special
purpose, we need to test whether or not the memory has been reserved
early in boot. physmem_excluded will return true when the entire range
is excluded, false otherwise.
Sponsored by: Netflix
List of changes:
- Use integer multiplication instead of long multiplication, because the result is an integer.
- Remove multiple if-statements and predict new if-statements.
- Rename local variable name, "ticks" into "retval" to avoid shadowing
the system "ticks" global variable.
Reviewed by: kib@ and imp@
MFC after: 1 week
Sponsored by: NVIDIA Networking
Differential Revision: https://reviews.freebsd.org/D36859
This allows to fix a bug where sbuf allocation done in the context of
dev_wired_cache_match() must use non-sleepable allocations.
Suggested by: jhb
Reviewed by: jhb, takawata
Discussed with: imp
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D36899
Later it would silently converted to ENOMEM always, because any error
was reported as NULL return path.
Reviewed by: jhb, takawata
Discussed with: imp
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D36899
Some virtual machines pass virtio MMIO device parameters via the kernel
command line as a series of virtio_mmio.device=<parameters> options.
These get translated into FreeBSD kernel environment variables; but
unfortunately they all use the same variable name, which resulted in
all but the first such parameter being ignored when the dynamic kernel
environment is set up from the initial environment buffers.
With this commit, duplicate environment settings will instead be stored
as ${name}_1, ${name}_2... ${name}_9999. In the unlikely event that
the same variable is set over 10000 times before the dynamic kernel
environment is set up, we panic.
Variable settings after the dynamic environment is initialized continue
to override the previously-set value; the change is limited to the very
early kernel boot (prior to SI_SUB_KMEM + 1) and changes behaviour from
"ignore" to "store with a different name" only.
Reviewed by: imp
Feedback from: kevans
Sponsored by: https://patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D36187
By convention, EINVAL is returned when validating arguments, not EPERM.
This matches the documented behaviour of sched_setscheduler(3), and that
of SCHED_OTHER.
PR: 227735
MFC after: 1 week
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D37021
It likely won't happen, but is consistent with the other functions of
this KPI.
Reviewed by: imp, jhb
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D33479
It can be static within uart_tty.c. It is an open question whether there
remains any real benefit to having uart instances share a swi thread.
Reviewed by: imp, markj, jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36938
This reverts commit 76e6e4d72f8d3da7d19242f303bc95461fde7fb9.
Several programs in the tree use -1 instead of INT_MAX to use
the maximum value. Thanks to Eugene Grosbein for pointing this
out.
Ensure that a negative backlog argument is handled as it if was 0.
Reviewed by: markj@, glebius@
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D31821
This will resolve a reference and return the appropriate handle, a node
on the simplebus or an ACPI_HANDLE for ACPI. For now we do not try to
further abstract the return type.
MFC after: 2 weeks
Reviewed by: mw
Differential Revision: https://reviews.freebsd.org/D36793
Mechanically cleanup INP_TIMEWAIT from the kernel sources. After
0d7445193ab, this commit shall not cause any functional changes.
Note: this flag was very often checked together with INP_DROPPED.
If we modify in_pcblookup*() not to return INP_DROPPED pcbs, we
will be able to remove most of this checks and turn them to
assertions. Some of them can be turned into assertions right now,
but that should be carefully done on a case by case basis.
Differential revision: https://reviews.freebsd.org/D36400
In non-periodic mode absolute timers fire at exactly the time given.
When specifying a fast clock, align the firing time so that less
timer interrupt events are needed.
Reviewed by: rrs @
Differential Revision: https://reviews.freebsd.org/D36858
MFC after: 1 week
Sponsored by: NVIDIA Networking
Add local functions to workaround an instruction segment trap (panic)
when the indirect functions copyin and copyout are called by an external
loadable kernel module (i.e. pfsync, zfs and linuxulator). The crash
was triggered by change 47a57144af25a7bd768b29272d50a36fdf2874ba, but
kernel binary linked with LLD 9 works fine. LLVM bisect points that LLD
behavior chaged after dc06b0bc9ad055d06535462d91bfc2a744b2f589.
This is know to affect powerpc targets only and the final fix is still
being discussed with the LLVM community.
PR: 266730
Reviewed by: luporl, jhibbits (on IRC, previous version)
MFC after: 2 days
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D36234
Modify db_show_sysctl_all so that it does not copy more than once the
data of the input oid, and so that what it passes to db_show_oid does
not alarm coverity.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D36847
Protocols such as netlink may need a large socket receive buffer,
measured in tens of megabytes. This change allows netlink to
set larger socket buffers (given the privs are in place), without
requiring user to manuall bump maxsockbuf.
Reviewed by: glebius
Differential Revision: https://reviews.freebsd.org/D36747
Allow to set custom per-protocol handlers for the socket buffers
ioctls by introducing pr_setsbopt callback with the default value
set to the currently-used sbsetopt().
Reviewed by: glebius
Differential Revision: https://reviews.freebsd.org/D36746
The implementation of sysctl_search_oid no longer relies on the
initial value of nodes to be all NULL, so remove the comment that
demands it and let the caller stop enforcing it.
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D36768
sysctl_search_old makes several tests in a loop that can be removed.
The first test in the loop is only ever true on the first loop
iteration, and is always true on that iteration, so its work can be
done before the loop begins.
The upper and lower bounds on the loop variable 'indx' are each tested
on each iteration, but 'indx' is changed in one direction or the other
only once within the loop, so only one bound needs to be checked.
Two ways remain in the loop that nodes[indx] can change (after one of
them is put before the loop start), and one of them applies exactly
when indx has been incremented, so no separate test for that case
requires testing.
Restructure and add comments that makes clearer that this is a basic
depth-first search.
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D36741
sysctl_register_oid must check the uniqueness of any newly computed
oid_number in sysctl_register_oid.
Reviewed by: asomers
MFC with: d3f96f661050
Differential Revision: https://reviews.freebsd.org/D36743
To avoid using the sysctl list macros directly in external kernel modules.
Reviewed by: asomers, manu and asiciliano
Differential Revision: https://reviews.freebsd.org/D36748
MFC after: 1 week
Sponsored by: NVIDIA Networking
Specifically, when we receive a violation and we're configured to panic,
kasan_enabled gets unset before we descend into panic(). At this point,
there's no longer any reason to allow marking as kasan_shadow_check() is
disabled -- we have some inherent risk of faulting or panicking if the
system's in a bad enough state with no benefit.
Reviewed by: markj
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D36742
Sysctl OIDs were internally stored in linked lists, triggering O(n^2)
behavior when userland iterates over many of them. The slowdown is
noticeable for MIBs that have > 100 children (for example, vm.uma). But
it's unignorable for kstat.zfs when a pool has > 1000 datasets.
Convert the linked lists into RB trees. This produces a ~25x speedup
for listing kstat.zfs with 4100 datasets, and no measurable penalty for
small dataset counts.
Bump __FreeBSD_version for the KPI change.
Sponsored by: Axcient
Reviewed by: mjg
Differential Revision: https://reviews.freebsd.org/D36500
The vn_rlimit_fsizex() function:
- checks that the write does not exceed RLIMIT_FSIZE limit and fs
maximum supported file size
- truncates write length if it exceeds the RLIMIT_FSIZE or max file size,
but there are some bytes to write
- sends SIGXFSZ if RLIMIT_FSIZE would be exceed otherwise
POSIX mandates the truncated write in case when some bytes can be
written but whole write request fails the RLIMIT_FSIZE check.
The function is supposed to be used from VOP_WRITE()s. Due to
pecularity in the VFS generic write syscall layer, uio_resid must
correctly reflect the written amount (noted by markj). Provide the dual
vn_rlimit_fsizex_res() function to correct uio_resid after the clamp
done in vn_rlimit_fsizex() on VOP_WRITE() return.
PR: 164793
Reviewed by: asomers, jah, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D36625
When a thread switching off-CPU is migrating to a remote CPU,
sched_switch() may trigger a rescheduling of the thread currently
running on that CPU. When doing so, it must ensure that that thread is
locked before modifying thread state. If the thread's lock is not the
scheduler lock, then the thread is in the process of switching off-CPU
and no extra effort is needed, and the initiator does not hold the
thread's lock and thus should not modify any thread state.
Reported and tested by: Steve Kargl
MFC after: 1 week