Commit Graph

19395 Commits

Author SHA1 Message Date
Eric van Gyzen
a134a12b14 Mark the debug.vnlru_nowhere sysctl as CTLFLAG_STATS
The kernel doesn't read it.  It's only writable so it can be cleared.

Sponsored by:	Dell EMC Isilon
2022-11-17 10:44:58 -06:00
Rick Macklem
4ee16246f9 vfs_vnops.c: Fix blksize for ZFS
Since ZFS reports _PC_MIN_HOLE_SIZE as 512 (although it
appears that an unwritten region must be at least f_iosize
to remain unallocated), vn_generic_copy_file_range()
uses 4096 for the copy blksize for ZFS, reulting in slow copies.

For most other file systems, _PC_MIN_HOLE_SIZE and f_iosize
are the same value, so this patch modifies the code to
use f_iosize for most cases.  It also documents in comments
why the blksize is being set a certain way, so that the code
does not appear to be doing "magic math".

Reported by:	allanjude
Reviewed by:	allanjude, asomers
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D37076
2022-11-16 17:37:22 -08:00
John Baldwin
9a673b7158 ktls: Add software support for AES-CBC decryption for TLS 1.1+.
This is mainly intended to provide a fallback for TOE TLS which may
need to use software decryption for an initial record at the start
of a connection.

Reviewed by:	markj
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D37370
2022-11-15 12:02:03 -08:00
Mateusz Guzik
c3f1a13902 Retire broken GPROF support from the kernel
The option is not even recognized and with that patched it does not
compile. Even if it did work, it would be prohibitively expensive to
use.

Interested parties can use pmcstat or dtrace instead.
2022-11-15 14:17:10 +00:00
John Baldwin
5920f99d21 ktls: Inline ktls_cleanup() into ktls_destroy().
Reviewed by:	gallatin, markj
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D37353
2022-11-11 16:01:02 -08:00
John Baldwin
d01db2b837 ktls: Don't leak ktls session objects for certain errors.
ktls_cleanup() does not free ktls session objects, it merely
cleans (and frees) members of the object.

Change callers to use ktls_free() instead.

Reviewed by:	gallatin, markj
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D37352
2022-11-11 16:00:37 -08:00
Mateusz Guzik
83286682f8 vfs: whack mips remnant
This reverts commit 8ffa01a061.
2022-11-09 00:31:50 +00:00
Gleb Smirnoff
8840ae2288 tcp: don't store VNET in every tcpcb, take it from the inpcbinfo
Reviewed by:		rscheff
Differential revision:	https://reviews.freebsd.org/D37125
2022-11-08 10:24:40 -08:00
Gleb Smirnoff
9eb0e8326d tcp: provide macros to access inpcb and socket from a tcpcb
There should be no functional changes with this commit.

Reviewed by:		rscheff
Differential revision:	https://reviews.freebsd.org/D37123
2022-11-08 10:24:40 -08:00
Mark Johnston
2c10be9e06 arm64: Handle translation faults for thread structures
The break-before-make requirement poses a problem when promoting or
demoting mappings containing thread structures: a CPU may raise a
translation fault while accessing curthread, and data_abort() accesses
the thread again before pmap_fault() can translate the address and
return.

Normally this isn't a problem because we have a hack to ensure that
slabs used by the thread zone are always accessed via the direct map,
where promotions and demotions are rare.  However, this hack doesn't
work properly with UMA_MD_SMALL_ALLOC disabled, as is the case with
KASAN configured (since our KASAN implementation does not shadow the
direct map and so tries to force the use of the kernel map wherever
possible).

Fix the problem by modifying data_abort() to handle translation faults
in the kernel map without dereferencing "td", i.e., curthread, and
without enabling interrupts.  pmap_klookup() has special handling for
translation faults which makes it safe to call in this context.  Then,
revert the aforementioned hack.

Reviewed by:	kevans, alc, kib, andrew
MFC after:	1 month
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D37231
2022-11-02 13:46:25 -04:00
Andrew Gallatin
8b19898a78 Fix a panic on boot introduced by 555a861d68
First, an sbuf_new() in device_get_path() shadows the sb
passed in by dev_wired_cache_add(), leaving its sb in an
unfinished state, leading to a failed KASSERT().  Fixing this
is as simple as removing the sbuf_new() from device_get_path()

Second, we cannot simply take a pointer to the sbuf memory and
store it in the device location cache, because that sbuf
is freed immediately after we add data to the cache, leading
to a use-after-free and eventually a double-free.  Fixing this
requires allocating memory for the path.

After a discussion with jhb, we decided that one malloc was
better than two in dev_wired_cache_add, which is why it changed
so much.

Reviewed by: jhb
Sponsored by: Netflix
MFC after: 14 days
2022-11-01 13:44:39 -04:00
Mark Johnston
1f6b6cf177 atomic: Intercept atomic_(load|store)_bool for kernel sanitizers
Fixes:		2bed73739a ("atomic: Add plain atomic_load/store_bool()")
2022-10-29 11:10:58 -04:00
Konstantin Belousov
6b69465efb vfs_domount(): ensure that v_mountedhere and VIRF_MOUNTPOINT are set under the vnode lock
Fixes:	f7833196bd
Reported and tested by:	pho
Reviewed by:	jah, markj (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D37198
2022-10-29 14:29:55 +03:00
John Baldwin
744bfb2131 Import the WireGuard driver from zx2c4.com.
This commit brings back the driver from FreeBSD commit
f187d6dfbf plus subsequent fixes from
upstream.

Relative to upstream this commit includes a few other small fixes such
as additional INET and INET6 #ifdef's, #include cleanups, and updates
for recent API changes in main.

Reviewed by:	pauamma, gbe, kevans, emaste
Obtained from:	git@git.zx2c4.com:wireguard-freebsd @ 3cc22b2
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D36909
2022-10-28 13:36:12 -07:00
Jason A. Harmening
f7833196bd vfs_lookup(): Minor performance optimizations
Refactor the symlink and mountpoint traversal logic to avoid
repeatedly checking the vnode type; a symlink cannot be a mountpoint
and vice versa.  Avoid repeatedly checking cn_flags for NOCROSSMOUNT
and simplify the check which determines whether the vnode is a
mountpoint.

Suggested by:	mjg
Reviewed by:	kib
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D35054
2022-10-26 19:33:33 -05:00
Jason A. Harmening
4390622c8d vfs_busy(): fix wording in comment
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D35054
2022-10-26 19:33:30 -05:00
Jason A. Harmening
706f15c5fa Remove witness directives from crossmp locking VOPs
These are of limited use since the crossmp vnode locking ops have not
actually used a lock since commit
a2d3554542.  We in fact require that
these operations are always issued with LK_SHARED.  Additionally,
these directives can produce a false positive in certain VV_CROSSLOCK
cases which require upgrading of the covered vnode lock from shared
to exclusive.

While here, replace the runtime check of LK_SHARED with a KASSERT and
expand the check to include LK_NOWAIT, which all callers pass.

Reviewed by:	kib
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D35054
2022-10-26 19:33:18 -05:00
Jason A. Harmening
080ef8a418 Add VV_CROSSLOCK vnode flag to avoid cross-mount lookup LOR
When a lookup operation crosses into a new mountpoint, the mountpoint
must first be busied before the root vnode can be locked. When a
filesystem is unmounted, the vnode covered by the mountpoint must
first be locked, and then the busy count for the mountpoint drained.
Ordinarily, these two operations work fine if executed concurrently,
but with a stacked filesystem the root vnode may in fact use the
same lock as the covered vnode. By design, this will always be
the case for unionfs (with either the upper or lower root vnode
depending on mount options), and can also be the case for nullfs
if the target and mount point are the same (which admittedly is
very unlikely in practice).

In this case, we have LOR. The lookup path holds the mountpoint
busy while waiting on what is effectively the covered vnode lock,
while a concurrent unmount holds the covered vnode lock and waits
for the mountpoint's busy count to drain.

Attempt to resolve this LOR by allowing the stacked filesystem
to specify a new flag, VV_CROSSLOCK, on a covered vnode as necessary.
Upon observing this flag, the vfs_lookup() will leave the covered
vnode lock held while crossing into the mountpoint. Employ this flag
for unionfs with the caveat that it can't be used for '-o below' mounts
until other unionfs locking issues are resolved.

Reported by:	pho
Tested by:	pho
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D35054
2022-10-26 19:33:03 -05:00
Mateusz Guzik
d346e3ac33 vfs: use cache_assert_no_entries instead of open-coding it 2022-10-26 15:54:19 +00:00
Warner Losh
deb1e3b719 physmem: Add physmem_excluded to query if a region is excluded
In order to safely reuse excluded memory when it's reserved for special
purpose, we need to test whether or not the memory has been reserved
early in boot. physmem_excluded will return true when the entire range
is excluded, false otherwise.

Sponsored by:		Netflix
2022-10-25 09:32:49 -06:00
Mateusz Guzik
d653aaec7a cache: add cache_assert_no_entries 2022-10-24 15:37:43 +00:00
Hans Petter Selasky
fdd9548333 time(3): Fix spelling.
Noted by:	Gary Jennejohn <garyj@gmx.de>
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-10-23 18:42:11 +02:00
Hans Petter Selasky
35a33d14b5 time(3): Optimize tvtohz() function.
List of changes:
- Use integer multiplication instead of long multiplication, because the result is an integer.
- Remove multiple if-statements and predict new if-statements.
- Rename local variable name, "ticks" into "retval" to avoid shadowing
the system "ticks" global variable.

Reviewed by:	kib@ and imp@
MFC after:	1 week
Sponsored by:	NVIDIA Networking
Differential Revision:  https://reviews.freebsd.org/D36859
2022-10-23 10:04:50 +02:00
Hans Petter Selasky
ee29897fc3 time(3): Declare the minimum and maximum hz values supported.
Reviewed by:	kib@ and imp@
MFC after:      1 week
Sponsored by:   NVIDIA Networking
Differential Revision:	https://reviews.freebsd.org/D37072
2022-10-23 10:04:50 +02:00
Konstantin Belousov
33ce178835 vn_bmap_seekhole: check that passed offset is non-negative
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D37024
2022-10-19 20:24:07 +03:00
Konstantin Belousov
555a861d68 device_get_path(): take sbuf directly
This allows to fix a bug where sbuf allocation done in the context of
dev_wired_cache_match() must use non-sleepable allocations.

Suggested by:	jhb
Reviewed by:	jhb, takawata
Discussed with:	imp
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D36899
2022-10-19 19:39:40 +03:00
Konstantin Belousov
8cf783bde3 device_get_path(): handle case when dev is root
PR:	266862
Based on submission by:	takawata
Reviewed by:	jhb, takawata
Disscussed with:	imp
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D36899
2022-10-19 19:39:33 +03:00
Konstantin Belousov
d9c5a9ea49 device_get_path(): do not drop the error from BUS_GET_DEVICE_PATH()
Later it would silently converted to ENOMEM always, because any error
was reported as NULL return path.

Reviewed by:	jhb, takawata
Discussed with:	imp
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D36899
2022-10-19 19:39:26 +03:00
Konstantin Belousov
23d2fcfbb2 subr_bus.c: some style
Wrap long lines in devctl2_ioctl DEV_GET_PATH and dev_wired_cache_match()

Reviewed by:	jhb, takawata
Discussed with:	imp
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D36899
2022-10-19 19:39:17 +03:00
Colin Percival
c32bd97641 kern: Support duplicate variables in early kenv
Some virtual machines pass virtio MMIO device parameters via the kernel
command line as a series of virtio_mmio.device=<parameters> options.
These get translated into FreeBSD kernel environment variables; but
unfortunately they all use the same variable name, which resulted in
all but the first such parameter being ignored when the dynamic kernel
environment is set up from the initial environment buffers.

With this commit, duplicate environment settings will instead be stored
as ${name}_1, ${name}_2... ${name}_9999.  In the unlikely event that
the same variable is set over 10000 times before the dynamic kernel
environment is set up, we panic.

Variable settings after the dynamic environment is initialized continue
to override the previously-set value; the change is limited to the very
early kernel boot (prior to SI_SUB_KMEM + 1) and changes behaviour from
"ignore" to "store with a different name" only.

Reviewed by:	imp
Feedback from:	kevans
Sponsored by:	https://patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D36187
2022-10-17 23:02:20 -07:00
Ali Abdallah
ba4782022a ksched: correct return code for invalid priority
By convention, EINVAL is returned when validating arguments, not EPERM.
This matches the documented behaviour of sched_setscheduler(3), and that
of SCHED_OTHER.

PR:		227735
MFC after:	1 week
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D37021
2022-10-17 15:12:13 -03:00
Mitchell Horne
39888ed7a3 kern_intr: Check for NULL event in intr_destroy()
It likely won't happen, but is consistent with the other functions of
this KPI.

Reviewed by:	imp, jhb
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D33479
2022-10-15 15:51:44 -03:00
Zhenlei Huang
43f8c763cd if_me: Use dedicated network privilege
Separate if_me privileges from if_gif.

Reviewed by:		kp
Differential Revision:	https://reviews.freebsd.org/D36691
2022-10-15 17:05:36 +02:00
Mitchell Horne
05b727fee5 Downgrade tty_intr_event from a global
It can be static within uart_tty.c. It is an open question whether there
remains any real benefit to having uart instances share a swi thread.

Reviewed by:	imp, markj, jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D36938
2022-10-12 13:46:12 -03:00
Michael Tuexen
bc0d407676 Revert "listen(): improve POSIX compliance"
This reverts commit 76e6e4d72f.

Several programs in the tree use -1 instead of INT_MAX to use
the maximum value. Thanks to Eugene Grosbein for pointing this
out.
2022-10-12 04:33:00 +02:00
Michael Tuexen
76e6e4d72f listen(): improve POSIX compliance
Ensure that a negative backlog argument is handled as it if was 0.

Reviewed by:		markj@, glebius@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D31821
2022-10-11 22:46:51 +02:00
Bjoern A. Zeeb
99e6980fcf device_get_property: add a HANDLE case
This will resolve a reference and return the appropriate handle, a node
on the simplebus or an ACPI_HANDLE for ACPI.  For now we do not try to
further abstract the return type.

MFC after:	2 weeks
Reviewed by:	mw
Differential Revision: https://reviews.freebsd.org/D36793
2022-10-09 21:51:25 +00:00
Mateusz Guzik
143942f992 unr: remove UNR64_LOCKED
All platforms support 64-bit atomics now.
2022-10-08 10:41:21 +00:00
Gleb Smirnoff
53af690381 tcp: remove INP_TIMEWAIT flag
Mechanically cleanup INP_TIMEWAIT from the kernel sources.  After
0d7445193a, this commit shall not cause any functional changes.

Note: this flag was very often checked together with INP_DROPPED.
If we modify in_pcblookup*() not to return INP_DROPPED pcbs, we
will be able to remove most of this checks and turn them to
assertions.  Some of them can be turned into assertions right now,
but that should be carefully done on a case by case basis.

Differential revision:	https://reviews.freebsd.org/D36400
2022-10-06 19:24:37 -07:00
Andrew Turner
9d4cff787e Remove pre-armv6 support from devmap
Remove an old code path that was used used by Armv4/5 so is unused now.

Sponsored by:	The FreeBSD Foundation
2022-10-05 09:56:17 +01:00
Hans Petter Selasky
0def80f1a5 time(3): Align fast clock times to avoid firing multiple timers.
In non-periodic mode absolute timers fire at exactly the time given.
When specifying a fast clock, align the firing time so that less
timer interrupt events are needed.

Reviewed by:	rrs @
Differential Revision:	https://reviews.freebsd.org/D36858
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-10-03 17:53:17 +02:00
Alfredo Dal'Ava Junior
db79bf75ac powerpc: cpuset: add local functions for copyin/copyout
Add local functions to workaround an instruction segment trap (panic)
when the indirect functions copyin and copyout are called by an external
loadable kernel module (i.e. pfsync, zfs and linuxulator). The crash
was triggered by change 47a57144af, but
kernel binary linked with LLD 9 works fine. LLVM bisect points that LLD
behavior chaged after dc06b0bc9ad055d06535462d91bfc2a744b2f589.

This is know to affect powerpc targets only and the final fix is still
being discussed with the LLVM community.

PR:	266730
Reviewed by:	luporl, jhibbits (on IRC, previous version)
MFC after:	2 days
Sponsored by:	Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision:	https://reviews.freebsd.org/D36234
2022-10-03 12:03:09 -03:00
Doug Moore
e5f93d1078 show_sysctl_all: reduce copying, please coverity
Modify db_show_sysctl_all so that it does not copy more than once the
data of the input oid, and so that what it passes to db_show_oid does
not alarm coverity.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D36847
2022-10-01 12:20:04 -05:00
Gleb Smirnoff
636420bde3 unix/dgram: don't leak file descriptors when socket write failed 2022-09-30 13:43:08 -07:00
Alexander V. Chernikov
7b660faa9e sockbufs: add sbreserve_locked_limit() with custom maxsockbuf limit.
Protocols such as netlink may need a large socket receive buffer,
 measured in tens of megabytes. This change allows netlink to
 set larger socket buffers (given the privs are in place), without
 requiring user to manuall bump maxsockbuf.

Reviewed by:	glebius
Differential Revision: https://reviews.freebsd.org/D36747
2022-09-28 10:20:09 +00:00
Alexander V. Chernikov
f66968564d protocols: make socket buffers ioctl handler changeable
Allow to set custom per-protocol handlers for the socket buffers
 ioctls by introducing pr_setsbopt callback with the default value
 set to the currently-used sbsetopt().

Reviewed by:	glebius
Differential Revision: https://reviews.freebsd.org/D36746
2022-09-28 10:20:09 +00:00
Doug Moore
5294bfa751 sysctl_search_oid: remove all-NULL precondition
The implementation of sysctl_search_oid no longer relies on the
initial value of nodes to be all NULL, so remove the comment that
demands it and let the caller stop enforcing it.

Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D36768
2022-09-28 04:30:11 -05:00
Doug Moore
9f6f9007b9 name2oid: use find_oidname
In name2oid, use sysctl _find_oidname instead of re-implementing it.
Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D36765
2022-09-27 16:17:55 -05:00
Doug Moore
e96ae5cb05 sysctl_search_oid: remove useless tests
sysctl_search_old makes several tests in a loop that can be removed.

The first test in the loop is only ever true on the first loop
iteration, and is always true on that iteration, so its work can be
done before the loop begins.

The upper and lower bounds on the loop variable 'indx' are each tested
on each iteration, but 'indx' is changed in one direction or the other
only once within the loop, so only one bound needs to be checked.

Two ways remain in the loop that nodes[indx] can change (after one of
them is put before the loop start), and one of them applies exactly
when indx has been incremented, so no separate test for that case
requires testing.

Restructure and add comments that makes clearer that this is a basic
depth-first search.

Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D36741
2022-09-27 13:30:31 -05:00
Doug Moore
ed5183455e register_oid: fix duplicate oid after d3f96f6610
sysctl_register_oid must check the uniqueness of any newly computed
oid_number in sysctl_register_oid.

Reviewed by:	asomers
MFC with:	d3f96f6610
Differential Revision:	https://reviews.freebsd.org/D36743
2022-09-27 12:24:01 -05:00