Commit Graph

19355 Commits

Author SHA1 Message Date
Hans Petter Selasky
0def80f1a5 time(3): Align fast clock times to avoid firing multiple timers.
In non-periodic mode absolute timers fire at exactly the time given.
When specifying a fast clock, align the firing time so that less
timer interrupt events are needed.

Reviewed by:	rrs @
Differential Revision:	https://reviews.freebsd.org/D36858
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-10-03 17:53:17 +02:00
Alfredo Dal'Ava Junior
db79bf75ac powerpc: cpuset: add local functions for copyin/copyout
Add local functions to workaround an instruction segment trap (panic)
when the indirect functions copyin and copyout are called by an external
loadable kernel module (i.e. pfsync, zfs and linuxulator). The crash
was triggered by change 47a57144af, but
kernel binary linked with LLD 9 works fine. LLVM bisect points that LLD
behavior chaged after dc06b0bc9ad055d06535462d91bfc2a744b2f589.

This is know to affect powerpc targets only and the final fix is still
being discussed with the LLVM community.

PR:	266730
Reviewed by:	luporl, jhibbits (on IRC, previous version)
MFC after:	2 days
Sponsored by:	Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision:	https://reviews.freebsd.org/D36234
2022-10-03 12:03:09 -03:00
Doug Moore
e5f93d1078 show_sysctl_all: reduce copying, please coverity
Modify db_show_sysctl_all so that it does not copy more than once the
data of the input oid, and so that what it passes to db_show_oid does
not alarm coverity.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D36847
2022-10-01 12:20:04 -05:00
Gleb Smirnoff
636420bde3 unix/dgram: don't leak file descriptors when socket write failed 2022-09-30 13:43:08 -07:00
Alexander V. Chernikov
7b660faa9e sockbufs: add sbreserve_locked_limit() with custom maxsockbuf limit.
Protocols such as netlink may need a large socket receive buffer,
 measured in tens of megabytes. This change allows netlink to
 set larger socket buffers (given the privs are in place), without
 requiring user to manuall bump maxsockbuf.

Reviewed by:	glebius
Differential Revision: https://reviews.freebsd.org/D36747
2022-09-28 10:20:09 +00:00
Alexander V. Chernikov
f66968564d protocols: make socket buffers ioctl handler changeable
Allow to set custom per-protocol handlers for the socket buffers
 ioctls by introducing pr_setsbopt callback with the default value
 set to the currently-used sbsetopt().

Reviewed by:	glebius
Differential Revision: https://reviews.freebsd.org/D36746
2022-09-28 10:20:09 +00:00
Doug Moore
5294bfa751 sysctl_search_oid: remove all-NULL precondition
The implementation of sysctl_search_oid no longer relies on the
initial value of nodes to be all NULL, so remove the comment that
demands it and let the caller stop enforcing it.

Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D36768
2022-09-28 04:30:11 -05:00
Doug Moore
9f6f9007b9 name2oid: use find_oidname
In name2oid, use sysctl _find_oidname instead of re-implementing it.
Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D36765
2022-09-27 16:17:55 -05:00
Doug Moore
e96ae5cb05 sysctl_search_oid: remove useless tests
sysctl_search_old makes several tests in a loop that can be removed.

The first test in the loop is only ever true on the first loop
iteration, and is always true on that iteration, so its work can be
done before the loop begins.

The upper and lower bounds on the loop variable 'indx' are each tested
on each iteration, but 'indx' is changed in one direction or the other
only once within the loop, so only one bound needs to be checked.

Two ways remain in the loop that nodes[indx] can change (after one of
them is put before the loop start), and one of them applies exactly
when indx has been incremented, so no separate test for that case
requires testing.

Restructure and add comments that makes clearer that this is a basic
depth-first search.

Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D36741
2022-09-27 13:30:31 -05:00
Doug Moore
ed5183455e register_oid: fix duplicate oid after d3f96f6610
sysctl_register_oid must check the uniqueness of any newly computed
oid_number in sysctl_register_oid.

Reviewed by:	asomers
MFC with:	d3f96f6610
Differential Revision:	https://reviews.freebsd.org/D36743
2022-09-27 12:24:01 -05:00
Hans Petter Selasky
c075ea46bc sysctl(3): Implement SYSCTL_FOREACH() to iterate all OIDs in a sysctl list.
To avoid using the sysctl list macros directly in external kernel modules.

Reviewed by:		asomers, manu and asiciliano
Differential Revision:	https://reviews.freebsd.org/D36748
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-09-27 19:21:21 +02:00
Mitchell Horne
f2963b530e kasan: disable kasan_mark() after a violation
Specifically, when we receive a violation and we're configured to panic,
kasan_enabled gets unset before we descend into panic().  At this point,
there's no longer any reason to allow marking as kasan_shadow_check() is
disabled -- we have some inherent risk of faulting or panicking if the
system's in a bad enough state with no benefit.

Reviewed by:	markj
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D36742
2022-09-27 11:01:21 -05:00
Alan Somers
6622e299ac Fix the build with SCHED_STATS after d3f96f6610
MFC with:	d3f96f6610
Sponsored by:	Axcient
2022-09-26 20:20:46 -06:00
Alan Somers
d3f96f6610 Fix O(n^2) behavior in sysctl
Sysctl OIDs were internally stored in linked lists, triggering O(n^2)
behavior when userland iterates over many of them.  The slowdown is
noticeable for MIBs that have > 100 children (for example, vm.uma).  But
it's unignorable for kstat.zfs when a pool has > 1000 datasets.

Convert the linked lists into RB trees.  This produces a ~25x speedup
for listing kstat.zfs with 4100 datasets, and no measurable penalty for
small dataset counts.

Bump __FreeBSD_version for the KPI change.

Sponsored by:	Axcient
Reviewed by:	mjg
Differential Revision: https://reviews.freebsd.org/D36500
2022-09-26 18:03:34 -06:00
Alan Somers
52360ca32f copy_file_range: truncate write if it would exceed RLIMIT_FSIZE
PR:		266611
MFC after:	2 weeks
Reviewed by:	kib
Differential Revision: https://reviews.freebsd.org/D36706
2022-09-26 15:22:29 -06:00
Mitchell Horne
818cae0ff7 kasan: provide bus peek/poke definitions
Reviewed by:	andrew, markj
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D36700
2022-09-26 14:25:05 -05:00
Konstantin Belousov
1b4b75171e Add vn_rlimit_fsizex() and vn_rlimit_fsizex_res()
The vn_rlimit_fsizex() function:
- checks that the write does not exceed RLIMIT_FSIZE limit and fs
  maximum supported file size
- truncates write length if it exceeds the RLIMIT_FSIZE or max file size,
  but there are some bytes to write
- sends SIGXFSZ if RLIMIT_FSIZE would be exceed otherwise

POSIX mandates the truncated write in case when some bytes can be
written but whole write request fails the RLIMIT_FSIZE check.

The function is supposed to be used from VOP_WRITE()s. Due to
pecularity in the VFS generic write syscall layer, uio_resid must
correctly reflect the written amount (noted by markj). Provide the dual
vn_rlimit_fsizex_res() function to correct uio_resid after the clamp
done in vn_rlimit_fsizex() on VOP_WRITE() return.

PR:	164793
Reviewed by:	asomers, jah, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D36625
2022-09-24 19:41:33 +03:00
Konstantin Belousov
2ac083f60f Add vn_rlimit_trunc()
Reviewed by:	asomers, jah, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D36625
2022-09-24 19:41:18 +03:00
Konstantin Belousov
cc65a412ae filesystems: return error from vn_rlimit_fsize() instead of EFBIG
Reviewed by:	asomers, jah, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D36625
2022-09-24 19:41:14 +03:00
Mark Johnston
c2d27b0ec7 sched_4bsd: Fix a racy thread state modification
When a thread switching off-CPU is migrating to a remote CPU,
sched_switch() may trigger a rescheduling of the thread currently
running on that CPU.  When doing so, it must ensure that that thread is
locked before modifying thread state.  If the thread's lock is not the
scheduler lock, then the thread is in the process of switching off-CPU
and no extra effort is needed, and the initiator does not hold the
thread's lock and thus should not modify any thread state.

Reported and tested by:	Steve Kargl
MFC after:	1 week
2022-09-23 20:09:06 -04:00
John Baldwin
f49fd63a6a kmem_malloc/free: Use void * instead of vm_offset_t for kernel pointers.
Reviewed by:	kib, markj
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D36549
2022-09-22 15:09:19 -07:00
John Baldwin
7ae99f80b6 pmap_unmapdev/bios: Accept a pointer instead of a vm_offset_t.
This matches the return type of pmap_mapdev/bios.

Reviewed by:	kib, markj
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D36548
2022-09-22 15:08:52 -07:00
Zhenlei Huang
440217b0af debugnet: Fix parameter order in the calls to m_get()
Reviewed by:	markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D36643
2022-09-21 06:55:20 -04:00
Mateusz Guzik
a3ab1102e3 vfs: silence a bogus LOR in freevnode
Reported by:	imp
2022-09-19 02:14:50 +00:00
Mateusz Guzik
a75d1ddd74 vfs: introduce V_PCATCH to stop abusing PCATCH 2022-09-17 15:41:37 +00:00
Mateusz Guzik
9e4f35ac25 vfs: deperl msleep flag calculation in vn_start_*write 2022-09-17 17:17:20 +02:00
Mateusz Guzik
1c7084fe56 vfs: clean up parse_mount_dev_present 2022-09-17 12:42:46 +00:00
Mateusz Guzik
b77bdfdb67 vfs: fix non-INVARIANTS build after 5b5b7e2ca2
Reported by:	gj
2022-09-17 10:45:12 +00:00
Mateusz Guzik
aede6a9670 vfs: fixup parse_mount_dev_present after 5b5b7e2ca2
Reported by:	kib
2022-09-17 10:35:00 +00:00
Mateusz Guzik
5b5b7e2ca2 vfs: always retain path buffer after lookup
This removes some of the complexity needed to maintain HASBUF and
allows for removing injecting SAVENAME by filesystems.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D36542
2022-09-17 09:10:38 +00:00
Mateusz Guzik
3df3d88cc5 vfs: move cn_nameptr assignment out of namei_getpath 2022-09-17 09:08:34 +00:00
Mateusz Guzik
41a0a99f85 vfs: slightly reorganize error handling in chroot
This avoids duplicated NDFREE_NOTHING which will be of importance
later.
2022-09-17 09:08:34 +00:00
Warner Losh
7cd4984e67 SPDX: Not BSD-4-Clause
This is not BSD-4-Clause. It's closer to a modified BSD-2-Clause with 2
added clauses (and the first one has added clauses). Remove
SPDX-License-Idnetifier since this license doesn't match anything in
SPDX.
2022-09-16 21:49:16 -06:00
Konstantin Belousov
ff41239f58 Add AT_USRSTACK{BASE, LIM} AT vectors, and ELF_BSDF_VMNOOVERCOMMIT flag
Reviewed by:	brooks, imp (previous version)
Discussed with:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D36540
2022-09-16 23:23:26 +03:00
Mateusz Guzik
50176b0296 locks: whack a failed experiment in form of restrict_starvation
This was never enabled and only pollutes the code. The issue will
be addressed later in a different manner.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-09-16 17:29:37 +00:00
Gordon Bergling
4771011b8f kern_jail: Fix a typo in a source code comment
- s/paramter/parameter/

MFC after:	3 days
2022-09-15 10:25:19 +02:00
Warner Losh
9baa0817ec SPDX: Not BSD-4-Clause
This license has 4 clauses, and shares some text with the BSD-4-Clause
license. However, it omits the standard disclaimer and has 2 clauses all
its own. Remove this tag, since it was made in error and this doesn't
match the SPDX copy of the BSD-4-Clause license.

Sponsored by:		Netflix
2022-09-14 21:29:31 -06:00
Mateusz Guzik
d04c7f10d4 vfs: make delmntque return with the interlock held
saves on relocking dance -- the lock is taken immediately
afterwards anyway.
2022-09-14 23:30:19 +00:00
Mateusz Guzik
43fbd0e7a7 lockf: elide vnode interlock in the common case in lf_purgelocks
The interlock was already taken and released when dooming, thus by
API contract locking state cannot be legally installed.

At the same time the state is almost never there to begin with.
2022-09-14 23:04:22 +00:00
Mateusz Guzik
a755fb921e vfs: retire the V_MNTREF flag
Reviewed by:	kib, mckusick
Differential Revision:	https://reviews.freebsd.org/D36521
2022-09-14 18:16:36 +00:00
Mateusz Guzik
61a1d5dde2 vfs: stop using the V_MNTREF flag
Reviewed by:	kib, mckusick
Differential Revision:	https://reviews.freebsd.org/D36521
2022-09-14 18:16:23 +00:00
Mateusz Guzik
f7dc4a71da vfs: plug spurious error checks in namei
error is guaranteed 0 at that point
2022-09-13 23:18:30 +00:00
Mateusz Guzik
b4137c9ed1 vfs: make NDVALIDATE private to vfs_lookup.c
it is not used elsewhere.
2022-09-12 22:50:48 +00:00
Allan Jude
b20ec58669 vfs.typenumhash: fix sysctl description
a string continuation was missing a space, resulting in two works
being smushed together.

Sponsored by:	Klara, Inc.
2022-09-10 22:47:51 +00:00
Mateusz Guzik
1760a6950a Fixup build after recent getsock changes 2022-09-10 20:40:43 +00:00
Mateusz Guzik
3be2225fc8 Remove fflag argument from getsock_cap
Interested callers can obtain in other own easily enough
and there is no reason to branch on it.
2022-09-10 19:47:47 +00:00
Mateusz Guzik
3212ad15ab Add getsock
All but one consumers of getsock_cap only pass 4 arguments.
Take advantage of it.
2022-09-10 19:47:47 +00:00
Mateusz Guzik
a2ad70923f Add branch prediction hints to getsock_cap 2022-09-10 19:41:52 +00:00
Gleb Smirnoff
e80062a2d4 tcp: avoid call to soisconnected() on transition to ESTABLISHED
This call existed since pre-FreeBSD times, and it is hard to understand
why it was there in the first place.  After 6f3caa6d81 it definitely
became necessary always and commit message from f1ee30ccd6 confirms that.
Now that 6f3caa6d81 is effectively backed out by 07285bb4c2, the call
appears to be useful only for sockets that landed on the incomplete queue,
e.g. sockets that have accept_filter(9) enabled on them.

Provide a new TCP flag to mark connections that are known to be on the
incomplete queue, and call soisconnected() only for those connections.

Reviewed by:		rrs, tuexen
Differential revision:	https://reviews.freebsd.org/D36488
2022-09-08 09:16:04 -07:00
Doug Moore
d0354fa7b6 rb_tree: reduce duplication in balancing code
Change RB_INSERT_COLOR and RB_REMOVE_COLOR so that the blocks of code
that are identical except for left and right being exchanged are made
only one block with a variable to indicate left- or right-handedness.

Rename RB macros so that those not intended for external use begin
with an underscore.

Add comments to the balancing code so that another might understand it.

Reviewed by:	alc, kib
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D36393
2022-09-07 23:46:19 -05:00