Commit Graph

19592 Commits

Author SHA1 Message Date
Konstantin Belousov
5657f49ef3 kern_umtx.c do_wait(): correct confusing indent
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2023-01-20 23:33:11 +02:00
Brooks Davis
fa1d803c0f epoch: replace hand coded assertion
The assertion is equivalent to kstack_contains() so use that rather
than spelling it out.

Suggested by:	jhb
Reviewed by:	jhb
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D38107
2023-01-20 18:04:40 +00:00
John Baldwin
846e4a206f ktls_disable_ifnet_help: Set curvnet around sorele().
This is required in kernels with VIMAGE such as GENERIC.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2023-01-18 15:39:04 -08:00
Konstantin Belousov
0f80d5ebc8 Require INVARIANTS and WITNESS if DEBUG_VFS_LOCKS is set
Reported by:	pho
Reviewed by:	markj, mjg
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D38070
2023-01-16 05:55:47 +02:00
Zhenlei Huang
8bce8d28ab jail: Avoid multipurpose return value of function prison_ip_restrict()
Currently function prison_ip_restrict() returns true if the replacement
buffer was used, or no buffer provided and allocation fails and should
redo. The logic is confusing and cause possibly infinite loop from
eb8dcdeac2 .

Reviewed by:	jamie, glebius
Approved by:	kp (mentor)
Differential Revision:	https://reviews.freebsd.org/D37918
2023-01-13 18:45:14 +08:00
Zhenlei Huang
89ddfbbac8 jail: Fix regression panic from eb8dcdeac2
And possibly infinite loop calling prison_ip_restrict() in
kern_jail_set() [2].

[1] It is possible that prisons do not have any IPv4 or IPv6 addresses.
[2] If prison_ip_restrict() is not provided with prison_ip, when it
    allocates prison_ip successfully, then it should return false to
    indicate not redo prison_ip_restrict() later.

Reviewed by:	glebius
Approved by:	kp (mentor)
Fixes:	eb8dcdeac2 jail: network epoch protection for IP address lists
Differential Revision:	https://reviews.freebsd.org/D37906
2023-01-13 18:45:14 +08:00
Zhenlei Huang
ddbf879d79 jail: Correctly access IPv[46] addresses of prison_ip
* Fix wrong IPv[46] addresses inherited from parent jail
* Properly restrict the child jail's IPv[46] addresses

Reviewed by:	melifaro, glebius
Approved by:	kp (mentor)
Fixes:	eb8dcdeac2 jail: network epoch protection for IP address lists
Differential Revision:	https://reviews.freebsd.org/D37871
Differential Revision:	https://reviews.freebsd.org/D37872
2023-01-13 18:45:14 +08:00
Konstantin Belousov
37b9fb1696 Add descrip_check_write_mp() helper
... which verifies that given file table does not have file descriptors
referencing vnodes on the specified mount point.  It is up to the caller
to ensure that the check is not racy.

Reviewed by:	mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D37896
2022-12-29 22:55:39 +02:00
Mateusz Guzik
f45feecfb2 vfs: add vn_getsize
getattr is very expensive and in important cases only gets called to get
the size. This can be optimized with a dedicated routine which obtains
that statistic.

As a step towards that goal make size-only consumers use a dedicated
routine.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D37885
2022-12-28 22:43:49 +00:00
John Baldwin
07be751727 ktls: Post receive errors on partially closed sockets.
If an error such as an invalid record or one whose decryption fails is
detected on a socket that has received a RST then ktls_drop() could
ignore the error since INP_DROPPED could already be set.  In this case
soreceive_generic hangs since it does not return from a KTLS socket
with pending encrypted data unless there is an error (so_error) (this
behavior is to ensure that soreceive_generic doesn't return a
premature EOF when there is pending data still being decrypted).

Note that this was a bug prior to
69542f2682 as tcp_usr_abort would also
have ignored the error in this case.

Reviewed by:	gallatin
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D37775
2022-12-27 16:00:17 -08:00
Mateusz Guzik
829f0bcb5f vfs: add the concept of vnode state transitions
To quote from a comment above vput_final:
<quote>
* XXX Some filesystems pass in an exclusively locked vnode and strongly depend
* on the lock being held all the way until VOP_INACTIVE. This in particular
* happens with UFS which adds half-constructed vnodes to the hash, where they
* can be found by other code.
</quote>

As is there is no mechanism which allows filesystems to denote that a
vnode is fully initialized, consequently problems like the above are
only found the hard way(tm).

Add rudimentary support for state transitions, which in particular allow
to assert the vnode is not legally unlocked until its fate is decided
(either construction finishes or vgone is called to abort it).

The new field lands in a 1-byte hole, thus it does not grow the struct.

Bump __FreeBSD_version to 1400077

Reviewed by:	kib (previous version)
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D37759
2022-12-26 17:35:12 +00:00
Mateusz Guzik
94267fc907 vfs: use designated initializers for the typename array
While here prefix with v for better consistency with the vnode stuff.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D37759
2022-12-26 17:34:41 +00:00
Konstantin Belousov
974be51b3f Fixes for ptrace_syscallreq()
Re-assign the sc local (syscall number) before moving args for SYS_syscall.
Correct the audit and kdtrace hooks invocations.

Fixes:	140ceb5d95
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2022-12-23 01:53:41 +02:00
Konstantin Belousov
140ceb5d95 ptrace(2): add PT_SC_REMOTE remote syscall request
Reviewed by:	markj
Discussed with:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D37590
2022-12-22 23:11:35 +02:00
Konstantin Belousov
f0592b3c8d Add a thread debugging flag TDB_BOUNDARY
It indicates to a debugger that the thread is stopped at the
kernel->user exit path.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D37590
2022-12-22 23:11:35 +02:00
Konstantin Belousov
e6feeae2f9 sys: rename td_coredump to td_remotereq
and TDB_COREDUMPRQ to TDB_COREDUMPREQ

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D37590
2022-12-22 23:11:35 +02:00
Zhenlei Huang
21ad3e27fa jail: Fix output of IPv[46] addresses of DDB show prison
Reviewed by:	melifaro, jamie
Approved by:	kp (mentor)
Fixes:		eb8dcdeac2 jail: network epoch protection for IP address lists
Differential Revision:	https://reviews.freebsd.org/D37732
2022-12-21 09:53:28 +08:00
Alfredo Dal'Ava Junior
b13110e9f3 ufs/ffs: detect endian mismatch between machine and filesystem
Mount on a LE machine a filesystem formatted for BE is not supported
currently. This adds a check for the superblock magic number using
swapped bytes to guess and warn the user that it may be a valid
superblock but endian is incompatible.

MFC after:	2 weeks
Reviewed by:	mckusick
Obtained from:	mckusick, alfredo
Differential Revision: https://reviews.freebsd.org/D37675
2022-12-20 00:20:11 -03:00
Doug Rabson
71e9be1bd5 Don't allow stacking of file mounts
Reviewed by:    mjg, kib
Tested by:      pho
2022-12-19 16:46:27 +00:00
Doug Rabson
a1d74b2dab Allow realpath to work for file mounts
For file mounts, the directory vnode is not available from namei and this
prevents the use of vn_fullpath_hardlink. In this case, we can use the
vnode which was covered by the file mount with vn_fullpath.

This also disallows file mounts over files with link counts greater than
one to ensure a deterministic path to the mount point.

Reviewed by:    mjg, kib
Tested by:      pho
2022-12-19 16:46:27 +00:00
Doug Rabson
521fbb722c Add support for mounting single files in nullfs
The main use-case for this is to support mounting config files and
secrets into OCI containers. My current workaround copies the files into
the container which is messy and risks secrets leaking into container
images if the cleanup fails.

This adds a VFCF flag to indicate whether the filesystem supports file
mounts and allows fspath to be either a directory or a file if the flag
is set.

Test Plan:
$ sudo mkdir -p /mnt
$ sudo touch /mnt/foo
$ sudo mount -t nullfs /COPYRIGHT /mnt/foo

Reviewed by:    mjg, kib
Tested by:      pho
2022-12-19 16:46:13 +00:00
Doug Rabson
78d35459a2 Add vn_path_to_global_path_hardlink
This is similar to vn_path_to_global_path but allows for regular files
which may not be present in the cache.

Reviewed by:    mjg, kib
Tested by:      pho
2022-12-19 16:44:59 +00:00
Mateusz Guzik
8f7859e800 vfs: retire the now unused SAVESTART flag
Bump __FreeBSD_version to 1400075

Tested by:      pho
2022-12-19 08:11:08 +00:00
Mateusz Guzik
56da4aa554 vfs: stop using SAVESTART for rename
ni_startdir has never reached rename routines anyway

Reviewed by:	mckusick
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D34468
2022-12-19 08:09:37 +00:00
Mateusz Guzik
8f874e92eb vfs: make relookup take an additional argument
instead of looking at SAVESTART

This is a step towards removing the flag.

Reviewed by:	mckusick
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D34468
2022-12-19 08:09:00 +00:00
Mateusz Guzik
269c564b90 vfs: retire NDFREE
There are no consumers anymore. Interested parties can NDFREE_PNBUF
and vput or vrele relevant vnodes.

Tested by:	pho
2022-12-19 08:07:54 +00:00
Mateusz Guzik
85dac03e30 vfs: stop using NDFREE
It provides nothing but a branchfest and next to no consumers want it
anyway.

Tested by:	pho
2022-12-19 08:07:23 +00:00
Rick Macklem
bba7a2e896 kern_jail.c: Allow mountd/nfsd to optionally run in a jail
This patch adds "allow.nfsd" to the jail code based on a
new kernel build option VNET_NFSD.  This will not work
until future patches fix nmount(2) to allow mountd to
run in a vnet prison and the NFS server code is patched
so that global variables are in a vnet.

The jail(8) man page will be patched in a future commit.

Reviewed by:	jamie
MFC after:	4 months
Differential Revision:	https://reviews.freebsd.org/D37637
2022-12-17 13:43:49 -08:00
Rick Macklem
195f1b124d vfs_mount.c: fix vfs_domount() for PRIV_VFS_MOUNT_EXPORTED
It appears that, prior to r158857 vfs_domount() checked
suser() when MNT_EXPORTED was specified.

r158857 appears to have broken this, since MNT_EXPORTED
was no longer set when mountd.c was converted to use nmount(2).
r164033 replaced the suser() check with
priv_check(td, PRIV_VFS_MOUNT_EXPORTED), which does the
same thing (ie. checks for effective uid == 0 assuming suses_enabled
is set).

This patch restores this check by setting MNT_EXPORTED when the
"export" mount option is specified to nmount().

I think this is reasonable since only mountd(8) should be setting
exports and I doubt any non-root mounted file system would
be setting its own exports.

Reviewed by:	kib, markj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D37718
2022-12-16 13:01:23 -08:00
John Baldwin
69542f2682 ktls: Close a race with setting so_error when dropping a connection.
pr_abort calls tcp_usr_abort which calls tcp_drop with ECONNABORTED.
After pr_abort returns, the so_error is then set to a more specific
error.  However, a reader can observe and return the ECONNABORTED
error before so_error is set to the desired error value.  This is
resulting in spurious test failures of recently added tests for
invalid conditions such as invalid headers.

To fix, refactor the code to abort a connection to call tcp_drop
directly with the desired error value.  ktls_reset_send_tag already
calls tcp_drop directly when it aborts a connection due to an error.

Reviewed by:	gallatin
Reported by:	CI (jenkins), gallatin, olivier
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D37692
2022-12-15 12:06:26 -08:00
Andrew Gallatin
ac4e3a27ab Unbreak the build when MAC is not defined
7a2c93b86e removed the use of "error" when MAC was not
defined, resulting in an unused variable error.

Sponsored by: Netflix
Reviewed by: jhb
2022-12-14 17:39:25 -05:00
Gleb Smirnoff
7a2c93b86e sockets: provide sousrsend() that does socket specific error handling
Sockets have special handling for EPIPE on a write, that was spread out
into several places.  Treating transient errors is also special - if
protocol is atomic, than we should ignore any changes to uio_resid, a
transient error means the write had completely failed (see d2b3a0ed31).

- Provide sousrsend() that expects a valid uio, and leave sosend() for
  kernel consumers only.  Do all special error handling right here.
- In dofilewrite() don't do special handling of error for DTYPE_SOCKET.
- For send(2), write(2) and aio_write(2) call into sousrsend() and remove
  error handling for kern_sendit(), soo_write() and soaio_process_job().

PR:			265087
Reported by:            rz-rpi03 at h-ka.de
Reviewed by:            markj
Differential revision:	https://reviews.freebsd.org/D35863
2022-12-14 10:02:44 -08:00
Jason A. Harmening
42442d7a6e Generalize the VV_CROSSLOCK logic in vfs_lookup()
When VV_CROSSLOCK is present, the lock for the vnode at the current
stage of lookup must be held across the VFS_ROOT() call for the
filesystem mounted at the vnode.  Since VV_CROSSLOCK implies that
the root vnode reuses the already-held lock, the possibility for
recursion should be made clear in the flags passed to VFS_ROOT().

For cases in which the lock is held exclusive, this means passing
LK_CANRECURSE.  For cases in which the lock is held shared, it
means clearing LK_NODDLKTREAT to allow VFS_ROOT() to potentially
recurse on the shared lock even in the presence of an exclusive
waiter.

That the existing code works for unionfs is due to a coincidence
of the current unionfs implementation.

Reviewed by:	kib
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D37458
2022-12-10 22:02:38 -06:00
Mateusz Guzik
ebdf27b6f3 uipc: remove accept_mtx
It is unused since 779f106aa1 ("Listening sockets improvements.")

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-12-11 02:47:07 +00:00
Konstantin Belousov
0919f29d91 shmfd: account for the actually allocated pages
Return the value as stat(2) st_blocks.

Suggested and reviewed by:	markj (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D37097
2022-12-09 14:17:12 +02:00
Konstantin Belousov
37aea2649f tmpfs: for used pages, account really allocated pages, instead of file sizes
This makes tmpfs size accounting correct for the sparce files. Also
correct report st_blocks/va_bytes. Previously the reported value did not
accounted for the swapped out pages.

PR:	223015
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D37097
2022-12-09 14:17:12 +02:00
Konstantin Belousov
7ec4b29b08 uiomove_object: hide diagnostic under bootverbose
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D37097
2022-12-09 14:15:37 +02:00
Doug Rabson
5eeb4f737f imgact_binmisc: Optionally pre-open the interpreter vnode
This allows the use of chroot and/or jail environments which depend on
interpreters registed with imgact_binmisc to use emulator binaries from
the host to emulate programs inside the chroot.

Reviewed by:	imp
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D37432
2022-12-08 14:32:03 +00:00
Warner Losh
7014e78fb7 boot: Remove stray free()
Early versions of this code had a free, but this one doesn't need
it. Remove the forgotten free(vv); from earlier versions.

Fixes:			ed56dcfc6b
Noticed by:		Michael Butler
Sponsored by:		Netflix
2022-12-07 11:30:04 -07:00
Warner Losh
ed56dcfc6b boot: pass in args as const
Copy the arg that sets a variable to maximize the reuse of this
routine. There are places we call it from that are const char * and it
might not be safe to cast that away.

Sponsored by:		Netflix
2022-12-07 11:00:54 -07:00
Warner Losh
3cf97e91fa Revert "newbus: Change attach failure behavior"
This reverts commit 68c3f03021. There are
some weird crashes when KVMs switch caused by this, so revert this
commit until they are sorted out.

Reported by:		cy@
Sponsored by:		Netflix
2022-12-05 17:00:26 -07:00
Warner Losh
68c3f03021 newbus: Change attach failure behavior
In the rare case that we succeed in probing, but fail to attach, flip
the default to be to disable the
device. hw.bus.disable_failed_devices=false is no required to restore
the old behavior. The old behavior dates form a time when dynamic
control of devices wasn't yet present (devctl didn't exist). Now that
one can retry probe/attach the device with devctl, the default doesn't
make sense: The more desirable behaivor is to have stable device numbers
when one has several instances of the same device in a system (common
for NICs or HBAs).

Reviewed by:		jhb (verbal)
Sponsored by:		Netflix
2022-12-04 16:29:03 -07:00
Warner Losh
aa52c6bdd7 newbus: Create a knob to disable devices that fail to attach.
Normally, when a device fails to attach, we tear down the newbus state
for that device so that future driver loads can try again (maybe with a
different driver, or maybe with a re-loaded and fixed kld).

Sometimes, however, it is desirable to have the device fail
permanantly. We do this by calling device_disable() on a failed
attached, as well as keeping the device in DS_ATTACHING forever. This
prevents retries on that device. This is enabled via
hw.bus.disable_failed_devices=1 in either a hint via the loader, or at
runtime with a sysctl setting. Setting from 1 -> 0 at runtime will not
affect previously disabled devices, however: they remain disabled.
They can be re-enabled manually with devctl enable, however.

Sponsored by:		Netflix

Reviewed by:	gallatin, hselasky, jhb
Differential Revision:	https://reviews.freebsd.org/D37517
2022-12-04 16:20:24 -07:00
Warner Losh
7652743540 devd: Warn for deprecated 'kern' system type
One year ago, I deprecated 'kern' in favor of 'kernel' for the system
name for some power events. I'm about to remove it from the kernel, but
realized there's been no warning generated for users. Preserve POLA by
converting on the fly here and issuing a warning for 14.x, and an fatal
error after we branch 15. Make compiling it an error on 16 to remove
the gross hack after we branch.

Sponsored by:		Netflix
Reviewed by:		bapt
Differential Revision:	https://reviews.freebsd.org/D37584
2022-12-02 10:48:02 -07:00
Warner Losh
8d147537bf newbus: Remove deprecated "kern" system name for resume events.
The new "kernel" system name is the one that's documented and has
been generated for a year now. Remove the old one now that 14.0
is getting close.

Sponsored by:		Netflix
Reviewed by:		bapt
Differential Revision:	https://reviews.freebsd.org/D37582
2022-12-02 10:48:02 -07:00
Warner Losh
e59fa9b2e7 newbus: Comment style nit
Sponsored by:		Netflix
2022-11-29 13:11:24 -07:00
Warner Losh
1ea42a2880 md5: Use c89 function definitions
Use the c89 function definitions rather than the old K&R definitions.

Sponsored by:		Netflix
2022-11-27 13:22:31 -07:00
Eric van Gyzen
a134a12b14 Mark the debug.vnlru_nowhere sysctl as CTLFLAG_STATS
The kernel doesn't read it.  It's only writable so it can be cleared.

Sponsored by:	Dell EMC Isilon
2022-11-17 10:44:58 -06:00
Rick Macklem
4ee16246f9 vfs_vnops.c: Fix blksize for ZFS
Since ZFS reports _PC_MIN_HOLE_SIZE as 512 (although it
appears that an unwritten region must be at least f_iosize
to remain unallocated), vn_generic_copy_file_range()
uses 4096 for the copy blksize for ZFS, reulting in slow copies.

For most other file systems, _PC_MIN_HOLE_SIZE and f_iosize
are the same value, so this patch modifies the code to
use f_iosize for most cases.  It also documents in comments
why the blksize is being set a certain way, so that the code
does not appear to be doing "magic math".

Reported by:	allanjude
Reviewed by:	allanjude, asomers
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D37076
2022-11-16 17:37:22 -08:00
John Baldwin
9a673b7158 ktls: Add software support for AES-CBC decryption for TLS 1.1+.
This is mainly intended to provide a fallback for TOE TLS which may
need to use software decryption for an initial record at the start
of a connection.

Reviewed by:	markj
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D37370
2022-11-15 12:02:03 -08:00
Mateusz Guzik
c3f1a13902 Retire broken GPROF support from the kernel
The option is not even recognized and with that patched it does not
compile. Even if it did work, it would be prohibitively expensive to
use.

Interested parties can use pmcstat or dtrace instead.
2022-11-15 14:17:10 +00:00
John Baldwin
5920f99d21 ktls: Inline ktls_cleanup() into ktls_destroy().
Reviewed by:	gallatin, markj
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D37353
2022-11-11 16:01:02 -08:00
John Baldwin
d01db2b837 ktls: Don't leak ktls session objects for certain errors.
ktls_cleanup() does not free ktls session objects, it merely
cleans (and frees) members of the object.

Change callers to use ktls_free() instead.

Reviewed by:	gallatin, markj
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D37352
2022-11-11 16:00:37 -08:00
Mateusz Guzik
83286682f8 vfs: whack mips remnant
This reverts commit 8ffa01a061.
2022-11-09 00:31:50 +00:00
Gleb Smirnoff
8840ae2288 tcp: don't store VNET in every tcpcb, take it from the inpcbinfo
Reviewed by:		rscheff
Differential revision:	https://reviews.freebsd.org/D37125
2022-11-08 10:24:40 -08:00
Gleb Smirnoff
9eb0e8326d tcp: provide macros to access inpcb and socket from a tcpcb
There should be no functional changes with this commit.

Reviewed by:		rscheff
Differential revision:	https://reviews.freebsd.org/D37123
2022-11-08 10:24:40 -08:00
Mark Johnston
2c10be9e06 arm64: Handle translation faults for thread structures
The break-before-make requirement poses a problem when promoting or
demoting mappings containing thread structures: a CPU may raise a
translation fault while accessing curthread, and data_abort() accesses
the thread again before pmap_fault() can translate the address and
return.

Normally this isn't a problem because we have a hack to ensure that
slabs used by the thread zone are always accessed via the direct map,
where promotions and demotions are rare.  However, this hack doesn't
work properly with UMA_MD_SMALL_ALLOC disabled, as is the case with
KASAN configured (since our KASAN implementation does not shadow the
direct map and so tries to force the use of the kernel map wherever
possible).

Fix the problem by modifying data_abort() to handle translation faults
in the kernel map without dereferencing "td", i.e., curthread, and
without enabling interrupts.  pmap_klookup() has special handling for
translation faults which makes it safe to call in this context.  Then,
revert the aforementioned hack.

Reviewed by:	kevans, alc, kib, andrew
MFC after:	1 month
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D37231
2022-11-02 13:46:25 -04:00
Andrew Gallatin
8b19898a78 Fix a panic on boot introduced by 555a861d68
First, an sbuf_new() in device_get_path() shadows the sb
passed in by dev_wired_cache_add(), leaving its sb in an
unfinished state, leading to a failed KASSERT().  Fixing this
is as simple as removing the sbuf_new() from device_get_path()

Second, we cannot simply take a pointer to the sbuf memory and
store it in the device location cache, because that sbuf
is freed immediately after we add data to the cache, leading
to a use-after-free and eventually a double-free.  Fixing this
requires allocating memory for the path.

After a discussion with jhb, we decided that one malloc was
better than two in dev_wired_cache_add, which is why it changed
so much.

Reviewed by: jhb
Sponsored by: Netflix
MFC after: 14 days
2022-11-01 13:44:39 -04:00
Mark Johnston
1f6b6cf177 atomic: Intercept atomic_(load|store)_bool for kernel sanitizers
Fixes:		2bed73739a ("atomic: Add plain atomic_load/store_bool()")
2022-10-29 11:10:58 -04:00
Konstantin Belousov
6b69465efb vfs_domount(): ensure that v_mountedhere and VIRF_MOUNTPOINT are set under the vnode lock
Fixes:	f7833196bd
Reported and tested by:	pho
Reviewed by:	jah, markj (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D37198
2022-10-29 14:29:55 +03:00
John Baldwin
744bfb2131 Import the WireGuard driver from zx2c4.com.
This commit brings back the driver from FreeBSD commit
f187d6dfbf plus subsequent fixes from
upstream.

Relative to upstream this commit includes a few other small fixes such
as additional INET and INET6 #ifdef's, #include cleanups, and updates
for recent API changes in main.

Reviewed by:	pauamma, gbe, kevans, emaste
Obtained from:	git@git.zx2c4.com:wireguard-freebsd @ 3cc22b2
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D36909
2022-10-28 13:36:12 -07:00
Jason A. Harmening
f7833196bd vfs_lookup(): Minor performance optimizations
Refactor the symlink and mountpoint traversal logic to avoid
repeatedly checking the vnode type; a symlink cannot be a mountpoint
and vice versa.  Avoid repeatedly checking cn_flags for NOCROSSMOUNT
and simplify the check which determines whether the vnode is a
mountpoint.

Suggested by:	mjg
Reviewed by:	kib
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D35054
2022-10-26 19:33:33 -05:00
Jason A. Harmening
4390622c8d vfs_busy(): fix wording in comment
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D35054
2022-10-26 19:33:30 -05:00
Jason A. Harmening
706f15c5fa Remove witness directives from crossmp locking VOPs
These are of limited use since the crossmp vnode locking ops have not
actually used a lock since commit
a2d3554542.  We in fact require that
these operations are always issued with LK_SHARED.  Additionally,
these directives can produce a false positive in certain VV_CROSSLOCK
cases which require upgrading of the covered vnode lock from shared
to exclusive.

While here, replace the runtime check of LK_SHARED with a KASSERT and
expand the check to include LK_NOWAIT, which all callers pass.

Reviewed by:	kib
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D35054
2022-10-26 19:33:18 -05:00
Jason A. Harmening
080ef8a418 Add VV_CROSSLOCK vnode flag to avoid cross-mount lookup LOR
When a lookup operation crosses into a new mountpoint, the mountpoint
must first be busied before the root vnode can be locked. When a
filesystem is unmounted, the vnode covered by the mountpoint must
first be locked, and then the busy count for the mountpoint drained.
Ordinarily, these two operations work fine if executed concurrently,
but with a stacked filesystem the root vnode may in fact use the
same lock as the covered vnode. By design, this will always be
the case for unionfs (with either the upper or lower root vnode
depending on mount options), and can also be the case for nullfs
if the target and mount point are the same (which admittedly is
very unlikely in practice).

In this case, we have LOR. The lookup path holds the mountpoint
busy while waiting on what is effectively the covered vnode lock,
while a concurrent unmount holds the covered vnode lock and waits
for the mountpoint's busy count to drain.

Attempt to resolve this LOR by allowing the stacked filesystem
to specify a new flag, VV_CROSSLOCK, on a covered vnode as necessary.
Upon observing this flag, the vfs_lookup() will leave the covered
vnode lock held while crossing into the mountpoint. Employ this flag
for unionfs with the caveat that it can't be used for '-o below' mounts
until other unionfs locking issues are resolved.

Reported by:	pho
Tested by:	pho
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D35054
2022-10-26 19:33:03 -05:00
Mateusz Guzik
d346e3ac33 vfs: use cache_assert_no_entries instead of open-coding it 2022-10-26 15:54:19 +00:00
Warner Losh
deb1e3b719 physmem: Add physmem_excluded to query if a region is excluded
In order to safely reuse excluded memory when it's reserved for special
purpose, we need to test whether or not the memory has been reserved
early in boot. physmem_excluded will return true when the entire range
is excluded, false otherwise.

Sponsored by:		Netflix
2022-10-25 09:32:49 -06:00
Mateusz Guzik
d653aaec7a cache: add cache_assert_no_entries 2022-10-24 15:37:43 +00:00
Hans Petter Selasky
fdd9548333 time(3): Fix spelling.
Noted by:	Gary Jennejohn <garyj@gmx.de>
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-10-23 18:42:11 +02:00
Hans Petter Selasky
35a33d14b5 time(3): Optimize tvtohz() function.
List of changes:
- Use integer multiplication instead of long multiplication, because the result is an integer.
- Remove multiple if-statements and predict new if-statements.
- Rename local variable name, "ticks" into "retval" to avoid shadowing
the system "ticks" global variable.

Reviewed by:	kib@ and imp@
MFC after:	1 week
Sponsored by:	NVIDIA Networking
Differential Revision:  https://reviews.freebsd.org/D36859
2022-10-23 10:04:50 +02:00
Hans Petter Selasky
ee29897fc3 time(3): Declare the minimum and maximum hz values supported.
Reviewed by:	kib@ and imp@
MFC after:      1 week
Sponsored by:   NVIDIA Networking
Differential Revision:	https://reviews.freebsd.org/D37072
2022-10-23 10:04:50 +02:00
Konstantin Belousov
33ce178835 vn_bmap_seekhole: check that passed offset is non-negative
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D37024
2022-10-19 20:24:07 +03:00
Konstantin Belousov
555a861d68 device_get_path(): take sbuf directly
This allows to fix a bug where sbuf allocation done in the context of
dev_wired_cache_match() must use non-sleepable allocations.

Suggested by:	jhb
Reviewed by:	jhb, takawata
Discussed with:	imp
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D36899
2022-10-19 19:39:40 +03:00
Konstantin Belousov
8cf783bde3 device_get_path(): handle case when dev is root
PR:	266862
Based on submission by:	takawata
Reviewed by:	jhb, takawata
Disscussed with:	imp
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D36899
2022-10-19 19:39:33 +03:00
Konstantin Belousov
d9c5a9ea49 device_get_path(): do not drop the error from BUS_GET_DEVICE_PATH()
Later it would silently converted to ENOMEM always, because any error
was reported as NULL return path.

Reviewed by:	jhb, takawata
Discussed with:	imp
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D36899
2022-10-19 19:39:26 +03:00
Konstantin Belousov
23d2fcfbb2 subr_bus.c: some style
Wrap long lines in devctl2_ioctl DEV_GET_PATH and dev_wired_cache_match()

Reviewed by:	jhb, takawata
Discussed with:	imp
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D36899
2022-10-19 19:39:17 +03:00
Colin Percival
c32bd97641 kern: Support duplicate variables in early kenv
Some virtual machines pass virtio MMIO device parameters via the kernel
command line as a series of virtio_mmio.device=<parameters> options.
These get translated into FreeBSD kernel environment variables; but
unfortunately they all use the same variable name, which resulted in
all but the first such parameter being ignored when the dynamic kernel
environment is set up from the initial environment buffers.

With this commit, duplicate environment settings will instead be stored
as ${name}_1, ${name}_2... ${name}_9999.  In the unlikely event that
the same variable is set over 10000 times before the dynamic kernel
environment is set up, we panic.

Variable settings after the dynamic environment is initialized continue
to override the previously-set value; the change is limited to the very
early kernel boot (prior to SI_SUB_KMEM + 1) and changes behaviour from
"ignore" to "store with a different name" only.

Reviewed by:	imp
Feedback from:	kevans
Sponsored by:	https://patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D36187
2022-10-17 23:02:20 -07:00
Ali Abdallah
ba4782022a ksched: correct return code for invalid priority
By convention, EINVAL is returned when validating arguments, not EPERM.
This matches the documented behaviour of sched_setscheduler(3), and that
of SCHED_OTHER.

PR:		227735
MFC after:	1 week
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D37021
2022-10-17 15:12:13 -03:00
Mitchell Horne
39888ed7a3 kern_intr: Check for NULL event in intr_destroy()
It likely won't happen, but is consistent with the other functions of
this KPI.

Reviewed by:	imp, jhb
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D33479
2022-10-15 15:51:44 -03:00
Zhenlei Huang
43f8c763cd if_me: Use dedicated network privilege
Separate if_me privileges from if_gif.

Reviewed by:		kp
Differential Revision:	https://reviews.freebsd.org/D36691
2022-10-15 17:05:36 +02:00
Mitchell Horne
05b727fee5 Downgrade tty_intr_event from a global
It can be static within uart_tty.c. It is an open question whether there
remains any real benefit to having uart instances share a swi thread.

Reviewed by:	imp, markj, jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D36938
2022-10-12 13:46:12 -03:00
Michael Tuexen
bc0d407676 Revert "listen(): improve POSIX compliance"
This reverts commit 76e6e4d72f.

Several programs in the tree use -1 instead of INT_MAX to use
the maximum value. Thanks to Eugene Grosbein for pointing this
out.
2022-10-12 04:33:00 +02:00
Michael Tuexen
76e6e4d72f listen(): improve POSIX compliance
Ensure that a negative backlog argument is handled as it if was 0.

Reviewed by:		markj@, glebius@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D31821
2022-10-11 22:46:51 +02:00
Bjoern A. Zeeb
99e6980fcf device_get_property: add a HANDLE case
This will resolve a reference and return the appropriate handle, a node
on the simplebus or an ACPI_HANDLE for ACPI.  For now we do not try to
further abstract the return type.

MFC after:	2 weeks
Reviewed by:	mw
Differential Revision: https://reviews.freebsd.org/D36793
2022-10-09 21:51:25 +00:00
Mateusz Guzik
143942f992 unr: remove UNR64_LOCKED
All platforms support 64-bit atomics now.
2022-10-08 10:41:21 +00:00
Gleb Smirnoff
53af690381 tcp: remove INP_TIMEWAIT flag
Mechanically cleanup INP_TIMEWAIT from the kernel sources.  After
0d7445193a, this commit shall not cause any functional changes.

Note: this flag was very often checked together with INP_DROPPED.
If we modify in_pcblookup*() not to return INP_DROPPED pcbs, we
will be able to remove most of this checks and turn them to
assertions.  Some of them can be turned into assertions right now,
but that should be carefully done on a case by case basis.

Differential revision:	https://reviews.freebsd.org/D36400
2022-10-06 19:24:37 -07:00
Andrew Turner
9d4cff787e Remove pre-armv6 support from devmap
Remove an old code path that was used used by Armv4/5 so is unused now.

Sponsored by:	The FreeBSD Foundation
2022-10-05 09:56:17 +01:00
Hans Petter Selasky
0def80f1a5 time(3): Align fast clock times to avoid firing multiple timers.
In non-periodic mode absolute timers fire at exactly the time given.
When specifying a fast clock, align the firing time so that less
timer interrupt events are needed.

Reviewed by:	rrs @
Differential Revision:	https://reviews.freebsd.org/D36858
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-10-03 17:53:17 +02:00
Alfredo Dal'Ava Junior
db79bf75ac powerpc: cpuset: add local functions for copyin/copyout
Add local functions to workaround an instruction segment trap (panic)
when the indirect functions copyin and copyout are called by an external
loadable kernel module (i.e. pfsync, zfs and linuxulator). The crash
was triggered by change 47a57144af, but
kernel binary linked with LLD 9 works fine. LLVM bisect points that LLD
behavior chaged after dc06b0bc9ad055d06535462d91bfc2a744b2f589.

This is know to affect powerpc targets only and the final fix is still
being discussed with the LLVM community.

PR:	266730
Reviewed by:	luporl, jhibbits (on IRC, previous version)
MFC after:	2 days
Sponsored by:	Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision:	https://reviews.freebsd.org/D36234
2022-10-03 12:03:09 -03:00
Doug Moore
e5f93d1078 show_sysctl_all: reduce copying, please coverity
Modify db_show_sysctl_all so that it does not copy more than once the
data of the input oid, and so that what it passes to db_show_oid does
not alarm coverity.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D36847
2022-10-01 12:20:04 -05:00
Gleb Smirnoff
636420bde3 unix/dgram: don't leak file descriptors when socket write failed 2022-09-30 13:43:08 -07:00
Alexander V. Chernikov
7b660faa9e sockbufs: add sbreserve_locked_limit() with custom maxsockbuf limit.
Protocols such as netlink may need a large socket receive buffer,
 measured in tens of megabytes. This change allows netlink to
 set larger socket buffers (given the privs are in place), without
 requiring user to manuall bump maxsockbuf.

Reviewed by:	glebius
Differential Revision: https://reviews.freebsd.org/D36747
2022-09-28 10:20:09 +00:00
Alexander V. Chernikov
f66968564d protocols: make socket buffers ioctl handler changeable
Allow to set custom per-protocol handlers for the socket buffers
 ioctls by introducing pr_setsbopt callback with the default value
 set to the currently-used sbsetopt().

Reviewed by:	glebius
Differential Revision: https://reviews.freebsd.org/D36746
2022-09-28 10:20:09 +00:00
Doug Moore
5294bfa751 sysctl_search_oid: remove all-NULL precondition
The implementation of sysctl_search_oid no longer relies on the
initial value of nodes to be all NULL, so remove the comment that
demands it and let the caller stop enforcing it.

Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D36768
2022-09-28 04:30:11 -05:00
Doug Moore
9f6f9007b9 name2oid: use find_oidname
In name2oid, use sysctl _find_oidname instead of re-implementing it.
Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D36765
2022-09-27 16:17:55 -05:00
Doug Moore
e96ae5cb05 sysctl_search_oid: remove useless tests
sysctl_search_old makes several tests in a loop that can be removed.

The first test in the loop is only ever true on the first loop
iteration, and is always true on that iteration, so its work can be
done before the loop begins.

The upper and lower bounds on the loop variable 'indx' are each tested
on each iteration, but 'indx' is changed in one direction or the other
only once within the loop, so only one bound needs to be checked.

Two ways remain in the loop that nodes[indx] can change (after one of
them is put before the loop start), and one of them applies exactly
when indx has been incremented, so no separate test for that case
requires testing.

Restructure and add comments that makes clearer that this is a basic
depth-first search.

Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D36741
2022-09-27 13:30:31 -05:00
Doug Moore
ed5183455e register_oid: fix duplicate oid after d3f96f6610
sysctl_register_oid must check the uniqueness of any newly computed
oid_number in sysctl_register_oid.

Reviewed by:	asomers
MFC with:	d3f96f6610
Differential Revision:	https://reviews.freebsd.org/D36743
2022-09-27 12:24:01 -05:00
Hans Petter Selasky
c075ea46bc sysctl(3): Implement SYSCTL_FOREACH() to iterate all OIDs in a sysctl list.
To avoid using the sysctl list macros directly in external kernel modules.

Reviewed by:		asomers, manu and asiciliano
Differential Revision:	https://reviews.freebsd.org/D36748
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-09-27 19:21:21 +02:00
Mitchell Horne
f2963b530e kasan: disable kasan_mark() after a violation
Specifically, when we receive a violation and we're configured to panic,
kasan_enabled gets unset before we descend into panic().  At this point,
there's no longer any reason to allow marking as kasan_shadow_check() is
disabled -- we have some inherent risk of faulting or panicking if the
system's in a bad enough state with no benefit.

Reviewed by:	markj
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D36742
2022-09-27 11:01:21 -05:00
Alan Somers
6622e299ac Fix the build with SCHED_STATS after d3f96f6610
MFC with:	d3f96f6610
Sponsored by:	Axcient
2022-09-26 20:20:46 -06:00
Alan Somers
d3f96f6610 Fix O(n^2) behavior in sysctl
Sysctl OIDs were internally stored in linked lists, triggering O(n^2)
behavior when userland iterates over many of them.  The slowdown is
noticeable for MIBs that have > 100 children (for example, vm.uma).  But
it's unignorable for kstat.zfs when a pool has > 1000 datasets.

Convert the linked lists into RB trees.  This produces a ~25x speedup
for listing kstat.zfs with 4100 datasets, and no measurable penalty for
small dataset counts.

Bump __FreeBSD_version for the KPI change.

Sponsored by:	Axcient
Reviewed by:	mjg
Differential Revision: https://reviews.freebsd.org/D36500
2022-09-26 18:03:34 -06:00
Alan Somers
52360ca32f copy_file_range: truncate write if it would exceed RLIMIT_FSIZE
PR:		266611
MFC after:	2 weeks
Reviewed by:	kib
Differential Revision: https://reviews.freebsd.org/D36706
2022-09-26 15:22:29 -06:00
Mitchell Horne
818cae0ff7 kasan: provide bus peek/poke definitions
Reviewed by:	andrew, markj
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D36700
2022-09-26 14:25:05 -05:00
Konstantin Belousov
1b4b75171e Add vn_rlimit_fsizex() and vn_rlimit_fsizex_res()
The vn_rlimit_fsizex() function:
- checks that the write does not exceed RLIMIT_FSIZE limit and fs
  maximum supported file size
- truncates write length if it exceeds the RLIMIT_FSIZE or max file size,
  but there are some bytes to write
- sends SIGXFSZ if RLIMIT_FSIZE would be exceed otherwise

POSIX mandates the truncated write in case when some bytes can be
written but whole write request fails the RLIMIT_FSIZE check.

The function is supposed to be used from VOP_WRITE()s. Due to
pecularity in the VFS generic write syscall layer, uio_resid must
correctly reflect the written amount (noted by markj). Provide the dual
vn_rlimit_fsizex_res() function to correct uio_resid after the clamp
done in vn_rlimit_fsizex() on VOP_WRITE() return.

PR:	164793
Reviewed by:	asomers, jah, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D36625
2022-09-24 19:41:33 +03:00
Konstantin Belousov
2ac083f60f Add vn_rlimit_trunc()
Reviewed by:	asomers, jah, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D36625
2022-09-24 19:41:18 +03:00
Konstantin Belousov
cc65a412ae filesystems: return error from vn_rlimit_fsize() instead of EFBIG
Reviewed by:	asomers, jah, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D36625
2022-09-24 19:41:14 +03:00
Mark Johnston
c2d27b0ec7 sched_4bsd: Fix a racy thread state modification
When a thread switching off-CPU is migrating to a remote CPU,
sched_switch() may trigger a rescheduling of the thread currently
running on that CPU.  When doing so, it must ensure that that thread is
locked before modifying thread state.  If the thread's lock is not the
scheduler lock, then the thread is in the process of switching off-CPU
and no extra effort is needed, and the initiator does not hold the
thread's lock and thus should not modify any thread state.

Reported and tested by:	Steve Kargl
MFC after:	1 week
2022-09-23 20:09:06 -04:00
John Baldwin
f49fd63a6a kmem_malloc/free: Use void * instead of vm_offset_t for kernel pointers.
Reviewed by:	kib, markj
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D36549
2022-09-22 15:09:19 -07:00
John Baldwin
7ae99f80b6 pmap_unmapdev/bios: Accept a pointer instead of a vm_offset_t.
This matches the return type of pmap_mapdev/bios.

Reviewed by:	kib, markj
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D36548
2022-09-22 15:08:52 -07:00
Zhenlei Huang
440217b0af debugnet: Fix parameter order in the calls to m_get()
Reviewed by:	markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D36643
2022-09-21 06:55:20 -04:00
Mateusz Guzik
a3ab1102e3 vfs: silence a bogus LOR in freevnode
Reported by:	imp
2022-09-19 02:14:50 +00:00
Mateusz Guzik
a75d1ddd74 vfs: introduce V_PCATCH to stop abusing PCATCH 2022-09-17 15:41:37 +00:00
Mateusz Guzik
9e4f35ac25 vfs: deperl msleep flag calculation in vn_start_*write 2022-09-17 17:17:20 +02:00
Mateusz Guzik
1c7084fe56 vfs: clean up parse_mount_dev_present 2022-09-17 12:42:46 +00:00
Mateusz Guzik
b77bdfdb67 vfs: fix non-INVARIANTS build after 5b5b7e2ca2
Reported by:	gj
2022-09-17 10:45:12 +00:00
Mateusz Guzik
aede6a9670 vfs: fixup parse_mount_dev_present after 5b5b7e2ca2
Reported by:	kib
2022-09-17 10:35:00 +00:00
Mateusz Guzik
5b5b7e2ca2 vfs: always retain path buffer after lookup
This removes some of the complexity needed to maintain HASBUF and
allows for removing injecting SAVENAME by filesystems.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D36542
2022-09-17 09:10:38 +00:00
Mateusz Guzik
3df3d88cc5 vfs: move cn_nameptr assignment out of namei_getpath 2022-09-17 09:08:34 +00:00
Mateusz Guzik
41a0a99f85 vfs: slightly reorganize error handling in chroot
This avoids duplicated NDFREE_NOTHING which will be of importance
later.
2022-09-17 09:08:34 +00:00
Warner Losh
7cd4984e67 SPDX: Not BSD-4-Clause
This is not BSD-4-Clause. It's closer to a modified BSD-2-Clause with 2
added clauses (and the first one has added clauses). Remove
SPDX-License-Idnetifier since this license doesn't match anything in
SPDX.
2022-09-16 21:49:16 -06:00
Konstantin Belousov
ff41239f58 Add AT_USRSTACK{BASE, LIM} AT vectors, and ELF_BSDF_VMNOOVERCOMMIT flag
Reviewed by:	brooks, imp (previous version)
Discussed with:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D36540
2022-09-16 23:23:26 +03:00
Mateusz Guzik
50176b0296 locks: whack a failed experiment in form of restrict_starvation
This was never enabled and only pollutes the code. The issue will
be addressed later in a different manner.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-09-16 17:29:37 +00:00
Gordon Bergling
4771011b8f kern_jail: Fix a typo in a source code comment
- s/paramter/parameter/

MFC after:	3 days
2022-09-15 10:25:19 +02:00
Warner Losh
9baa0817ec SPDX: Not BSD-4-Clause
This license has 4 clauses, and shares some text with the BSD-4-Clause
license. However, it omits the standard disclaimer and has 2 clauses all
its own. Remove this tag, since it was made in error and this doesn't
match the SPDX copy of the BSD-4-Clause license.

Sponsored by:		Netflix
2022-09-14 21:29:31 -06:00
Mateusz Guzik
d04c7f10d4 vfs: make delmntque return with the interlock held
saves on relocking dance -- the lock is taken immediately
afterwards anyway.
2022-09-14 23:30:19 +00:00
Mateusz Guzik
43fbd0e7a7 lockf: elide vnode interlock in the common case in lf_purgelocks
The interlock was already taken and released when dooming, thus by
API contract locking state cannot be legally installed.

At the same time the state is almost never there to begin with.
2022-09-14 23:04:22 +00:00
Mateusz Guzik
a755fb921e vfs: retire the V_MNTREF flag
Reviewed by:	kib, mckusick
Differential Revision:	https://reviews.freebsd.org/D36521
2022-09-14 18:16:36 +00:00
Mateusz Guzik
61a1d5dde2 vfs: stop using the V_MNTREF flag
Reviewed by:	kib, mckusick
Differential Revision:	https://reviews.freebsd.org/D36521
2022-09-14 18:16:23 +00:00
Mateusz Guzik
f7dc4a71da vfs: plug spurious error checks in namei
error is guaranteed 0 at that point
2022-09-13 23:18:30 +00:00
Mateusz Guzik
b4137c9ed1 vfs: make NDVALIDATE private to vfs_lookup.c
it is not used elsewhere.
2022-09-12 22:50:48 +00:00
Allan Jude
b20ec58669 vfs.typenumhash: fix sysctl description
a string continuation was missing a space, resulting in two works
being smushed together.

Sponsored by:	Klara, Inc.
2022-09-10 22:47:51 +00:00
Mateusz Guzik
1760a6950a Fixup build after recent getsock changes 2022-09-10 20:40:43 +00:00
Mateusz Guzik
3be2225fc8 Remove fflag argument from getsock_cap
Interested callers can obtain in other own easily enough
and there is no reason to branch on it.
2022-09-10 19:47:47 +00:00
Mateusz Guzik
3212ad15ab Add getsock
All but one consumers of getsock_cap only pass 4 arguments.
Take advantage of it.
2022-09-10 19:47:47 +00:00
Mateusz Guzik
a2ad70923f Add branch prediction hints to getsock_cap 2022-09-10 19:41:52 +00:00
Gleb Smirnoff
e80062a2d4 tcp: avoid call to soisconnected() on transition to ESTABLISHED
This call existed since pre-FreeBSD times, and it is hard to understand
why it was there in the first place.  After 6f3caa6d81 it definitely
became necessary always and commit message from f1ee30ccd6 confirms that.
Now that 6f3caa6d81 is effectively backed out by 07285bb4c2, the call
appears to be useful only for sockets that landed on the incomplete queue,
e.g. sockets that have accept_filter(9) enabled on them.

Provide a new TCP flag to mark connections that are known to be on the
incomplete queue, and call soisconnected() only for those connections.

Reviewed by:		rrs, tuexen
Differential revision:	https://reviews.freebsd.org/D36488
2022-09-08 09:16:04 -07:00
Doug Moore
d0354fa7b6 rb_tree: reduce duplication in balancing code
Change RB_INSERT_COLOR and RB_REMOVE_COLOR so that the blocks of code
that are identical except for left and right being exchanged are made
only one block with a variable to indicate left- or right-handedness.

Rename RB macros so that those not intended for external use begin
with an underscore.

Add comments to the balancing code so that another might understand it.

Reviewed by:	alc, kib
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D36393
2022-09-07 23:46:19 -05:00
Mateusz Guzik
3e0b486886 vfs: flip a condition around in kern_statat
error tends to be 0.
2022-09-07 20:06:24 +00:00
Hans Petter Selasky
0e391a3197 ktls: Add missing NULL pointer check for TLS RX hardware offload.
The send tag pointer may be NULL when the ktls_reset_receive_tag()
function is invoked. Add check for this.

Reviewed by:	gallatin @
Sponsored by:	NVIDIA Networking
2022-09-06 13:49:23 +02:00
Mateusz Guzik
69413598d2 signal: use proc_iterate to save on work
Most notably poudriere performs kill -9 -1 in jails for each port
being built. This reduces the scan from hundrends of processes to
literally 1.

Reviewed by:	jamie, markj
Differential Revision:	https://reviews.freebsd.org/D34522
2022-09-05 11:54:47 +00:00
Mateusz Guzik
5ecb5444aa jail: add process linkage
It allows iteration over processes belonging to given jail instead of
having to walk the entire allproc list.

Note the iteration can miss processes which remains bug-compatible
with previous code.

Reviewed by:	jamie (previous version), markj (previous version)
Differential Revision:	https://reviews.freebsd.org/D34522
2022-09-05 11:54:47 +00:00
Gordon Bergling
d744e271eb kern: Remove a double word in a source code comment
- s/that that/that/

MFC after:	3 days
2022-09-04 17:32:10 +02:00
Gordon Bergling
49a033d8cf kern: Correct some typos in source code comments
- s/occured/occurred/
- s/the the/the/

MFC after:	3 days
2022-09-04 13:00:01 +02:00
Gordon Bergling
2b7d656f17 kern: Fix a typo in asource code comment
- s/overriden/overridden/

MFC after:	3 days
2022-09-03 15:26:55 +02:00
Gleb Smirnoff
24af7808fa protosw: repair protocol selection logic in socket(2)
Pointy hat to:	glebius
Fixes:		61f7427f02
2022-08-30 21:19:46 -07:00
Gleb Smirnoff
61f7427f02 protosw: cleanup protocols that existed merely to provide pr_input
Since 4.4BSD the protosw was used to implement socket types created
by socket(2) syscall and at the same to demultiplex incoming IPv4
datagrams (later copied to IPv6).  This story ended with 78b1fc05b2.

These entries (e.g. IPPROTO_ICMP) in inetsw that were added to catch
packets in ip_input(), they would also be returned by pffindproto()
if user says socket(AF_INET, SOCK_RAW, IPPROTO_ICMP).  Thus, for raw
sockets to work correctly, all the entries were pointing at raw_usrreq
differentiating only in the value of pr_protocol.

With 78b1fc05b2 all these entries are no longer needed, as ip_protox
is independent of protosw.  Any socket syscall requesting SOCK_RAW type
would end up with rip_protosw.  And this protosw has its pr_protocol
set to 0, allowing to mark socket with any protocol.

For IPv6 raw socket the change required two small fixes:
o Validate user provided protocol value
o Always use protocol number stored in inp in rip6_attach, instead
  of protosw value, which is now always 0.

Differential revision:	https://reviews.freebsd.org/D36380
2022-08-30 15:09:21 -07:00
Gleb Smirnoff
8624f4347e divert: declare PF_DIVERT domain and stop abusing PF_INET
The divert(4) is not a protocol of IPv4.  It is a socket to
intercept packets from ipfw(4) to userland and re-inject them
back.  It can divert and re-inject IPv4 and IPv6 packets today,
but potentially it is not limited to these two protocols.  The
IPPROTO_DIVERT does not belong to known IP protocols, it
doesn't even fit into u_char.  I guess, the implementation of
divert(4) was done the way it is done basically because it was
easier to do it this way, back when protocols for sockets were
intertwined with IP protocols and domains were statically
compiled in.

Moving divert(4) out of inetsw accomplished two important things:

1) IPDIVERT is getting much closer to be not dependent on INET.
   This will be finalized in following changes.
2) Now divert socket no longer aliases with raw IPv4 socket.
   Domain/proto selection code won't need a hack for SOCK_RAW and
   multiple entries in inetsw implementing different flavors of
   raw socket can merge into one without requirement of raw IPv4
   being the last member of dom_protosw.

Differential revision:	https://reviews.freebsd.org/D36379
2022-08-30 15:09:21 -07:00
Gleb Smirnoff
244e1aeaec domains: merge domain_init() into domain_add()
domain_init() called at SI_SUB_PROTO_DOMAIN/SI_ORDER_SECOND is always
called right after domain_add(), that had been called at SI_ORDER_FIRST.
Note that protocols aren't initialized yet at this point, since they are
usually scheduled to initialize at SI_ORDER_THIRD.

After this merge it becomes clear that DOMF_SUPPORTED / DOMF_INITED
can be garbage collected as they are set & checked in the same function.

For initialization of the domain system itself it is now clear that
domaininit() can be garbage collected and static initializer is enough.
2022-08-29 19:15:01 -07:00
Gleb Smirnoff
e18c5816ea domains: use queue(9) SLIST for linked list of domains 2022-08-29 19:15:01 -07:00
Gleb Smirnoff
d7574c7432 domains: init pr_domain in pr_init() 2022-08-29 19:15:01 -07:00
Gleb Smirnoff
c414347bc5 mbufs: isolate max_linkhdr and max_protohdr handling in the mbuf code
o Statically initialize max_linkhdr to default value without relying
  on domain(9) code doing that.
o Statically initialize max_protohdr to a sane value, without relying
  on TCP being always compiled in.
o Retire max_datalen. Set, but not used.
o Don't make the domain(9) system responsible in validating these
  values and updating max_hdr.  Instead provide KPI max_linkhdr_grow()
  and max_protohdr_grow().
o Call max_linkhdr_grow() from IEEE802.11 and max_protohdr_grow() from
  TCP.  Those are the only protocols today that may want to grow.

Reviewed by:		tuexen
Differential revision:	https://reviews.freebsd.org/D36376
2022-08-29 19:14:25 -07:00
Mark Johnston
32faf071bd devstat: Remove DTrace io probes lacking a BIO reference
The io:::start and end probes trace individual I/O requests.

Also remove the unimplemented wait-start and wait-done probes.

PR:		266098
MFC after:	1 week
2022-08-29 13:22:36 -04:00
Doug Moore
5d91386826 rb_tree: avoid extra reads in rebalancing
In RB_INSERT_COLOR and RB_REMOVE_COLOR, avoid reading a parent pointer
from memory, and then reading the left-color bit from memory, and then
reading the right-color bit from memory, since they're all in the same
field. The compiler can't infer that only the first read is really
necessary, so write the code in a way so that it doesn't have to.

Drop RB_RED_LEFT and RB_RED_RIGHT macros that reach into memory to get
those bits.  Drop RB_COLOR, the only thing left using RB_RED_LEFT and
RB_RED_RIGHT after the other changes, and go straight to DIAGNOSTIC
code in subr_stats to implement RB_COLOR for its single, dubious use
there.

Reviewed by:	alc
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D36353
2022-08-29 11:11:31 -05:00
John Baldwin
e3885a7893 soo_stat: Ensure error is always initialized.
In kernels without MAC, error is not set for sockets whose protocol
layer does not implement the pr_sense hook.

Reported by:	Jenkins (powerpc kernel builds)
Fixes:		7c04ca1fad sockets: for stat(2) on a socket don't report hiwat as block size
2022-08-26 11:17:55 -07:00
Gleb Smirnoff
837b7203f0 domains: use struct domain as argument 2022-08-26 10:35:35 -07:00
firk
768f6373eb Fix compat10 semaphore interface race
Wrong has-waiters and missing unconditional _count==0 check may cause
infinite waiting with already non-zero count.
1) properly clear _has_waiters flag when waiting failed to start
2) always check _count before start waiting

PR:	265997
Reviewed by:	kib
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D36272
2022-08-26 20:34:29 +03:00
Gleb Smirnoff
7c04ca1fad sockets: for stat(2) on a socket don't report hiwat as block size
The code appeared in d8392c6c39 with not good explanation.  It is
very unlikely any software in the world needs that.

Differential revision:	https://reviews.freebsd.org/D36283
2022-08-26 08:16:15 -07:00
Mateusz Guzik
49afea1059 proc: read the pid prior to unlocking in report_alive_proc1
In principle another thread could have reaped the process by that time.
2022-08-25 17:26:49 +00:00
Konstantin Belousov
fce3b1c327 fork_exit(): style comment
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D36302
2022-08-24 22:12:53 +03:00
Brooks Davis
840327e5dd mbuf: Don't support PAGE_SIZE < 4K
The Vax supported such things, but FreeBSD does not.  This further
implies that MJUMPAGESIZE > MCLBYTES so assert this and remove code
handling them being equal.

Reviewed by:	kp, imp, jhb
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D36320
2022-08-24 18:34:07 +01:00
Mateusz Guzik
9488262679 rms: add rms_assert_rlock_ok
So that callers which opportunistically elide the lock can still
assert that they can take it.

Reviewed by:
Differential Revision:
2022-08-23 19:15:48 +00:00
Robert Wing
3454a7caa0 kqueue: retire knlist_init_rw_reader()
Last usage was removed in afa85850e7.

Reviewed by:	pauamma, melifaro, kib
Differential Revision:	https://reviews.freebsd.org/D36205
2022-08-20 21:17:39 -08:00
Konstantin Belousov
f829268bcc Remove TDF_DOING_SA
We cannot see a thread with the flag set in unsuspend, after we stopped
doing SINGLE_ALLPROC from user processes.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D36207
2022-08-20 20:34:30 +03:00
Konstantin Belousov
5e5675cb4b Remove struct proc p_singlethr member
It does not serve any purpose after we stopped doing
thread_single(SINGLE_ALLPROC) from stoppable user processes.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D36207
2022-08-20 20:34:30 +03:00
Konstantin Belousov
2842ec6d99 REAP_KILL_PROC: kill processes in the threaded taskqueue context
There is a problem still left after the fixes to REAP_KILL_PROC.  The
handling of the stopping signals by sig_suspend_threads() can occur
outside the stopping process context by tdsendsignal(), and it uses
mostly the same mechanism of aborting sleeps as suspension.  In other
words, it badly interacts with thread_single(SINGLE_ALLPROC).

But unlike single threading from the process context, we cannot wait by
sleep for other single threading requests to pass, because we own
spinlock(s).

Fix this by moving both the thread_single(p2, SINGLE_ALLPROC), and the
signalling, to the threaded taskqueue which cannot be single-threaded
itself.

Reported and tested by:	pho
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D36207
2022-08-20 20:34:11 +03:00
Konstantin Belousov
5e9bba94bd fork_norfproc(): unlock p1 before retrying
Reported and reviewed by:	markj
Tested by:	pho
Syzkaller:	647212368c3f32c6f13f
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D36207
2022-08-20 20:33:18 +03:00
Konstantin Belousov
0a4f2ac3b7 kern_sig.c: style
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D36207
2022-08-20 20:33:18 +03:00
Konstantin Belousov
cdb58f9d04 ksiginfo_tryfree(): change return type to bool
The function result is already used as bool.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D36207
2022-08-20 20:33:18 +03:00
Konstantin Belousov
cc29f221aa ksiginfo_alloc(): change to directly take M_WAITOK/NOWAIT flags
Also style, and remove unneeded cast.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D36207
2022-08-20 20:33:17 +03:00
Konstantin Belousov
5c78797e42 reap_kill_proc_locked(): remove outdated part of the comment
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D36207
2022-08-20 20:33:17 +03:00
Konstantin Belousov
30b16a6bcf exit1(): update comment about thread_single()
We do not check single-threading conditions in trap, or when sleeping
uninterruptible.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D36207
2022-08-20 20:33:17 +03:00
Konstantin Belousov
f835be5822 sleepq_set_timeout_sbt(): correct comment to not talk about ticks
It is sbt now.  Also, explain what flags are.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D36207
2022-08-20 20:33:17 +03:00
Konstantin Belousov
da39a100db sleepq_check_ast_sc_locked(): update comment
The relock order is important not only for a signal delivery, but also
for the suspension requests.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D36207
2022-08-20 20:33:17 +03:00
Konstantin Belousov
bd76586bb7 fork_norfproc(): style
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D36207
2022-08-20 20:33:17 +03:00
Mateusz Guzik
497240def8 Retire clone_drain_lock
It is only ever xlocked in drain_dev_clone_events and the only consumer of
that routine does not need it -- eventhandler code already makes sure the
relevant callback is no longer running.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D36268
2022-08-20 09:44:05 +00:00
Gleb Smirnoff
820bafd0bc unix/dgram: don't panic if socket buffer has negative space
That's a legitimate scenario, although unlikely.

Reported by:	https://syzkaller.appspot.com/bug?extid=6e8be1ec8d77578a3df4
2022-08-19 12:15:38 -07:00
Mateusz Guzik
545db925c3 pipe: fix EOF case for non-blocking fds
In particular unbreaks 'go build'.
2022-08-18 21:23:53 +00:00
Gleb Smirnoff
e7d02be19d protosw: refactor protosw and domain static declaration and load
o Assert that every protosw has pr_attach.  Now this structure is
  only for socket protocols declarations and nothing else.
o Merge struct pr_usrreqs into struct protosw.  This was suggested
  in 1996 by wollman@ (see 7b187005d1), and later reiterated
  in 2006 by rwatson@ (see 6fbb9cf860).
o Make struct domain hold a variable sized array of protosw pointers.
  For most protocols these pointers are initialized statically.
  Those domains that may have loadable protocols have spacers. IPv4
  and IPv6 have 8 spacers each (andre@ dff3237ee5).
o For inetsw and inet6sw leave a comment noting that many protosw
  entries very likely are dead code.
o Refactor pf_proto_[un]register() into protosw_[un]register().
o Isolate pr_*_notsupp() methods into uipc_domain.c

Reviewed by:		melifaro
Differential revision:	https://reviews.freebsd.org/D36232
2022-08-17 11:50:32 -07:00
Gleb Smirnoff
f6dc5aa342 unix: use private enum as argument for unp_connect2()
instead of using historic PRU_ flags that are now not used by anything
rather than TCP debugging.
2022-08-17 11:50:31 -07:00
Gleb Smirnoff
81a34d374e protosw: retire pr_drain and use EVENTHANDLER(9) directly
The method was called for two different conditions: 1) the VM layer is
low on pages or 2) one of UMA zones of mbuf allocator exhausted.
This change 2) into a new event handler, but all affected network
subsystems modified to subscribe to both, so this change shall not
bring functional changes under different low memory situations.

There were three subsystems still using pr_drain: TCP, SCTP and frag6.
The latter had its protosw entry for the only reason to register its
pr_drain method.

Reviewed by:		tuexen, melifaro
Differential revision:	https://reviews.freebsd.org/D36164
2022-08-17 11:50:31 -07:00
Gleb Smirnoff
1922eb3e9c protosw: retire pr_slowtimo and pr_fasttimo
They were useful many years ago, when the callwheel was not efficient,
and the kernel tried to have as little callout entries scheduled as
possible.

Reviewed by:		tuexen, melifaro
Differential revision:	https://reviews.freebsd.org/D36163
2022-08-17 11:50:31 -07:00
Gleb Smirnoff
78b1fc05b2 protosw: separate pr_input and pr_ctlinput out of protosw
The protosw KPI historically has implemented two quite orthogonal
things: protocols that implement a certain kind of socket, and
protocols that are IPv4/IPv6 protocol.  These two things do not
make one-to-one correspondence. The pr_input and pr_ctlinput methods
were utilized only in IP protocols.  This strange duality required
IP protocols that doesn't have a socket to declare protosw, e.g.
carp(4).  On the other hand developers of socket protocols thought
that they need to define pr_input/pr_ctlinput always, which lead to
strange dead code, e.g. div_input() or sdp_ctlinput().

With this change pr_input and pr_ctlinput as part of protosw disappear
and IPv4/IPv6 get their private single level protocol switch table
ip_protox[] and ip6_protox[] respectively, pointing at array of
ipproto_input_t functions.  The pr_ctlinput that was used for
control input coming from the network (ICMP, ICMPv6) is now represented
by ip_ctlprotox[] and ip6_ctlprotox[].

ipproto_register() becomes the only official way to register in the
table.  Those protocols that were always static and unlikely anybody
is interested in making them loadable, are now registered by ip_init(),
ip6_init().  An IP protocol that considers itself unloadable shall
register itself within its own private SYSINIT().

Reviewed by:		tuexen, melifaro
Differential revision:	https://reviews.freebsd.org/D36157
2022-08-17 11:50:31 -07:00
Gleb Smirnoff
489482e276 ipsec: isolate knowledge about protocols that are last header
Retire PR_LASTHDR protosw flag.

Reviewed by:		ae
Differential revision:	https://reviews.freebsd.org/D36155
2022-08-17 08:24:28 -07:00
Mateusz Guzik
9ac6eda6c6 pipe: try to skip locking the pipe if a non-blocking fd is used
Reviewed by:	markj (previous version)
Differential Revision:	https://reviews.freebsd.org/D36082
2022-08-17 14:23:34 +00:00
Ed Maste
fbafa98a94 Disallow invalid PT_GNU_STACK
Stack must be at least readable and writable.

PR:		242570
Reviewed by:	kib, markj
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35867
2022-08-16 15:52:21 -04:00
Mateusz Guzik
84a0be4a23 vfs: plug a dead store in kern_linkat_vp
Reported by:	clang --analyze
2022-08-16 10:46:29 +00:00
Mateusz Guzik
eb9a1f9c68 vfs: plug dead store in vn_io_fault1
Reported by:	clang --analyze
2022-08-16 10:46:20 +00:00
Dimitry Andric
9762d48b7f Adjust function definition in kern_poll.c to avoid clang 15 warning
With clang 15, the following -Werror warning is produced:

    sys/kern/kern_poll.c:374:16: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
    netisr_pollmore()
                   ^
                    void

This is because netisr_pollmore() is declared with a (void) argument
list, but defined with an empty argument list. Make the definition match
the declaration.

MFC after:	3 days
2022-08-14 21:27:34 +02:00
Alexander V. Chernikov
9b967bd65d domains: allow domains to be unloaded
Add domain_remove() SYSUNINT callback that removes the domain
 from the domain list if it has DOMF_UNLOADABLE flag set.
This change is required to support netlink ( D36002 ).

Reviewed by:	glebius
Differential Revision: https://reviews.freebsd.org/D36173
2022-08-14 09:22:33 +00:00
Gleb Smirnoff
f277746e13 protosw: change prototype for pr_control
For some reason protosw.h is used during world complation and userland
is not aware of caddr_t, a relic from the first version of C.  Broken
buildworld is good reason to get rid of yet another caddr_t in kernel.

Fixes:	886fc1e804
2022-08-12 12:08:18 -07:00
Gleb Smirnoff
948f31d7b0 netinet: do not broadcast PRC_REDIRECT_HOST on ICMP redirect
This is expensive and useless call.  It has been useless since Alexander
melifaro@ moved the forwarding table to nexthops with passive invalidation.
What happens now is that cached route in a inpcb would get invalidated
on next ip_output().

These were the last users of pfctlinput(), so garbage collect it.

Reviewed by:		melifaro
Differential revision:	https://reviews.freebsd.org/D36156
2022-08-12 08:31:29 -07:00
Gleb Smirnoff
8c77967ecc protosw: retire pr_output method
The only place to execute this method was raw_usend(). Only those
protocols that used raw socket were able to actually enter that method.
All pr_output assignments being deleted by this commit were a dead code
for many years.

Reviewed by:		melifaro
Differential revision:	https://reviews.freebsd.org/D36126
2022-08-11 09:19:37 -07:00
Andrew Turner
8e9ca1379e Adjust function definition in subr_devmap.c to avoid clang 15 warning
With clang 15, the following -Werror warning is produced:

    sys/kern/subr_devmap.c:87:19: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
    devmap_print_table()
                      ^
                       void

This is because devmap_print_table() and devmap_lastaddr() are declared
with a (void) argument list, but defined with an empty argument list.
Make the definition match the declaration.

Sponsored by:	The FreeBSD Foundation
2022-08-11 14:30:32 +01:00
Alexander V. Chernikov
102e6817f0 devd: move all devd notification logic to a separate file.
Currently, subr_bus.c shares logic for (a) maintaining all HW devices
 (e.g. discovery/attach/detach logic) and (b) generic devctl notification
 layer for devices/PMU/GEOM/interfaces/etc).
These two subsystems share really tiny interaction interface, composed of 3
 notification functions. With that in mind, move devctl layer to a
 separate file, establishing a clear notification interface between the
 sub.c bus layer and the provider (devctl).

The primary driver of this change is netlink implementation (D36002).
The idea is to propagate device-level events to netlink as well, so all
 netlink customers can subscribe to these changes.
The long-term goal is to deprecate devctl and to use netlink as the
 kernel<> userland transport provided netlink gets enough traction.

Reviewed by:	imp, markj
Differential Revision: https://reviews.freebsd.org/D36091
MFC after:	1 month
2022-08-10 18:56:01 +00:00
Gleb Smirnoff
07285bb4c2 tcp: utilize new solisten_clone() and solisten_enqueue()
This streamlines cloning of a socket from a listener.  Now we do not
drop the inpcb lock during creation of a new socket, do not do useless
state transitions, and put a fully initialized socket+inpcb+tcpcb into
the listen queue.

Before this change, first we would allocate the socket and inpcb+tcpcb via
tcp_usr_attach() as TCPS_CLOSED, link them into global list of pcbs, unlock
pcb and put this onto incomplete queue (see 6f3caa6d81).  Then, after
sonewconn() we would lock it again, transition into TCPS_SYN_RECEIVED,
insert into inpcb hash, finalize initialization of tcpcb.  And then, in
call into tcp_do_segment() and upon transition to TCPS_ESTABLISHED call
soisconnected().  This call would lock the listening socket once again
with a LOR protection sequence and then we would relocate the socket onto
the complete queue and only now it is ready for accept(2).

Reviewed by:		rrs, tuexen
Differential revision:	https://reviews.freebsd.org/D36064
2022-08-10 11:09:34 -07:00
Gleb Smirnoff
8f5a0a2e4f sockets: provide solisten_clone(), solisten_enqueue()
as alternative KPI to sonewconn().  The latter has three stages:
- check the listening socket queue limits
- allocate a new socket
- call into protocol attach method
- link the new socket into the listen queue of the listening socket

The attach method, originally designed for a creation of socket by the
socket(2) syscall has slightly different semantics than attach of a socket
cloned by listener.  Make it possible for protocols to call into the
first stage, then perform a different attach, and then call into the
final stage.  The first stage, that checks limits and clones a socket
is called solisten_clone(), and the function that enqueues the socket
is solisten_enqueue().

Reviewed by:		tuexen
Differential revision:	https://reviews.freebsd.org/D36063
2022-08-10 11:09:34 -07:00
Konstantin Belousov
00d17cf342 elf_note_prpsinfo: handle more failures from proc_getargv()
Resulting sbuf_len() from proc_getargv() might return 0 if user mangled
ps_strings enough. Also, sbuf_len() API contract is to return -1 if the
buffer overflowed. The later should not occur because get_ps_strings()
checks for catenated length, but check for this subtle detail explicitly
as well to be more resilent.

The end result is that p_comm is used in this situations.

Approved by:	so
Security:	FreeBSD-SA-22:09.elf
Reported by:	Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Reviewed by:	delphij, markj
admbugs:	988
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35391
2022-08-09 15:44:45 -04:00
Konstantin Belousov
1b0a4974c5 thread_create(): call cpu_copy_thread() after td_pflags is zeroed
By calling the function too early we might still have the td_pflags
value cached from the previous struct thread use. cpu_copy_thread()
depends on correct value for TDP_KTHREAD at least on x86.

Reported, bisected, and tested by:	pho
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D36069
2022-08-08 19:44:17 +03:00
Gordon Bergling
fa1ac9693a vnode(9): Fix a typo in a source code comment
- s/paramater/parameter/

MFC after:	3 days
2022-08-07 16:08:43 +02:00
Ed Maste
f0687f3e0e Clarify code comments on ASLR default settings
Sponsored by:	The FreeBSD Foundation
2022-08-05 10:01:16 -04:00