Commit Graph

18417 Commits

Author SHA1 Message Date
Mateusz Guzik
904a08f342 ktls: switch bare zone_mbuf use to m_free_raw
Reviewed by:	gallatin
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30955
2021-07-02 08:30:22 +00:00
Mateusz Guzik
05462babd4 mbuf: add m_free_raw to be used instead of directly calling uma_zfree
The intent is to remove all direct zone_mbuf consumers so that ctor/dtor
from that zone can be reimplemented as wrappers around uma, avoiding an
indirect function call.

Reviewed by:	kbowling
Discussed with:	gallatin
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30959
2021-07-02 08:30:22 +00:00
Mateusz Guzik
fb32c8dbeb iflib: retire MB_DTOR_SKIP
The flag was added in 2016 but remains unused.

Reviewed by:	kbowling
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30958
2021-07-02 08:30:22 +00:00
Edward Tomasz Napierala
db8d680ebe procctl(2): add PROC_NO_NEW_PRIVS_CTL, PROC_NO_NEW_PRIVS_STATUS
This introduces a new, per-process flag, "NO_NEW_PRIVS", which
is inherited, preserved on exec, and cannot be cleared.  The flag,
when set, makes subsequent execs ignore any SUID and SGID bits,
instead executing those binaries as if they not set.

The main purpose of the flag is implementation of Linux
PROC_SET_NO_NEW_PRIVS prctl(2), and possibly also unpriviledged
chroot.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30939
2021-07-01 09:42:07 +01:00
Dmitry Chagin
5d9f790191 Eliminate p_elf_machine from struct proc.
Instead of p_elf_machine use machine member of the Elf_Brandinfo which is now
cached in the struct proc at p_elf_brandinfo member.

Note to MFC: D30918, KBI

Reviewed by:		kib, markj
Differential Revision:	https://reviews.freebsd.org/D30926
MFC after:		2 weeks
2021-06-29 20:18:29 +03:00
Dmitry Chagin
615f22b2fb Add a link to the Elf_Brandinfo into the struc proc.
To allow the ABI to make a dicision based on the Brandinfo add a link
to the Elf_Brandinfo into the struct proc. Add a note that the high 8 bits
of Elf_Brandinfo flags is private to the ABI.

Note to MFC: it breaks KBI.

Reviewed by:		kib, markj
Differential Revision:	https://reviews.freebsd.org/D30918
MFC after:		2 weeks
2021-06-29 20:15:08 +03:00
Edward Tomasz Napierala
435754a59e Add infrastructure required for Linux coredump support
This adds `sv_elf_core_osabi`, `sv_elf_core_abi_vendor`,
and `sv_elf_core_prepare_notes` fields to `struct sysentvec`,
and modifies imgact_elf.c to make use of them instead
of hardcoding FreeBSD-specific values.  It also updates all
of the ABI definitions to preserve current behaviour.

This makes it possible to implement non-native ELF coredump
support without unnecessary code duplication.  It will be used
for Linux coredumps.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30921
2021-06-29 08:49:12 +01:00
Edward Tomasz Napierala
61b4c62718 imgact_elf.c: style, remove unnecessary casts
Remove unnecessary type casts and redundant brackets.
No functional changes.

Suggested By:	kib
Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30841
2021-06-27 17:05:59 +01:00
Alexander Motin
6df35af4d8 Allow sleepq_signal() to drop the lock.
Introduce SLEEPQ_DROP sleepq_signal() flag, allowing one to drop the
sleep queue chain lock before returning.  Reduced lock scope allows
significantly reduce lock contention inside taskqueue_enqueue() for
ZFS worker threads doing ~350K disk reads/s on 40-thread system.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2021-06-25 14:12:21 -04:00
Konstantin Belousov
802cf4ab0e namei: add NDPREINIT() macro
Its intent is to do the initialization of the future part of struct nameidata
which should be used across several namei() and VOPs.  Right now it is NOP.

Reviewed by:	mckusick
Discussed with:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D30041
2021-06-23 23:46:15 +03:00
Warner Losh
ddfc9c4c59 newbus: Move from bus_child_{pnpinfo,location}_src to bus_child_{pnpinfo,location} with sbuf
Now that the upper layers all go through a layer to tie into these
information functions that translates an sbuf into char * and len. The
current interface suffers issues of what to do in cases of truncation,
etc. Instead, migrate all these functions to using struct sbuf and these
issues go away. The caller is also in charge of any memory allocation
and/or expansion that's needed during this process.

Create a bus_generic_child_{pnpinfo,location} and make it default. It
just returns success. This is for those busses that have no information
for these items. Migrate the now-empty routines to using this as
appropriate.

Document these new interfaces with man pages, and oversight from before.

Reviewed by:		jhb, bcr
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D29937
2021-06-22 20:52:06 -06:00
Edward Tomasz Napierala
06250515cf imgact_elf: compute auxv buffer size instead of using magic value
The new buffer is somewhat larger, but there should be no functional
changes.

Reviewed By:	kib, imp
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30821
2021-06-21 17:07:07 +01:00
Colin Percival
fe51b5a76d kern_tslog: Include tslog data from loader
The i386 loader (and hopefully others to come) now passes tslog data
as a "preloaded module".  Include this in the data returned by the
debug.tslog sysctl.

Reviewed by:	kevans
2021-06-20 20:09:47 -07:00
Warner Losh
0a99422970 Move mips and arm to 1000Hz by default.
armv6 and armv7 systems already were 1000Hz. The other armv5 were a
mix of 100 and 1000. This changes them to 1000. Should there be
issues, we can add options HZ=100 to the systems that have bad
performance at the drop of a hat.

mips is a lot more complicated. But most of the systems are already
1000HZ. The hardware exceptions are all fast enough to run at
1000Hz. MALTA is our primary emulator, and history has shown emulators
tend to like 100Hz better, so run those systems at 100Hz. As with arm,
any system that shows a huge performance regression can reverted to
100Hz easily.

This was going to be committed well in advance of the 13 branch, but
it was delayed and forgotten til now.

Discussed on:	#bsdmips ages ago
Sponsored by:	Netflix
2021-06-16 20:00:14 -06:00
John Baldwin
faf0224ff2 ktls: Don't mark existing received mbufs notready for TOE TLS.
The TOE driver might receive decrypted TLS records that are enqueued
to the socket buffer after ktls_try_toe() returns and before
ktls_enable_rx() locks the receive buffer to call sb_mark_notready().
In that case, sb_mark_notready() would incorrectly treat the decrypted
TLS record as an encrypted record and schedule it for decryption.
This always resulted in the connection being dropped as the data in
the control message did not look like a valid TLS header.

To fix, don't try to handle software decryption of existing buffers in
the socket buffer for TOE TLS in ktls_enable_rx().  If a TOE TLS
driver needs to decrypt existing data in the socket buffer, the driver
will need to manage that in its tod_alloc_tls_session method.

Sponsored by:	Chelsio Communications
2021-06-15 17:45:21 -07:00
Konstantin Belousov
a12e901a5a Add a knob to disable dequeueing SIGCHLD on waiting for live process
It seems that Linux does not dequeue siginfo for SIGCHLD when wait*(2)
reports status of the running process.  In particular, sigwaitinfo(2)
and other signal querying syscalls can observe the siginfo after wait.

FreeBSD dequeued siginfo from the beginning, so we cannot change the
default ABI to be more compatible.  Still, add a knob to enable to
change to the other behavior for debugging purposes.

Reported by:	dchagin
Reviewed by:	dchagin, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30675
2021-06-16 02:00:19 +03:00
Konstantin Belousov
bc38762474 Add a knob to not drop signal with default ignored or ignored actions
Traditionally, BSD drops signals with the default action during send,
not even putting them to the destination process queue.  This semantic
is not shared with other operating systems (Linux), which do queue
such signals.  In particular, sigtimedwait(2) and related syscalls can
observe the delivery.

Add a global knob kern.sig_discard_ign which can be set to false to force
enqueuing of the signals with default action.  Also add an ABI flag to
indicate that signals should be queued.

Note that it is not practical to run with the knob turned on, because almost
all software that care about the delivery of such signals, is aware of the
difference, and misbehaves if the signals are actually queued.  The purpose
of the knob as is is to allow for easier diagnostic of the programs that
need the adjustments, to confirm the cause of problem.

Reported by:	dchagin
Reviewed by:	dchagin, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30675
2021-06-16 02:00:19 +03:00
Konstantin Belousov
acced8b043 sigwait: add comment explaining EINTR/ERESTART details
Reviewed by:	dchagin, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30675
2021-06-16 02:00:19 +03:00
Konstantin Belousov
afb36e289c sigwait(2) and sigtimedwait(2) must not be restarted.
Reported by:	dchagin
Reviewed by:	dchagin, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30675
2021-06-16 02:00:18 +03:00
Mark Johnston
a100217489 Consistently use the SOCKBUF_MTX() and SOCK_MTX() macros
This makes it easier to change the socket locking protocols.  No
functional change intended.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-06-14 17:32:32 -04:00
Mark Johnston
f4bb1869dd Consistently use the SOLISTENING() macro
Some code was using it already, but in many places we were testing
SO_ACCEPTCONN directly.  As a small step towards fixing some bugs
involving synchronization with listen(2), make the kernel consistently
use SOLISTENING().  No functional change intended.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-06-14 17:32:27 -04:00
Andrew Gallatin
ed5e13cfc2 ktls: Fix interaction with RATELIMIT
uipc_ktls.c was missing opt_ratelimit.h, so it was
never noticing that RATELIMIT was enabled.  Once it was
enabled, it failed to compile as  ktls_modify_txrtlmt()
had accrued a compilation error when it was not being
compiled in.

Sponsored by: Netflix
2021-06-14 10:51:16 -04:00
Dmitry Chagin
e884512ad1 Split kern_poll() on two counterparts.
The kern_poll_kfds() operates on clear kernel data, kfds points to an
array in the kernel, while kern_poll() operates on user supplied pollfd.
Move nfds check to kern_poll_maxfds().

No functional changes, it's for future use in the Linux emulation layer.

Reviewd by:		kib
Differential Revision:	https://reviews.freebsd.org/D30690
MFC after:		2 weeks
2021-06-10 15:11:25 +03:00
Dmitry Chagin
f570a6723e Fix copyright, remove "all rights reserved".
The eventfd code was written by me, rdivacky@ copyrigth applicable only
to epoll part of the Linuxulator code. Roman is ok to retire his copyright
from sys/kern/sys_eventfd.c and 'All rights reserved.' lines from
sys/compat/linux/linux_event.[c|h] and sys/kern/sys_eventfd.c files.

Reviewed by:		kib, emaste
Approved by:		rdivacky
Differential Revision:	https://reviews.freebsd.org/D30677
MFC after:		2 weeks
2021-06-08 08:18:00 +03:00
Mark Johnston
887c753c9f Fix handling of D_GIANTOK
It was meant to suppress only the printf(), not the subsequent injection
of Giant-protected thunks for various file operations.

Fixes:		fbeb4ccac9
Reported by:	pho
Tested by:	pho
MFC after:	6 days
Pointy hat:	markj
2021-06-07 16:45:50 -04:00
Mark Johnston
fbeb4ccac9 Suppress D_NEEDGIANT warnings for some drivers
During boot we warn that the kbd and openfirm drivers are Giant-locked
and may be deleted.  Generally, the warning helps signal that certain
old drivers are not being maintained and are subject to removal, but
this doesn't really apply to certain drivers which are harder to
detangle from Giant.

Add a flag, D_GIANTOK, that devices can specify to suppress the
misleading warning.  Use it in the kbd and openfirm drivers.

Reviewed by:	imp, jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30649
2021-06-06 16:44:46 -04:00
Konstantin Belousov
2d423f7671 sysent: allow ABI to disable setid on exec.
Reviewed by:	dchagin
Tested by:	trasz
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28154
2021-06-06 21:42:52 +03:00
Konstantin Belousov
19e6043a44 kern_exec.c: Add execve_nosetid() helper
Reviewed by:	dchagin
Tested by:	trasz
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28154
2021-06-06 21:42:41 +03:00
Jason A. Harmening
59409cb90f Add a generic mechanism for preventing forced unmount
This is aimed at preventing stacked filesystems like nullfs and unionfs
from "losing" their lower mounts due to forced unmount.  Otherwise,
VFS operations that are passed through to the lower filesystem(s) may
crash or otherwise cause unpredictable behavior.

Introduce two new functions: vfs_pin_from_vp() and vfs_unpin().
which are intended to be called on the lower mount(s) when the stacked
filesystem is mounted and unmounted, respectively.
Much as registration in the mnt_uppers list previously did, pinning
will prevent even forced unmount of the lower FS and will allow the
stacked FS to freely operate on the lower mount either by direct
use of the struct mount* or indirect use through a properly-referenced
vnode's v_mount field.

vfs_pin_from_vp() is modeled after vfs_ref_from_vp() in that it uses
the mount interlock coupled with re-checking vp->v_mount to ensure
that it will fail in the face of a pending unmount request, even if
the concurrent unmount fully completes.

Adopt these new functions in both nullfs and unionfs.

Reviewed By:	kib, markj
Differential Revision: https://reviews.freebsd.org/D30401
2021-06-05 18:20:36 -07:00
wiklam
43521b46fc Correcting comment about "sched_interact_score".
Reviewed by:	jrtc@, imp@
Pull Request:	https://github.com/freebsd/freebsd-src/pull/431

Sponsored by:		Netflix
2021-06-02 21:50:57 -06:00
Warner Losh
9f3d1a98dd regen after tweaks to getgroups and setgroups
Sponsored by:		Netflix
2021-06-02 13:24:50 -06:00
Moritz Buhl
4bc2174a1b kern: fail getgroup and setgroup with negative int
Found using
https://github.com/NetBSD/src/blob/trunk/tests/lib/libc/sys/t_getgroups.c

getgroups/setgroups want an int and therefore casting it to u_int
resulted in `getgroups(-1, ...)` not returning -1 / errno = EINVAL.

imp@ updated syscall.master and made changes markj@ suggested

PR:			189941
Tested by:		imp@
Reviewed by:		markj@
Pull Request:		https://github.com/freebsd/freebsd-src/pull/407
Differential Revision:	https://reviews.freebsd.org/D30617
2021-06-02 13:22:57 -06:00
Mateusz Guzik
c9f8dcda85 kqueue: replace kq_ncallouts loop with atomic_fetchadd 2021-06-02 15:14:58 +00:00
Rich Ercolani
a19ae1b099 vfs: fix MNT_SYNCHRONOUS check in vn_write
ca1ce50b2b ("vfs: add more safety against concurrent forced
unmount to vn_write") has a side effect of only checking MNT_SYNCHRONOUS
if O_FSYNC is set.

Reviewed By: mjg
Differential Revision: https://reviews.freebsd.org/D30610
2021-06-02 13:42:02 +00:00
Kyle Evans
2d741f33bd kern: ether_gen_addr: randomize on default hostuuid, too
Currently, this will still hash the default (all zero) hostuuid and
potentially arrive at a MAC address that has a high chance of collision
if another interface of the same name appears in the same broadcast
domain on another host without a hostuuid, e.g., some virtual machine
setups.

Instead of using the default hostuuid, just treat it as a failure and
generate a random LA unicast MAC address.

Reviewed by:	bz, gbe, imp, kbowling, kp
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D29788
2021-06-01 22:59:21 -05:00
Mark Johnston
283e60fb31 ktrace: Fix an inverted comparison added in commit f3851b235
Fixes:		f3851b235 ("ktrace: Fix a race with fork()")
Reported by:	dchagin, phk
2021-06-01 09:15:35 -04:00
Konstantin Belousov
d3f7975fcb thread_reap_barrier(): remove unused variable
Noted by:	alc
Sponsored by:	Mellanox Technologies/NVidia Networking
MFC after:	1 week
2021-05-31 23:03:42 +03:00
Konstantin Belousov
f62c7e54e9 Add thread_reap_barrier()
Reviewed by:	hselasky,markj
Sponsored by:	Mellanox Technologies/NVidia Networking
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30468
2021-05-31 18:09:22 +03:00
Konstantin Belousov
3a68546d23 quisce_cpus(): add special handling for PDROP
Currently passing PDROP to the quisce_cpus() function does not make sense.
Add special meaning for it, by not waiting for the idle thread to schedule.

Also avoid allocating u_int[MAXCPU] on the stack.

Reviewed by:	hselasky, markj
Sponsored by:	Mellanox Technologies/NVidia Networking
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30468
2021-05-31 18:09:22 +03:00
Konstantin Belousov
845d77974b kern_thread.c: wrap too long lines
Reviewed by:	hselasky, markj
Sponsored by:	Mellanox Technologies/NVidia Networking
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30468
2021-05-31 18:09:22 +03:00
Konstantin Belousov
e266a0f7f0 kern linker: do not allow more than one kldload and kldunload syscalls simultaneously
kld_sx is dropped e.g. for executing sysinits, which allows user
to initiate kldunload while module is not yet fully initialized.

Reviewed by:	markj
Differential revision:	https://reviews.freebsd.org/D30456
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-05-31 18:09:22 +03:00
Konstantin Belousov
27006229f7 vinvalbuf: do not panic if we were unable to flush dirty buffers
Return EBUSY instead and let caller to handle the issue.

For vgone()/vnode reclamation, caller first does vinvalbuf(V_SAVE),
which return EBUSY in case dirty buffers where not flushed. Then caller
calls vinvalbuf(0) due to non-zero return, which gets rid of all dirty
buffers without dependencies.

PR:	238565
Reviewed by:	asomers, mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30555
2021-05-31 01:20:53 +03:00
Jason A. Harmening
a4b07a2701 VFS_QUOTACTL(9): allow implementation to indicate busy state changes
Instead of requiring all implementations of vfs_quotactl to unbusy
the mount for Q_QUOTAON and Q_QUOTAOFF, add an "mp_busy" in/out param
to VFS_QUOTACTL(9).  The implementation may then indicate to the caller
whether it needed to unbusy the mount.

Also, add stbool.h to libprocstat modules which #define _KERNEL
before including sys/mount.h.  Otherwise they'll pull in sys/types.h
before defining _KERNEL and therefore won't have the bool definition
they need for mp_busy.

Reviewed By:	kib, markj
Differential Revision: https://reviews.freebsd.org/D30556
2021-05-30 14:53:47 -07:00
Jason A. Harmening
271fcf1c28 Revert commits 6d3e78ad6c and 54256e7954
Parts of libprocstat like to pretend they're kernel components for the
sake of including mount.h, and including sys/types.h in the _KERNEL
case doesn't fix the build for some reason.  Revert both the
VFS_QUOTACTL() change and the follow-up "fix" for now.
2021-05-29 17:48:02 -07:00
Mateusz Guzik
3cf75ca220 vfs: retire unused vn_seqc_write_begin_unheld* 2021-05-29 22:04:09 +00:00
Mateusz Guzik
d81aefa8b7 vfs: use the sentinel trick in locked lookup path parsing 2021-05-29 22:04:09 +00:00
Mateusz Guzik
478c52f1e3 vfs: slightly rework vn_rlimit_fsize 2021-05-29 22:04:09 +00:00
Mateusz Guzik
9bfddb3ac4 fd: use PROC_WAIT_UNLOCKED when clearing p_fd/p_pd 2021-05-29 22:04:09 +00:00
Jason A. Harmening
6d3e78ad6c VFS_QUOTACTL(9): allow implementation to indicate busy state changes
Instead of requiring all implementations of vfs_quotactl to unbusy
the mount for Q_QUOTAON and Q_QUOTAOFF, add an "mp_busy" in/out param
to VFS_QUOTACTL(9).  The implementation may then indicate to the caller
whether it needed to unbusy the mount.

Reviewed By:	kib, markj
Differential Revision: https://reviews.freebsd.org/D30218
2021-05-29 14:05:39 -07:00
Mark Johnston
f3851b235b ktrace: Fix a race with fork()
ktrace(2) may toggle trace points in any of
1. a single process
2. all members of a process group
3. all descendents of the processes in 1 or 2

In the first two cases, we do not permit the operation if the process is
being forked or not visible. However, in case 3 we did not enforce this
restriction for descendents. As a result, the assertions about the child
in ktrprocfork() may be violated.

Move these checks into ktrops() so that they are applied consistently.

Allow KTROP_CLEAR for nascent processes. Otherwise, there is a window
where we cannot clear trace points for a nascent child if they are
inherited from the parent.

Reported by:	syzbot+d96676592978f137e05c@syzkaller.appspotmail.com
Reported by:	syzbot+7c98fcf84a4439f2817f@syzkaller.appspotmail.com
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30481
2021-05-27 15:52:20 -04:00