18930 Commits

Author SHA1 Message Date
Mateusz Guzik
0a048d4a98 mbuf: plug set-but-not-used vars
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-12-09 17:59:11 +00:00
Justin Hibbits
d2de68811a Fix assert check for SV_DSO_SIG in exec_sysvec_init_secondary()
The only requirement for SV_DSO_SIG here is that the flags match between
the source and target sysentvec.

The current assertion is too strict and fails on powerpc64, the only
other architecture than amd64 that uses this function, which doesn't
implement sigtramp in a VDSO.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D33355
2021-12-08 22:54:07 -06:00
Konstantin Belousov
b7c55487ff Regen 2021-12-09 02:49:10 +02:00
Konstantin Belousov
5346570276 swapoff: add one more variant of the syscall
Requested and reviewed by:	brooks
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33343
2021-12-09 02:48:46 +02:00
Konstantin Belousov
c1a8472793 syscalls: add COMPAT13
Reviewed by:	brooks
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33343
2021-12-09 02:48:32 +02:00
Konstantin Belousov
ecd8245e0d Kernel linkers: add emergency sysctl to restore old behavior
allowing linking to static symbols from other files.  Default the new
settings to true, delaying the change of the kernel linker behavior
for other day.

Suggested by:	emaste
PR:	207898
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32878
2021-12-08 23:32:30 +02:00
Konstantin Belousov
95c20faf11 kernel linker: do not read debug symbol tables for non-debug symbols
In particular, this prevents resolving locals from other files.
To access debug symbol tables, add LINKER_LOOKUP_DEBUG_SYMBOL and
LINKER_DEBUG_SYMBOL_VALUES kobj methods, which are allowed to use
any types of present symbols in all tables.

PR:	207898
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32878
2021-12-08 23:32:29 +02:00
Konstantin Belousov
72f6662662 linker_debug_symbol_values(): use proper linker interface to get debug values
Reported by:	markj
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32878
2021-12-08 23:32:26 +02:00
Konstantin Belousov
c37c6f994f Style
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32878
2021-12-08 23:32:20 +02:00
Konstantin Belousov
794d3e8e63 fcntl(2): add F_KINFO operation
that returns struct kinfo_file for the given file descriptor.  Among
other data, it also returns kf_path, if file op was able to restore file
path.

Reviewed by:	jhb, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33277
2021-12-06 22:18:09 +02:00
Konstantin Belousov
6e51d61a96 Add declaration for static export_file_to_kinfo()
Reviewed by:	jhb, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33277
2021-12-06 22:18:09 +02:00
Konstantin Belousov
eb02958748 Add kern.elf{32,64}.vdso knobs to enable/disable vdso preloading
Reviewed by:	emaste
Discussed with:	jrtc27
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D32960
2021-12-06 20:46:49 +02:00
Konstantin Belousov
98c8b62524 vdso for ia32 on amd64
Reviewed by:	emaste
Discussed with:	jrtc27
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D32960
2021-12-06 20:46:49 +02:00
Konstantin Belousov
290e05dde0 imgact_aout.c: some style
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32960
2021-12-06 20:46:49 +02:00
Konstantin Belousov
01c77a436e Pass vdso address to userspace
Reviewed by:	emaste
Discussed with:	jrtc27
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D32960
2021-12-06 20:46:49 +02:00
Konstantin Belousov
ab4524b3d7 amd64: wrap 64bit sigtramp into vdso
Reviewed by:	emaste
Discussed with:	jrtc27
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D32960
2021-12-06 20:46:49 +02:00
Konstantin Belousov
9da5257e1c imgact_aout.c: We do not expect the aout support to be ported
Specify that the only supported architecture for a.out is ia32 (either
i386 or amd64 host kernel).

Requested by:	emaste
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32960
2021-12-06 20:46:49 +02:00
Scott Long
95d35d7a0e Fix "set but not used" in kern_cpu.c
Sponsored by: Rubicon Communications, LLC ("Netgate")
2021-12-05 15:33:04 -07:00
Konstantin Belousov
a5c2d59ed3 Expand comment explaining reasons for automatic swapoff on shutdown
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33167
2021-12-03 10:42:21 +02:00
Cy Schubert
db0ac6ded6 Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816"
This reverts commit 266f97b5e9a7958e365e78288616a459b40d924a, reversing
changes made to a10253cffea84c0c980a36ba6776b00ed96c3e3b.

A mismerge of a merge to catch up to main resulted in files being
committed which should not have been.
2021-12-02 14:45:04 -08:00
Cy Schubert
266f97b5e9 wpa: Import wpa_supplicant/hostapd commit 14ab4a816
This is the November update to vendor/wpa committed upstream 2021-11-26.

MFC after:      1 month
2021-12-02 13:35:14 -08:00
Gleb Smirnoff
d96fccc505 epoch: with EPOCH_TRACE add epoch_where_report()
which will report where the epoch was entered and also
mark the tracker, so that exit will also be reported.

Helps to understand epoch entrance/exit scenarios in
complex cases, like network stack.  As everything else
under EPOCH_TRACE it is a developer only tool.
2021-12-02 11:02:51 -08:00
Gleb Smirnoff
de2d47842e SMR protection for inpcbs
With introduction of epoch(9) synchronization to network stack the
inpcb database became protected by the network epoch together with
static network data (interfaces, addresses, etc).  However, inpcb
aren't static in nature, they are created and destroyed all the
time, which creates some traffic on the epoch(9) garbage collector.

Fairly new feature of uma(9) - Safe Memory Reclamation allows to
safely free memory in page-sized batches, with virtually zero
overhead compared to uma_zfree().  However, unlike epoch(9), it
puts stricter requirement on the access to the protected memory,
needing the critical(9) section to access it.  Details:

- The database is already build on CK lists, thanks to epoch(9).
- For write access nothing is changed.
- For a lookup in the database SMR section is now required.
  Once the desired inpcb is found we need to transition from SMR
  section to r/w lock on the inpcb itself, with a check that inpcb
  isn't yet freed.  This requires some compexity, since SMR section
  itself is a critical(9) section.  The complexity is hidden from
  KPI users in inp_smr_lock().
- For a inpcb list traversal (a pcblist sysctl, or broadcast
  notification) also a new KPI is provided, that hides internals of
  the database - inp_next(struct inp_iterator *).

Reviewed by:		rrs
Differential revision:	https://reviews.freebsd.org/D33022
2021-12-02 10:48:48 -08:00
Gordon Bergling
fe96f62d61 kern: Correct a typo in a sysctl description
- s/osbolete/obsolete/

MFC after:	3 days
2021-12-02 10:54:15 +01:00
Warner Losh
1c7d15b030 Make device_busy/unbusy work w/o Giant held
The vast majority of the busy/unbusy users in the tree don't acquire
Giant before calling device_busy/unbusy. However, if multiple threads
are opening a file, say, that causes the device to busy/unbusy, then we
can race to the root marking things busy. Move to using a reference
count to keep track of how many times a device_t has been made busy. Use
that count to make the same decisions that we'd make with the old device
state.

Note: gpiopps.c uses D_TRACKCLOSE. Others do as well. However, there's a
known race with closes that will be corrected for all the drivers that
do this in a future commit.

Sponsored by:		Netflix
Reviewed by:		hselasky, jhb
Differential Revision:	https://reviews.freebsd.org/D26284
2021-11-30 15:18:01 -07:00
Warner Losh
25c49c426c Revert "Make device_busy/unbusy work w/o Giant held"
This reverts commit 08e781915363f98f4318a864b3b5a52bd99424c6.

Commit message was for a very old version of the patch. Will re-commit
with the right one since it's so bad. There's no locked versions of
it...that code was reworked to use refcnt APIs.

Noticed by:	jhb, jtrc27
Sponsored by:	Netflix
2021-11-30 15:17:07 -07:00
Warner Losh
08e7819153 Make device_busy/unbusy work w/o Giant held
The vast majority of the busy/unbusy users in the tree don't acquire Giant
before calling device_busy/unbusy. However, if multiple threads are opening a
file, say, that causes the device to busy/unbusy, then we can race to the root
marking things busy. Create a new device_busy_locked and device_unbusy_locked
that are the current implemntations of device_busy and device_unbusy. Make
device_busy and unbusy acquire Giant before calling the _locked versrions. Since
we never sleep in the busy/unbusy path, Giant's single threaded semantics
suffice to keep this safe.

Sponsored by:		Netflix
Reviewed by:		hselasky, jhb
Differential Revision:	https://reviews.freebsd.org/D26284
2021-11-30 15:03:26 -07:00
Andriy Gapon
3d9d64aa18 kern_tc: unify timecounter to bintime delta conversion
There are two places where we convert from a timecounter delta to
a bintime delta: tc_windup and bintime_off.
Both functions use the same calculations when the timecounter delta is
small.  But for a large delta (greater than approximately an equivalent
of 1 second) the calculations were different.  Both functions use
approximate calculations based on th_scale that avoid division.  Both
produce values slightly greater than a true value, calculated with
division by tc_frequency, would be.  tc_windup is slightly more
accurate, so its result is closer to the true value and, thus, smaller
than bintime_off result.

As a consequence there can be a jump back in time when time hands are
switched after a long period of time (a large delta).  Just before the
switch the time would be calculated with a large delta from
th_offset_count in bintime_off.  tc_windup does the switch using its own
calculations of a new th_offset using the large delta.  As explained
earlier, the new th_offset may end up being less than the previously
produced binuptime.  So, for a period of time new binuptime values may
be "back in time" comparing to values just before the switch.

Such a jump must never happen.  All the code assumes that the uptime is
monotonically nondecreasing and some code works incorrectly when that
assumption is broken.  For example, we have observed sleepq_timeout()
ignoring a timeout when the sbinuptime value obtained by the callout
code was greater than the expiration value, but the sbinuptime obtained
in sleepq_timeout() was less than it.  In that case the target thread
would never get woken up.

The unified calculations should ensure the monotonic property of the
uptime.

The problem is quite rare as normally tc_windup should be called HZ
times per second (typically 1000 or 100).  But it may happen in VMs on
very busy hypervisors where a VM's virtual CPU may not get an execution
time slot for a second or more.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	Panzura LLC
2021-11-30 15:23:23 +02:00
Gordon Bergling
b6f4818a7e vfs: Fix a typo in a sysctl description
- s/dependecies/dependencies/

MFC after:	3 days
2021-11-30 07:28:40 +01:00
Brooks Davis
0e765d9b08 syscalls: regen 2021-11-29 22:04:58 +00:00
Brooks Davis
6d37a1670b syscalls: mprotect does not take a const
The mprotect syscall decleration is not const.  I added this one
incorrectly in a944d28d0edf7ceb1bef4d789dfa4e8e18331658.

Reported by:	kib
Reviewed by:	kib, imp
2021-11-29 22:04:47 +00:00
Brooks Davis
401eec3635 syscalls: regen 2021-11-29 22:04:44 +00:00
Brooks Davis
a8efd4d1b3 syscalls: make syscall and __syscall SYSMUX
Rather than combining the declearation of nosys with the registration
of SYS_syscall, declare syscall(2) and __syscall(2) with the new
SYSMUX type in syscalls.master and declare nosys directly.  This
eliminates the last use of syscall aliases in the tree.

Reviewed by:	kib, imp
2021-11-29 22:04:44 +00:00
Brooks Davis
d7f306c5be makesyscalls: add a new SYSMUX type
This type is for system call multiplexers (syscall(2), __syscall(2))
that don't have a normal handler and instead are handled in the
machine-dependent syscall code.

Reviewed by:	kib, imp
2021-11-29 22:04:43 +00:00
Brooks Davis
5c1835b1d4 syscalls: regen 2021-11-29 22:04:43 +00:00
Brooks Davis
cffb55f0f3 syscalls: normalize exit
Declare the exit system call normally.  This results in the
implementation being named sys_exit rather than sys_sys_exit and
being decalred as returning an int.  Infact it does not return
at all because exit1 does not, so add an __unreachable() to let the
compiler know that.

Reviewed by:	kib, imp
2021-11-29 22:04:43 +00:00
Brooks Davis
7fb006e7d6 syscalls: regen 2021-11-29 22:04:42 +00:00
Brooks Davis
638c5fa8df syscalls: normalize (get|set)rlimit
Declare normal <foo>_args structs rather than going out of the way
to declare __<foo>_args.

Reviewed by:	kib, imp
2021-11-29 22:04:42 +00:00
Brooks Davis
c2996f8ad9 syscalls: regen 2021-11-29 22:04:42 +00:00
Brooks Davis
ba4e5253a3 syscalls: normalize orecvfrom and ogetsockname
Declare o<foo>_args rather than reusing the equivalent <foo>_args
structs.  Avoiding the addition of a new type isn't worth the
gratutious differences.

Reviewed by:	kib, imp
2021-11-29 22:04:42 +00:00
Brooks Davis
28f0471884 uipc: rework recvfrom, getsockname, getpeername
Stop using <foo>_args structs as part of internal kernel APIs.  Add
a kern_recvfrom and adjust getsockname and getpeername's equivalent
functions to take individual arguments rather than a uap pointer.

Adopt a convention from CheriBSD that a function interacting with
userspace pointers and sitting between the sys_<foo> syscall and
kern_<foo> implementation is named user_<foo>.

Reviewed by:	kib, imp
2021-11-29 22:04:41 +00:00
Brooks Davis
3660e76a22 syscalls: correct a couple style issues
Reviewed by:	kib, imp
2021-11-29 22:04:41 +00:00
Brooks Davis
33f9ea209e syscalls: add missing SAL annotations
freebsd7_shmctl was missing an annotation

Reviewed by:	kib, imp
2021-11-29 22:04:41 +00:00
Konstantin Belousov
08bb51f8d6 shutdown: unmount filesystems after swapoff
Swap on file requires operational underlying mount, otherwise
swapoff_all() is guaranteed to panic due to the default strategy VOP for
reclaimed vnodes.

Reported and tested by:	peterj
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33147
2021-11-29 18:38:02 +02:00
Konstantin Belousov
4f924a786a linker_kldload_busy(): allow recursion
Some drivers recursively loads modules by explicit calls to kldload
during initialization, which might occur during kldload.

PR:	259748
Reported and tested by:	thj
Reviewed by:	markj
Sponsored by:	Nvidia networking
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32972
2021-11-28 10:36:09 +02:00
Mateusz Guzik
e511bd1406 vfs: fully lockless v_writecount adjustment
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D33128
2021-11-27 23:07:26 +00:00
Mateusz Guzik
4dcdf3987c vfs: replace the MNTK_TEXT_REFS flag with VIRF_TEXT_REF
This allows to stop maintaing the VI_TEXT_REF flag and consequently
opens up fully lockless v_writecount adjustment.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D33127
2021-11-27 23:07:25 +00:00
Mateusz Guzik
054f5815c5 vfs: plug a set-but-not-used var in kern_alternate_path
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-11-26 12:22:09 +00:00
Mateusz Guzik
3ffcfa599e vfs: add vop_stdadd_writecount_nomsync
This avoids needing to inspect the mount point every time.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D33125
2021-11-26 12:06:08 +00:00
Mateusz Guzik
7e1d3eefd4 vfs: remove the unused thread argument from NDINIT*
See b4a58fbf640409a1 ("vfs: remove cn_thread")

Bump __FreeBSD_version to 1400043.
2021-11-25 22:50:42 +00:00