Commit Graph

18907 Commits

Author SHA1 Message Date
John Baldwin
308fc7e5b1 user_getpeername: Use 'bool' for the compat argument.
This matches user_getsockname.

Reviewed by:	brooks, kib
Sponsored by:	The University of Cambridge, Google Inc.
Differential Revision:	https://reviews.freebsd.org/D33987
2022-01-24 09:51:35 -08:00
Konstantin Belousov
fe6db72708 Add security.bsd.allow_ptrace sysctl
that disables any access to ptrace(2) for all processes.

Reviewed by:	emaste
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33986
2022-01-22 19:36:56 +02:00
Konstantin Belousov
55a0aa2162 p_candebug(), p_cansee(): always allow for curproc
Privilege checks in both functions should allow the current process to
infer information about itself, as well as use the interfaces that are
proclaimed 'debugging', for instance, procctl(2).

Note that in p_cansee() case, explicit comparision of curproc and p
avoids a race where the process might change credentials and cause
thread to compare its cached stale credentials against updated process
creds, effectively disallowing the process to observe itself.

Reviewed by:	emaste
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33986
2022-01-22 19:36:56 +02:00
Mark Johnston
6be8944d96 ktls: Zero out TLS_GET_RECORD control messages
Otherwise we end up copying one uninitialized byte into the socket
buffer.

Reported by:	KMSAN
Reviewed by:	jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33953
2022-01-20 15:42:46 -05:00
Mark Johnston
c3196306f0 clockcalib: Fix an overflow bug
tc_counter_mask is an unsigned int and in the TSC timecounter is equal
to UINT_MAX, so the addition tc->tc_counter_mask + 1 can overflow to 0,
resulting in a hang during boot.

Fixes:		c2705ceaeb ("x86: Speed up clock calibration")
Reviewed by:	cperciva
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33956
2022-01-20 08:23:38 -05:00
Alexander Motin
b7ff445ffa Reduce bufdaemon/bufspacedaemon shutdown time.
Before this change bufdaemon and bufspacedaemon threads used
kthread_shutdown() to stop activity on system shutdown.  The problem is
that kthread_shutdown() has no idea about the wait channel and lock used
by specific thread to wake them up reliably.  As result, up to 9 threads
could consume up to 9 seconds to shutdown for no good reason.

This change introduces specific shutdown functions, knowing how to
properly wake up specific threads, reducing wait for those threads on
shutdown/reboot from average 4 seconds to effectively zero.

MFC after:	2 weeks
Reviewed by:	kib, markj
Differential Revision:  https://reviews.freebsd.org/D33936
2022-01-18 19:26:16 -05:00
Mark Johnston
3ce04aca49 proc: Add a sysctl to fetch virtual address space layout info
This provides information about fixed regions of the target process'
user memory map.

Reviewed by:	kib
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33708
2022-01-17 16:12:43 -05:00
Mark Johnston
1811c1e957 exec: Reimplement stack address randomization
The approach taken by the stack gap implementation was to insert a
random gap between the top of the fixed stack mapping and the true top
of the main process stack.  This approach was chosen so as to avoid
randomizing the previously fixed address of certain process metadata
stored at the top of the stack, but had some shortcomings.  In
particular, mlockall(2) calls would wire the gap, bloating the process'
memory usage, and RLIMIT_STACK included the size of the gap so small
(< several MB) limits could not be used.

There is little value in storing each process' ps_strings at a fixed
location, as only very old programs hard-code this address; consumers
were converted decades ago to use a sysctl-based interface for this
purpose.  Thus, this change re-implements stack address randomization by
simply breaking the convention of storing ps_strings at a fixed
location, and randomizing the location of the entire stack mapping.
This implementation is simpler and avoids the problems mentioned above,
while being unlikely to break compatibility anywhere the default ASLR
settings are used.

The kern.elfN.aslr.stack_gap sysctl is renamed to kern.elfN.aslr.stack,
and is re-enabled by default.

PR:		260303
Reviewed by:	kib
Discussed with:	emaste, mw
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33704
2022-01-17 16:12:36 -05:00
Mark Johnston
758d98debe exec: Remove the stack gap implementation
ASLR stack randomization will reappear in a forthcoming commit.  Rather
than inserting a random gap into the stack mapping, the entire stack
mapping itself will be randomized in the same way that other mappings
are when ASLR is enabled.

No functional change intended, as the stack gap implementation is
currently disabled by default.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33704
2022-01-17 16:11:54 -05:00
Mark Johnston
706f4a81a8 exec: Introduce the PROC_PS_STRINGS() macro
Rather than fetching the ps_strings address directly from a process'
sysentvec, use this macro.  With stack address randomization the
ps_strings address is no longer fixed.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33704
2022-01-17 16:11:54 -05:00
Mark Johnston
5a8413e779 setrlimit: Remove special handling for RLIMIT_STACK with a stack gap
This will not be required with a forthcoming reimplementation of ASLR
stack randomization.  Moreover, this change was not sufficient to enable
the use of a stack size limit smaller than the stack gap itself.

PR:		260303
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33704
2022-01-17 11:42:13 -05:00
Mark Johnston
3fc21fdd5f sysent: Add a sv_psstringssz field to struct sysentvec
The size of the ps_strings structure varies between ABIs, so this is
useful for computing the address of the ps_strings structure relative to
the top of the stack when stack address randomization is enabled.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33704
2022-01-17 11:42:07 -05:00
Mark Johnston
1544f5add8 Revert "kern_exec: Add kern.stacktop sysctl."
The current ASLR stack gap feature will be removed, and with that the
need for the kern.stacktop sysctl is gone.  All consumers have been
removed.

This reverts commit a97d697122.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33704
2022-01-17 11:41:58 -05:00
Mark Johnston
dc7526170d posixshm: Report output buffer truncation from kern.ipc.posix_shm_list
PR:		240573
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33912
2022-01-17 08:35:19 -05:00
Alexander Motin
e76c010899 Fix inverse sleep logic in buf_daemon().
Before commit 3cec5c77d6 buf_daemon() went to longer 1s sleep if
numdirtybuffers <= lodirtybuffers.  After that commit new condition
!BIT_EMPTY(BUF_DOMAINS, &bdlodirty) got opposite -- true when one
or more more domains is above lodirtybuffers.  As result, on freshly
booted system with no dirty buffers buf_daemon() wakes up 10 times
per second and probably only 1 time per second when there is actual
work to do.

MFC after:	1 week
Reviewed by:	kib, markj
Tested by:	pho
Differential revision:	https://reviews.freebsd.org/D33890
2022-01-15 19:32:36 -05:00
Simon J. Gerraty
bacb140f31 Ignore calcru: runtime went backwards for vm_guest
VM's have little control over CPU speed, don't make matters worse
by constantly spaming console.

Reviewed by:	jhb
Differential Revision:	https://reviews.freebsd.org/D33902
2022-01-14 16:07:43 -08:00
Brooks Davis
0910a41ef3 Revert "syscallarg_t: Add a type for system call arguments"
Missed issues in truss on at least armv7 and powerpcspe need to be
resolved before recommit.

This reverts commit 3889fb8af0.
This reverts commit 1544e0f5d1.
2022-01-12 23:29:20 +00:00
Brooks Davis
3889fb8af0 sysent: regen for syscallarg_t 2022-01-12 22:51:25 +00:00
Brooks Davis
1544e0f5d1 syscallarg_t: Add a type for system call arguments
This more clearly differentiates system call arguments from integer
registers and return values. On current architectures it has no effect,
but on architectures where pointers are not integers (CHERI) and may
not even share registers (CHERI-MIPS) it is necessiary to differentiate
between system call arguments (syscallarg_t) and integer register values
(register_t).

Obtained from:	CheriBSD

Reviewed by:	imp, kib
Differential Revision:	https://reviews.freebsd.org/D33780
2022-01-12 22:51:25 +00:00
Colin Percival
c2705ceaeb x86: Speed up clock calibration
Prior to this commit, the TSC and local APIC frequencies were calibrated
at boot time by measuring the clocks before and after a one-second sleep.
This was simple and effective, but had the disadvantage of *requiring a
one-second sleep*.

Rather than making two clock measurements (before and after sleeping) we
now perform many measurements; and rather than simply subtracting the
starting count from the ending count, we calculate a best-fit regression
between the target clock and the reference clock (for which the current
best available timecounter is used). While we do this, we keep track
of an estimate of the uncertainty in the regression slope (aka. the ratio
of clock speeds), and stop measuring when we believe the uncertainty is
less than 1 PPM.

In order to avoid the risk of aliasing resulting from the data-gathering
loop synchronizing with (a multiple of) the frequency of the reference
clock, we add some additional spinning depending upon the iteration number.

For numerical stability and simplicity of implementation, we make use of
floating-point arithmetic for the statistical calculations.

On the author's Dell laptop, this reduces the time spent in calibration
from 2000 ms to 29 ms; on an EC2 c5.xlarge instance, it is reduced from
2000 ms to 2.5 ms.

Reviewed by:	bde (previous version), kib
MFC after:	1 month
Sponsored by:	https://www.patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D33802
2022-01-12 12:34:07 -08:00
Konstantin Belousov
a24afbb4e6 Ignore debugger-injected signals left after detaching
PR:	261010
Reported by:	Martin Simmons <martin@lispworks.com>
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33787
2022-01-12 07:33:30 +02:00
Alexander Motin
cb1f5d1136 Reduce minimum idle hardclock rate from 2Hz to 1Hz.
On idle 80-thread system it allows to improve package-level idle state
residency and so power consumption by several percent.

MFC after:	2 weeks
2022-01-09 19:25:56 -05:00
Konstantin Belousov
4a4b059a97 Add vfs_remount_ro()
a helper to remount filesystem from rw to ro.

Tested by:	pho
Reviewed by:	markj, mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33721
2022-01-08 05:41:44 +02:00
John Baldwin
7def1e10b3 bus_dma: Deduplicate locking helper functions.
- Move busdma_lock_mutex to subr_bus_dma.c.

- Move _busdma_lock_dflt to subr_bus_dma.c.  This function was named a
  couple of different things previously.  It is not a public API but
  an internal helper used in place of a NULL pointer.  The prototype
  is in <sys/bus_dma.h> as not all backends include
  <sys/bus_dma_internal.h>.

Reviewed by:	kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D33694
2022-01-05 13:50:40 -08:00
John Baldwin
85b4607324 Deduplicate bus_dma bounce code.
Move mostly duplicated code in various MD bus_dma backends to support
bounce pages into sys/kern/subr_busdma_bounce.c.  This file is
currently #include'd into the backends rather than compiled standalone
since it requires access to internal members of opaque bus_dma
structures such as bus_dmamap_t and bus_dma_tag_t.

Reviewed by:	kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D33684
2022-01-05 13:50:40 -08:00
John Baldwin
753c851387 sigev_findtd: Fix whitespace nit in argument list.
Obtained from:	CheriBSD
2022-01-04 13:37:39 -08:00
Gleb Smirnoff
644ca0846d domains: make domain_init() initialize only global state
Now that each module handles its global and VNET initialization
itself, there is no VNET related stuff left to do in domain_init().

Differential revision:	https://reviews.freebsd.org/D33541
2022-01-03 10:15:22 -08:00
Gleb Smirnoff
24e1c6ae7d domains: init with standard SYSINIT(9) or VNET_SYSINIT()
There left only three modules that used dom_init().  And netipsec
was the last one to use dom_destroy().

Differential revision:	https://reviews.freebsd.org/D33540
2022-01-03 10:15:22 -08:00
Gleb Smirnoff
340c7343f4 protocols: don't execute protosw_init() for every VNET
The function now modifies pr_usrreqs only, which are always
global.  Rename it to pr_usrreqs_init().

Differential revision:	https://reviews.freebsd.org/D33538
2022-01-03 10:15:21 -08:00
Gleb Smirnoff
89128ff3e4 protocols: init with standard SYSINIT(9) or VNET_SYSINIT
The historical BSD network stack loop that rolls over domains and
over protocols has no advantages over more modern SYSINIT(9).
While doing the sweep, split global and per-VNET initializers.

Getting rid of pr_init allows to achieve several things:
o Get rid of ifdef's that protect against double foo_init() when
  both INET and INET6 are compiled in.
o Isolate initializers statically to the module they init.
o Makes code easier to understand and maintain.

Reviewed by:		melifaro
Differential revision:	https://reviews.freebsd.org/D33537
2022-01-03 10:15:21 -08:00
Jessica Clarke
a3e828c91d intrng: Use less confusing return value for intr_pic_add_handler
Currently intr_pic_add_handler either returns the PIC you gave it (which
is useless and risks causing confusion about whether it's creating
another PIC) or, on error, NULL. Instead, convert it to return an int
error code as one would expect.

Note that the only consumer of this API, arm64's gicv3_its, does not use
the return value, so no uses need updating to work with the revised API.

Reviewed by:	markj, mmel
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D33341
2022-01-03 17:08:44 +00:00
Stefan Eßer
ec3af9d0ca sys/kern/sched_4bsd.c: fix typo introduced in previous commit 2022-01-01 15:33:38 +01:00
Stefan Eßer
a19bd8e30e Restore variable aliasing in the context of cpu set operations
A simplification of set operations removed side-effects of the
previous code, which are restored by this commit.
2022-01-01 11:58:40 +01:00
Mark Johnston
6b95cf5bde callout: Wait for the softclock thread to switch before rescheduling
When a softclock thread prepares to go off-CPU, the following happens in
the context of the thread:

1. callout state is locked
2. thread state is set to IWAIT
3. thread lock is switched from the tdq lock to the callout lock
4. tdq lock is released
5. sched_switch() sets td_lock to &blocked_lock
6. sched_switch() releases old td_lock (callout lock)
7. sched_switch() removes td from its runqueue
8. cpu_switch() sets td_lock back to the callout lock

Suppose a timer interrupt fires while the softclock thread is switching
off, and callout_process() schedules the softclock thread.  Then there
is a window between steps 5 and 8 where callout_process() can call
sched_add() while td_lock is &blocked_lock, but this is not correct
since the thread is not logically locked.

callout_process() thus needs to spin waiting for the softclock thread to
finish switching off (i.e., after step 8 completes) before rescheduling
it, since callout_process() does not acquire the thread lock directly.

Reported by:	syzbot+fb44dbf6734ff492c337@syzkaller.appspotmail.com
Fixes:		74cf7cae4d ("softclock: Use dedicated ithreads for running callouts.")
Reviewed by:	mav, kib, jhb
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33709
2021-12-31 17:01:39 -05:00
Mark Johnston
f04a096049 exec: Simplify sv_copyout_strings implementations a bit
Simplify control flow around handling of the execpath length and signal
trampoline.  Cache the sysentvec pointer in a local variable.

No functional change intended.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33703
2021-12-31 12:50:15 -05:00
John Baldwin
74cf7cae4d softclock: Use dedicated ithreads for running callouts.
Rather than using the swi infrastructure, rewrite softclock() as a
thread loop (softclock_thread()) and use it as the main routine of the
softclock threads.  The threads use the CC_LOCK as the thread lock
when idle.

Reviewed by:	mav, imp, kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D33683
2021-12-30 14:55:08 -08:00
Stefan Eßer
e2650af157 Make CPU_SET macros compliant with other implementations
The introduction of <sched.h> improved compatibility with some 3rd
party software, but caused the configure scripts of some ports to
assume that they were run in a GLIBC compatible environment.

Parts of sched.h were made conditional on -D_WITH_CPU_SET_T being
added to ports, but there still were compatibility issues due to
invalid assumptions made in autoconfigure scripts.

The differences between the FreeBSD version of macros like CPU_AND,
CPU_OR, etc. and the GLIBC versions was in the number of arguments:
FreeBSD used a 2-address scheme (one source argument is also used as
the destination of the operation), while GLIBC uses a 3-adderess
scheme (2 source operands and a separately passed destination).

The GLIBC scheme provides a super-set of the functionality of the
FreeBSD macros, since it does not prevent passing the same variable
as source and destination arguments. In code that wanted to preserve
both source arguments, the FreeBSD macros required a temporary copy of
one of the source arguments.

This patch set allows to unconditionally provide functions and macros
expected by 3rd party software written for GLIBC based systems, but
breaks builds of externally maintained sources that use any of the
following macros: CPU_AND, CPU_ANDNOT, CPU_OR, CPU_XOR.

One contributed driver (contrib/ofed/libmlx5) has been patched to
support both the old and the new CPU_OR signatures. If this commit
is merged to -STABLE, the version test will have to be extended to
cover more ranges.

Ports that have added -D_WITH_CPU_SET_T to build on -CURRENT do
no longer require that option.

The FreeBSD version has been bumped to 1400046 to reflect this
incompatible change.

Reviewed by:	kib
MFC after:	2 weeks
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D33451
2021-12-30 12:20:32 +01:00
Colin Percival
33812d60b9 vfs_mountroot: Check for root dev before waiting
If GEOM is idle but the root device is not yet present when we enter
vfs_mountroot_wait_if_necessary, we call vfs_mountroot_wait to wait
for root holds (e.g. CAM or USB initialization).  Upon returning from
vfs_mountroot_wait, we wait 100 ms at a time until the root device
shows up.

Since the root device most likely appeared during vfs_mountroot_wait
-- waiting for subsystems which may be responsible for the root
device is the whole purpose of that function -- it makes sense to
check if the device is now present rather than printing a warning
and pausing for 100 ms before checking.

Reviewed by:	trasz
Fixes: a3ba3d09c2 Make root mount wait mechanism smarter
Sponsored by:	https://www.patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D33593
2021-12-29 12:41:09 -08:00
Colin Percival
19a172158c vfs_mountroot: Wait for GEOM idle post root holds
In the case of a root hold related to the initialization of a disk
device, a flurry of GEOM tasting is likely to take place as soon as
the device is initialized and the root hold is released.  If we
don't wait for GEOM idle it's easy for vfs_mountroot to "win" the
race and proceed before the root filesystem GEOM is ready.

Reviewed by:	imp
Sponsored by:	https://www.patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D33592
2021-12-29 12:41:08 -08:00
Colin Percival
e6db5eb9ec vfs_mountroot: Skip 'Root mount waiting' < 1 s
While the message is technically correct, it's not particularly
helpful in the case where we're only waiting a few ms; this case
occurs frequently on EC2 arm64 instances with CAM initialization
racing to release its root hold before vfs_mountroot reaches this
point.  Only print the message if we end up waiting for more than
one second.

Sponsored by:	https://www.patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D33591
2021-12-29 12:41:08 -08:00
Edward Tomasz Napierala
626d6992ca Move fork_rfppwait() check into ast()
This will always sleep at least once, so it's a slow path by definition.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D33387
2021-12-26 17:22:21 +00:00
Roger Pau Monné
60e749da3c mbuf_tags: use explicitly sized type for 'type' parameter
Functions manipulating mbuf tags are using an int type for passing the
'type' parameter, but the internal tag storage is using a 16bit
integer to store it. This leads to the following code:

t = m_tag_alloc(...,0xffffffff,...,...);
m_tag_prepend(m, t);
r = m_tag_locate(m ,...,0xffffffff, NULL);

Returning r == NULL because m_tag_locate doesn't truncate the type
parameter when doing the match. This is unexpected because the type of
the 'type' parameter is int, and the caller doesn't need to know about
the internal truncations.

Fix this by making the 'type' parameter of type uint16_t in order to
match the size of its internal storage and make it obvious to the
caller the actual size of the parameter.

While there also use uint uniformly replacing the existing u_int
instances.

Reviewed by: kp, donner, glebius
Differential revision: https://reviews.freebsd.org/D33680
2021-12-29 09:23:52 +01:00
John Baldwin
254e4e5b77 Simplify swi for bus_dma.
When a DMA request using bounce pages completes, a swi is triggered to
schedule pending DMA requests using the just-freed bounce pages.  For
a long time this bus_dma swi has been tied to a "virtual memory" swi
(swi_vm).  However, all of the swi_vm implementations are the same and
consist of checking a flag (busdma_swi_pending) which is always true
and if set calling busdma_swi.  I suspect this dates back to the
pre-SMPng days and that the intention was for swi_vm to serve as a
mux.  However, in the current scheme there's no need for the mux.

Instead, remove swi_vm and vm_ih.  Each bus_dma implementation that
uses bounce pages is responsible for creating its own swi (busdma_ih)
which it now schedules directly.  This swi invokes busdma_swi directly
removing the need for busdma_swi_pending.

One consequence is that the swi now works on RISC-V which had previously
failed to invoke busdma_swi from swi_vm.

Reviewed by:	imp, kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D33447
2021-12-28 13:51:25 -08:00
John Baldwin
2cee586189 sys/kern: Use C99 fixed-width integer types.
No functional change.

Reviewed by:	imp, kib
Differential Revision:	https://reviews.freebsd.org/D33630
2021-12-28 09:41:08 -08:00
Konstantin Belousov
23ba59fbfb itimers: strip unused bits from struct itimer and struct itimers
Reviewed by:	imp, markj, mav
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33670
2021-12-28 03:02:53 +02:00
Konstantin Belousov
3f15708478 itimers_alloc: no need to initialize its_timers array
struct itimers is allocated with M_ZERO, setting all members to NULL
is tautological.

Reviewed by:	imp, markj, mav
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33670
2021-12-28 03:02:53 +02:00
Alexander Motin
c6c52d8e39 kern: Remove CTLFLAG_NEEDGIANT from some more sysctls.
MFC after:	2 weeks
2021-12-26 23:07:33 -05:00
Gleb Smirnoff
eb8dcdeac2 jail: network epoch protection for IP address lists
Now struct prison has two pointers (IPv4 and IPv6) of struct
prison_ip type.  Each points into epoch context, address count
and variable size array of addresses.  These structures are
freed with network epoch deferred free and are not edited in
place, instead a new structure is allocated and set.

While here, the change also generalizes a lot (but not enough)
of IPv4 and IPv6 processing. E.g. address family agnostic helpers
for kern_jail_set() are provided, that reduce v4-v6 copy-paste.

The fast-path prison_check_ip[46]_locked() is also generalized
into prison_ip_check() that can be executed with network epoch
protection only.

Reviewed by:		jamie
Differential revision:	https://reviews.freebsd.org/D33339
2021-12-26 10:45:50 -08:00
Alexander Motin
fe27f1db5f kern: Remove CTLFLAG_NEEDGIANT from some sysctls.
MFC after:	2 weeks
2021-12-26 12:03:33 -05:00
Jessica Clarke
d2ef377430 Fix buffer overread in preloaded hostuuid parsing
Commit b6be9566d2 stopped prison0_init writing outside of the
preloaded hostuuid's bounds. However, the preloaded data will not
(normally) have a NUL in it, and so validate_uuid will walk off the end
of the buffer in its call to sscanf. Previously if there was any
whitespace in the string we'd at least know there's a NUL one past the
end due to the off-by-one error, but now no such byte is guaranteed.

Fix this by copying to a temporary buffer and explicitly adding a NUL.

Whilst here, change the strlcpy call to use a far less suspicious
argument for dstsize; in practice it's fine, but it's an unusual pattern
and not necessary.

Found by:	CHERI
Reviewed by:	emaste, kevans, jhb
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D33616
2021-12-22 16:47:23 +00:00