Since TD_IS_RUNNING() and TS_ON_RUNQ() are defined as logical
expressions involving '==', clang 14 warns about them being checked with
a bitwise operator instead of a logical one:
```
sys/kern/tty_info.c:124:9: error: use of bitwise '|' with boolean operands [-Werror,-Wbitwise-instead-of-logical]
runa = TD_IS_RUNNING(td) | TD_ON_RUNQ(td);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
||
sys/sys/proc.h:562:27: note: expanded from macro 'TD_IS_RUNNING'
^
sys/kern/tty_info.c:124:9: note: cast one or both operands to int to silence this warning
sys/sys/proc.h:562:27: note: expanded from macro 'TD_IS_RUNNING'
^
sys/kern/tty_info.c:129:9: error: use of bitwise '|' with boolean operands [-Werror,-Wbitwise-instead-of-logical]
runb = TD_IS_RUNNING(td2) | TD_ON_RUNQ(td2);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
||
sys/sys/proc.h:562:27: note: expanded from macro 'TD_IS_RUNNING'
^
sys/kern/tty_info.c:129:9: note: cast one or both operands to int to silence this warning
sys/sys/proc.h:562:27: note: expanded from macro 'TD_IS_RUNNING'
^
```
Fix this by using logical operators instead. No functional change
intended.
Reviewed by: cem, emaste, kevans, markj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D34186
There was nothing preventing one from sending an empty fragment on an
arbitrary KTLS TX-enabled socket, but ktls_frame() asserts that this
could not happen. Though the transmit path handles this case for TLS
1.0 with AES-CBC, we should be strict and allow empty fragments only in
modes where it is explicitly allowed.
Modify sosend_generic() to reject writes to a KTLS-enabled socket if the
number of data bytes is zero, so that userspace cannot trigger the
aforementioned assertion.
Add regression tests to exercise this case.
Reported by: syzkaller
Reviewed by: gallatin, jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34195
Most fget*() functions initialize the output parameter to NULL. Make
the externally visible interface behave consistently, and make
fget_unlocked_seq() private to kern_descrip.c.
This fixes at least one bug in a consumer, _filemon_wrapper_openat(),
which assumes that getvnode() sets the output file pointer to NULL upon
an error.
Reported by: syzbot+01c0459408f896a5933a@syzkaller.appspotmail.com
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34190
The ntp_init() function did set a couple of global objects to zero. These
objects are in the .bss section and already initialized to zero during kernel
or module loading.
Since 59f256ec35d3 dmesg(8) will always skip first line of the message
buffer, cause it might be incomplete. The problem is that in most cases
it is complete, valid and contains the "---<<BOOT>>---" marker. This
skip can be disabled with '-a', but that would also unhide all non-kernel
messages. Move this functionality from dmesg(8) to kernel, since kernel
actually knows if wrap has happened or not.
The main motivation for the change is not actually the value of the
"---<<BOOT>>---" marker. The problem breaks unit tests, that clear
message buffer, perform a test and then check the message buffer for
a result. Example of such test is sys/kern/sonewconn_overflow.
74cf7cae4d22 ("softclock: Use dedicated ithreads for running callouts.")
switched callouts away from the swi infrastructure. It turns out that
this was a major source of entropy in early boot, which we've now lost.
As a result, first boot on hardware without a 'fast' entropy source
would block waiting for fortuna to be seeded with little hope of
progressing without manual intervention.
Let's resolve it by explicitly harvesting entropy in callout_process()
if we've handled any callouts. cc/curthread/now seem to be reasonable
sources of entropy, so use those.
Discussed with: jhb (also proposed initial patch)
Reported by: many
Reviewed by: cem, markm (both csprng)
Differential Revision: https://reviews.freebsd.org/D34150
It prevents WITNESS from recording the lock order for the buffer lock
acquired by getblkx().
Reviewed by: mckusick
Discussed with: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D34073
At the moment this is mostly a no-op but in the future there will be
in-flight encrypted data which requires software decryption. This
same setup is also needed for NIC TLS RX.
Note that this does break TOE TLS RX for AES-CBC ciphers since there
is no software fallback for AES-CBC receive. This will be resolved
one way or another before 14.0 is released.
Reviewed by: hselasky
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D34082
Also remove once-used functions to clean up after failed insmntque1(),
which were destructor callbacks in previous life.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D34071
The lock is unneccessary since the mount point is busied, which prevents
unmount and syncer vnode deallocation. Having the vnode locked causes
innocent LoRs and complicates debugging.
Also stop starting write accounting around it. Any caller of
VOP_FSYNC() must do it already, and sync_vnode() does.
Reported and tested by: pho
Reviewed by: markj, mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D34072
The buffer must not be accessed by any other thread, it is freshly
allocated. As such, LK_NOWAIT should be nop but also it prevents
recording the order between the buffer lock and any other locks we might
own in the call to getnewbuf(). In particular, if we own FFS snap lock,
it should avoid triggering false positive warning.
Reviewed by: markj, mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D34072
Also remove a pointer to array variable, use array address directly.
Reviewed by: markj, mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D34072
When we remove an interface it is first removed from the interface list
V_ifnet (by if_unlink_ifnet()) and marked as IFF_DYING. We then wait for
any possible references to stop being used (i.e.
epoch_wait/epoch_drain_callbacks) before we tear it fully down.
However, the index in ifindex_table is not removed, so m_rcvif_restore()
can still find the (now dying) interface.
This results in panics, for example when dummynet restores the rcvif
pointer and passes a packet to ip6_input() we can panic because the
AF_INET6 domain has already been removed (so we end up dereferencing a
NULL pointer there).
Check that the interface is not dying before we restore it, which is
equivalent to checking its presence in V_ifnet, and thus ensures that
future accesses (while in NET_EPOCH) are safe.
Reviewed by: glebius
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D34076
I was somehow convinced that insmntque calls insmntque1 with a NULL
destructor. Unfortunately this worked well enough to not immediately
blow up in simple testing.
Keep not using the destructor in previously patched filesystems though
as it avoids unnecessary casts.
Noted by: kib
Reported by: pho
This adds the PT_GETREGSET and PT_SETREGSET ptrace types. These can be
used to access all the registers from a specified core dump note type.
The NT_PRSTATUS and NT_FPREGSET notes are initially supported. Other
machine-dependant types are expected to be added in the future.
The ptrace addr points to a struct iovec pointing at memory to hold the
registers along with its length. On success the length in the iovec is
updated to tell userspace the actual length the kernel wrote or, if the
base address is NULL, the length the kernel would have written.
Because the data field is an int the arguments are backwards when
compared to the Linux PTRACE_GETREGSET call.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19831
Supplement ifindex table with generation count and use it to
serialize & restore an ifnet pointer.
Reviewed by: kp
Differential revision: https://reviews.freebsd.org/D33266
Fun note: git show e6abef09187a
The manpage has contained the following verbiage on the matter for just
under 31 years:
"At least one argument must be present in the array"
Previous to this version, it had been prefaced with the weakening phrase
"By convention."
Carry through and document it the rest of the way. Allowing argc == 0
has been a source of security issues in the past, and it's hard to
imagine a valid use-case for allowing it. Toss back EINVAL if we ended
up not copying in any args for *execve().
The manpage change can be considered "Obtained from: OpenBSD"
Reviewed by: emaste, kib, markj (all previous version)
Differential Revision: https://reviews.freebsd.org/D34045
This function will be used by coming TLS hardware receive offload support.
Differential Revision: https://reviews.freebsd.org/D32356
Discussed with: jhb@
MFC after: 1 week
Sponsored by: NVIDIA Networking
This matches user_getsockname.
Reviewed by: brooks, kib
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33987
that disables any access to ptrace(2) for all processes.
Reviewed by: emaste
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33986
Privilege checks in both functions should allow the current process to
infer information about itself, as well as use the interfaces that are
proclaimed 'debugging', for instance, procctl(2).
Note that in p_cansee() case, explicit comparision of curproc and p
avoids a race where the process might change credentials and cause
thread to compare its cached stale credentials against updated process
creds, effectively disallowing the process to observe itself.
Reviewed by: emaste
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33986
Otherwise we end up copying one uninitialized byte into the socket
buffer.
Reported by: KMSAN
Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33953
tc_counter_mask is an unsigned int and in the TSC timecounter is equal
to UINT_MAX, so the addition tc->tc_counter_mask + 1 can overflow to 0,
resulting in a hang during boot.
Fixes: c2705ceaeb09 ("x86: Speed up clock calibration")
Reviewed by: cperciva
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33956
Before this change bufdaemon and bufspacedaemon threads used
kthread_shutdown() to stop activity on system shutdown. The problem is
that kthread_shutdown() has no idea about the wait channel and lock used
by specific thread to wake them up reliably. As result, up to 9 threads
could consume up to 9 seconds to shutdown for no good reason.
This change introduces specific shutdown functions, knowing how to
properly wake up specific threads, reducing wait for those threads on
shutdown/reboot from average 4 seconds to effectively zero.
MFC after: 2 weeks
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D33936
This provides information about fixed regions of the target process'
user memory map.
Reviewed by: kib
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33708
The approach taken by the stack gap implementation was to insert a
random gap between the top of the fixed stack mapping and the true top
of the main process stack. This approach was chosen so as to avoid
randomizing the previously fixed address of certain process metadata
stored at the top of the stack, but had some shortcomings. In
particular, mlockall(2) calls would wire the gap, bloating the process'
memory usage, and RLIMIT_STACK included the size of the gap so small
(< several MB) limits could not be used.
There is little value in storing each process' ps_strings at a fixed
location, as only very old programs hard-code this address; consumers
were converted decades ago to use a sysctl-based interface for this
purpose. Thus, this change re-implements stack address randomization by
simply breaking the convention of storing ps_strings at a fixed
location, and randomizing the location of the entire stack mapping.
This implementation is simpler and avoids the problems mentioned above,
while being unlikely to break compatibility anywhere the default ASLR
settings are used.
The kern.elfN.aslr.stack_gap sysctl is renamed to kern.elfN.aslr.stack,
and is re-enabled by default.
PR: 260303
Reviewed by: kib
Discussed with: emaste, mw
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33704
ASLR stack randomization will reappear in a forthcoming commit. Rather
than inserting a random gap into the stack mapping, the entire stack
mapping itself will be randomized in the same way that other mappings
are when ASLR is enabled.
No functional change intended, as the stack gap implementation is
currently disabled by default.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33704
Rather than fetching the ps_strings address directly from a process'
sysentvec, use this macro. With stack address randomization the
ps_strings address is no longer fixed.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33704
This will not be required with a forthcoming reimplementation of ASLR
stack randomization. Moreover, this change was not sufficient to enable
the use of a stack size limit smaller than the stack gap itself.
PR: 260303
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33704
The size of the ps_strings structure varies between ABIs, so this is
useful for computing the address of the ps_strings structure relative to
the top of the stack when stack address randomization is enabled.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33704
The current ASLR stack gap feature will be removed, and with that the
need for the kern.stacktop sysctl is gone. All consumers have been
removed.
This reverts commit a97d697122da2bfb0baae5f0939d118d119dae33.
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33704
Before commit 3cec5c77d617 buf_daemon() went to longer 1s sleep if
numdirtybuffers <= lodirtybuffers. After that commit new condition
!BIT_EMPTY(BUF_DOMAINS, &bdlodirty) got opposite -- true when one
or more more domains is above lodirtybuffers. As result, on freshly
booted system with no dirty buffers buf_daemon() wakes up 10 times
per second and probably only 1 time per second when there is actual
work to do.
MFC after: 1 week
Reviewed by: kib, markj
Tested by: pho
Differential revision: https://reviews.freebsd.org/D33890
VM's have little control over CPU speed, don't make matters worse
by constantly spaming console.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D33902
Missed issues in truss on at least armv7 and powerpcspe need to be
resolved before recommit.
This reverts commit 3889fb8af0b611e3126dc250ebffb01805152104.
This reverts commit 1544e0f5d1f1e3b8c10a64cb899a936976ca7ea4.
This more clearly differentiates system call arguments from integer
registers and return values. On current architectures it has no effect,
but on architectures where pointers are not integers (CHERI) and may
not even share registers (CHERI-MIPS) it is necessiary to differentiate
between system call arguments (syscallarg_t) and integer register values
(register_t).
Obtained from: CheriBSD
Reviewed by: imp, kib
Differential Revision: https://reviews.freebsd.org/D33780
Prior to this commit, the TSC and local APIC frequencies were calibrated
at boot time by measuring the clocks before and after a one-second sleep.
This was simple and effective, but had the disadvantage of *requiring a
one-second sleep*.
Rather than making two clock measurements (before and after sleeping) we
now perform many measurements; and rather than simply subtracting the
starting count from the ending count, we calculate a best-fit regression
between the target clock and the reference clock (for which the current
best available timecounter is used). While we do this, we keep track
of an estimate of the uncertainty in the regression slope (aka. the ratio
of clock speeds), and stop measuring when we believe the uncertainty is
less than 1 PPM.
In order to avoid the risk of aliasing resulting from the data-gathering
loop synchronizing with (a multiple of) the frequency of the reference
clock, we add some additional spinning depending upon the iteration number.
For numerical stability and simplicity of implementation, we make use of
floating-point arithmetic for the statistical calculations.
On the author's Dell laptop, this reduces the time spent in calibration
from 2000 ms to 29 ms; on an EC2 c5.xlarge instance, it is reduced from
2000 ms to 2.5 ms.
Reviewed by: bde (previous version), kib
MFC after: 1 month
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D33802
a helper to remount filesystem from rw to ro.
Tested by: pho
Reviewed by: markj, mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33721
- Move busdma_lock_mutex to subr_bus_dma.c.
- Move _busdma_lock_dflt to subr_bus_dma.c. This function was named a
couple of different things previously. It is not a public API but
an internal helper used in place of a NULL pointer. The prototype
is in <sys/bus_dma.h> as not all backends include
<sys/bus_dma_internal.h>.
Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D33694
Move mostly duplicated code in various MD bus_dma backends to support
bounce pages into sys/kern/subr_busdma_bounce.c. This file is
currently #include'd into the backends rather than compiled standalone
since it requires access to internal members of opaque bus_dma
structures such as bus_dmamap_t and bus_dma_tag_t.
Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D33684
Now that each module handles its global and VNET initialization
itself, there is no VNET related stuff left to do in domain_init().
Differential revision: https://reviews.freebsd.org/D33541