Commit Graph

18664 Commits

Author SHA1 Message Date
Konstantin Belousov
69a456c0b6 elf_note_prpsinfo: handle more failures from proc_getargv()
Resulting sbuf_len() from proc_getargv() might return 0 if user mangled
ps_strings enough. Also, sbuf_len() API contract is to return -1 if the
buffer overflowed. The later should not occur because get_ps_strings()
checks for catenated length, but check for this subtle detail explicitly
as well to be more resilent.

The end result is that p_comm is used in this situations.

Approved by:	so
Security:	FreeBSD-SA-22:09.elf
Reported by:	Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Reviewed by:	delphij, markj
admbugs:	988
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35391

(cherry picked from commit 00d17cf342)
(cherry picked from commit 8a44a2c644)
2022-08-09 16:00:43 -04:00
Mark Johnston
c48048ebdb kevent: Fix an off-by-one in filt_timerexpire_l()
Suppose a periodic kevent timer fires close to its deadline, so that
now - kc->next is small.  Then delta ends up being 1, and the next timer
deadline is set to (delta + 1) * kc->to, where kc->to is the timer
period.  This means that the timer fires at half of the requested rate,
and the value returned in kn_data is similarly inaccurate.

Approved by:	so
Security:	FreeBSD-EN-22:16.kqueue
PR:		264131
Fixes:		7cb40543e9 ("filt_timerexpire: do not iterate over the interval")
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 524dadf7a8)
(cherry picked from commit 129112f80d)
2022-07-25 17:01:16 -04:00
Mark Johnston
9470a2f7da clockcalib: Fix an overflow bug
tc_counter_mask is an unsigned int and in the TSC timecounter is equal
to UINT_MAX, so the addition tc->tc_counter_mask + 1 can overflow to 0,
resulting in a hang during boot.

Approved by:	re (gjb)
Fixes:		c2705ceaeb ("x86: Speed up clock calibration")
Reviewed by:	cperciva
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit c3196306f0)
(cherry picked from commit 58f49b7da7)
2022-05-05 14:37:48 -04:00
Konstantin Belousov
821467b5a0 Mostly revert a5970a529c: Make files opened with O_PATH to not block non-forced unmount
Approved by:	re (gjb)

(cherry picked from commit bf13db086b)
(cherry picked from commit 6daddc54de)
2022-04-20 00:03:16 +03:00
Mark Johnston
5444e90da1 linker: Permit CTFv3 containers
Approved by:	re (gjb)
Reviewed by:	Domagoj Stolfa
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 8dbae4ce32)
(cherry picked from commit 8409eb0251)
2022-04-18 12:38:34 -04:00
Mark Johnston
30700bc80e linker: Simplify CTF container handling
Use sys/ctf.h to provide various definitions required to parse the CTF
header.  No functional change intended.

Approved by:	re (gjb)
Reviewed by:	Domagoj Stolfa, emaste
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit cab9382a2c)
(cherry picked from commit 24597a09b9)
2022-04-18 12:38:14 -04:00
Mateusz Guzik
1929c80c53 vfs: fixup WANTIOCTLCAPS on open
In some cases vn_open_cred overwrites cn_flags, effectively nullifying
initialisation done in NDINIT. This will have to be fixed.

In the meantime make sure the flag is passed.

Reported by:	jenkins
Noted by:	Mathieu <sigsys@gmail.com>
Approved by:	re (gjb)

(cherry picked from commit b7262756e2)
(cherry picked from commit 792ebbb155)
2022-04-06 23:26:14 +00:00
Mateusz Guzik
fffe016c81 vfs: fix memory leak on lookup with fds with ioctl caps
Reviewed by:	markj
PR:		262515
Noted by:	firk@cantconnect.ru
Differential Revision:	https://reviews.freebsd.org/D34667
Approved by:	re (gjb)

(cherry picked from commit 0c805718cb)
(cherry picked from commit 838d8e6fb6)
2022-04-06 23:25:50 +00:00
Mateusz Guzik
4e8fa94f33 vfs: add missing bits to vdropl_impl
This completes the patch which was originally meant to go in.

Spotted by:	mhorne
Fixes: c35ec1efdc ("vfs: [1/2] fix stalls in vnode reclaim by not
requeieing from vnlru")
Approved by:	re (implicit)

(cherry picked from commit 2533b5dc82)
2022-03-27 14:39:45 +00:00
Mateusz Guzik
115479c27d vfs: [2/2] fix stalls in vnode reclaim by only counting attempts
... and ignoring if they succeded, which matches historical behavior.

Reported by:	pho
Approved by:    re (gjb)

(cherry picked from commit 3a4c5dab92)
2022-03-25 14:04:00 +00:00
Mateusz Guzik
b2516d2f7d vfs: [1/2] fix stalls in vnode reclaim by not requeieing from vnlru
Reported by:	pho
Approved by:	re (gjb)

(cherry picked from commit c35ec1efdc)
2022-03-25 14:03:46 +00:00
Allan Jude
26714a5fa2 Allow kern.ipc.maxsockets to be set to current value without error
Normally setting kern.ipc.maxsockets returns EINVAL if the new value
is not greater than the previous value. This can cause spurious
error messages when sysctl.conf is processed multiple times, or when
automation systems try to ensure the sysctl is set to the correct
value. If the value is unchanged, then just do nothing.

Approved by:	re (gjb)
PR:	243532
Reviewed by:	markj
Sponsored by:	Modirum MDPay
Sponsored by:	Klara Inc.

(cherry picked from commit c441592a0e)
(cherry picked from commit 4f69c57599)
2022-03-24 13:24:07 -04:00
Konstantin Belousov
61f45ca2da buf_alloc(): Stop using LK_NOWAIT, use LK_NOWITNESS
Despite the buffer taken from cache or free list, it still can be
locked, due to 'lockless lookup' in getblkx() potentially operating on
the freed buffers.  The lock is transient, but prevents the use of
LK_NOWAIT there for the goal of neutralizing WITNESS.

Just use LK_NOWITNESS.

Reported and tested by:	pho
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 1fb00c8f10)
(cherry picked from commit dd54e44a27)
2022-03-14 11:47:12 -04:00
Mark Johnston
4a80132dcd rmlock: Temporarily revert commit c84bb8cd77
It appears to have introduced a regression on arm64, possibly due to the
fact that the pcpu pointer is reloaded outside of the critical section
in _rm_rlock().  Until this is resolved one way or another, let's
revert.

Reported by:	Ronald Klop <ronald-lists@klop.ws>
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit afb44cb010)
2022-03-07 10:45:24 -05:00
Mateusz Guzik
8891979494 fd: add close_range(..., CLOSE_RANGE_CLOEXEC)
For compatibility with Linux.

MFC after:	3 days
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D34424

(cherry picked from commit f3f3e3c44d)
2022-03-07 12:15:47 +00:00
Mateusz Guzik
d6132e2117 cache: improve vnode vs name assertion in cache_enter_time
(cherry picked from commit 1d65a9b47e)
2022-03-05 19:53:46 +00:00
Mateusz Guzik
807c4914a5 cache: remove NOCACHE handling from cache_fplookup_noentry
It was copy-pasted from locked lookup. As LOOKUP operation cannot have
the flag set it was always ending up setting MAKEENTRY.

(cherry picked from commit 611470a515)
2022-03-05 19:53:46 +00:00
Mateusz Guzik
df7ebff33c cache: whack "set but not used" warnings
(cherry picked from commit 7e9680d3be)
2022-03-05 19:53:20 +00:00
Mateusz Guzik
54c0eac7c1 cache: only let non-dir descriptors through when doing EMPTYPATH lookups
Otherwise things like realpath against a file and '.' end up with an
illegal state of having a regular vnode for the parent.

Reported by:	syzbot+9aa5439dd9c708aeb1a8@syzkaller.appspotmail.com

(cherry picked from commit 628c3b307f)
2022-03-05 19:52:57 +00:00
Mateusz Guzik
bac79d8e16 cache: only assert on flags when dealing with EMPTYPATH
Reported by:	syzbot+bd48ee0843206a09e6b8@syzkaller.appspotmail.com
Fixes:		7dd419cabc ("cache: add empty path support")

(cherry picked from commit 1045352f15)
2022-03-05 19:52:33 +00:00
Mateusz Guzik
3343c0afbf cache: add empty path support
This avoids spurious drop offs as EMPTY is passed regardless of the
actual path name.

Pushign the work inside the lookup instead of just ignorign the flag
allows avoid checking for empty pathname for all other lookups.

(cherry picked from commit 7dd419cabc)
2022-03-05 19:52:29 +00:00
Mateusz Guzik
f85d71e72e cache: retire cache_fast_revlookup sysctl
Sponsored by:	Rubicon Communications, LLC ("Netgate")

(cherry picked from commit b65ad70195)
2022-03-05 19:49:57 +00:00
Mark Johnston
543157870d rmlock: Micro-optimize read locking
Use get_pcpu() instead of an open-coded pcpu_find(td->td_oncpu).  This
eliminates some memory accesses and results in a shorter instruction
sequence.  Note that get_pcpu() didn't exist when rmlocks were added.

Reviewed by:	jah, mjg
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit c84bb8cd77)
2022-03-04 11:32:14 -05:00
Marvin Ma
769f1e79f5 vfs_unregister: fix error handling
Due to misplaced braces, an error from vfs_uninit() in the VFCF_SBDRY
case was ignored.

Reported by:	Anton Rang <rang@acm.org>
Reviewed by:	Anton Rang <rang@acm.org>, markj
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D34375

(cherry picked from commit 1517b8d5a7)
2022-03-03 08:20:08 -06:00
Jamie Gritton
803d7f4ccd posixshm: Allow jails to use kern.ipc.posix_shm_list
PR:		257554
Reported by:	grembo@

(cherry picked from commit d7c4ea7d72)
2022-03-02 15:08:00 -08:00
Eric van Gyzen
c15fc3249f Fix lockstat:::thread-spin dtrace probe with LOCK_PROFILING
The spinning start time is missing from the calculation due to a
misplaced #endif.  Return the #endif where it's supposed to be.

Submitted by:	Alexander Alexeev <aalexeev@isilon.com>
Reviewed by:	bdrewery, mjg
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D31384

(cherry picked from commit 428624130a)
2022-03-02 15:56:30 -06:00
Mark Johnston
dbba19453e sleepqueue: Annotate sleepq_max_depth as static
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 852ff943b9)
2022-02-21 09:57:46 -05:00
Edward Tomasz Napierala
d3f0d2c0ee linux: Add additional ptracestop only if the debugger is Linux
In 6e66030c4c, additional ptracestop was added in order
to implement PTRACE_EVENT_EXEC.  Make it only apply to cases
where the debugger is a Linux processes; native FreeBSD
debuggers can trace Linux processes too, but they don't
expect that additonal ptracestop.

Fixes:		6e66030c4c
Reported By:	kib
Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32726

(cherry picked from commit 8bbc0600cc)
2022-02-21 14:31:22 +00:00
Edward Tomasz Napierala
fc36cd43fd linux: implement PTRACE_EVENT_EXEC
This fixes strace(1) from Ubuntu Focal.

Reviewed By:	jhb
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32367

(cherry picked from commit 6e66030c4c)
2022-02-21 13:35:51 +00:00
Konstantin Belousov
4cae9d803a Remove PT_GET_SC_ARGS_ALL
Reimplement bdf0f24bb1 by checking for the caller' ABI in
the implementation of PT_GET_SC_ARGS, and copying out everything if
it is Linuxolator.

Also fix a minor information leak: if PT_GET_SC_ARGS_ALL is done on the
thread reused after other process, it allows to read some number of that
thread last syscall arguments. Clear td_sa.args in thread_alloc().

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D31968

(cherry picked from commit f575573ca5)
2022-02-21 13:34:16 +00:00
Edward Tomasz Napierala
8371bf67d6 linux: implement PTRACE_GET_SYSCALL_INFO
This is one of the pieces required to make modern (ie Focal)
strace(1) work.

Reviewed By:	jhb (earlier version)
Sponsored by:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D28212

(cherry picked from commit bdf0f24bb1)
2022-02-21 13:23:50 +00:00
Edward Tomasz Napierala
460b4b550d Implement unprivileged chroot
This builds on recently introduced NO_NEW_PRIVS flag to implement
unprivileged chroot, enabled by `security.bsd.unprivileged_chroot`.
It allows non-root processes to chroot(2), provided they have the
NO_NEW_PRIVS flag set.

The chroot(8) utility gets a new flag, -n, which sets NO_NEW_PRIVS
before chrooting.

Reviewed By:	kib
Sponsored By:	EPSRC
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D30130

(cherry picked from commit a40cf4175c)
2022-02-14 18:42:21 +00:00
Mark Johnston
2a454b54bf Fix the build after commit 5fa005e915
Fixes:	5fa005e915 ("exec: Reimplement stack address randomization")
2022-02-16 13:32:18 -05:00
John Baldwin
1a9f14cfa5 Use vmspace->vm_stacktop in place of sv_usrstack in more places.
Reviewed by:	markj
Obtained from:	CheriBSD

(cherry picked from commit becaf6433b)
2022-02-16 11:55:37 -05:00
Mark Johnston
5fa005e915 exec: Reimplement stack address randomization
The approach taken by the stack gap implementation was to insert a
random gap between the top of the fixed stack mapping and the true top
of the main process stack.  This approach was chosen so as to avoid
randomizing the previously fixed address of certain process metadata
stored at the top of the stack, but had some shortcomings.  In
particular, mlockall(2) calls would wire the gap, bloating the process'
memory usage, and RLIMIT_STACK included the size of the gap so small
(< several MB) limits could not be used.

There is little value in storing each process' ps_strings at a fixed
location, as only very old programs hard-code this address; consumers
were converted decades ago to use a sysctl-based interface for this
purpose.  Thus, this change re-implements stack address randomization by
simply breaking the convention of storing ps_strings at a fixed
location, and randomizing the location of the entire stack mapping.
This implementation is simpler and avoids the problems mentioned above,
while being unlikely to break compatibility anywhere the default ASLR
settings are used.

The kern.elfN.aslr.stack_gap sysctl is renamed to kern.elfN.aslr.stack,
and is re-enabled by default.

PR:		260303
Reviewed by:	kib
Discussed with:	emaste, mw
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 1811c1e957)
2022-02-16 11:55:03 -05:00
Mark Johnston
e3b852f99b ktls: Disallow transmitting empty frames outside of TLS 1.0/CBC mode
There was nothing preventing one from sending an empty fragment on an
arbitrary KTLS TX-enabled socket, but ktls_frame() asserts that this
could not happen.  Though the transmit path handles this case for TLS
1.0 with AES-CBC, we should be strict and allow empty fragments only in
modes where it is explicitly allowed.

Modify sosend_generic() to reject writes to a KTLS-enabled socket if the
number of data bytes is zero, so that userspace cannot trigger the
aforementioned assertion.

Add regression tests to exercise this case.

Reported by:	syzkaller
Reviewed by:	gallatin, jhb
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 5de79eeddb)
2022-02-16 11:52:31 -05:00
Mark Johnston
7ac2a6354f file: Make fget*() and getvnode*() consistent about initializing *fpp
Most fget*() functions initialize the output parameter to NULL.  Make
the externally visible interface behave consistently, and make
fget_unlocked_seq() private to kern_descrip.c.

This fixes at least one bug in a consumer, _filemon_wrapper_openat(),
which assumes that getvnode() sets the output file pointer to NULL upon
an error.

Reported by:	syzbot+01c0459408f896a5933a@syzkaller.appspotmail.com
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 300cfb96fc)
2022-02-16 11:52:31 -05:00
Justin Hibbits
2053dee56a Fix gzip compressed core dumps on big endian architectures
The gzip trailer words (size and CRC) are both little-endian per the spec.

MFC after:	3 days
Sponsored by:	Juniper Networks, Inc.

(cherry picked from commit 6db44b0158)
2022-02-14 13:30:52 -06:00
Dimitry Andric
ae76550171 tty_info: Avoid warning by using logical instead of bitwise operators
Since TD_IS_RUNNING() and TS_ON_RUNQ() are defined as logical
expressions involving '==', clang 14 warns about them being checked with
a bitwise operator instead of a logical one:

```
sys/kern/tty_info.c:124:9: error: use of bitwise '|' with boolean operands [-Werror,-Wbitwise-instead-of-logical]
        runa = TD_IS_RUNNING(td) | TD_ON_RUNQ(td);
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                 ||
sys/sys/proc.h:562:27: note: expanded from macro 'TD_IS_RUNNING'
                                ^
sys/kern/tty_info.c:124:9: note: cast one or both operands to int to silence this warning
sys/sys/proc.h:562:27: note: expanded from macro 'TD_IS_RUNNING'
                                ^
sys/kern/tty_info.c:129:9: error: use of bitwise '|' with boolean operands [-Werror,-Wbitwise-instead-of-logical]
        runb = TD_IS_RUNNING(td2) | TD_ON_RUNQ(td2);
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                  ||
sys/sys/proc.h:562:27: note: expanded from macro 'TD_IS_RUNNING'
                                ^
sys/kern/tty_info.c:129:9: note: cast one or both operands to int to silence this warning
sys/sys/proc.h:562:27: note: expanded from macro 'TD_IS_RUNNING'
                                ^
```

Fix this by using logical operators instead. No functional change
intended.

Reviewed by:	cem, emaste, kevans, markj
MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D34186

(cherry picked from commit 7d8a4eb943)
2022-02-11 17:43:03 +01:00
Colin Percival
baee6cc181 x86: Speed up clock calibration
Prior to this commit, the TSC and local APIC frequencies were calibrated
at boot time by measuring the clocks before and after a one-second sleep.
This was simple and effective, but had the disadvantage of *requiring a
one-second sleep*.

Rather than making two clock measurements (before and after sleeping) we
now perform many measurements; and rather than simply subtracting the
starting count from the ending count, we calculate a best-fit regression
between the target clock and the reference clock (for which the current
best available timecounter is used). While we do this, we keep track
of an estimate of the uncertainty in the regression slope (aka. the ratio
of clock speeds), and stop measuring when we believe the uncertainty is
less than 1 PPM.

In order to avoid the risk of aliasing resulting from the data-gathering
loop synchronizing with (a multiple of) the frequency of the reference
clock, we add some additional spinning depending upon the iteration number.

For numerical stability and simplicity of implementation, we make use of
floating-point arithmetic for the statistical calculations.

On the author's Dell laptop, this reduces the time spent in calibration
from 2000 ms to 29 ms; on an EC2 c5.xlarge instance, it is reduced from
2000 ms to 2.5 ms.

Reviewed by:	bde (previous version), kib
Sponsored by:	https://www.patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D33802

(cherry picked from commit c2705ceaeb)
2022-02-10 22:52:00 -08:00
Kyle Evans
00bc7bbde5 sched: separate out schedinit_ap()
schedinit_ap() sets up an AP for a later call to sched_throw(NULL).

Currently, ULE sets up some pcpu bits and fixes the idlethread lock with
a call to sched_throw(NULL); this results in a window where curthread is
setup in platforms' init_secondary(), but it has the wrong td_lock.
Typical platform AP startup procedure looks something like:

- Setup curthread
- ... other stuff, including cpu_initclocks_ap()
- Signal smp_started
- sched_throw(NULL) to enter the scheduler

cpu_initclocks_ap() may have callouts to process (e.g., nvme) and
attempt to sched_add() for this AP, but this attempt fails because
of the noted violated assumption leading to locking heartburn in
sched_setpreempt().

Interrupts are still disabled until cpu_throw() so we're not really at
risk of being preempted -- just let the scheduler in on it a little
earlier as part of setting up curthread.

(cherry picked from commit 589aed00e3)
2022-02-10 14:55:29 -06:00
Kyle Evans
7393eedb03 execve: disallow argc == 0
The manpage has contained the following verbiage on the matter for just
under 31 years:

"At least one argument must be present in the array"

Previous to this version, it had been prefaced with the weakening phrase
"By convention."

Carry through and document it the rest of the way.  Allowing argc == 0
has been a source of security issues in the past, and it's hard to
imagine a valid use-case for allowing it.  Toss back EINVAL if we ended
up not copying in any args for *execve().

The manpage change can be considered "Obtained from: OpenBSD"

(cherry picked from commit 773fa8cd13)
(cherry picked from commit c9afc7680f)
2022-02-10 14:21:59 -06:00
Hans Petter Selasky
22ba297076 mbuf(9): Assert receive mbufs don't carry a send tag.
Else we would start leaking reference counts.

Discussed with:	jhb@
Sponsored by:	NVIDIA Networking

(cherry picked from commit 17cbcf33c3)
2022-02-10 16:11:22 +01:00
Gordon Bergling
6a3607622e kern_racct: Fix a typo in a source code comment
- s/maxumum/maximum/

(cherry picked from commit a9bee9c77a)
2022-02-09 07:19:50 +01:00
Gordon Bergling
b9c307bc77 kern_fflock: Fix a typo in a source code comment
- s/foward/forward/

(cherry picked from commit 5a78ec9e7c)
2022-02-09 07:18:00 +01:00
Ed Maste
94e6d14488 Remove "All Rights Reserved" from FreeBSD Foundation sys/ copyrights
These ones were unambiguous cases where the Foundation was the only
listed copyright holder (in the associated license block).

Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 9feff969a0)
2022-02-08 15:00:55 -05:00
Konstantin Belousov
15def34bd8 Add GB_NOWITNESS flag
(cherry picked from commit c02780b78c)
2022-02-07 11:38:50 +02:00
Konstantin Belousov
7782d71671 syncer VOP_FSYNC(): unlock syncer vnode around call to VFS_SYNC()
(cherry picked from commit 3d68c4e175)
2022-02-07 11:38:50 +02:00
Konstantin Belousov
4116ae3ece buf_alloc(): lock the buffer with LK_NOWAIT
(cherry picked from commit 5875b94c74)
2022-02-07 11:38:49 +02:00
Konstantin Belousov
78d27f25c7 Use dedicated lock name for pbufs
(cherry picked from commit 531f8cfea0)
2022-02-07 11:38:49 +02:00