18850 Commits

Author SHA1 Message Date
Konstantin Belousov
ee92c8a842 sysctl kern.proc.procname: report right hardlink name
PR:	248184
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32611
2021-10-28 20:50:02 +03:00
Konstantin Belousov
351d5f7fc5 exec: store parent directory and hardlink name of the binary in struct proc
While doing it, also move all the code to resolve pathnames and obtain
text vp and dvp, into single place.   Besides simplifying the code, it
avoids spurious vnode relocks and validates the explanation why
a transient text reference on the script vnode is not harmful.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32611
2021-10-28 20:49:56 +03:00
Konstantin Belousov
0c10648fbb exec: provide right hardlink name in AT_EXECPATH
For this, use vn_fullpath_hardlink() to resolve executable name for
execve(2).

This should provide the right hardlink name, used for execution, instead
of random hardlink pointing to this binary.  Also this should make the
AT_EXECNAME reliable for execve(2), since kernel only needs to resolve
parent directory path, which should always succeed (except pathological
cases like unlinking a directory).

PR:	248184
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32611
2021-10-28 20:49:31 +03:00
Konstantin Belousov
9a0bee9f6a Make vn_fullpath_hardlink() externally callable
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32611
2021-10-28 20:49:26 +03:00
Konstantin Belousov
15bf81f354 struct image_params: use bool type for boolean members
Also re-align comments, and group booleans and char members together.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32611
2021-10-28 20:49:21 +03:00
Konstantin Belousov
9d58243fbc do_execve(): switch boolean locals to use bool type
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32611
2021-10-28 20:49:16 +03:00
Konstantin Belousov
143dba3a91 kern_exec.c: style
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32611
2021-10-28 20:49:10 +03:00
Mateusz Guzik
628c3b307f cache: only let non-dir descriptors through when doing EMPTYPATH lookups
Otherwise things like realpath against a file and '.' end up with an
illegal state of having a regular vnode for the parent.

Reported by:	syzbot+9aa5439dd9c708aeb1a8@syzkaller.appspotmail.com
2021-10-27 18:27:47 +00:00
Mark Johnston
71f31d784e rmslock: Update td_locks during lock and unlock operations
Reviewed by:	mjg
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32692
2021-10-27 11:18:13 -04:00
Gordon Bergling
70de1003da jail(8): Fix a few common typos in source code comments
- s/phyiscal/physical/

MFC after:	3 days
2021-10-27 06:16:06 +02:00
Kirk McKusick
dfd704b7fb Allow biodone() to be used as a completion routine.
An ordered series of BIO_READ and BIO_WRITE operations are
typically done as:

	while (work to do) {
		setup bp for I/O
		g_io_request(bp, consumer);
		biowait(bp);
	}

Here you need to have biodone() called at the completion of
the I/O to set the BIO_DONE flag and awaken the biowait(). The
obvious way to do this would be to set bio_done = biodone, but
biodone() will only take the desired action if bio_done == NULL.
The relevant code at the end of biodone() is:

	done = bp->bio_done;
	if (done == NULL) {
		mtxp = mtx_pool_find(mtxpool_sleep, bp);
		mtx_lock(mtxp);
		bp->bio_flags |= BIO_DONE;
		wakeup(bp);
		mtx_unlock(mtxp);
	} else
		done(bp);

This code would infinitely recurse if biodone() is specified as the
routine to use at completion. So before this change, a wrapper done
function had to be written:

static void
g_io_done(struct bio *bp)
{

	bp->bio_done = NULL;
	biodone(bp);
	bp->bio_done = g_io_done;
}

This commit changes

	if (done == NULL)

to

	if (done == NULL || done == biodone)

which eliminates the need for the wrapper function.

Reviewed by:  kib
Sponsored by: Netflix
2021-10-23 14:11:57 -07:00
Edward Tomasz Napierala
6e66030c4c linux: implement PTRACE_EVENT_EXEC
This fixes strace(1) from Ubuntu Focal.

Reviewed By:	jhb
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32367
2021-10-23 19:46:26 +01:00
Konstantin Belousov
3b5331dd8d uipc_shm: silent warnings about write-only variables in largepage code
In shm_largepage_phys_populate(), the result from vm_page_grab() is only
needed for assertion.

In shm_dotruncate_largepage(), there is a commented-out prototype code
for managed largepages.   The oldobjsz is saved for its sake, so mark
the variable as __unused directly.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-10-21 21:40:46 +03:00
Konstantin Belousov
3d2778515a sig_ast_checksusp(): mark the local p as __diagused
It is only used to assert that the (current) process is locked

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-10-21 21:40:46 +03:00
Konstantin Belousov
6776747a0e subr_firmware.c::unloadentry(): remote write-only variable
The function ignores result returned by linker_release_module().
The FW_UNLOAD flag on the file is cleared, so even on error it would
not be tried again.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-10-21 21:40:46 +03:00
Konstantin Belousov
993446638c alq_open_flags(): mark local td variable as unused
It is passed to the NDINIT() macro which ignores the thread argument
for some time.

Sponsored by:	The FreeBSD Foundation
2021-10-21 21:40:46 +03:00
Konstantin Belousov
bded8fa300 umtxq_requeue: remove write-only variable uh2
umtxq_queue_lookup() does not change state.  It is redone inside
umtxq_insert() later, anyway.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-10-21 21:40:46 +03:00
John Baldwin
96668a81ae ktls: Always create a software backend for receive sessions.
A future change to TOE TLS will require a software fallback for the
first few TLS records received.  Future support for NIC TLS on receive
will also require a software fallback for certain cases.

Reviewed by:	gallatin, hselasky
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D32566
2021-10-21 09:37:17 -07:00
John Baldwin
c57dbec69a ktls: Add a routine to query information in a receive socket buffer.
In particular, ktls_pending_rx_info() determines which TLS record is
at the end of the current receive socket buffer (including
not-yet-decrypted data) along with how much data in that TLS record is
not yet present in the socket buffer.

This is useful for future changes to support NIC TLS receive offload
and enhancements to TOE TLS receive offload.  Those use cases need a
way to synchronize a state machine on the NIC with the TLS record
boundaries in the TCP stream.

Reviewed by:	gallatin, hselasky
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D32564
2021-10-21 09:36:29 -07:00
Mark Johnston
84c3922243 Convert consumers to vm_page_alloc_noobj_contig()
Remove now-unneeded page zeroing.  No functional change intended.

Reviewed by:	alc, hselasky, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32006
2021-10-19 21:22:56 -04:00
Mark Johnston
a4667e09e6 Convert vm_page_alloc() callers to use vm_page_alloc_noobj().
Remove page zeroing code from consumers and stop specifying
VM_ALLOC_NOOBJ.  In a few places, also convert an allocation loop to
simply use VM_ALLOC_WAITOK.

Similarly, convert vm_page_alloc_domain() callers.

Note that callers are now responsible for assigning the pindex.

Reviewed by:	alc, hselasky, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31986
2021-10-19 21:22:56 -04:00
Konstantin Belousov
c7f38a2df1 procctl: stop using SA_*LOCKED, define local enum
Using SA_*LOCKED constants breaks !INVARIANT builds

Reported by:	cy
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-10-20 00:25:19 +03:00
Konstantin Belousov
49db81aa05 kern_procctl: skip zombies for process group operations
When iterating over the process group members, skip zombies same as it
is done by pfind() for single-process operation.

Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32513
2021-10-19 23:04:34 +03:00
Konstantin Belousov
3692877a6c kern_procctl.c: use td->td_proc instead of curproc
Suggested by:	markj
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32513
2021-10-19 23:04:34 +03:00
Konstantin Belousov
f5bb6e5a6d procctl: actually require debug privileges over target
for state control over TRACE, TRAPCAP, ASLR, PROTMAX, STACKGAP,
NO_NEWPRIVS, and WXMAP.

Reported by:	emaste
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32513
2021-10-19 23:04:34 +03:00
Konstantin Belousov
1c4dbee5dd procctl: make it possible to specify that some operations require debug privilege over the target
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32513
2021-10-19 23:04:34 +03:00
Konstantin Belousov
32026f5983 sys_procctl(): zero the data buffer once, on syscall entry
and remove zeroing of it from specific functions.  This way it is
guaranteed that we do not leak kernel data.

Suggested by:	markj
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32513
2021-10-19 23:04:34 +03:00
Konstantin Belousov
56d5323b4d sys_procctl(): use table data to do copyin/copyout
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32513
2021-10-19 23:04:34 +03:00
Konstantin Belousov
68dc5b381a kern_procctl_single(): convert to use table data
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32513
2021-10-19 23:04:34 +03:00
Konstantin Belousov
34f39a8c0e procctl: convert PDEATHSIG_CTL/STATUS to regular kern_procctl_single() cases
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32513
2021-10-19 23:04:34 +03:00
Konstantin Belousov
f833ab9dd1 procctl(2): add consistent shortcut P_ID:0 as curproc
Reported by:	bdrewery, emaste
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32513
2021-10-19 23:04:34 +03:00
Konstantin Belousov
7ae879b14a kern_procctl(): convert the function to be table-driven
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32513
2021-10-19 23:04:34 +03:00
Konstantin Belousov
31faa565ed sys_procctl(2): remove sysproto and argused
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32513
2021-10-19 23:04:33 +03:00
Mark Johnston
621fd9dcb2 timecounter: Lock the timecounter list
Timecounter registration is dynamic, i.e., there is no requirement that
timecounters must be registered during single-threaded boot.  Loadable
drivers may in principle register timecounters (which can be switched to
automatically).  Timecounters cannot be unregistered, though this could
be implemented.

Registered timecounters belong to a global linked list.  Add a mutex to
synchronize insertions and the traversals done by (mpsafe) sysctl
handlers.  No functional change intended.

Reviewed by:	imp, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32511
2021-10-18 09:56:59 -04:00
Mark Johnston
81f2e9063d signal: Add SIG_FOREACH and refactor issignal()
Add a SIG_FOREACH macro that can be used to iterate over a signal set.
This is a bit cleaner and more efficient than calling sig_ffs() in a
loop.  The implementation is based on BIT_FOREACH_ISSET(), except
that the bitset limbs are always 32 bits wide, and signal sets are
1-indexed rather than 0-indexed like bitset(9) sets.

issignal() cannot really be modified to use SIG_FOREACH() directly.
Take this opportunity to split the function into two explicit loops.
I've always found this function hard to read and think that this change
is an improvement.

Remove sig_ffs(), nothing uses it now.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32473
2021-10-18 09:56:58 -04:00
Colin Percival
52e125c2bd TSLOG: Report final execname, not first
In cases such as daemons launched via limits(1), a process may call
exec multiple times; the last name of the last binary executed is
usually (always?) more informative.

Fixes:	46dd801acb23 Add userland boot profiling to TSLOG
Sponsored by:	https://www.patreon.com/cperciva
2021-10-17 13:36:38 -07:00
Jessica Clarke
682c00a6ce riscv: Implement pmap_mapdev_attr
This is needed for LinuxKPI's _ioremap_attr. This reuses the generic
implementation introduced for aarch64, and itself requires implementing
pmap_kenter, which is trivial to do given riscv currently treats all
mapping attributes the same due to the Svpbmt extension not yet being
ratified and in hardware.

Reviewed by:	markj, mhorne
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D32445
2021-10-17 15:31:35 +01:00
Mateusz Guzik
1045352f15 cache: only assert on flags when dealing with EMPTYPATH
Reported by:	syzbot+bd48ee0843206a09e6b8@syzkaller.appspotmail.com
Fixes:		7dd419cabc6bb9e0 ("cache: add empty path support")
2021-10-17 08:42:47 +00:00
Mateusz Guzik
7dd419cabc cache: add empty path support
This avoids spurious drop offs as EMPTY is passed regardless of the
actual path name.

Pushign the work inside the lookup instead of just ignorign the flag
allows avoid checking for empty pathname for all other lookups.
2021-10-16 20:08:37 +00:00
Colin Percival
46dd801acb Add userland boot profiling to TSLOG
On kernels compiled with 'options TSLOG', record for each process ID:
* The timestamp of the fork() which creates it and the parent
process ID,
* The first path passed to execve(), if any,
* The first path resolved by namei, if any, and
* The timestamp of the exit() which terminates the process.

Expose this information via a new sysctl, debug.tslog_user.

On kernels lacking 'options TSLOG' (the default), no information is
recorded and the sysctl does not exist.

Note that recording namei is needed in order to obtain the names of
rc.d scripts being launched, as the rc system sources them in a
subshell rather than execing the scripts.

With this commit it is now possible to generate flamecharts of the
entire boot process from the start of the loader to the end of
/etc/rc.  The code needed to perform this processing is currently
found in github: https://github.com/cperciva/freebsd-boot-profiling

Reviewed by:	mhorne
Sponsored by:	https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D32493
2021-10-16 11:47:34 -07:00
Dawid Gorecki
a97d697122 kern_exec: Add kern.stacktop sysctl.
With stack gap enabled top of the stack is moved down by a random
amount of bytes. Because of that some multithreaded applications
which use kern.usrstack sysctl to calculate address of stacks for
their threads can fail. Add kern.stacktop sysctl, which can be used
to retrieve address of the stack after stack gap is applied to it.
Returns value identical to kern.usrstack for processes which have
no stack gap.

Reviewed by: kib
Obtained from: Semihalf
Sponsored by: Stormshield
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D31897
2021-10-15 10:21:55 +02:00
Dawid Gorecki
889b56c8cd setrlimit: Take stack gap into account.
Calling setrlimit with stack gap enabled and with low values of stack
resource limit often caused the program to abort immediately after
exiting the syscall. This happened due to the fact that the resource
limit was calculated assuming that the stack started at sv_usrstack,
while with stack gap enabled the stack is moved by a random number
of bytes.

Save information about stack size in struct vmspace and adjust the
rlim_cur value. If the rlim_cur and stack gap is bigger than rlim_max,
then the value is truncated to rlim_max.

PR: 253208
Reviewed by: kib
Obtained from: Semihalf
Sponsored by: Stormshield
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D31516
2021-10-15 10:21:47 +02:00
John Baldwin
a72ee35564 ktls: Defer creation of threads and zones until first use.
Run ktls_init() when the first KTLS session is created rather than
unconditionally during boot.  This avoids creating unused threads and
allocating unused resources on systems which do not use KTLS.

Reviewed by:	gallatin, markj
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D32487
2021-10-14 15:48:34 -07:00
Konstantin Belousov
1adebca1fc Style
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2021-10-14 23:07:32 +03:00
Brooks Davis
04c91ac48a selsocket: handle sopoll() errors correctly
Without this change, unmounting smbfs filesystems with an INVARIANTS
kernel would panic after 10e64782ed59727e8c9fe4a5c7e17f497903c8eb.

Found by:	markj
Reviewed by:	markj, jhb
Obtained from:	CheriBSD
MFC after:	3 days
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D32492
2021-10-14 00:43:48 +01:00
John Baldwin
9f03d2c001 ktls: Ensure FIFO encryption order for TLS 1.0.
TLS 1.0 records are encrypted as one continuous CBC chain where the
last block of the previous record is used as the IV for the next
record.  As a result, TLS 1.0 records cannot be encrypted out of order
but must be encrypted as a FIFO.

If the later pages of a sendfile(2) request complete before the first
pages, then TLS records can be encrypted out of order.  For TLS 1.1
and later this is fine, but this can break for TLS 1.0.

To cope, add a queue in each TLS session to hold TLS records that
contain valid unencrypted data but are waiting for an earlier TLS
record to be encrypted first.

- In ktls_enqueue(), check if a TLS record being queued is the next
  record expected for a TLS 1.0 session.  If not, it is placed in
  sorted order in the pending_records queue in the TLS session.

  If it is the next expected record, queue it for SW encryption like
  normal.  In addition, check if this new record (really a potential
  batch of records) was holding up any previously queued records in
  the pending_records queue.  Any of those records that are now in
  order are also placed on the queue for SW encryption.

- In ktls_destroy(), free any TLS records on the pending_records
  queue.  These mbufs are marked M_NOTREADY so were not freed when the
  socket buffer was purged in sbdestroy().  Instead, they must be
  freed explicitly.

Reviewed by:	gallatin, markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D32381
2021-10-13 12:30:15 -07:00
John Baldwin
a63752cce6 ktls: Reject attempts to enable AES-CBC with TLS 1.3.
AES-CBC cipher suites are not supported in TLS 1.3.

Reported by:	syzbot+ab501c50033ec01d53c6@syzkaller.appspotmail.com
Reviewed by:	tuexen, markj
Differential Revision:	https://reviews.freebsd.org/D32404
2021-10-13 12:12:58 -07:00
Mark Johnston
03d5820f73 mount: Check for !VDIR mount points before handling -o emptydir
To implement -o emptydir, vfs_emptydir() checks that the passed
directory is empty.  This should be done after checking whether the
vnode is of type VDIR, though, or vfs_emptydir() may end up calling
VOP_READDIR on a non-directory.

Reported by:	syzbot+4006732c69fb0f792b2c@syzkaller.appspotmail.com
Reviewed by:	kib, imp
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32475
2021-10-13 09:33:35 -04:00
John Baldwin
d1b6fef075 Stop creating socket aio kprocs during boot.
Create the initial pool of kprocs on demand when the first socket AIO
request is submitted instead.  The pool of kprocs used for other AIO
requests is similarly created on first use.

Reviewed by:	asomers
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D32468
2021-10-12 14:03:07 -07:00
Kyle Evans
7259ca3104 fifos: delegate unhandled kqueue filters to underlying filesystem
This gives the vfs layer a chance to provide handling for EVFILT_VNODE,
for instance.  Change pipe_specops to use the default vop_kqfilter to
accommodate fifoops that don't specify the method (i.e. all in-tree).

Based on a patch by Jan Kokemüller.

PR:		225934
Reviewed by:	kib, markj (both pre-KASSERT)
Differential Revision:	https://reviews.freebsd.org/D32271
2021-10-12 02:43:07 -05:00