18739 Commits

Author SHA1 Message Date
Mark Johnston
42188bb5c1 unix: Remove a write-only local variable
Reported by:	clang
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-11-16 13:30:22 -05:00
Kirk McKusick
f10a8d0971 Allow the MNT_FORCE flag to be passed through to an initial mount.
When doing an initial mount(8) with its -f (force) flag, the MNT_FORCE
flag is not passed through to the underlying filesystem mount routine.
MNT_FORCE is only passed through on later updates to an existing
mount. With this commit the MNT_FORCE flag is now passed through on the
initial mount.

Sanity check: kib
Sponsored by: Netflix
2021-11-15 15:45:56 -08:00
Mark Johnston
2287ced2f5 clock: Group the "clocks" SYSINIT with the function definition
This is how most SYSINITs are defined.  Also annotate the dummy
parameter with __unused.  No functional change intended.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-11-15 16:13:24 -05:00
John Baldwin
900a28fe33 ktls: Reject some invalid cipher suites.
- Reject AES-CBC cipher suites for TLS 1.0 and TLS 1.1 using auth
  algorithms other than SHA1-HMAC.

- Reject AES-GCM cipher suites for TLS versions older than 1.2.

Reviewed by:	markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D32842
2021-11-15 11:30:12 -08:00
Brooks Davis
3f8ced5bce syscalls: regen 2021-11-15 18:34:27 +00:00
Brooks Davis
8e4a3add99 struct kevent_freebsd11 -> struct freebsd11_kevent
Rename to match the naming of syscalls and allow 32 to be appended
without making an ugly name like kevent_freebsd1132.

While here, make the kevent changelist argument const.

Reviewed by:	kib
2021-11-15 18:34:27 +00:00
Brooks Davis
f0da2a1467 syscalls: unwrap a long line
Style dictates that each variable is on a single line

Reviewed by:	kib
2021-11-15 18:34:27 +00:00
Mark Johnston
d28af1abf0 vm: Add a mode to vm_object_page_remove() which skips invalid pages
This will be used to break a deadlock in ZFS between the per-mountpoint
teardown lock and page busy locks.  In particular, when purging data
from the page cache during dataset rollback, we want to avoid blocking
on the busy state of invalid pages since the busying thread may be
blocked on the teardown lock in zfs_getpages().

Add a helper, vn_pages_remove_valid(), for use by filesystems.  Bump
__FreeBSD_version so that the OpenZFS port can make use of the new
helper.

PR:		258208
Reviewed by:	avg, kib, sef
Tested by:	pho (part of a larger patch)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32931
2021-11-15 13:01:30 -05:00
Mark Johnston
1fb99e97e9 bus: Make BUS_TRANSLATE_RESOURCE behave more like other bus methods
- Return an errno value upon failure, instead of 1.
- Provide a bus_translate_resource() wrapper.
- Implement the generic version, which traverses the hierarchy until a
  bus driver with a non-trivial implementation is found, in subr_bus.c
  like other similar default implementations.
- Make ofw_pcib_translate_resource() return an error if a matching PCI
  address range is not found.
- Make generic_pcie_translate_resource_common() return an int instead of
  a bool.  Fix up callers.

No functional change intended.

Reviewed by:	imp, jhb
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32855
2021-11-15 13:01:30 -05:00
Konstantin Belousov
8660813153 start_init: use 'p'
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2021-11-15 02:33:01 +02:00
Mateusz Guzik
e9c7ec2287 aio: whack "set but not used" warnings 2021-11-14 16:59:53 +00:00
Mateusz Guzik
7e9680d3be cache: whack "set but not used" warnings 2021-11-14 16:57:43 +00:00
Konstantin Belousov
d032cda0d0 DEBUG_VFS_LOCKS: stop excluding devfs and doomed vnode from asserts
We do not require devvp vnode locked for metadata io.  It is typically
not needed indeed, since correctness of the file system using
corresponding block device ensures that there is no incorrect or racy
manipulations.

But right now DEBUG_VFS_LOCKS option excludes both character device
vnodes and completely destroyed (VBAD) vnodes from asserts.  This is not
too bad since WITNESS still ensures that we do not leak locks.  On the
other hand, asserts do not mean what they should, to the reader, and
reliance on them being enforced might result in wrong code.

Note that ASSERT_VOP_LOCKED() still silently accepts NULLVP, I think it
is worth fixing as well, in the next round.

In collaboration with:	pho
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D32761
2021-11-13 01:02:42 +02:00
Konstantin Belousov
47b248ac65 Make locking assertions for VOP_FSYNC() and VOP_FDATASYNC() more correct
For devfs vnodes, it is fine to not lock vnodes for VOP_FSYNC().
Otherwise vnode must be locked exclusively, except for MNT_SHARED_WRITES()
where the shared lock is enough.

Reported and tested by:	pho
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32761
2021-11-13 01:02:13 +02:00
Konstantin Belousov
d1d675cb30 freevnode(): lock the freeing vnode around destroy_vpollinfo()
to satisfy locking requirements of knlist manipulations.

Reported and tested by:	pho
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32761
2021-11-13 01:01:02 +02:00
Konstantin Belousov
a7b4a54d2c getblk(): do not require devvp vnodes to be locked
Reported and tested by:	pho
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32761
2021-11-13 01:00:24 +02:00
Mark Johnston
ac2b544417 mbuf: Fix an offset calculation in m_apply_extpg_one()
We were not including the requested starting offset in the page offset.

Reviewed by:	jhb
Fixes:		3c7a01d773ac ("Extend m_apply() to support unmapped mbufs.")
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32922
2021-11-10 16:57:12 -05:00
Konstantin Belousov
439c3d9563 Regen 2021-11-10 21:18:54 +02:00
Konstantin Belousov
77b2c2f814 Add sched_getcpu()
for compatibility with Linux.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32901
2021-11-10 21:18:54 +02:00
John Baldwin
e3ba94d4f3 Don't require the socket lock for sorele().
Previously, sorele() always required the socket lock and dropped the
lock if the released reference was not the last reference.  Many
callers locked the socket lock just before calling sorele() resulting
in a wasted lock/unlock when not dropping the last reference.

Move the previous implementation of sorele() into a new
sorele_locked() function and use it instead of sorele() for various
places in uipc_socket.c that called sorele() while already holding the
socket lock.

The sorele() macro now uses refcount_release_if_not_last() try to drop
the socket reference without locking the socket.  If that shortcut
fails, it locks the socket and calls sorele_locked().

Reviewed by:	kib, markj
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D32741
2021-11-09 10:50:12 -08:00
John Baldwin
57093f9366 vfs: Consistently validate AT_* flags in kern_* functions.
Some syscalls checked for invalid AT_* flags in sys_* and others in
kern_*.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	The University of Cambridge, Google Inc.
Differential Revision:	https://reviews.freebsd.org/D32864
2021-11-09 09:42:12 -08:00
Rick Macklem
f0c9847a6c vfs: Add "ioflag" and "cred" arguments to VOP_ALLOCATE
When the NFSv4.2 server does a VOP_ALLOCATE(), it needs
the operation to be done for the RPC's credential and not
td_ucred. It also needs the writing to be done synchronously.

This patch adds "ioflag" and "cred" arguments to VOP_ALLOCATE()
and modifies vop_stdallocate() to use these arguments.

The VOP_ALLOCATE.9 man page will be patched separately.

Reviewed by:	khng, kib
Differential Revision:	https://reviews.freebsd.org/D32865
2021-11-06 13:26:43 -07:00
Kyle Evans
6a8ea6d174 sched: split sched_ap_entry() out of sched_throw()
sched_throw() can no longer take a NULL thread, APs enter through
sched_ap_entry() instead.  This completely removes branching in the
common case and cleans up both paths.  No functional change intended.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D32829
2021-11-05 15:45:51 -05:00
Hans Petter Selasky
dd31400c3c Factor out flags preserved during mbuf demote into a separate define.
This define will later on be used by coming TLS RX hardware offload patches.

No functional change intended.

Reviewed by:	jhb@
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2021-11-04 18:53:49 +01:00
Allan Jude
c441592a0e Allow kern.ipc.maxsockets to be set to current value without error
Normally setting kern.ipc.maxsockets returns EINVAL if the new value
is not greater than the previous value. This can cause spurious
error messages when sysctl.conf is processed multiple times, or when
automation systems try to ensure the sysctl is set to the correct
value. If the value is unchanged, then just do nothing.

PR:	243532
Reviewed by:	markj
MFC after:	3 days
Sponsored by:	Modirum MDPay
Sponsored by:	Klara Inc.
Differential Revision:	https://reviews.freebsd.org/D32775
2021-11-04 12:56:09 +00:00
Konstantin Belousov
7ac82c96fe proc_get_binpath(): provide syntaxically correct value for unused NDINIT arg
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2021-11-04 02:55:33 +02:00
Warner Losh
072d5b98c4 sysbeep: Adjust interface to take a duration as a sbt
Change the 'period' argument to 'duration' and change its type to
sbintime_t so we can more easily express different durations.

Reviewed by:	tsoome, glebius
Differential Revision:	https://reviews.freebsd.org/D32619
2021-11-03 16:03:51 -06:00
Kyle Evans
589aed00e3 sched: separate out schedinit_ap()
schedinit_ap() sets up an AP for a later call to sched_throw(NULL).

Currently, ULE sets up some pcpu bits and fixes the idlethread lock with
a call to sched_throw(NULL); this results in a window where curthread is
setup in platforms' init_secondary(), but it has the wrong td_lock.
Typical platform AP startup procedure looks something like:

- Setup curthread
- ... other stuff, including cpu_initclocks_ap()
- Signal smp_started
- sched_throw(NULL) to enter the scheduler

cpu_initclocks_ap() may have callouts to process (e.g., nvme) and
attempt to sched_add() for this AP, but this attempt fails because
of the noted violated assumption leading to locking heartburn in
sched_setpreempt().

Interrupts are still disabled until cpu_throw() so we're not really at
risk of being preempted -- just let the scheduler in on it a little
earlier as part of setting up curthread.

Reviewed by:	alfredo, kib, markj
Triage help from:	andrew, markj
Smoke-tested by:	alfredo (ppc), kevans (arm64, x86), mhorne (arm)
Differential Revision:	https://reviews.freebsd.org/D32797
2021-11-03 15:54:59 -05:00
Mark Johnston
175d3380a3 amd64: Deduplicate routines for expanding KASAN/KMSAN shadow maps
When working on the ports these functions were slightly different, but
now there's no reason for them to be separate.

No functional change intended.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-11-03 12:36:02 -04:00
Konstantin Belousov
be10c0a910 fexecve(2): allow O_PATH file descriptors opened without O_EXEC
This improves compatibility with Linux.

Noted by:	Drew DeVault <sir@cmpwn.com>
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32821
2021-11-03 18:00:42 +02:00
Konstantin Belousov
02de91d740 proc_get_binpath(): return empty string instead of NULL
for strange case where queried process does not have text.

Reported by:	Michael Butler <imb@protected-networks.net>
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2021-11-03 17:30:10 +02:00
Konstantin Belousov
e4ce23b238 fexecve(2): restore the attempts to calculate the executable path
vn_fullpath() call was not converted to pass newtextvp, instead it used
imgp->vp which is still NULL there.  As result vn_fullpath() always
returned EINVAL and execpath was recorded from the value of arg0.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2021-11-03 15:10:22 +02:00
Kyle Evans
7771f2a0c9 kern: physmem: improve region coalescing logic
The existing logic didn't take into account newly inserted mappings
wholly contained by an existing region (or vice versa), nor did it
account for weird overlap scenarios.  The latter is probably unlikely
to happen, but the former may happen in UEFI: BootServicesData allocated
within a large chunk of ConventionalMemory.  This situation blows up vm
initialization.

While we're here, remove the "exact match" logic as it's likely wrong;
if an exact match exists with conflicting flags, for instance, then we
should probably be doing something else.  The new logic takes into
account exact matches as part of the overlapping efforts.

Reviewed by:	kib, mhorne (both earlier version)
Differential Revision:	https://reviews.freebsd.org/D32701
2021-11-03 02:32:46 -05:00
Konstantin Belousov
f34fc6ba06 Extract proc_get_binpath() from sysctl_kern_proc_pathname()
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32738
2021-10-31 03:05:14 +02:00
Edward Tomasz Napierala
8bbc0600cc linux: Add additional ptracestop only if the debugger is Linux
In 6e66030c4c0, additional ptracestop was added in order
to implement PTRACE_EVENT_EXEC.  Make it only apply to cases
where the debugger is a Linux processes; native FreeBSD
debuggers can trace Linux processes too, but they don't
expect that additonal ptracestop.

Fixes:		6e66030c4c0
Reported By:	kib
Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32726
2021-10-30 09:54:17 +01:00
Mark Johnston
26f76aea2d timecounter: Load the currently selected tc once in tc_windup()
Reported by:	Sebastian Huber <sebastian.huber@embedded-brains.de>
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32729
2021-10-29 14:30:15 -04:00
Sebastian Huber
ae750fbac7 kern_tc.c: Scaling/large delta recalculation
This change is a slight performance optimization for systems with a slow
64-bit division.

The th->th_scale and th->th_large_delta values only depend on the
timecounter frequency and the th->th_adjustment. The timecounter
frequency of a timehand only changes when a new timecounter is activated
for the timehand. The th->th_adjustment is only changed by the NTP
second update. The NTP second update is not done for every call of
tc_windup().

Move the code block to recalculate the scaling factor and
the large delta of a timehand to the new helper function
recalculate_scaling_factor_and_large_delta().

Call recalculate_scaling_factor_and_large_delta() when a new timecounter
is activated and a NTP second update occurred.

MFC after:	1 week
2021-10-29 00:31:14 +03:00
Konstantin Belousov
1c69690319 Unmap shared page manually before doing vm_map_remove() on exit or exec
This allows the pmap_remove(min, max) call to see empty pmap and exploit
empty pmap optimization.

Reviewed by:	markj
Tested by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32569
2021-10-28 22:01:59 +03:00
Konstantin Belousov
4d675b80f0 i386: fix struct proc layout asserts after 351d5f7fc5161ede
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-10-28 21:56:21 +03:00
Konstantin Belousov
ee92c8a842 sysctl kern.proc.procname: report right hardlink name
PR:	248184
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32611
2021-10-28 20:50:02 +03:00
Konstantin Belousov
351d5f7fc5 exec: store parent directory and hardlink name of the binary in struct proc
While doing it, also move all the code to resolve pathnames and obtain
text vp and dvp, into single place.   Besides simplifying the code, it
avoids spurious vnode relocks and validates the explanation why
a transient text reference on the script vnode is not harmful.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32611
2021-10-28 20:49:56 +03:00
Konstantin Belousov
0c10648fbb exec: provide right hardlink name in AT_EXECPATH
For this, use vn_fullpath_hardlink() to resolve executable name for
execve(2).

This should provide the right hardlink name, used for execution, instead
of random hardlink pointing to this binary.  Also this should make the
AT_EXECNAME reliable for execve(2), since kernel only needs to resolve
parent directory path, which should always succeed (except pathological
cases like unlinking a directory).

PR:	248184
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32611
2021-10-28 20:49:31 +03:00
Konstantin Belousov
9a0bee9f6a Make vn_fullpath_hardlink() externally callable
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32611
2021-10-28 20:49:26 +03:00
Konstantin Belousov
15bf81f354 struct image_params: use bool type for boolean members
Also re-align comments, and group booleans and char members together.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32611
2021-10-28 20:49:21 +03:00
Konstantin Belousov
9d58243fbc do_execve(): switch boolean locals to use bool type
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32611
2021-10-28 20:49:16 +03:00
Konstantin Belousov
143dba3a91 kern_exec.c: style
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32611
2021-10-28 20:49:10 +03:00
Mateusz Guzik
628c3b307f cache: only let non-dir descriptors through when doing EMPTYPATH lookups
Otherwise things like realpath against a file and '.' end up with an
illegal state of having a regular vnode for the parent.

Reported by:	syzbot+9aa5439dd9c708aeb1a8@syzkaller.appspotmail.com
2021-10-27 18:27:47 +00:00
Mark Johnston
71f31d784e rmslock: Update td_locks during lock and unlock operations
Reviewed by:	mjg
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32692
2021-10-27 11:18:13 -04:00
Gordon Bergling
70de1003da jail(8): Fix a few common typos in source code comments
- s/phyiscal/physical/

MFC after:	3 days
2021-10-27 06:16:06 +02:00
Kirk McKusick
dfd704b7fb Allow biodone() to be used as a completion routine.
An ordered series of BIO_READ and BIO_WRITE operations are
typically done as:

	while (work to do) {
		setup bp for I/O
		g_io_request(bp, consumer);
		biowait(bp);
	}

Here you need to have biodone() called at the completion of
the I/O to set the BIO_DONE flag and awaken the biowait(). The
obvious way to do this would be to set bio_done = biodone, but
biodone() will only take the desired action if bio_done == NULL.
The relevant code at the end of biodone() is:

	done = bp->bio_done;
	if (done == NULL) {
		mtxp = mtx_pool_find(mtxpool_sleep, bp);
		mtx_lock(mtxp);
		bp->bio_flags |= BIO_DONE;
		wakeup(bp);
		mtx_unlock(mtxp);
	} else
		done(bp);

This code would infinitely recurse if biodone() is specified as the
routine to use at completion. So before this change, a wrapper done
function had to be written:

static void
g_io_done(struct bio *bp)
{

	bp->bio_done = NULL;
	biodone(bp);
	bp->bio_done = g_io_done;
}

This commit changes

	if (done == NULL)

to

	if (done == NULL || done == biodone)

which eliminates the need for the wrapper function.

Reviewed by:  kib
Sponsored by: Netflix
2021-10-23 14:11:57 -07:00