When doing an initial mount(8) with its -f (force) flag, the MNT_FORCE
flag is not passed through to the underlying filesystem mount routine.
MNT_FORCE is only passed through on later updates to an existing
mount. With this commit the MNT_FORCE flag is now passed through on the
initial mount.
Sanity check: kib
Sponsored by: Netflix
This is how most SYSINITs are defined. Also annotate the dummy
parameter with __unused. No functional change intended.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Rename to match the naming of syscalls and allow 32 to be appended
without making an ugly name like kevent_freebsd1132.
While here, make the kevent changelist argument const.
Reviewed by: kib
This will be used to break a deadlock in ZFS between the per-mountpoint
teardown lock and page busy locks. In particular, when purging data
from the page cache during dataset rollback, we want to avoid blocking
on the busy state of invalid pages since the busying thread may be
blocked on the teardown lock in zfs_getpages().
Add a helper, vn_pages_remove_valid(), for use by filesystems. Bump
__FreeBSD_version so that the OpenZFS port can make use of the new
helper.
PR: 258208
Reviewed by: avg, kib, sef
Tested by: pho (part of a larger patch)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32931
- Return an errno value upon failure, instead of 1.
- Provide a bus_translate_resource() wrapper.
- Implement the generic version, which traverses the hierarchy until a
bus driver with a non-trivial implementation is found, in subr_bus.c
like other similar default implementations.
- Make ofw_pcib_translate_resource() return an error if a matching PCI
address range is not found.
- Make generic_pcie_translate_resource_common() return an int instead of
a bool. Fix up callers.
No functional change intended.
Reviewed by: imp, jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32855
We do not require devvp vnode locked for metadata io. It is typically
not needed indeed, since correctness of the file system using
corresponding block device ensures that there is no incorrect or racy
manipulations.
But right now DEBUG_VFS_LOCKS option excludes both character device
vnodes and completely destroyed (VBAD) vnodes from asserts. This is not
too bad since WITNESS still ensures that we do not leak locks. On the
other hand, asserts do not mean what they should, to the reader, and
reliance on them being enforced might result in wrong code.
Note that ASSERT_VOP_LOCKED() still silently accepts NULLVP, I think it
is worth fixing as well, in the next round.
In collaboration with: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D32761
For devfs vnodes, it is fine to not lock vnodes for VOP_FSYNC().
Otherwise vnode must be locked exclusively, except for MNT_SHARED_WRITES()
where the shared lock is enough.
Reported and tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32761
We were not including the requested starting offset in the page offset.
Reviewed by: jhb
Fixes: 3c7a01d773ac ("Extend m_apply() to support unmapped mbufs.")
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32922
for compatibility with Linux.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32901
Previously, sorele() always required the socket lock and dropped the
lock if the released reference was not the last reference. Many
callers locked the socket lock just before calling sorele() resulting
in a wasted lock/unlock when not dropping the last reference.
Move the previous implementation of sorele() into a new
sorele_locked() function and use it instead of sorele() for various
places in uipc_socket.c that called sorele() while already holding the
socket lock.
The sorele() macro now uses refcount_release_if_not_last() try to drop
the socket reference without locking the socket. If that shortcut
fails, it locks the socket and calls sorele_locked().
Reviewed by: kib, markj
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D32741
Some syscalls checked for invalid AT_* flags in sys_* and others in
kern_*.
Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D32864
When the NFSv4.2 server does a VOP_ALLOCATE(), it needs
the operation to be done for the RPC's credential and not
td_ucred. It also needs the writing to be done synchronously.
This patch adds "ioflag" and "cred" arguments to VOP_ALLOCATE()
and modifies vop_stdallocate() to use these arguments.
The VOP_ALLOCATE.9 man page will be patched separately.
Reviewed by: khng, kib
Differential Revision: https://reviews.freebsd.org/D32865
sched_throw() can no longer take a NULL thread, APs enter through
sched_ap_entry() instead. This completely removes branching in the
common case and cleans up both paths. No functional change intended.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D32829
This define will later on be used by coming TLS RX hardware offload patches.
No functional change intended.
Reviewed by: jhb@
MFC after: 1 week
Sponsored by: NVIDIA Networking
Normally setting kern.ipc.maxsockets returns EINVAL if the new value
is not greater than the previous value. This can cause spurious
error messages when sysctl.conf is processed multiple times, or when
automation systems try to ensure the sysctl is set to the correct
value. If the value is unchanged, then just do nothing.
PR: 243532
Reviewed by: markj
MFC after: 3 days
Sponsored by: Modirum MDPay
Sponsored by: Klara Inc.
Differential Revision: https://reviews.freebsd.org/D32775
Change the 'period' argument to 'duration' and change its type to
sbintime_t so we can more easily express different durations.
Reviewed by: tsoome, glebius
Differential Revision: https://reviews.freebsd.org/D32619
schedinit_ap() sets up an AP for a later call to sched_throw(NULL).
Currently, ULE sets up some pcpu bits and fixes the idlethread lock with
a call to sched_throw(NULL); this results in a window where curthread is
setup in platforms' init_secondary(), but it has the wrong td_lock.
Typical platform AP startup procedure looks something like:
- Setup curthread
- ... other stuff, including cpu_initclocks_ap()
- Signal smp_started
- sched_throw(NULL) to enter the scheduler
cpu_initclocks_ap() may have callouts to process (e.g., nvme) and
attempt to sched_add() for this AP, but this attempt fails because
of the noted violated assumption leading to locking heartburn in
sched_setpreempt().
Interrupts are still disabled until cpu_throw() so we're not really at
risk of being preempted -- just let the scheduler in on it a little
earlier as part of setting up curthread.
Reviewed by: alfredo, kib, markj
Triage help from: andrew, markj
Smoke-tested by: alfredo (ppc), kevans (arm64, x86), mhorne (arm)
Differential Revision: https://reviews.freebsd.org/D32797
When working on the ports these functions were slightly different, but
now there's no reason for them to be separate.
No functional change intended.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
for strange case where queried process does not have text.
Reported by: Michael Butler <imb@protected-networks.net>
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
vn_fullpath() call was not converted to pass newtextvp, instead it used
imgp->vp which is still NULL there. As result vn_fullpath() always
returned EINVAL and execpath was recorded from the value of arg0.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
The existing logic didn't take into account newly inserted mappings
wholly contained by an existing region (or vice versa), nor did it
account for weird overlap scenarios. The latter is probably unlikely
to happen, but the former may happen in UEFI: BootServicesData allocated
within a large chunk of ConventionalMemory. This situation blows up vm
initialization.
While we're here, remove the "exact match" logic as it's likely wrong;
if an exact match exists with conflicting flags, for instance, then we
should probably be doing something else. The new logic takes into
account exact matches as part of the overlapping efforts.
Reviewed by: kib, mhorne (both earlier version)
Differential Revision: https://reviews.freebsd.org/D32701
In 6e66030c4c0, additional ptracestop was added in order
to implement PTRACE_EVENT_EXEC. Make it only apply to cases
where the debugger is a Linux processes; native FreeBSD
debuggers can trace Linux processes too, but they don't
expect that additonal ptracestop.
Fixes: 6e66030c4c0
Reported By: kib
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D32726
This change is a slight performance optimization for systems with a slow
64-bit division.
The th->th_scale and th->th_large_delta values only depend on the
timecounter frequency and the th->th_adjustment. The timecounter
frequency of a timehand only changes when a new timecounter is activated
for the timehand. The th->th_adjustment is only changed by the NTP
second update. The NTP second update is not done for every call of
tc_windup().
Move the code block to recalculate the scaling factor and
the large delta of a timehand to the new helper function
recalculate_scaling_factor_and_large_delta().
Call recalculate_scaling_factor_and_large_delta() when a new timecounter
is activated and a NTP second update occurred.
MFC after: 1 week
This allows the pmap_remove(min, max) call to see empty pmap and exploit
empty pmap optimization.
Reviewed by: markj
Tested by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32569
While doing it, also move all the code to resolve pathnames and obtain
text vp and dvp, into single place. Besides simplifying the code, it
avoids spurious vnode relocks and validates the explanation why
a transient text reference on the script vnode is not harmful.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32611
For this, use vn_fullpath_hardlink() to resolve executable name for
execve(2).
This should provide the right hardlink name, used for execution, instead
of random hardlink pointing to this binary. Also this should make the
AT_EXECNAME reliable for execve(2), since kernel only needs to resolve
parent directory path, which should always succeed (except pathological
cases like unlinking a directory).
PR: 248184
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32611
Also re-align comments, and group booleans and char members together.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32611
An ordered series of BIO_READ and BIO_WRITE operations are
typically done as:
while (work to do) {
setup bp for I/O
g_io_request(bp, consumer);
biowait(bp);
}
Here you need to have biodone() called at the completion of
the I/O to set the BIO_DONE flag and awaken the biowait(). The
obvious way to do this would be to set bio_done = biodone, but
biodone() will only take the desired action if bio_done == NULL.
The relevant code at the end of biodone() is:
done = bp->bio_done;
if (done == NULL) {
mtxp = mtx_pool_find(mtxpool_sleep, bp);
mtx_lock(mtxp);
bp->bio_flags |= BIO_DONE;
wakeup(bp);
mtx_unlock(mtxp);
} else
done(bp);
This code would infinitely recurse if biodone() is specified as the
routine to use at completion. So before this change, a wrapper done
function had to be written:
static void
g_io_done(struct bio *bp)
{
bp->bio_done = NULL;
biodone(bp);
bp->bio_done = g_io_done;
}
This commit changes
if (done == NULL)
to
if (done == NULL || done == biodone)
which eliminates the need for the wrapper function.
Reviewed by: kib
Sponsored by: Netflix