When rolling back a dataset, ZFS has to purge file data resident in the
system page cache. To do this, it loops over all vnodes for the
mountpoint and calls vn_pages_remove() to purge pages associated with
the vnode's VM object. Each page is thus exclusively busied while the
dataset's teardown write lock is held.
When handling a page fault on a mapped ZFS file, FreeBSD's page fault
handler busies newly allocated pages and then uses VOP_GETPAGES to fill
them. The ZFS getpages VOP acquires the teardown read lock with vnode
pages already busied. This represents a lock order reversal which can
lead to deadlock.
To break the deadlock, observe that zfs_rezget() need only purge those
pages marked valid, and that pages busied by the page fault handler are,
by definition, invalid. Furthermore, ZFS pages always transition from
invalid to valid with the teardown lock held, and ZFS never creates
partially valid pages. Thus, zfs_rezget() can use the new
vn_pages_remove_valid() to skip over pages busied by the fault handler.
PR: 258208
Tested by: pho
Reviewed by: avg, sef, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32931
This reverts commit 9ef7df022a46 ("hyperv: Register hyperv_timecounter
later during boot") and adds a comment explaining why the timecounter
needs to be registered as early as it is.
PR: 259878
Fixes: 9ef7df022a46 ("hyperv: Register hyperv_timecounter later during boot")
Reviewed by: kib
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33014
Hyper-V wants to register its MSR-based timecounter during
SI_SUB_HYPERVISOR, before SI_SUB_LOCK, since an emulated 8254 may not be
available for DELAY(). So we cannot use MTX_SYSINIT to initialize the
timecounter lock.
PR: 259878
Reviewed by: kib
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33014
Add a boolean parameter to minidumpsys(), to indicate a live dump. When
requested, take a snapshot of important global state, and pass this to
the machine-dependent minidump function. For now this includes the
kernel message buffer, and the bitset of pages to be dumped. Beyond
this, we don't take much action to protect the integrity of the dump
from changes in the running system.
A new function msgbuf_duplicate() is added for snapshotting the message
buffer. msgbuf_copy() is insufficient for this purpose since it marks
any new characters it finds as read.
For now, nothing can actually trigger a live minidump. A future patch
will add the mechanism for this. For simplicity and safety, live dumps
are disallowed for mips.
Reviewed by: markj, jhb
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31993
When constructing the set of dumpable pages, use the bitset provided by
the state argument, rather than assuming vm_page_dump invariably. For
normal kernel minidumps this will be a pointer to vm_page_dump, but when
dumping the live system it will not.
To do this, the functions in vm_dumpset.h are extended to accept the
desired bitset as an argument. Note that this provided bitset is assumed
to be derived from vm_page_dump, and therefore has the same size.
Reviewed by: kib, markj, jhb
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31992
Don't assume we are dumping the global message buffer, but use the one
provided by the state argument. While here, drop superfluous
cast to char *.
Reviewed by: markj, jhb
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31991
During a live dump, we may race with updates to the kernel page tables.
This is generally okay; we accept that the state of the system while
dumping may be somewhat inconsistent with its state when the dump was
invoked. However, when walking the kernel page tables, it is important
that we load each PDE/PTE only once while operating on it. Otherwise, it
is possible to have the relevant PTE change underneath us. For example,
after checking the valid bit, but before reading the physical address.
Convert the loads to atomics, and add some validation around the
physical addresses, to ensure that we do not try to dump a non-existent
or non-canonical physical address.
Similarly, don't read kernel_vm_end more than once, on the off chance
that pmap_growkernel() is called between the two page table walks.
Reviewed by: kib, markj
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31990
It is useful for quickly checking an address against the DMAP region.
These definitions exist already on arm64 and riscv.
Reviewed by: kib, markj
MFC after: 3 days
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D32962
The minidump code is written assuming that certain global state will not
change, and rightly so, since it executes from a kernel debugger
context. In order to support taking minidumps of a live system, we
should allow copies of relevant global state that is likely to change to
be passed as parameters to the minidumpsys() function.
This patch does the work of parameterizing this function, by adding a
struct minidumpstate argument. For now, this struct allows for copies of
the kernel message buffer, and the bitset that tracks which pages should
be dumped (vm_page_dump). Follow-up changes will actually make use of
these arguments.
Notably, dump_avail[] does not need a snapshot, since it is not expected
to change after system initialization.
The existing minidumpsys() definitions are renamed, and a thin MI
wrapper is added to kern_dump.c, which handles the construction of
the state struct. Thus, calling minidumpsys() remains as simple as
before.
Reviewed by: kib, markj, jhb
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D31989
Some of the functions in msun can be implemented using a compiler
builtin function to generate a small number of instructions. Implement
this support in fma, fmax, fmin, and sqrt on arm64.
Care must be taken as the builtin can be implemented as a function
call on some architectures that lack direct support. In these cases
we need to use the original code path.
As we don't set errno on failure build with -fno-math-errno so the
toolchain doesn't convert a builtin into a function call when it
detects a failure, e.g. gcc will add a call to sqrt when the input
is negative leading to an infinite loop.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32801
These are the updated version of the older Cortex Strings Library we
previously used. The Arm Optimized Routines also support CPU features
that are currently in development on FreeBSD, e.g. Branch Target
Identification (BTI). Rather than add BTI support to the old code it's
easier to just use the maintained version.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32774
iflib_stop modifies iflib data structures that are used by _task_fn_rx,
most prominently the free lists. So, iflib_stop has to ensure that the
rx task threads are not active.
This should help to fix a crash seen when iflib_if_ioctl (e.g.,
SIOCSIFCAP) is called while there is already traffic flowing.
The crash has been seen on VMWare guests with vmxnet3 driver.
My guess is that on physical hardware the couple of 1ms delays that
iflib_stop has after disabling interrupts are enough for the queued work
to be completed before any iflib state is touched.
But on busy hypervisors the guests might not get enough CPU time to
complete the work, thus there can be a race between the taskqueue
threads and the work done to handle an ioctl, specifically in iflib_stop
and iflib_init_locked.
PR: 259458
Reviewed by: markj
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D32926
We don't need to bootstrap lex or md4 anymore.
Cat doesn't need to be bootstrapped (but is needed for buildkernel)
cruncgen doesn't need to be bootstrapped at all.
kbdcontrol isn't needed
Sponsored by: Netflix
Pull Request: https://github.com/freebsd/freebsd-src/pull/554
Until this change there were two places where we would free tcpcb -
tcp_discardcb() in case if all timers are drained and tcp_timer_discard()
otherwise. They were pretty much copy-n-paste, except that in the
default case we would run tcp_hc_update(). Merge this into single
function tcp_freecb() and move new short version of tcp_timer_discard()
to tcp_timer.c and make it static.
Reviewed by: rrs, hselasky
Differential revision: https://reviews.freebsd.org/D32965
In case we failed to uma_zalloc() and also failed to reuse with
tcp_tw_2msl_scan(), then just use on stack tcptw. This will allow
to run through tcp_twrespond() and standard tcpcb discard routine.
Reviewed by: rrs
Differential revision: https://reviews.freebsd.org/D32965
Rather than having code to re-define bcd2bin() for the LinuxKPI
make sure libkern.h is always included before the LinuxKPI version.
Then only re-define our local LinuxKPI implementation. [1]
From the argument truncating wrapper call the libkern version.
If we change our libkern implementation in the future we can save
us the remainder of the hassle. [2] Given I need this to MFC,
which I am not sure we can with libkern, commit this intermediate
step.
Suggested by: Johannes Berg (johannes sipsolutions.net) [1]
Suggested by: ian [2]
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
X-MFC with: 548ada00e54a9e7745d041b1ec7f68f3bd493365
Differential Revision: https://reviews.freebsd.org/D32695
yocalebo_gmail.com submitted a patch for mountd.c that
fixes the case where a username starts with a digit.
Without this patch, the username that starts with a
digit is misinterpreted as a numeric uid.
With this patch, any string that does not entirely
convert to a decimal number via strtoul() is considered
a user/group name.
Submitted by: yocalebo_gmail.com
Reviewed by: rmacklem
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D32993