As we do have zlib code in loader, we should also support gzip
compression in zfs.
PR: 153173
Submitted by: Mikhail Zakharov <zmey20000@yahoo.com>
Reviewed by: imp, markj, delphij
Differential Revision: https://reviews.freebsd.org/D35320
MFC after: 1 month
This is slightly more optimized than checking panicstr directly. For
most of these instances performance doesn't matter, but let's make
KERNEL_PANICKED() the common idiom.
Reviewed by: mjg
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D35373
Some makefs(8) patches make use of zfsimpl.h (not zfsimpl.c though) to
provide definitions for various on-disk structures. Most of this diff
simply adds new definitions that are useful.
Also reduce dependencies of the header:
- remove an unused list_node_t field to drop the sys/list.h dependency
- replace CTASSERT with _Static_assert
And fix the declaration of decode_embedded_bp_compressed().
No functional change intended.
Reviewed by: tsoome
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35278
Move declarations into a new nvlist.h rather than putting everything in
libzfs.h. This makes this nvlist code easier to reuse elsewhere. In
particular, the nvlist implementation in sys/contrib/libnv does not
provide XDR encoding, but this is needed when reading from or writing to
ZFS pools.
Also:
- Remove references to boolean_t. It has to be a 32-bit int here, so
just reference the underlying type.
- Add includes needed when compiling the nvlist code outside of stand/.
No functional change intended.
Reviewed by: tsoome
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35255
Notable upstream pull request merges:
#9078: log xattr=sa create/remove/update to ZIL
#11919: Cross-platform xattr user namespace compatibility
#13014: Report dnodes with faulty bonuslen
#13016: FreeBSD: Fix zvol_cdev_open locking
#13019: spl: Don't check FreeBSD rwlocks for double initialization
#13027: Fix clearing set-uid and set-gid bits on a file when
replying a write
#13031: Add enumerated vdev names to 'zpool iostat -v' and
'zpool list -v'
#13074: Enable encrypted raw sending to pools with greater ashift
#13076: Receive checks should allow unencrypted child datasets
#13098: Avoid dirtying the final TXGs when exporting a pool
#13172: Fix ENOSPC when unlinking multiple files from full pool
Obtained from: OpenZFS
OpenZFS commit: a86e089415
The general aim in this and subsequent patches is to minimize the
amount of code that directly references CTF types such as ctf_type_t,
ctf_array_t, etc. To that end, introduce some routines similar to the
existing fbt_get_ctt_size() (which exists to deal with differences
between v1 and v2) and change ctf_lookup_by_id() to return a void
pointer.
Support for v2 containers is preserved.
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34361
Use it instead of the existing ctf.h from OpenSolaris. This makes it
easier to use CTF in the core kernel, and to extend the CTF format to
support wider type IDs.
The imported ctf.h is modified to depend only on _types.h, and also to
provide macros which use the "parent" bit of a type ID to refer to types
in a parent CTF container.
No functional change intended.
Reviewed by: Domagoj Stolfa, emaste
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34358
fasttrap instruments certain instructions by overwriting them and
copying the original instruction to some per-thread scratch space which
is executed after the probe fires. This trampoline jumps back to the
tracepoint after executing the original instruction.
The created mapping has both write and execute permissions, and so this
mechanism doesn't work when allow_wx is disabled. Work around the
restriction by using proc_rwmem() to write to the trampoline.
Reviewed by: vangyzen
Tested by: Amit <akamit91@hotmail.com>
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34304
This gets rid of the error prone naming where fget_unlocked returns with
a ref held, while fget_locked requires a lock but provides nothing in
terms of making sure the file lives past unlock.
No functional changes.
The Branch Target Identification (BTI) Armv8-A extension adds new
instructions that can be placed where we may indirrectly branch to,
e.g. at the start of a function called via a function pointer. We can't
emulate these in DTrace as the kernel will have raised a different
exception before the DTrace handler has run.
Skip over the BTI instruction if it's used as the first instruction in
a function.
Sponsored by: The FreeBSD Foundation
We had a hardcoded limit of 1/128-th of physical memory that was further
subdivided between all CPUs as principal buffers are allocated on the
per-CPU basis. Actually, the buffers could use up 1/64-th of the
memmory because with the default switch policy there are two buffers per
CPU.
This commit allows to change that limit.
Note that the discussed limit is per dtrace command invocation.
The idea is to limit the size of a single malloc(9) call, not the total
memory size used by DTrace buffers.
Reviewed by: markj
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D33648
getf() on FreeBSD calls _sx_slock(), _sx_sunlock() and fget_locked().
Furthermore, it does not set the per-core fault flag, meaning it
usually ends up in a double fault panic once getf() does get called,
especially from fbt.
Reviewing the DTrace Toolkit + a number of other scripts scattered
around FreeBSD, I have not been able to find one use of getf(). Given
how broken the implementation currently is, we disable it until it
can be implemented properly.
Also comment out a test in aggs/tst.subr.d for getf().
Reviewed by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D33378
pcount is unused in the sense that it's set but never used except in an
assert. But asserts are compiled out always, so just mark it as unused.
Sponsored by: Netflix
As with arm and riscv fix return fbt probes on arm64. arg0 should be
the offset within the function of the return instruction and arg1
should be the return value.
Reviewed by: kp, markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33440
When writing to memory on arm64 we may be trying to be accessing a
read-only page. In this case try to access via the DMAP region to
get a writable location.
While here simplify writing data in DDB and stop trashing the size as
it is passed into the cache handling functions.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32053
Move the common kernel function signatures from machine/reg.h to a new
sys/reg.h. This is in preperation for adding PT_GETREGSET to ptrace(2).
Reviewed by: imp, markj
Sponsored by: DARPA, AFRL (original work)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19830
This was ported from illumos but not completely done. Currently we do
not perform type deduplication between KLDs and the kernel, i.e., kernel
modules have a complete type graph. So, remove it for now since it's
not functional and complicates the task of modifying various CTF type
definitions, and we are hitting some limits in the current format which
necessitate an update.
No functional change intended.
MFC after: 2 weeks
In both cases, too few frames were trimmed, leading to exception handling
or DTrace internals being exposed in stack traces exposed by D's stack()
primitive.
MFC after: 3 days
Reviewed by: emaste, andrew
To trace leaf asm functions we can insert a single nop instruction as
the first instruction in a function and trigger off this.
Reviewed by: gnn
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D28132
high-resolution nanosecond timestamp used for the DTrace 'timestamp'
built-in variable. The new implementation uses the EL0 cycle
counter and frequency registers in ARMv8-A. This replaces a
previous implementation that relied on an instrumentation-safe
implementation of getnanotime(), which provided only timer
resolution.
MFC after: 3 days
Reviewed by: andrew, bsdimp (older version)
Useful comments appreciated: jrtc27, emaste
The existing implementation relies on each trap handler saving a normal
stack frame record, which is a waste of time and space when we're
already saving a trapframe to the stack. It's also wrong as it currently
saves LR not ELR.
Instead of patching it up, rewrite it based on the RISC-V implementation
with inspiration from the amd64 implementation for how to handle
vectored traps to provide an improved implementation. This includes
compressing the information down to one line like other architectures
rather than the highly-verbose old form that repeats itself by printing
LR and FP in one frame only to print them as PC and SP in the next. It
also includes printing out actually useful information about the traps
that occurred, though FAR is not saved in the trapframe so we cannot
print it (in general it can be clobbered between when the trap happened
and now), only ESR.
The AAPCS also allows the stack frame record to be located anywhere in
the frame, not just the top, so the caller's SP is not at a fixed offset
from the callee's FP like on almost all other architectures in
existence. This means there is no way to derive the caller's SP in the
unwinder, and so we have to drop that bit of (unused) state everywhere.
Reviewed by: jhb, markj
Differential Revision: https://reviews.freebsd.org/D28026
A more complete fix for this function is being worked on in D28054. Fix
the uninitialized variable error so that builds can at least proceed.
Reported by: several
Some stack frames are too large for a store pair instruction we already
detect in the arm64 fbt code. Add support for handling subtracting the
stack pointer directly.
Sponsored by: Innovate UK
When searching for an instruction to patch out in the arm64 function
boundary trace we search for a store pair with a write back. This
instruction is commonly used to store two registers to the stack
and update the stack pointer to hold space for more.
This works in many cases, however not all functions use this, e.g.
when the stack frame is too large. In these cases we may find another
instruction of the same type that doesn't store through the stack
pointer. Filter these instructions out and assume if we see one we
are past the function prologue.
Reported by: rwatson
Sponsored by: Innovate UK
- Implement a dtrace_getnanouptime(), matching the existing
dtrace_getnanotime(), to avoid DTrace calling out to a potentially
instrumentable function.
(These should probably both be under KDTRACE_HOOKS. Also, it's not clear
to me that they are correct implementations for the DTrace thread time
functions they are used in .. fixes for another commit.)
- Don't allow FBT to instrument functions involved in EL1 exception handling
that are involved in FBT trap processing: handle_el1h_sync() and
do_el1h_sync().
- Don't allow FBT to instrument DDB and KDB functions, as that makes it
rather harder to debug FBT problems.
Prior to these changes, use of FBT on FreeBSD/arm64 rapidly led to kernel
panics due to recursion in DTrace.
Reliable FBT on FreeBSD/arm64 is reliant on another change from @andrew to
have the aarch64 instrumentor more carefully check that instructions it
replaces are against the stack pointer, which can otherwise lead to memory
corruption. That change remains under review.
MFC after: 2 weeks
Reviewed by: andrew, kp, markj (earlier version), jrtc27 (earlier version)
Differential revision: https://reviews.freebsd.org/D27766
A test of this is funcs/tst.strtok.d which has this filter:
BEGIN
/(this->field = strtok(this->str, ",")) == NULL/
{
exit(1);
}
The test will randomly fail with exit status of 1 indicating that this->field
was NULL even though printing it out shows it is not.
This is compiled to the DTrace instruction set:
// Pushed arguments not shown here
// call strtok() and set result into %r1
07: 2f001f01 call DIF_SUBR(31), %r1 ! strtok
// set thread local scalar this->field from %r1
08: 39050101 stls %r1, DT_VAR(1281) ! DT_VAR(1281) = "field"
// Prepare for the == comparison
// Set right side of %r2 to NULL
09: 25000102 setx DT_INTEGER[1], %r2 ! 0x0
// string compare %r1 (strtok result) to %r2
10: 27010200 scmp %r1, %r2
In this case only %r1 is loaded with a string limit set to lim1. %r2 being
NULL does not get loaded and does not set lim2. Then we call dtrace_strncmp()
with MIN(lim1, lim2) resulting in passing 0 and comparing neither side.
dtrace_strncmp() handles this case fine and it already has been while
being lucky with what lim2 was [un]initialized as.
Reviewed by: markj, Don Morris <dgmorris AT earthlink.net>
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D27671
This makes the minimum amount of changes to allow inclusion of dtrace.h
without all the solaris compatibility headers. Installing dtrace.h allows
compiling consumers of libdtrace (e.g. https://github.com/tmetsch/python-dtrace)
without requiring a copy of the source tree.
For python-dtrace I worked around this in 58019c9a12
but being able to build the library without installed sources would be
extremely useful.
Reviewed By: gnn
Differential Revision: https://reviews.freebsd.org/D27884
This same check is used on other architectures. Previously this would
permit a stack frame to unwind into any arbitrary kernel address
(including unmapped addresses).
Reviewed by: andrew, markj
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27362
- Push the kstack_contains check down into unwind_frame() so that it
is honored by DDB and DTrace.
- Check that the trapframe for an exception frame is contained in the
traced thread's kernel stack for DDB traces.
Reviewed by: markj
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27357
The sdt module's load handler iterates over SDT linker sets for the
kernel and all loaded modules to create probes and providers defined by
SDT(9). Probes in one module may belong to a provider in a different
module, but when a probe is created we assume that the provider is
already defined. To maintain this invariant, modify the load handler to
perform two separate passes over loaded modules: one to define providers
and the other to define probes.
The problem manifests when loading linux.ko, which depends on
linux_common.ko, which defines providers used by probes defined in
linux.ko.
Reported by: gallatin
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
This catches up to the changes made to struct unwind_state in r364180.
Reviewed by: mhorne
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27360