dtrace_trap() consumes page and protection faults triggered by code running
in DTrace probe context. Such faults occur with interrupts disabled and are
detected using a per-CPU flag. Regular faults cause dtrace_trap() to be
called with interrupts enabled, and nothing was ensuring that the flag was
read from the correct CPU. This may result in dtrace_trap() consuming
unrelated page and protection faults when DTrace is enabled, causing the
fault handler to return without actually having handled the fault.
Diagnosed by: Ryan Libby <rlibby@gmail.com>
MFC after: 3 days
Sponsored by: Dell EMC Isilon
When recording probe site addresses in the output DOF file, dtrace -G
needs to emit relocations for the .SUNW_dof section in order to obtain
the addresses of functions containing probe sites. DTrace expects the
addresses to be relative to the base address of the final ELF file,
and the amd64 USDT implementation was relying on some unspecified and
incorrect behaviour in the base system GNU ld to achieve this.
This change reimplements the probe site relocation handling to allow
USDT to be used with lld and newer GNU binutils. Specifically, it
makes use of R_X86_64_PC64/R_386_PC32 relocations to obtain the
probe site address relative to the DOF file address, and adds and uses a
new DOF relocation type which computes the final probe site address using
these relative offsets.
Reported by and discussed with: Rafael Espíndola
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D9374
This corresponds to the following illumos issues:
5755 want support for Intel FMA instrs
5756 want support for Intel BMI1 instrs
5757 want support for Intel BMI2 instrs
5758 want support for Intel AVX2 instrs
7204 Want broadwell rdseed and adx support
7208 Want stac/clac disasm support
7733 Need SHA Instruction dis support
7756 dis can't handle x86 SSE 3 instructions
7757 want avx2 disasm tests
7758 want SSE 4.1 disasm tests
MFC after: 2 weeks
This ioctl has been considered legacy by upstream since the DTrace code
was first imported, and is unused. The removal also allows some
simplification of dtrace_helper_slurp().
Also remove a bogus copyout in the DTRACEHIOC_ADDDOF handler. Due to a
bug, it would overwrite an in-memory copy of the DOF header rather than
the passed-in DOF helper. Moreover, DTRACEHIOC_ADDDOF already copies the
helper back out automatically since its argument has the IOC_OUT attribute.
the fifth argument to functions being traced, however there was an error
where the userspace stack was being used. This may be invalid leading to
a kernel panic if this address is unmapped.
Submitted by: Graeme Jenkinson <graeme.jenkinson@cl.cam.ac.uk>
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D9229
These functions may be called in DTrace probe context, so they cannot be
safely traced. Moreover, they are currently only used by DTrace, so their
corresponding FBT probes are not particularly useful.
MFC after: 2 weeks
This restriction was inherited from upstream but is not relevant on FreeBSD.
Furthermore, it hindered the tracing of locking primitive subroutines.
MFC after: 1 week
Otherwise there exists a narrow window during which a syscall probe can be
disabled and cause a concurrently-running thread to call dtrace_probe()
with an invalid probe ID.
Reported by: ngie
MFC after: 1 week
Sponsored by: Dell EMC Isilon
* Use the right incantation to get the next stack pointer. Since powerpc uses
special frames for traps, dereferencing the stack pointer straight up won't
get us the next stack pointer in every case.
* Clear EE using the correct instruction sequence. The PowerISA states that
'andi.' ANDs the register with 0||<imm>, instead of sign extending or filling
out the unavailable bits with 1. Even if it did sign extend, PSL_EE is
0x8000, so ~PSL_EE is 0x7fff, and the upper bits would be cleared. Use rlwinm
in the 32-bit case, and a two-rotate sequence in the 64-bit case, the latter
chosen to follow the output generated by gcc.
MFC after: 1 week
Also reduce the diff between us and upstream: the input data model will
always be DATAMODEL_NATIVE because of a bug (p_model is never set but is
always initialized to 0), so we don't need to override the caller anyway.
This change is also necessary to support the pid provider for 32-bit
processes on amd64.
MFC after: 2 weeks
former return the current status for the latter to use. Without this we
could enable interrupts when they shouldn't be.
It's still not quite right as it should only update the bits we care about,
bit should be good enough until the correct fix can be tested.
PR: 204270
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
Currently, Application Processors (non-boot CPUs) are started by
MD code at SI_SUB_CPU, but they are kept waiting in a "pen" until
SI_SUB_SMP at which point they are released to run kernel threads.
SI_SUB_SMP is one of the last SYSINIT levels, so APs don't enter
the scheduler and start running threads until fairly late in the
boot.
This change moves SI_SUB_SMP up to just before software interrupt
threads are created allowing the APs to start executing kernel
threads much sooner (before any devices are probed). This allows
several initialization routines that need to perform initialization
on all CPUs to now perform that initialization in one step rather
than having to defer the AP initialization to a second SYSINIT run
at SI_SUB_SMP. It also permits all CPUs to be available for
handling interrupts before any devices are probed.
This last feature fixes a problem on with interrupt vector exhaustion.
Specifically, in the old model all device interrupts were routed
onto the boot CPU during boot. Later after the APs were released at
SI_SUB_SMP, interrupts were redistributed across all CPUs.
However, several drivers for multiqueue hardware allocate N interrupts
per CPU in the system. In a system with many CPUs, just a few drivers
doing this could exhaust the available pool of interrupt vectors on
the boot CPU as each driver was allocating N * mp_ncpu vectors on the
boot CPU. Now, drivers will allocate interrupts on their desired CPUs
during boot meaning that only N interrupts are allocated from the boot
CPU instead of N * mp_ncpu.
Some other bits of code can also be simplified as smp_started is
now true much earlier and will now always be true for these bits of
code. This removes the need to treat the single-CPU boot environment
as a special case.
As a transition aid, the new behavior is available under a new kernel
option (EARLY_AP_STARTUP). This will allow the option to be turned off
if need be during initial testing. I plan to enable this on x86 by
default in a followup commit in the next few days and to have all
platforms moved over before 11.0. Once the transition is complete,
the option will be removed along with the !EARLY_AP_STARTUP code.
These changes have only been tested on x86. Other platform maintainers
are encouraged to port their architectures over as well. The main
things to check for are any uses of smp_started in MD code that can be
simplified and SI_SUB_SMP SYSINITs in MD code that can be removed in
the EARLY_AP_STARTUP case (e.g. the interrupt shuffling).
PR: kern/199321
Reviewed by: markj, gnn, kib
Sponsored by: Netflix
When this flag is turned on, DOF and DIF validation errors are printed to
the kernel message buffer. This is useful for debugging.
Also remove the debug.dtrace.debug sysctl, which has no effect.
While the instructions were not included into the original instruction
set, their support can be indicated by a special feature bit.
For example:
CPU: AMD Phenom(tm) II X4 955 Processor (3214.71-MHz K8-class CPU)
...
AMD Features2=0x37ff<LAHF, ...>
Clang 3.8 uses lahf/sahf as a faster alternative to pushf/popf where
possible.
MFC after: 2 weeks
Currently this argument is a pointer into the stack which is used by FBT
to fetch the first five probe arguments. On all non-x86 architectures it's
simply the trapframe address, so this change has no functional impact. On
amd64 it's a pointer into the trapframe such that stack[1 .. 5] gives the
first five argument registers, which are deliberately grouped together in
the amd64 trapframe definition.
A trapframe argument simplifies the invop handlers on !x86 and makes the
x86 FBT invop handler easier to understand. Moreover, it allows for invop
handlers that may want to modify the register set of the interrupted thread.
This allows the hrtimer to be used earlier during boot. This is required
for boot-time DTrace: anonymous enablings are created during
SI_SUB_DTRACE_ANON, which runs before APs are started. In particular,
the DTrace deadman timer requires that the hrtimer be functional.
MFC after: 2 weeks
Allow using DTRACE for performance analysis of userspace
applications - the function call stack can be captured.
This is almost an exact copy of AMD64 solution.
Obtained from: Semihalf
Sponsored by: Cavium
Reviewed by: emaste, gnn, jhibbits
Differential Revision: https://reviews.freebsd.org/D5779
- Handle the case where no DOF helper is provided. This occurs with the
currently-unused DTRACEHIOC_ADD ioctl.
- Fix some checks that prevented the loading DOF in the (non-default)
lazyload mode.
need to include it explicitly when <vm/vm_param.h> is already included.
Suggested by: alc
Reviewed by: alc
Differential Revision: https://reviews.freebsd.org/D5379
first instruction to see if it's either a pushm with lr, or a sub with sp.
The former is the common case, with the latter used with va_args.
This removes 12 probes. These are all hand-written assembly, with a few C
functions with no stack usage.
Submitted by: Howard Su <howard0su@gmail.com>
Differential Revision: https://reviews.freebsd.org/D4419
Rather than pushing all eight possible arguments into dtrace_probe()'s
stack frame, make the syscall_args struct for the current syscall available
via the current thread. Using a custom getargval method for the systrace
provider, this allows any syscall argument to be fetched, even in kernels
that have modified the maximum number of system call arguments.
Sponsored by: EMC / Isilon Storage Division
r281257 added support for lazyload mode by allowing dtrace(1) to register
a DOF section on behalf of a traced process. This was implemented by
having libdtrace copy the DOF section into a heap-allocated buffer and
passing its address to the ioctl handler. However, DTrace uses the DOF
section address as a lookup key in certain cases, so the ioctl handler
should be given the target process' DOF section address instead. This
change modifies the ADDDOF handler to copy the DOF section in from the
target process, rather than from dtrace(1).
While here update for armv6 to a tested value.
Submitted by: Howard Su <howard0su@gmail.com>
Reviewed by: stat
Differential Revision: https://reviews.freebsd.org/D4315
Boundary Trace to assembly to reduce the overhead of these checks.
Submitted by: Howard Su <howard0su@gmail.com>
Relnotes: Yes
Differential Revision: https://reviews.freebsd.org/D4266
stack, take into account the copy of rsi pushed between the breakpoint
trapframe and the dtrace_invop frame. Prior to r287644, this was covered
by the fact that sizeof(struct amd64_frame) was 24 rather than 16.
Reported by: smh
linux_syscallnames[] from linux_* to linux32_* to avoid conflicts with
linux64.ko. While here, add support for linux64 binaries to systrace.
- Update NOPROTO entries in amd64/linux/syscalls.master to match the
main table to fix systrace build.
- Add a special case for union l_semun arguments to the systrace
generation.
- The systrace_linux32 module now only builds the systrace_linux32.ko.
module on amd64.
- Add a new systrace_linux module that builds on both i386 and amd64.
For i386 it builds the existing systrace_linux.ko. For amd64 it
builds a systrace_linux.ko for 64-bit binaries.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D3954
otherwise DTRACE_ANCHORED() returns false and that makes stack()
insert a bogus frame at the top.
For example:
dtrace -n 'test:dtrace_test::sdttest { stack(); }
This change is not really a solution, but just a work-around.
The real solution is to record the probe's call site and to use
that for resolving a function name.
PR: 195222
MFC after: 22 days