Commit Graph

41 Commits

Author SHA1 Message Date
John Baldwin
c18ca74916 Don't pass error from syscallenter() to syscallret().
syscallret() doesn't use error anymore.  Fix a few other places to permit
removing the return value from syscallenter() entirely.
- Remove a duplicated assertion from arm's syscall().
- Use td_errno for amd64_syscall_ret_flush_l1d.

Reviewed by:	kib
MFC after:	1 month
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D2090
2019-07-15 21:25:16 +00:00
John Baldwin
1af9474b26 Always set td_errno to the error value of a system call.
Early errors prior to a system call did not set td_errno.  This commit
sets td_errno for all errors during syscallenter().  As a result,
syscallret() can now always use td_errno without checking TDP_NERRNO.

Reviewed by:	kib
MFC after:	1 month
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D20898
2019-07-15 21:16:01 +00:00
John Baldwin
c26541e315 Use 'retval' label for first error in syscallenter().
This is more consistent with the rest of the function and lets us
unindent most of the function.

Reviewed by:	kib
MFC after:	1 month
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D20897
2019-07-09 23:58:12 +00:00
Mateusz Guzik
7d065d876e Deinline vfork handling out of the syscall return path.
vfork is rarely called (comparatively to other syscalls) and it avoidably
pollutes the fast path.

Sponsored by:	The FreeBSD Foundation
2018-12-19 20:27:26 +00:00
Mateusz Guzik
e272bf479b Annotate td_cowgen check as unlikely.
Sponsored by:	The FreeBSD Foundation
2018-11-29 04:48:22 +00:00
Mateusz Guzik
e3d3e8289b Revert "fork: fix use-after-free with vfork"
This unreliably breaks libc handling of vfork where forking succeded,
but execve did not.

vfork code in libc performs waitpid with WNOHANG in case of failed exec.
With the fix exit codepath was waking up the parent before the child
fully transitioned to a zombie. Woken up parent would waitpid, which
could find a not-yet-zombie child and fail to reap it due to the WNOHANG
flag.

While removing the flag fixes the problem, it is not an option due to older
releases which would still suffer from the kernel change.

Revert the fix until a solution can be worked out.

Note that while use-after-free which gets back due to the revert is a real
bug, it's side-effects are limited due to the fact that struct proc memory
is never released by UMA.
2018-11-23 04:38:50 +00:00
Mateusz Guzik
adce241981 Annotate TDP_RFPPWAIT as unlikely.
The flag is only set on vfork, but is tested for *all* syscalls.
On amd64 this shortens common-case (not vfork) code.
2018-11-22 21:38:24 +00:00
Mateusz Guzik
b00b27e925 fork: fix use-after-free with vfork
The pointer to the child is stored without any reference held. Then it is
blindly used to wait until P_PPWAIT is cleared. However, if the child is
autoreaped it could have exited and get freed before the parent started
waiting.

Use the existing hold mechanism to mitigate the problem. Most common case
of doing exec remains unchanged. The corner case of doing exit performs
wake up before waiting for holds to clear.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18295
2018-11-22 21:08:37 +00:00
Mateusz Guzik
9d68f7741f systrace: track it like sdt probes
While here predict false.

Note the code is wrong (regardless of this change). Dereference of the
pointer can race with module unload. A fix would set the probe to a
nop stub instead of NULL.
2018-04-27 15:16:34 +00:00
Pedro F. Giffuni
df57947f08 spdx: initial adoption of licensing ID tags.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.

Initially, only tag files that use BSD 4-Clause "Original" license.

RelNotes:	yes
Differential Revision:	https://reviews.freebsd.org/D13133
2017-11-18 14:26:50 +00:00
Konstantin Belousov
2d88da2f06 Move struct syscall_args syscall arguments parameters container into
struct thread.

For all architectures, the syscall trap handlers have to allocate the
structure on the stack.  The structure takes 88 bytes on 64bit arches
which is not negligible.  Also, it cannot be easily found by other
code, which e.g. caused duplication of some members of the structure
to struct thread already.  The change removes td_dbg_sc_code and
td_dbg_sc_nargs which were directly copied from syscall_args.

The structure is put into the copied on fork part of the struct thread
to make the syscall arguments information correct in the child after
fork.

This move will also allow several more uses shortly.

Reviewed by:	jhb (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
X-Differential revision:	https://reviews.freebsd.org/D11080
2017-06-12 21:03:23 +00:00
Gleb Smirnoff
83c9dea1ba - Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeter
in place.  To do per-cpu stats, convert all fields that previously were
  maintained in the vmmeters that sit in pcpus to counter(9).
- Since some vmmeter stats may be touched at very early stages of boot,
  before we have set up UMA and we can do counter_u64_alloc(), provide an
  early counter mechanism:
  o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter.
  o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter,
    so that at early stages of boot, before counters are allocated we already
    point to a counter that can be safely written to.
  o For sparc64 that required a whole dummy pcpu[MAXCPU] array.

Further related changes:
- Don't include vmmeter.h into pcpu.h.
- vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit,
  to match kernel representation.
- struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion.

This is based on benno@'s 4-year old patch:
https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html

Reviewed by:	kib, gallatin, marius, lidl
Differential Revision:	https://reviews.freebsd.org/D10156
2017-04-17 17:34:47 +00:00
Gleb Smirnoff
fef0991322 Typo! 2017-04-17 17:07:51 +00:00
Gleb Smirnoff
9ed01c32e0 All these files need sys/vmmeter.h, but now they got it implicitly
included via sys/pcpu.h.
2017-04-17 17:07:00 +00:00
Eric Badger
82a4538f31 Defer ptracestop() signals that cannot be delivered immediately
When a thread is stopped in ptracestop(), the ptrace(2) user may request
a signal be delivered upon resumption of the thread. Heretofore, those signals
were discarded unless ptracestop()'s caller was issignal(). Fix this by
modifying ptracestop() to queue up signals requested by the ptrace user that
will be delivered when possible. Take special care when the signal is SIGKILL
(usually generated from a PT_KILL request); no new stop events should be
triggered after a PT_KILL.

Add a number of tests for the new functionality. Several tests were authored
by jhb.

PR:		212607
Reviewed by:	kib
Approved by:	kib (mentor)
MFC after:	2 weeks
Sponsored by:	Dell EMC
In collaboration with:	jhb
Differential Revision:	https://reviews.freebsd.org/D9260
2017-02-20 15:53:16 +00:00
Konstantin Belousov
643f6f47fd Add PROC_TRAPCAP procctl(2) controls and global sysctl kern.trap_enocap.
Both can be used to cause processes in capability mode to receive
SIGTRAP when ENOTCAPABLE or ECAPMODE errors are returned from
syscalls.

Idea by:	emaste
Reviewed by:	oshogbo (previous version), emaste
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D7965
2016-09-21 08:23:33 +00:00
John Baldwin
fc4f075a1a Add PTRACE_VFORK to trace vfork events.
First, PL_FLAG_FORKED events now also set a PL_FLAG_VFORKED flag when
the new child was created via vfork() rather than fork().  Second, a
new PL_FLAG_VFORK_DONE event can now be enabled via the PTRACE_VFORK
event mask.  This new stop is reported after the vfork parent resumes
due to the child calling exit or exec.  Debuggers can use this stop to
reinsert breakpoints in the vfork parent process before it resumes.

Reviewed by:	kib
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D7045
2016-07-18 14:53:55 +00:00
John Baldwin
8d570f64aa Add a mask of optional ptrace() events.
ptrace() now stores a mask of optional events in p_ptevents.  Currently
this mask is a single integer, but it can be expanded into an array of
integers in the future.

Two new ptrace requests can be used to manipulate the event mask:
PT_GET_EVENT_MASK fetches the current event mask and PT_SET_EVENT_MASK
sets the current event mask.

The current set of events include:
- PTRACE_EXEC: trace calls to execve().
- PTRACE_SCE: trace system call entries.
- PTRACE_SCX: trace syscam call exits.
- PTRACE_FORK: trace forks and auto-attach to new child processes.
- PTRACE_LWP: trace LWP events.

The S_PT_SCX and S_PT_SCE events in the procfs p_stops flags have
been replaced by PTRACE_SCE and PTRACE_SCX.  PTRACE_FORK replaces
P_FOLLOW_FORK and PTRACE_LWP replaces P2_LWP_EVENTS.

The PT_FOLLOW_FORK and PT_LWP_EVENTS ptrace requests remain for
compatibility but now simply toggle corresponding flags in the
event mask.

While here, document that PT_SYSCALL, PT_TO_SCE, and PT_TO_SCX both
modify the event mask and continue the traced process.

Reviewed by:	kib
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D7044
2016-07-15 15:32:09 +00:00
Mark Johnston
8ff6d9dd22 Support an arbitrary number of arguments to DTrace syscall probes.
Rather than pushing all eight possible arguments into dtrace_probe()'s
stack frame, make the syscall_args struct for the current syscall available
via the current thread. Using a custom getargval method for the systrace
provider, this allows any syscall argument to be fetched, even in kernels
that have modified the maximum number of system call arguments.

Sponsored by:	EMC / Isilon Storage Division
2015-12-17 00:00:27 +00:00
Ed Schouten
aff5735784 Add a way to distinguish between forking and thread creation in schedtail.
For CloudABI we need to initialize the registers of new threads
differently based on whether the thread got created through a fork or
through simple thread creation.

Add a flag, TDP_FORKING, that is set by do_fork() and cleared by
fork_exit(). This can be tested against in schedtail.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D3973
2015-10-22 09:33:34 +00:00
John Baldwin
189ac973de Fix various edge cases related to system call tracing.
- Always set td_dbg_sc_* when P_TRACED is set on system call entry
  even if the debugger is not tracing system call entries.  This
  ensures the fields are valid when reporting other stops that
  occur at system call boundaries such as for PT_FOLLOW_FORKS or
  when only tracing system call exits.
- Set TDB_SCX when reporting the stop for a new child process in
  fork_return().  This causes the event to be reported as a system
  call exit.
- Report a system call exit event in fork_return() for new threads in
  a traced process.
- Copy td_dbg_sc_* to new threads instead of zeroing.  This ensures
  that td_dbg_sc_code in particular will report the system call that
  created the new thread or process when it reports a system call
  exit event in fork_return().
- Add new ptrace tests to verify that new child processes and threads
  report system call exit events with a valid pl_syscall_code via
  PT_LWPINFO.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D3822
2015-10-06 19:29:05 +00:00
John Baldwin
bdd64116b0 Always clear TDB_USERWR before fetching system call arguments. The
TDB_USERWR flag may still be set after a debugger detaches from a
process via PT_DETACH.  Previously the flag would never be cleared
forcing a double fetch of the system call arguments for each system
call.  Note that the flag cannot be cleared at PT_DETACH time in case
one of the threads in the process is currently stopped in
syscallenter() and the debugger has modified the arguments for that
pending system call before detaching.

Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D3678
2015-09-16 20:55:00 +00:00
John Baldwin
ded3e7f08e The 'sa' argument to syscallret() is not unused. 2015-09-01 22:28:23 +00:00
John Baldwin
183b68f74f Export current system call code and argument count for system call entry
and exit events. procfs stop events for system call tracing report these
values (argument count for system call entry and code for system call exit),
but ptrace() does not provide this information. (Note that while the system
call code can be determined in an ABI-specific manner during system call
entry, it is not generally available during system call exit.)

The values are exported via new fields at the end of struct ptrace_lwpinfo
available via PT_LWPINFO.

Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D3536
2015-09-01 22:24:54 +00:00
Mateusz Guzik
4ea6a9a28f Generalised support for copy-on-write structures shared by threads.
Thread credentials are maintained as follows: each thread has a pointer to
creds and a reference on them. The pointer is compared with proc's creds on
userspace<->kernel boundary and updated if needed.

This patch introduces a counter which can be compared instead, so that more
structures can use this scheme without adding more comparisons on the boundary.
2015-06-10 10:43:59 +00:00
Konstantin Belousov
8638fe7bea Thread waiting for the vfork(2)-ed child to exec or exit, must allow
for the suspension.

Currently, the loop performs uninterruptible cv_wait(9) call, which
prevents suspension until child allows further execution of parent.
If child is stopped, suspension or single-threading is delayed
indefinitely.

Create a helper thread_suspend_check_needed() to identify the need for
a call to thread_suspend_check().  It is required since call to the
thread_suspend_check() cannot be safely done while owning the child
(p2) process lock.  Only when suspension is needed, drop p2 lock and
call thread_suspend_check().  Perform wait for cv with timeout, in
case suspend is requested after wait started; I do not see a better
way to interrupt the wait.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-12-08 16:18:05 +00:00
Robert Watson
4a14441044 Update kernel inclusions of capability.h to use capsicum.h instead; some
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.

MFC after:	3 weeks
2014-03-16 10:55:57 +00:00
Attilio Rao
54366c0bd7 - For kernel compiled only with KDTRACE_HOOKS and not any lock debugging
option, unbreak the lock tracing release semantic by embedding
  calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined
  version of the releasing functions for mutex, rwlock and sxlock.
  Failing to do so skips the lockstat_probe_func invokation for
  unlocking.
- As part of the LOCKSTAT support is inlined in mutex operation, for
  kernel compiled without lock debugging options, potentially every
  consumer must be compiled including opt_kdtrace.h.
  Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the
  dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES
  is linked there and it is only used as a compile-time stub [0].

[0] immediately shows some new bug as DTRACE-derived support for debug
in sfxge is broken and it was never really tested.  As it was not
including correctly opt_kdtrace.h before it was never enabled so it
was kept broken for a while.  Fix this by using a protection stub,
leaving sfxge driver authors the responsibility for fixing it
appropriately [1].

Sponsored by:	EMC / Isilon storage division
Discussed with:	rstone
[0] Reported by:	rstone
[1] Discussed with:	philip
2013-11-25 07:38:45 +00:00
Oleksandr Tymoshenko
7fc3ae51f3 Fix build on ARM (and probably other platforms) 2012-12-28 06:52:53 +00:00
Jeff Roberson
4c44811c9d - Add new machine parsable KTR macros for timing events.
- Use this new format to automatically handle syscalls and VOPs.  This
   changes the earlier format but is still human readable.

Sponsored by:	EMC / Isilon Storage Division
2012-12-19 20:10:00 +00:00
Attilio Rao
16cbf13b53 Move the checks for td_pinned, td_critnest, TDP_NOFAULTING and
TDP_NOSLEEPING leaking from syscallret() to userret() so that also
trap handling is covered. Also, the check on td_locks is not duplicated
between the two functions.

Reported by:	avg
Reviewed by:	kib
MFC after:	1 week
2012-09-08 18:35:15 +00:00
John Baldwin
7e690c1f79 Assert that system calls do not leak a pinned thread (via sched_pin()) to
userland.
2012-08-22 20:02:42 +00:00
Konstantin Belousov
6c5d7af158 Assert that TDP_NOFAULTING and TDP_NOSPEEPING thread flags do not leak
when thread returns from a syscall to usermode.

Tested by:	pho
MFC after:	1 week
2012-05-30 13:44:42 +00:00
Konstantin Belousov
2dd9ea6f70 Add thread-private flag to indicate that error value is already placed
in td_errno. Flag is supposed to be used by syscalls returning
EJUSTRETURN because errno was already placed into the usermode frame
by a call to set_syscall_retval(9). Both ktrace and dtrace get errno
value from td_errno if the flag is set.

Use the flag to fix sigsuspend(2) error return ktrace records.

Requested by:	bde
MFC after:	1 week
2012-04-12 10:48:43 +00:00
Konstantin Belousov
1d7ca9bb8e Currently, the debugger attached to the process executing vfork() does
not get syscall exit notification until the child performed exec of
exit.  Swap the order of doing ptracestop() and waiting for P_PPWAIT
clearing, by postponing the wait into syscallret after ptracestop()
notification is done.

Reported, tested and reviewed by:	Dmitry Mikulin <dmitrym juniper net>
MFC after:	 2 weeks
2012-02-27 21:10:10 +00:00
Konstantin Belousov
343b391f20 The PTRACESTOP() macro is used only once. Inline the only use and remove
the macro.

MFC after:	1 week
2012-02-11 14:49:25 +00:00
Konstantin Belousov
6ad1ff09cc A debugger which requested PT_FOLLOW_FORK should get the notification
about new child not only when doing PT_TO_SCX, but also for PT_CONTINUE.
If TDB_FORK flag is set, always issue a stop, the same as is done for
TDB_EXEC.

Reported by:	Dmitry Mikulin <dmitrym juniper net>
MFC after:	1 week
2012-01-30 20:00:29 +00:00
Marcel Moolenaar
b2f1a8f2b3 Revert rev. 226893: subr_syscall.c is being included from C files and
on amd64 with FREEBSD32 enabled, this means that systrace_probe_func
gets defined twice.
2011-10-30 02:19:39 +00:00
Marcel Moolenaar
056f0ec755 Define systrace_probe_func in subr_syscall.c where it's used, instead
of defining it in MD code. This eliminates porting to other architectures.
2011-10-29 01:26:36 +00:00
Konstantin Belousov
ce8bd78b2a Do not deliver SIGTRAP on exec as the normal signal, use ptracestop() on
syscall exit path. Otherwise, if SIGTRAP is ignored, that tdsendsignal()
do not want to deliver the signal, and debugger never get a notification
of exec.

Found and tested by:	Anton Yuzhaninov <citrin citrin ru>
Discussed with:	jhb
MFC after:	2 weeks
2011-09-27 13:17:02 +00:00
Konstantin Belousov
26ccf4f10f Inline the syscallenter() and syscallret(). This reduces the time measured
by the syscall entry speed microbenchmarks by ~10% on amd64.

Submitted by:	jhb
Approved by:	re (bz)
MFC after:	2 weeks
2011-09-11 16:05:09 +00:00