It allows iteration over processes belonging to given jail instead of
having to walk the entire allproc list.
Note the iteration can miss processes which remains bug-compatible
with previous code.
Reviewed by: jamie (previous version), markj (previous version)
Differential Revision: https://reviews.freebsd.org/D34522
It does not serve any purpose after we stopped doing
thread_single(SINGLE_ALLPROC) from stoppable user processes.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D36207
We do not check single-threading conditions in trap, or when sleeping
uninterruptible.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Differential revision: https://reviews.freebsd.org/D36207
Make most AST handlers dynamically registered. This allows to have
subsystem-specific handler source located in the subsystem files,
instead of making subr_trap.c aware of it. For instance, signal
delivery code on return to userspace is now moved to kern_sig.c.
Also, it allows to have some handlers designated as the cleanup (kclear)
type, which are called both at AST and on thread/process exit. For
instance, ast(), exit1(), and NFS server no longer need to be aware
about UFS softdep processing.
The dynamic registration also allows third-party modules to register AST
handlers if needed. There is one caveat with loadable modules: the
code does not make any effort to ensure that the module is not unloaded
before all threads processed through AST handler in it. In fact, this
is already present behavior for hwpmc.ko and ufs.ko. I do not think it
is worth the efforts and the runtime overhead to try to fix it.
Reviewed by: markj
Tested by: emaste (arm64), pho
Discussed with: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D35888
Since both self single-threading and remote single-threading rely on
suspending the thread doing thread_single(), it cannot be mixed: thread
doing thread_suspend_switch() might be subject to thread_suspend_one()
and vice versa.
In collaboration with: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D35310
before the process itself does thread_single(SINGLE_EXIT). We cannot
single-thread such process in ALLPROC (external) mode, and properly
detect and report the failure to do so due to the process becoming
zombie is easier to prevent than handle.
In collaboration with: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D35310
p_reapsibling linkage is protected by proctree_lock, and it is modified
there.
Suggested and reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D35014
We use the p_itcallout callout, interlocked by the proc lock, to
schedule timeouts for the setitimer(2) system call. When a process
exits, the callout must be stopped before the process struct is
recycled.
Currently we attempt to stop the callout in exit1() with the call
_callout_stop_safe(&p->p_itcallout, CS_EXECUTING). If this call returns
0, then we sleep in order to drain the callout. However, this happens
only if the callout is not scheduled at all. If the callout thread is
blocked on the proc lock, then exit1() will not block and the callout
may execute after the process has fully exited, typically resulting in a
panic.
I cannot see a reason to use the CS_EXECUTING flag here. Instead, use
the regular callout_stop()/callout_drain() dance to halt the callout.
Reported by: ler
Tested by: ler, pho
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34625
Declare the exit system call normally. This results in the
implementation being named sys_exit rather than sys_sys_exit and
being decalred as returning an int. Infact it does not return
at all because exit1 does not, so add an __unreachable() to let the
compiler know that.
Reviewed by: kib, imp
Previously, the code would copy twice as many pointers as specified
and print pairs of them a single 64-bit pointer.
abort2 doesn't return so make the return type void
freebsd32_abort2 is in it's own file with a 2-clause BSD license
based on a discussion with Wojciech many years ago.
Reviewed by: kevans
This allows the pmap_remove(min, max) call to see empty pmap and exploit
empty pmap optimization.
Reviewed by: markj
Tested by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32569
While doing it, also move all the code to resolve pathnames and obtain
text vp and dvp, into single place. Besides simplifying the code, it
avoids spurious vnode relocks and validates the explanation why
a transient text reference on the script vnode is not harmful.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32611
On kernels compiled with 'options TSLOG', record for each process ID:
* The timestamp of the fork() which creates it and the parent
process ID,
* The first path passed to execve(), if any,
* The first path resolved by namei, if any, and
* The timestamp of the exit() which terminates the process.
Expose this information via a new sysctl, debug.tslog_user.
On kernels lacking 'options TSLOG' (the default), no information is
recorded and the sysctl does not exist.
Note that recording namei is needed in order to obtain the names of
rc.d scripts being launched, as the rc system sources them in a
subshell rather than execing the scripts.
With this commit it is now possible to generate flamecharts of the
entire boot process from the start of the loader to the end of
/etc/rc. The code needed to perform this processing is currently
found in github: https://github.com/cperciva/freebsd-boot-profiling
Reviewed by: mhorne
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D32493
To prevent umtx.h polluting by future changes split it on two headers:
umtx.h - ABI header for userspace;
umtxvar.h - the kernel staff.
While here fix umtx_key_match style.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31248
MFC after: 2 weeks
Use sysentvec hooks to only call umtx_thread_exit/umtx_exec, which handle
robust mutexes, for native FreeBSD ABI. Similarly, there is no sense
in calling sigfastblock_clear() for non-native ABIs.
Requested by: dchagin
Reviewed by: dchagin, markj (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D30987
after itimers are stopped. This makes it more usable for e.g. native FreeBSD
ABI sysentvecs.
Reviewed by: dchagin, markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D30987
It seems that Linux does not dequeue siginfo for SIGCHLD when wait*(2)
reports status of the running process. In particular, sigwaitinfo(2)
and other signal querying syscalls can observe the siginfo after wait.
FreeBSD dequeued siginfo from the beginning, so we cannot change the
default ABI to be more compatible. Still, add a knob to enable to
change to the other behavior for debugging purposes.
Reported by: dchagin
Reviewed by: dchagin, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30675
Exiting processes that have been removed from allproc but are still
executing are not yet marked PRS_ZOMBIE, so they were not listed (for
example, if a thread panics during exit1()). To detect these
processes, clear p_list.le_prev to NULL explicitly after removing a
process from the allproc list and check for this sentinel rather than
PRS_ZOMBIE when walking the pidhash.
While here, simplify the pidhash walk to use a single outer loop.
Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D27824
Exec and exit are same as corresponding eventhandler hooks.
Thread exit hook is called somewhat earlier, while thread is still
owned by the process and enough context is available. Note that the
process lock is owned when the hook is called.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D27309
While there, do some minor cleanup for kclocks. They are only
registered from kern_time.c, make registration function static.
Remove event hooks, they are not used by both registered kclocks.
Add some consts.
Perhaps we can stop registering kclocks at all and statically
initialize them.
Reviewed by: mjg
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D27305
No functional change intended.
Tracking these structures separately for each proc enables future work to
correctly emulate clone(2) in linux(4).
__FreeBSD_version is bumped (to 1300130) for consumption by, e.g., lsof.
Reviewed by: kib
Discussed with: markj, mjg
Differential Revision: https://reviews.freebsd.org/D27037
Since thread_zone is marked NOFREE the thread_fini callback is never
executed, meaning memory allocated by seltdinit is never released.
Adding the call to thread_dtor is not sufficient as exiting processes
cache the main thread.
First, funsetownlst() list looks at the first element of the list to see
whether it's processing a process or a process group list. Then it
acquires the global sigio lock and processes the list. However, nothing
prevents the first sigio tracker from being freed by a concurrent
funsetown() before the sigio lock is acquired.
Fix this by acquiring the global sigio lock immediately after checking
whether the list is empty. Callers of funsetownlst() ensure that new
sigio trackers cannot be added concurrently.
Second, fsetown() uses funsetown() to remove an existing sigio structure
from a file object. However, funsetown() uses a racy check to avoid the
sigio lock, so two threads may call fsetown() on the same file object,
both observe that no sigio tracker is present, and enqueue two sigio
trackers for the same file object. However, if the file object is
destroyed, funsetown() will only remove one sigio tracker, and
funsetownlst() may later trigger a use-after-free when it clears the
file object reference for each entry in the list.
Fix this by introducing funsetown_locked(), which avoids the racy check.
Reviewed by: kib
Reported by: pho
Tested by: pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27157
Attempt of adding assertions that pgrp->pg_jobc counters do not
underflow in r361967, reverted in r362910, points out bugs in the
handling of job control. Peter Holm was able to narrow down the
problem to very easy reproduction with timeout(1) which uses reaping.
The following list of problems with calculation of pg_jobs which
directs SIGHUP/SIGCONT delivery for orphaned process group was
identified:
- Re-calculation of the orphaned status for children of exiting parent
was wrong, but mostly unnoticed when all children were reparented to
init(8). When child can be reparented to a different process which
could affect the child' job control state, it was not properly
accounted for in pg_jobc.
- Lockless check for exiting process' parent process group is racy
because nothing prevents the parent from changing its group
membership.
- Exited process is left in the process group, until waited. This
affects other calculations of pg_jobc.
Split handling of job control status on process changing its process
group, and process exiting. Calculate increments and decrements for
pg_jobs by exact checking the orphanage instead of assuming process
group membership for children and parent. Move the call to killjobc()
later under the proctree_lock. Mark exiting process in killjobc()
with a new flag P_TREE_GRPEXITED and skip it for all pg_jobc
calculations after the flag is set.
Add checker that independently recalculates pg_jobc value and compares
it with the memoized process group state. This is enabled under INVARIANTS.
Reviewed by: jilles
Discussed with: kevans
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D26116
This NFS lock device driver was replaced by the kernel NLM around FreeBSD7 and
has not normally been used since then.
To use it, the kernel had to be built without "options NFSLOCKD" and
the nfslockd.ko had to be deleted as well.
Since it uses Giant and is no longer used, this patch removes it.
With this device driver removed, there is now a lot of unused code
in the userland rpc.lockd. That will be removed on a future commit.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D22933
Modern debuggers and process tracers use ptrace() rather than procfs
for debugging. ptrace() has a supserset of functionality available
via procfs and new debugging features are only added to ptrace().
While the two debugging services share some fields in struct proc,
they each use dedicated fields and separate code. This results in
extra complexity to support a feature that hasn't been enabled in the
default install for several years.
PR: 244939 (exp-run)
Reviewed by: kib, mjg (earlier version)
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D23837
The debugger like truss(1) depends on the wait(2) syscall. This syscall
waits for ALL children. When it is waiting for ALL child's the children
created by process descriptors are not returned. This behavior was
introduced because we want to implement libraries which may pdfork(1).
The behavior of process descriptor brakes truss(1) because it will
not be able to collect the status of processes with process descriptors.
To address this problem the status is returned to parent when the
child is traced. While the process is traced the debugger is the new parent.
In case the original parent and debugger are the same process it means the
debugger explicitly used pdfork() to create the child. In that case the debugger
should be using kqueue()/pdwait() instead of wait().
Add test case to verify that. The test case was implemented by markj@.
Reviewed by: kib, markj
Discussed with: jhb
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D20362
There is no correctness change here, but the procid lock is contended in
the fork path and taking it while holding proctree avoidably extends its
hold time.
Note that there are other ids which can end up getting cleared with the
lock.
Sponsored by: The FreeBSD Foundation
It is not needed by anything in the kernel and it slightly drives up contention
on both proctree and allproc locks.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21447
In case of the process being debugged. The P_TRACED is cleared very early,
which would make procdesc_close() not calling proc_clear_orphan().
That would result in the debugged process can not be able to collect
status of the process with process descriptor.
Reviewed by: markj, kib
Tested by: pho
MFC after: 1 month
Similar to r348026, exhaustive search for uses of CTRn() and cross reference
ktr.h includes. Where it was obvious that an OS compat header of some kind
included ktr.h indirectly, .c files were left alone. Some of these files
clearly got ktr.h via header pollution in some scenarios, or tinderbox would
not be passing prior to this revision, but go ahead and explicitly include it
in files using it anyway.
Like r348026, these CUs did not show up in tinderbox as missing the include.
Reported by: peterj (arm64/mp_machdep.c)
X-MFC-With: r347984
Sponsored by: Dell EMC Isilon
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.
EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).
As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions. The remainder of the patch addresses
adding appropriate includes to fix those files.
LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).
No functional change (intended). Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.
Such processes will be reparented to the reaper when the current
parent is done with them (i.e., ptrace detached), so p_oppid must be
updated accordingly.
Add a regression test to exercise this code path. Previously it
would not be possible to reap an orphan with a stale oppid.
Reviewed by: kib, mjg
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19825
The kernel was already doing this prior to r329615. It was changed
to reduce contention on allproc. However, introduction of pidhash
locks and removal of proctree -> allproc ordering from fork thanks
to bitmaps fixed things enough to make this change pessimal.
waitpid takes proctree on each call and this change (now) causes
avoidable stalls if allproc is held.
Sponsored by: The FreeBSD Foundation
Currently unique pid allocation on fork often requires a full walk of
process, group, session lists to make sure it is not used by anything.
This has a side effect of requiring proctree to be held along with allproc,
which adds more contention in poudriere -j 128.
The patch below implements trivial bitmaps which gets rid of the problem.
Dedicated lock is introduced to manage IDs.
While here a bug was discovered: all processes would inherit reap id from
the first process spawned by init. This had a side effect of keeping the
ID used and when allocation rolls over to the beginning it keeps being
skipped.
The patch is loosely based on initial work by mjoras@.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
waitpid always takes proctree to evaluate the list, but only takes allproc
if it can reap. With this patch allproc is no longer taken, which helps during
poudriere -j 128.
Discussed with: kib
Sponsored by: The FreeBSD Foundation
This unreliably breaks libc handling of vfork where forking succeded,
but execve did not.
vfork code in libc performs waitpid with WNOHANG in case of failed exec.
With the fix exit codepath was waking up the parent before the child
fully transitioned to a zombie. Woken up parent would waitpid, which
could find a not-yet-zombie child and fail to reap it due to the WNOHANG
flag.
While removing the flag fixes the problem, it is not an option due to older
releases which would still suffer from the kernel change.
Revert the fix until a solution can be worked out.
Note that while use-after-free which gets back due to the revert is a real
bug, it's side-effects are limited due to the fact that struct proc memory
is never released by UMA.