Commit Graph

457 Commits

Author SHA1 Message Date
Konstantin Belousov
39f259b1d5 Do not call FreeBSD-ABI specific code for all ABIs
(cherry picked from commit 28a66fc3da)
2021-07-22 01:11:52 +03:00
Konstantin Belousov
cdb79e2e08 Move sv_onexit() sysentvec hook slightly later
(cherry picked from commit 55976ce11a)
2021-07-22 01:11:52 +03:00
Konstantin Belousov
2ea2987d14 Add a knob to disable dequeueing SIGCHLD on waiting for live process
(cherry picked from commit a12e901a5a)
2021-06-22 04:45:32 +03:00
Konstantin Belousov
9c74a20681 ptrace: add an option to not kill debuggees on debugger exit
(cherry picked from commit fd3ac06f45)
2021-06-01 03:38:54 +03:00
John Baldwin
47877889f2 ddb ps: Use the pidhash to enumerate processes not in allproc.
Exiting processes that have been removed from allproc but are still
executing are not yet marked PRS_ZOMBIE, so they were not listed (for
example, if a thread panics during exit1()).  To detect these
processes, clear p_list.le_prev to NULL explicitly after removing a
process from the allproc list and check for this sentinel rather than
PRS_ZOMBIE when walking the pidhash.

While here, simplify the pidhash walk to use a single outer loop.

Reviewed by:	kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D27824
2020-12-31 16:00:05 -08:00
Konstantin Belousov
87a9b18d22 Provide ABI modules hooks for process exec/exit and thread exit.
Exec and exit are same as corresponding eventhandler hooks.

Thread exit hook is called somewhat earlier, while thread is still
owned by the process and enough context is available.  Note that the
process lock is owned when the hook is called.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D27309
2020-11-23 17:29:25 +00:00
Konstantin Belousov
e68c619144 Stop using eventhandlers for itimers subsystem exec and exit hooks.
While there, do some minor cleanup for kclocks.  They are only
registered from kern_time.c, make registration function static.
Remove event hooks, they are not used by both registered kclocks.
Add some consts.

Perhaps we can stop registering kclocks at all and statically
initialize them.

Reviewed by:	mjg
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D27305
2020-11-21 21:43:36 +00:00
Conrad Meyer
85078b8573 Split out cwd/root/jail, cmask state from filedesc table
No functional change intended.

Tracking these structures separately for each proc enables future work to
correctly emulate clone(2) in linux(4).

__FreeBSD_version is bumped (to 1300130) for consumption by, e.g., lsof.

Reviewed by:	kib
Discussed with:	markj, mjg
Differential Revision:	https://reviews.freebsd.org/D27037
2020-11-17 21:14:13 +00:00
Mateusz Guzik
19d3e47dca select: call seltdfini on process and thread exit
Since thread_zone is marked NOFREE the thread_fini callback is never
executed, meaning memory allocated by seltdinit is never released.

Adding the call to thread_dtor is not sufficient as exiting processes
cache the main thread.
2020-11-16 03:12:21 +00:00
Mark Johnston
f52979098d Fix a pair of races in SIGIO registration
First, funsetownlst() list looks at the first element of the list to see
whether it's processing a process or a process group list.  Then it
acquires the global sigio lock and processes the list.  However, nothing
prevents the first sigio tracker from being freed by a concurrent
funsetown() before the sigio lock is acquired.

Fix this by acquiring the global sigio lock immediately after checking
whether the list is empty.  Callers of funsetownlst() ensure that new
sigio trackers cannot be added concurrently.

Second, fsetown() uses funsetown() to remove an existing sigio structure
from a file object.  However, funsetown() uses a racy check to avoid the
sigio lock, so two threads may call fsetown() on the same file object,
both observe that no sigio tracker is present, and enqueue two sigio
trackers for the same file object.  However, if the file object is
destroyed, funsetown() will only remove one sigio tracker, and
funsetownlst() may later trigger a use-after-free when it clears the
file object reference for each entry in the list.

Fix this by introducing funsetown_locked(), which avoids the racy check.

Reviewed by:	kib
Reported by:	pho
Tested by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27157
2020-11-11 13:44:27 +00:00
Konstantin Belousov
844219f471 proc_realparent: if p_oppid does not match pid of the current parent
for non-orphaned process, return reaper instead of init.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D26416
2020-09-16 21:38:24 +00:00
Konstantin Belousov
c5bc28b273 Fix several issues with process group orphanage.
Attempt of adding assertions that pgrp->pg_jobc counters do not
underflow in r361967, reverted in r362910, points out bugs in the
handling of job control.  Peter Holm was able to narrow down the
problem to very easy reproduction with timeout(1) which uses reaping.

The following list of problems with calculation of pg_jobs which
directs SIGHUP/SIGCONT delivery for orphaned process group was
identified:
- Re-calculation of the orphaned status for children of exiting parent
  was wrong, but mostly unnoticed when all children were reparented to
  init(8).  When child can be reparented to a different process which
  could affect the child' job control state, it was not properly
  accounted for in pg_jobc.
- Lockless check for exiting process' parent process group is racy
  because nothing prevents the parent from changing its group
  membership.
- Exited process is left in the process group, until waited. This
  affects other calculations of pg_jobc.

Split handling of job control status on process changing its process
group, and process exiting.  Calculate increments and decrements for
pg_jobs by exact checking the orphanage instead of assuming process
group membership for children and parent.  Move the call to killjobc()
later under the proctree_lock.  Mark exiting process in killjobc()
with a new flag P_TREE_GRPEXITED and skip it for all pg_jobc
calculations after the flag is set.

Add checker that independently recalculates pg_jobc value and compares
it with the memoized process group state. This is enabled under INVARIANTS.

Reviewed by:	jilles
Discussed with:	kevans
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D26116
2020-08-22 21:32:11 +00:00
Mateusz Guzik
5a90435ccf proc: refactor clearing credentials into proc_unset_cred 2020-05-25 12:41:44 +00:00
Rick Macklem
8de97f394e Remove the old NFS lock device driver that uses Giant.
This NFS lock device driver was replaced by the kernel NLM around FreeBSD7 and
has not normally been used since then.
To use it, the kernel had to be built without "options NFSLOCKD" and
the nfslockd.ko had to be deleted as well.
Since it uses Giant and is no longer used, this patch removes it.

With this device driver removed, there is now a lot of unused code
in the userland rpc.lockd. That will be removed on a future commit.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D22933
2020-04-09 14:44:46 +00:00
John Baldwin
59838c1a19 Retire procfs-based process debugging.
Modern debuggers and process tracers use ptrace() rather than procfs
for debugging.  ptrace() has a supserset of functionality available
via procfs and new debugging features are only added to ptrace().
While the two debugging services share some fields in struct proc,
they each use dedicated fields and separate code.  This results in
extra complexity to support a feature that hasn't been enabled in the
default install for several years.

PR:		244939 (exp-run)
Reviewed by:	kib, mjg (earlier version)
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D23837
2020-04-01 19:22:09 +00:00
Warner Losh
58aa35d429 Remove sparc64 kernel support
Remove all sparc64 specific files
Remove all sparc64 ifdefs
Removee indireeect sparc64 ifdefs
2020-02-03 17:35:11 +00:00
Mateusz Guzik
3ff65f71cb Remove duplicated empty lines from kern/*.c
No functional changes.
2020-01-30 20:05:05 +00:00
Mariusz Zaborski
8e49361164 procdesc: allow to collect status through wait(1) if process is traced
The debugger like truss(1) depends on the wait(2) syscall. This syscall
waits for ALL children. When it is waiting for ALL child's the children
created by process descriptors are not returned. This behavior was
introduced because we want to implement libraries which may pdfork(1).

The behavior of process descriptor brakes truss(1) because it will
not be able to collect the status of processes with process descriptors.

To address this problem the status is returned to parent when the
child is traced. While the process is traced the debugger is the new parent.
In case the original parent and debugger are the same process it means the
debugger explicitly used pdfork() to create the child. In that case the debugger
should be using kqueue()/pdwait() instead of wait().

Add test case to verify that. The test case was implemented by markj@.

Reviewed by:	kib, markj
Discussed with:	jhb
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D20362
2019-11-25 18:33:21 +00:00
Mateusz Guzik
9576ff5803 proc: clear pid bitmap entry after dropping proctree lock
There is no correctness change here, but the procid lock is contended in
the fork path and taking it while holding proctree avoidably extends its
hold time.

Note that there are other ids which can end up getting cleared with the
lock.

Sponsored by:	The FreeBSD Foundation
2019-09-02 12:46:43 +00:00
Mateusz Guzik
88cc62e5a5 proc: eliminate the zombproc list
It is not needed by anything in the kernel and it slightly drives up contention
on both proctree and allproc locks.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21447
2019-08-28 16:18:23 +00:00
Mariusz Zaborski
a05cfdf479 exit1: fix style nits
MFC after:	1 month
2019-08-05 20:20:14 +00:00
Mariusz Zaborski
799d92ab78 proc: introduce the proc_add_orphan function
This API allows adding the process to its parent orphan list.

Reviewed by:	kib, markj
MFC after:	1 month
2019-08-05 20:11:57 +00:00
Mariusz Zaborski
41fadb3fca exit1: postpone clearing P_TRACED flag until the proctree lock is acquired
In case of the process being debugged. The P_TRACED is cleared very early,
which would make procdesc_close() not calling proc_clear_orphan().
That would result in the debugged process can not be able to collect
status of the process with process descriptor.

Reviewed by:	markj, kib
Tested by:	pho
MFC after:	1 month
2019-08-05 19:59:23 +00:00
Mariusz Zaborski
9db97ca0bd proc: make clear_orphan an public API
This will be useful for other patches with process descriptors.
Change its name as well.

Reviewed by:	markj, kib
2019-07-29 21:42:57 +00:00
Conrad Meyer
daec92844e Include ktr.h in more compilation units
Similar to r348026, exhaustive search for uses of CTRn() and cross reference
ktr.h includes.  Where it was obvious that an OS compat header of some kind
included ktr.h indirectly, .c files were left alone.  Some of these files
clearly got ktr.h via header pollution in some scenarios, or tinderbox would
not be passing prior to this revision, but go ahead and explicitly include it
in files using it anyway.

Like r348026, these CUs did not show up in tinderbox as missing the include.

Reported by:	peterj (arm64/mp_machdep.c)
X-MFC-With:	r347984
Sponsored by:	Dell EMC Isilon
2019-05-21 20:38:48 +00:00
Conrad Meyer
e2e050c8ef Extract eventfilter declarations to sys/_eventfilter.h
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.

EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).

As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions.  The remainder of the patch addresses
adding appropriate includes to fix those files.

LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).

No functional change (intended).  Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed.  __FreeBSD_version has been bumped.
2019-05-20 00:38:23 +00:00
Mark Johnston
128c9bc05b Set the p_oppid field of orphans when exiting.
Such processes will be reparented to the reaper when the current
parent is done with them (i.e., ptrace detached), so p_oppid must be
updated accordingly.

Add a regression test to exercise this code path.  Previously it
would not be possible to reap an orphan with a stale oppid.

Reviewed by:	kib, mjg
Tested by:	pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19825
2019-04-07 14:26:14 +00:00
Mateusz Guzik
eadb1dcb71 proc: handle sdt exit probe before taking the proc lock
Sponsored by:	The FreeBSD Foundation
2018-12-08 06:31:43 +00:00
Mateusz Guzik
b1fbffe73c proc: when exiting move to zombproc before taking proctree
The kernel was already doing this prior to r329615. It was changed
to reduce contention on allproc. However, introduction of pidhash
locks and removal of proctree -> allproc ordering from fork thanks
to bitmaps fixed things enough to make this change pessimal.

waitpid takes proctree on each call and this change (now) causes
avoidable stalls if allproc is held.

Sponsored by:	The FreeBSD Foundation
2018-12-07 12:32:25 +00:00
Mateusz Guzik
34ebdceac0 Manage process-related IDs with bitmaps
Currently unique pid allocation on fork often requires a full walk of
process, group, session lists to make sure it is not used by anything.
This has a side effect of requiring proctree to be held along with allproc,
which adds more contention in poudriere -j 128.

The patch below implements trivial bitmaps which gets rid of the problem.
Dedicated lock is introduced to manage IDs.

While here a bug was discovered: all processes would inherit reap id from
the first process spawned by init. This had a side effect of keeping the
ID used and when allocation rolls over to the beginning it keeps being
skipped.

The patch is loosely based on initial work by mjoras@.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
2018-12-07 12:22:32 +00:00
Mateusz Guzik
1e9a1bf589 proc: create a dedicated lock for zombproc to ligthen the load on allproc_lock
waitpid always takes proctree to evaluate the list, but only takes allproc
if it can reap. With this patch allproc is no longer taken, which helps during
poudriere -j 128.

Discussed with: kib
Sponsored by:	The FreeBSD Foundation
2018-11-29 02:52:08 +00:00
Mateusz Guzik
e3d3e8289b Revert "fork: fix use-after-free with vfork"
This unreliably breaks libc handling of vfork where forking succeded,
but execve did not.

vfork code in libc performs waitpid with WNOHANG in case of failed exec.
With the fix exit codepath was waking up the parent before the child
fully transitioned to a zombie. Woken up parent would waitpid, which
could find a not-yet-zombie child and fail to reap it due to the WNOHANG
flag.

While removing the flag fixes the problem, it is not an option due to older
releases which would still suffer from the kernel change.

Revert the fix until a solution can be worked out.

Note that while use-after-free which gets back due to the revert is a real
bug, it's side-effects are limited due to the fact that struct proc memory
is never released by UMA.
2018-11-23 04:38:50 +00:00
Mateusz Guzik
b00b27e925 fork: fix use-after-free with vfork
The pointer to the child is stored without any reference held. Then it is
blindly used to wait until P_PPWAIT is cleared. However, if the child is
autoreaped it could have exited and get freed before the parent started
waiting.

Use the existing hold mechanism to mitigate the problem. Most common case
of doing exec remains unchanged. The corner case of doing exit performs
wake up before waiting for holds to clear.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18295
2018-11-22 21:08:37 +00:00
Mateusz Guzik
a627b4629d proc: update list manipulation comment on process exit
Processes stay in the hash until they get reaped.

This code does not unlink the child from the parent, so remove
the claim that it does.

Sponsored by:	The FreeBSD Foundation
2018-11-21 22:16:10 +00:00
Mateusz Guzik
3d3e6793f6 proc: implement pid hash locks and an iterator
forks, exits and waits are frequently stalled during poudriere -j 128 runs
due to killpg and process list exports performed for each package.

Both uses take the allproc lock. The latter case can be modified to iterate
over the hash with finer grained locking instead.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17817
2018-11-21 18:56:15 +00:00
Mateusz Guzik
2c054ce924 proc: always store parent pid in p_oppid
Doing so removes the dependency on proctree lock from sysctl process list
export which further reduces contention during poudriere -j 128 runs.

Reviewed by:	kib (previous version)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17825
2018-11-16 17:07:54 +00:00
Konstantin Belousov
b940886338 Add PROC_PDEATHSIG_SET to procctl interface.
Allow processes to request the delivery of a signal upon death of
their parent process.  Supposed consumer of the feature is PostgreSQL.

Submitted by:	Thomas Munro
Reviewed by:	jilles, mjg
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D15106
2018-04-18 21:31:13 +00:00
Brooks Davis
6469bdcdb6 Move most of the contents of opt_compat.h to opt_global.h.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c.  A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by:	kib, cem, jhb, jtl
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14941
2018-04-06 17:35:35 +00:00
Mateusz Guzik
81d68271d7 Reduce contention on the proctree lock during heavy package build.
There is a proctree -> allproc ordering established.

Most of the time it is either xlock -> xlock or slock -> slock.

On fork however there is a slock -> xlock pair which results in
pathological wait times due to threads keeping proctree held for
reading and all waiting on allproc. Switch this to xlock -> xlock.
Longer term fix would get rid of proctree in this place to begin with.
Right now it is necessary to walk the session/process group lists to
determine which id is free. The walk can be avoided e.g. with bitmaps.

The exit path used to have one place which dealt with allproc and
then with proctree. Move the allproc acquire into the section protected
by proctree. This reduces contention against threads waiting on proctree
in the fork codepath - the fork proctree holder does not have to wait
for allproc as often.

Finally, move tidhash manipulation outside of the area protected by
either of these locks. The removal from the hash was already unprotected.
There is no legitimate reason to look up thread ids for a process still
under construction.

This results in about 50% wait time reduction during -j 128 package build.
2018-02-20 02:18:30 +00:00
Mateusz Guzik
2ca66c1ef5 Fix process exit vs reap race introduced in r329449
The race manifested itself mostly in terms of crashes with "spin lock
held too long".

Relevant parts of respective code paths:

exit:				reap:
PROC_LOCK(p);
PROC_SLOCK(p);
p->p_state == PRS_ZOMBIE
PROC_UNLOCK(p);
				PROC_LOCK(p);
/* exit work */
				if (p->p_state == PRS_ZOMBIE) /* true */
					proc_reap()
					free proc
/* more exit work */
PROC_SUNLOCK(p);

Thus a still exiting process is reaped.

Prior to the change the zombie check was followed by slock/sunlock trip
which prevented the problem.

Even code prior to this commit has a bug: the proc is still accessed for
statistic collection purposes. However, the severity is rather small and
the bug may be fixed in a future commit.

Reported by:	many
Tested by:	allanjude
2018-02-19 00:54:08 +00:00
Mateusz Guzik
7beb60820f exit: get rid of PROC_SLOCK when checking a process to report, take #2
The suspension counter needs synchronisation through slock, but we don't
need it to check if inspecting the counter is necessary to begin with.
In the common case it is not, thus avoid the lock if possible.

Reviewed by:	kib
Tested by:	pho
2018-02-18 21:07:15 +00:00
Mateusz Guzik
8bf6ff2226 Revert r329448.
Turns out is is actually racy, reproducible with stress2/misc/truss.sh

Requested by:	kib
2018-02-17 17:23:43 +00:00
Mateusz Guzik
ad58e5e86c exit: stop doing PROC_SLOCK just to call proc_reap
It immediately does PROC_SUNLOCK anyway and the lock plays no role.
2018-02-17 09:03:11 +00:00
Mateusz Guzik
9c0e785c58 exit: get rid of PROC_SLOCK when checking a process to report
All accessed fields are protected with already held process lock.
2018-02-17 08:48:45 +00:00
Mateusz Guzik
015cd8dc93 On process exit signal the parent after dropping the proctree lock. 2018-02-17 00:24:50 +00:00
Mateusz Guzik
7e588b9219 Unref the prison after proctree is dropped. 2018-02-17 00:23:56 +00:00
Mateusz Guzik
6776bfeb8f Tidy up kern_wait6
- don't relock curproc in msleep
- don't relock proctree if P_STATCHILD is spotted
- reformat the proc_to_reap call in the main loop
2018-02-17 00:21:50 +00:00
Jeff Roberson
3f289c3fcf Implement 'domainset', a cpuset based NUMA policy mechanism. This allows
userspace to control NUMA policy administratively and programmatically.

Implement domainset based iterators in the page layer.

Remove the now legacy numa_* syscalls.

Cleanup some header polution created by having seq.h in proc.h.

Reviewed by:	markj, kib
Discussed with:	alc
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D13403
2018-01-12 22:48:23 +00:00
Pedro F. Giffuni
51369649b0 sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
Matt Joras
2ca45184dc Introduce EVENTHANDLER_LIST and some users.
This introduces a facility to EVENTHANDLER(9) for explicitly defining a
reference to an event handler list. This is useful since previously all
invokers of events had to do a locked traversal of the global list of
event handler lists in order to find the appropriate event handler list.
By keeping a pointer to the appropriate list an invoker can avoid this
traversal completely. The pointer is initialized with SYSINIT(9) during
the eventhandler stage. Users registering interest in events do not need
to know if the event is backed by such a list, since the list is added
to the global list of lists. As with lists that are not pre-defined it
is safe to register for the events before the list has been created.

This converts the process_* and thread_* events to using the new
facility, as these are events whose locked traversals end up showing up
significantly in ports build workflows (and presumably other workflows
with many short lived threads/procs). It may be advantageous to convert
other events to using the new facility.

The el_flags field is now unused, but leave it be so that this revision
can be MFC'd.

Reviewed by:	bdrewery, markj, mjg
Approved by:	rstone (mentor)
In collaboration with:  ian
MFC after:      4 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12814
2017-11-09 22:51:48 +00:00