Commit Graph

462 Commits

Author SHA1 Message Date
John Baldwin
4e7d1b527c Add a proc_off_p_hash helper variable.
This is used by kernel debuggers to enumerate processes via the pid
hash table.

Reviewed by:	kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D27825
2020-12-31 16:00:33 -08:00
Mateusz Guzik
598f2b8116 dtrace: stop using eventhandlers for the part compiled into the kernel
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D27311
2020-11-23 18:27:21 +00:00
Conrad Meyer
85078b8573 Split out cwd/root/jail, cmask state from filedesc table
No functional change intended.

Tracking these structures separately for each proc enables future work to
correctly emulate clone(2) in linux(4).

__FreeBSD_version is bumped (to 1300130) for consumption by, e.g., lsof.

Reviewed by:	kib
Discussed with:	markj, mjg
Differential Revision:	https://reviews.freebsd.org/D27037
2020-11-17 21:14:13 +00:00
Mark Johnston
f52979098d Fix a pair of races in SIGIO registration
First, funsetownlst() list looks at the first element of the list to see
whether it's processing a process or a process group list.  Then it
acquires the global sigio lock and processes the list.  However, nothing
prevents the first sigio tracker from being freed by a concurrent
funsetown() before the sigio lock is acquired.

Fix this by acquiring the global sigio lock immediately after checking
whether the list is empty.  Callers of funsetownlst() ensure that new
sigio trackers cannot be added concurrently.

Second, fsetown() uses funsetown() to remove an existing sigio structure
from a file object.  However, funsetown() uses a racy check to avoid the
sigio lock, so two threads may call fsetown() on the same file object,
both observe that no sigio tracker is present, and enqueue two sigio
trackers for the same file object.  However, if the file object is
destroyed, funsetown() will only remove one sigio tracker, and
funsetownlst() may later trigger a use-after-free when it clears the
file object reference for each entry in the list.

Fix this by introducing funsetown_locked(), which avoids the racy check.

Reviewed by:	kib
Reported by:	pho
Tested by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27157
2020-11-11 13:44:27 +00:00
Mateusz Guzik
40aad3e477 thread: tidy up r367543
"locked" variable is spurious in the committed version.
2020-11-10 21:29:10 +00:00
Mateusz Guzik
f837888a3e thread: use tdfind in sysctl_kern_proc_kstack
This treads linear scans for locked lookup, but more importantly removes
the only consumer of thread_find.
2020-11-10 01:57:19 +00:00
Konstantin Belousov
dd90d96342 Put calls to check_pgrp_jobc() in fixjobc_kill() under INVARIANTS.
Reported by:	Michael Butler <imb@protected-networks.net>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-09-17 00:07:15 +00:00
Konstantin Belousov
182cfe6ff4 Add check_pgrp_jobc() calls into process exit path.
Both before and after job control adjustments.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D26416
2020-09-16 21:49:19 +00:00
Konstantin Belousov
2f5f11f533 Fix fixjobc+orhpanage.
Orphans affect job control state, we must account for them when
changing pg_jobc.

Instead of p_pptr, use proc_realparent() to get parent relevant for
job control.

Use correct calculation of the parent for exiting process.  For jobc
purposes, we must use realparent, but if it is also exiting, we should
fall to reaper, then recursively find non-exiting reaper.

Reported by:	trasz
PR:	249257
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D26416
2020-09-16 21:46:57 +00:00
Konstantin Belousov
928b85384a Assert that P_TREE_GRPEXITED is set only once.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D26416
2020-09-16 21:40:32 +00:00
Konstantin Belousov
82207cd246 Improve ddb 'show pgrpdump' command.
Use ddb pager.
Make lines more compact.
Eliminate unneeded casts.
Print more job-control related info when reporting process group.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D26416
2020-09-16 21:34:18 +00:00
Mateusz Guzik
6fed89b179 kern: clean up empty lines in .c and .h files 2020-09-01 22:12:32 +00:00
Mateusz Guzik
feabaaf995 cache: drop the always curthread argument from reverse lookup routines
Note VOP_VPTOCNP keeps getting it as temporary compatibility for zfs.

Tested by:	pho
2020-08-24 08:57:02 +00:00
Konstantin Belousov
c5bc28b273 Fix several issues with process group orphanage.
Attempt of adding assertions that pgrp->pg_jobc counters do not
underflow in r361967, reverted in r362910, points out bugs in the
handling of job control.  Peter Holm was able to narrow down the
problem to very easy reproduction with timeout(1) which uses reaping.

The following list of problems with calculation of pg_jobs which
directs SIGHUP/SIGCONT delivery for orphaned process group was
identified:
- Re-calculation of the orphaned status for children of exiting parent
  was wrong, but mostly unnoticed when all children were reparented to
  init(8).  When child can be reparented to a different process which
  could affect the child' job control state, it was not properly
  accounted for in pg_jobc.
- Lockless check for exiting process' parent process group is racy
  because nothing prevents the parent from changing its group
  membership.
- Exited process is left in the process group, until waited. This
  affects other calculations of pg_jobc.

Split handling of job control status on process changing its process
group, and process exiting.  Calculate increments and decrements for
pg_jobs by exact checking the orphanage instead of assuming process
group membership for children and parent.  Move the call to killjobc()
later under the proctree_lock.  Mark exiting process in killjobc()
with a new flag P_TREE_GRPEXITED and skip it for all pg_jobc
calculations after the flag is set.

Add checker that independently recalculates pg_jobc value and compares
it with the memoized process group state. This is enabled under INVARIANTS.

Reviewed by:	jilles
Discussed with:	kevans
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D26116
2020-08-22 21:32:11 +00:00
Mateusz Guzik
3b44443626 devfs: rework si_usecount to track opens
This removes a lot of special casing from the VFS layer.

Reviewed by:	kib (previous version)
Tested by:	pho (previous version)
Differential Revision:	https://reviews.freebsd.org/D25612
2020-08-11 14:27:57 +00:00
Mateusz Guzik
58199a7052 ifdef out pg_jobc assertions added in r361967
They trigger for some people, the bug is not obvious, there are no takers
for fixing it, the issue already had to be there for years beforehand and
is low priority.
2020-07-03 09:23:11 +00:00
Konstantin Belousov
4bc5ce2c74 Use tdfind() in pget().
Reviewed by:	jhb, hselasky
Sponsored by:	Mellanox Technologies
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25532
2020-07-02 10:40:47 +00:00
Mateusz Guzik
90a08d6cad Assert on pg_jobc state.
Stolen from NetBSD.
2020-06-09 15:17:23 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Konstantin Belousov
48fcb46311 Add sysctl kern.proc.sigfastblk for reporting sigfastblock word address.
Tested by:	pho
Disscussed with:	cem, emaste, jilles
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D12773
2020-02-09 12:29:51 +00:00
Mark Johnston
1c29da0279 Reimplement stack capture of running threads on i386 and amd64.
After r355784 the td_oncpu field is no longer synchronized by the thread
lock, so the stack capture interrupt cannot be delievered precisely.
Fix this using a loop which drops the thread lock and restarts if the
wrong thread was sampled from the stack capture interrupt handler.

Change the implementation to use a regular interrupt instead of an NMI.
Now that we drop the thread lock, there is no advantage to the latter.

Simplify the KPIs.  Remove stack_save_td_running() and add a return
value to stack_save_td().  On platforms that do not support stack
capture of running threads, stack_save_td() returns EOPNOTSUPP.  If the
target thread is running in user mode, stack_save_td() returns EBUSY.

Reviewed by:	kib
Reported by:	mjg, pho
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23355
2020-01-31 15:43:33 +00:00
Mateusz Guzik
b249ce48ea vfs: drop the mostly unused flags argument from VOP_UNLOCK
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D21427
2020-01-03 22:29:58 +00:00
Conrad Meyer
fea73412a0 sleep(9), sleepqueue(9): const'ify wchan pointers
_sleep(9), wakeup(9), sleepqueue(9), et al do not dereference or modify the
channel pointers provided in any way; they are merely used as intptrs into a
dictionary structure to match waiters with wakers.  Correctly annotate this
such that _sleep() and wakeup() may be used on const pointers without
invoking ugly patterns like __DECONST().  Plumb const through all of the
underlying sleepqueue bits.

No functional change.

Reviewed by:	rlibby
Discussed with:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D22914
2019-12-24 16:19:33 +00:00
Mark Johnston
01cef4caa7 Remove page locking from pmap_mincore().
After r352110 the page lock no longer protects a page's identity, so
there is no purpose in locking the page in pmap_mincore().  Instead,
if vm.mincore_mapped is set to the non-default value of 0, re-lookup
the page after acquiring its object lock, which holds the page's
identity stable.

The change removes the last callers of vm_page_pa_tryrelock(), so
remove it.

Reviewed by:	kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21823
2019-10-16 22:03:27 +00:00
Doug Moore
2288078c5e Define macro VM_MAP_ENTRY_FOREACH for enumerating the entries in a vm_map.
In case the implementation ever changes from using a chain of next pointers,
then changing the macro definition will be necessary, but changing all the
files that iterate over vm_map entries will not.

Drop a counter in vm_object.c that would have an effect only if the
vm_map entry count was wrong.

Discussed with: alc
Reviewed by: markj
Tested by: pho (earlier version)
Differential Revision:	https://reviews.freebsd.org/D21882
2019-10-08 07:14:21 +00:00
Mateusz Guzik
88cc62e5a5 proc: eliminate the zombproc list
It is not needed by anything in the kernel and it slightly drives up contention
on both proctree and allproc locks.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21447
2019-08-28 16:18:23 +00:00
Mateusz Guzik
2319489b6e proc: remove zpfind
It is not used by anything. If someone wants it back it should be reimplemented
to use the proc hash.

Sponsored by:	The FreeBSD Foundation
2019-08-28 01:22:21 +00:00
Conrad Meyer
e2e050c8ef Extract eventfilter declarations to sys/_eventfilter.h
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.

EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).

As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions.  The remainder of the patch addresses
adding appropriate includes to fix those files.

LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).

No functional change (intended).  Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed.  __FreeBSD_version has been bumped.
2019-05-20 00:38:23 +00:00
Rick Macklem
eeb1f3ed51 Fix the NFSv4 client to safely find processes.
r340744 broke the NFSv4 client, because it replaced pfind_locked() with a
call to pfind(), since pfind() acquires the sx lock for the pid hash and
the NFSv4 already holds a mutex when it does the call.
The patch fixes the problem by recreating a pfind_any_locked() and adding the
functions pidhash_slockall() and pidhash_sunlockall to acquire/release
all of the pid hash locks.
These functions are then used by the NFSv4 client instead of acquiring
the allproc_lock and calling pfind().

Reviewed by:	kib, mjg
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D19887
2019-04-15 01:27:15 +00:00
Mark Johnston
6a85590370 Show wiring state of map entries in procstat -v.
Note that only entries wired by userspace are shown as such.  In
particular, entries transiently wired by sysctl_wire_old_buffer() are
not flagged as wired in procstat -v output.

Reviewed by:	kib (previous version)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D19461
2019-03-05 19:45:37 +00:00
Konstantin Belousov
f02bc51c09 Do not call PHOLD() while owning the allproc_lock sx.
Otherwise the lock might recurse in faultin() if the process is
swapped out.

Reported by:	zeising
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-03 21:31:40 +00:00
Jilles Tjoelker
0abc7e41ba pfind, pfind_any: Correct zombie logic
SVN r340744 erroneously changed pfind() to return any process including
zombies and pfind_any() to return only non-zombie processes.

In particular, this caused kill() on a zombie process to fail with [ESRCH].
There is no direct test case for this but /usr/tests/bin/sh/builtins/kill1.0
occasionally triggers it (as reported by lwhsu).

Conversely, returning zombies from pfind() seems likely to violate
invariants and cause panics, but I have not looked at this.

PR:		233646
Reviewed by:	mjg, kib, ngie
Differential Revision:	https://reviews.freebsd.org/D18665
2018-12-28 13:32:14 +00:00
Mateusz Guzik
34ebdceac0 Manage process-related IDs with bitmaps
Currently unique pid allocation on fork often requires a full walk of
process, group, session lists to make sure it is not used by anything.
This has a side effect of requiring proctree to be held along with allproc,
which adds more contention in poudriere -j 128.

The patch below implements trivial bitmaps which gets rid of the problem.
Dedicated lock is introduced to manage IDs.

While here a bug was discovered: all processes would inherit reap id from
the first process spawned by init. This had a side effect of keeping the
ID used and when allocation rolls over to the beginning it keeps being
skipped.

The patch is loosely based on initial work by mjoras@.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
2018-12-07 12:22:32 +00:00
Eric van Gyzen
5e38e3f5eb Include path for tmpfs objects in vm.objects sysctl
This applies the fix in r283924 to the vm.objects sysctl
added by r283624 so the output will include the vnode
information (i.e. path) for tmpfs objects.

Reviewed by:	kib, dab
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D2724
2018-11-30 04:59:43 +00:00
Mateusz Guzik
1e9a1bf589 proc: create a dedicated lock for zombproc to ligthen the load on allproc_lock
waitpid always takes proctree to evaluate the list, but only takes allproc
if it can reap. With this patch allproc is no longer taken, which helps during
poudriere -j 128.

Discussed with: kib
Sponsored by:	The FreeBSD Foundation
2018-11-29 02:52:08 +00:00
Mateusz Guzik
53011553fa proc: convert pfind & friends to use pidhash locks and other cleanup
pfind_locked is retired as it relied on allproc which unnecessarily
restricts locking of the hash.

Sponsored by:	The FreeBSD Foundation
2018-11-21 20:15:56 +00:00
Mateusz Guzik
3d3e6793f6 proc: implement pid hash locks and an iterator
forks, exits and waits are frequently stalled during poudriere -j 128 runs
due to killpg and process list exports performed for each package.

Both uses take the allproc lock. The latter case can be modified to iterate
over the hash with finer grained locking instead.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17817
2018-11-21 18:56:15 +00:00
Mateusz Guzik
2c054ce924 proc: always store parent pid in p_oppid
Doing so removes the dependency on proctree lock from sysctl process list
export which further reduces contention during poudriere -j 128 runs.

Reviewed by:	kib (previous version)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17825
2018-11-16 17:07:54 +00:00
Mark Johnston
aeb7a84ee1 Remove mostly-useless proc provider probes.
For some reason the proc UMA zone's ctor, dtor and init functions are
instrumented, but these functions are always available through FBT.
Moreover, the probes are not part of the original Solaris proc
provider, aren't documented, have no uses (e.g., in dwatch(8)) and
have no clear use to begin with.  Therefore, remove them.

Reviewed by:	rpaulo
Differential Revision:	https://reviews.freebsd.org/D2169
2018-11-15 23:02:59 +00:00
Konstantin Belousov
d6eff0832c Add a way for the process to request cleanup of the kernel cache of
the process arguments.  New arguments length zero causes the drop of
the pargs instead of allocation of useless zero-length buffer.

Submitted by:	Thomas Munro
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D16111
2018-07-04 13:22:48 +00:00
Konstantin Belousov
6e22bbf66e fork: avoid endless wait with PTRACE_FORK and RFSTOPPED.
An RFSTOPPED thread can't clean TDB_STOPATFORK, which is done in the
fork_return() in its context, so parent is stuck forever.  Triggered
when trying to ptrace linux process.  Instead of waiting for the new
thread to clear TDB_STOPATFORK, tag it as traced and reparent to the
debugger in do_fork(), and let it only notify the debugger when run.

Submitted by:	Yanko Yankulov <yanko.yankulov@gmail.com>
Reviewed by:	jhb
MFC after:	1 week
X-MFC-Note:	keep p_dbgwait placeholder intact
Differential revision:	https://reviews.freebsd.org/D15857
2018-06-21 21:12:49 +00:00
Bjoern A. Zeeb
7938a4425a Instead of using hand-rolled loops where not needed switch them
to FOREACH_PROC_IN_SYSTEM() to have a single pattern to look for.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D15916
2018-06-20 11:42:06 +00:00
Brooks Davis
6469bdcdb6 Move most of the contents of opt_compat.h to opt_global.h.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c.  A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by:	kib, cem, jhb, jtl
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14941
2018-04-06 17:35:35 +00:00
Mateusz Guzik
9d4e369ae8 Don't generate data in sysctl_out_proc unless we intend to copy out.
The first call is used to gauge how much spaces is needed. Just computing
the size instead of generating the output allows to not take the proctree
lock.
2018-02-25 15:16:58 +00:00
John Baldwin
3160862437 Report offset relative to the backing object for kinfo_vmentry structures.
For the pathname reported in kinfo_vmentry structures (kve_path), the
sysctl handlers walk the object chain to find the bottom-most VM object.
This permits a COW mapping of a file with dirty pages to report the
pathname of the originally mapped file.  Do the same for the object
offset (kve_offset) computing a cumulative offset during the same object
walk so that the reported offset is relative to the reported pathname.

Note that ptrace(PT_VM_ENTRY) already returns a cumulative offset
rather than the raw offset of the VM map entry.

Note also that this does not affect procstat -v output (even structured
output) since that output does not include the kve_offset field.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	DARPA / AFRL
Differential Revision:	https://reviews.freebsd.org/D13767
2018-01-04 21:59:34 +00:00
Antoine Brodin
1b25176cbc sysctl_kern_proc_args: do not take the fast path if p_args is NULL
In this case it falls back to reading ps_strings
2018-01-01 21:25:01 +00:00
Konstantin Belousov
baaa79699a Make kern_proc_vmmap_resident() externally accesible, and move the
vmmap_skip_res_cnt control check inside it.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D13595
2017-12-28 13:16:32 +00:00
Pedro F. Giffuni
51369649b0 sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
Mateusz Guzik
997131646f Check for PRS_NEW without locking the proc in sysctl_kern_proc 2017-11-17 02:29:06 +00:00
Xin LI
712dda7fb0 Be more careful when doing calculation with request from userland.
MFC after:	2 weeks
2017-11-13 07:47:43 +00:00