Commit Graph

402 Commits

Author SHA1 Message Date
Attilio Rao
54366c0bd7 - For kernel compiled only with KDTRACE_HOOKS and not any lock debugging
option, unbreak the lock tracing release semantic by embedding
  calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined
  version of the releasing functions for mutex, rwlock and sxlock.
  Failing to do so skips the lockstat_probe_func invokation for
  unlocking.
- As part of the LOCKSTAT support is inlined in mutex operation, for
  kernel compiled without lock debugging options, potentially every
  consumer must be compiled including opt_kdtrace.h.
  Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the
  dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES
  is linked there and it is only used as a compile-time stub [0].

[0] immediately shows some new bug as DTRACE-derived support for debug
in sfxge is broken and it was never really tested.  As it was not
including correctly opt_kdtrace.h before it was never enabled so it
was kept broken for a while.  Fix this by using a protection stub,
leaving sfxge driver authors the responsibility for fixing it
appropriately [1].

Sponsored by:	EMC / Isilon storage division
Discussed with:	rstone
[0] Reported by:	rstone
[1] Discussed with:	philip
2013-11-25 07:38:45 +00:00
John Baldwin
55648840de Extend the support for exempting processes from being killed when swap is
exhausted.
- Add a new protect(1) command that can be used to set or revoke protection
  from arbitrary processes.  Similar to ktrace it can apply a change to all
  existing descendants of a process as well as future descendants.
- Add a new procctl(2) system call that provides a generic interface for
  control operations on processes (as opposed to the debugger-specific
  operations provided by ptrace(2)).  procctl(2) uses a combination of
  idtype_t and an id to identify the set of processes on which to operate
  similar to wait6().
- Add a PROC_SPROTECT control operation to manage the protection status
  of a set of processes.  MADV_PROTECT still works for backwards
  compatability.
- Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc)
  the first bit of which is used to track if P_PROTECT should be inherited
  by new child processes.

Reviewed by:	kib, jilles (earlier version)
Approved by:	re (delphij)
MFC after:	1 month
2013-09-19 18:53:42 +00:00
Will Andrews
5e9ccc8797 Add the ability to display the default FIB number for a process to the
ps(1) utility, e.g. "ps -O fib".

bin/ps/keyword.c:
	Add the "fib" keyword and default its column name to "FIB".

bin/ps/ps.1:
	Add "fib" as a supported keyword.

sys/compat/freebsd32/freebsd32.h:
sys/kern/kern_proc.c:
sys/sys/user.h:
	Add the default fib number for a process (p->p_fibnum)
	to the user land accessible process data of struct kinfo_proc.

Submitted by:	Oliver Fromme <olli@fromme.com>, gibbs
2013-08-26 23:48:21 +00:00
Mark Johnston
7b77e1fe0f Specify SDT probe argument types in the probe definition itself rather than
using SDT_PROBE_ARGTYPE(). This will make it easy to extend the SDT(9) API
to allow probes with dynamically-translated types.

There is no functional change.

MFC after:	2 weeks
2013-08-15 04:08:55 +00:00
Mikolaj Golub
5ea21e6904 Similarly to proc_getargv() and proc_getenvv(), export proc_getauxv()
to be able to reuse the code.

MFC after:	3 weeks
2013-04-14 20:03:48 +00:00
Mikolaj Golub
fe52cf5475 Re-factor the code to provide kern_proc_filedesc_out(), kern_proc_out(),
and kern_proc_vmmap_out() functions to output process kinfo structures
to sbuf, to make the code reusable.

The functions are going to be used in the coredump routine to store
procstat info in the core program header notes.

Reviewed by:	kib
MFC after:	3 weeks
2013-04-14 20:01:36 +00:00
Attilio Rao
bc403f030d Switch some "low-hanging fruit" to acquire read lock on vmobjects
rather than write locks.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc
Tested by:	pho
2013-04-08 19:58:32 +00:00
Attilio Rao
89f6b8632c Switch the vm_object mutex to be a rwlock. This will enable in the
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.

The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
  - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
  - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
  - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
  - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
    (in order to avoid visibility of implementation details)
  - The read-mode operations are added:
    VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
    VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
  sys/mutex.h in consumers directly to cater its inlining functions
  using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
  consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
  the compat layer because the name clash between FreeBSD and solaris
  versions must be avoided.
  At this purpose zfs redefines the vm_object locking functions
  directly, isolating the FreeBSD components in specific compat stubs.

The KPI results heavilly broken by this commit.  Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).

Sponsored by:	EMC / Isilon storage division
Reviewed by:	jeff
Reviewed by:	pjd (ZFS specific review)
Discussed with:	alc
Tested by:	pho
2013-03-09 02:32:23 +00:00
Pawel Jakub Dawidek
4f66641749 Look for zombie process only if we were given process id.
Reviewed by:	kib
MFC after:	2 weeks
X-MFC-after-or-with:	243142
2012-11-25 19:31:42 +00:00
Konstantin Belousov
134eb42e24 In pget(9), if PGET_NOTWEXIT flag is not specified, also search the
zombie list for the pid. This allows several kern.proc sysctls to
report useful information for zombies.

Hold the allproc_lock around all searches instead of relocking it.
Remove private pfind_locked() from the new nfs client code.

Requested and reviewed by:	pjd
Tested by:	pho
MFC after:	3 weeks
2012-11-16 08:25:06 +00:00
Mateusz Guzik
4419a8a88c enterpgrp: get rid of pgrp2 variable and use KASSERT directly on pgfind result.
pgrp2 was used only for debugging, but pgrp2 = pgfind(..) was present in compiled code even for kernels without INVARIANTS

Approved by:	trasz (mentor)
MFC after:	1 week
2012-11-13 22:01:25 +00:00
Konstantin Belousov
5050aa86cf Remove the support for using non-mpsafe filesystem modules.
In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.

The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.

Conducted and reviewed by:	attilio
Tested by:	pho
2012-10-22 17:50:54 +00:00
David E. O'Brien
60ee433881 Don't include opt_ddb.h & <ddb/ddb.h> twice. 2012-08-15 14:18:54 +00:00
Konstantin Belousov
1c771f9222 After the PHYS_TO_VM_PAGE() function was de-inlined, the main reason
to pull vm_param.h was removed.  Other big dependency of vm_page.h on
vm_param.h are PA_LOCK* definitions, which are only needed for
in-kernel code, because modules use KBI-safe functions to lock the
pages.

Stop including vm_param.h into vm_page.h. Include vm_param.h
explicitely for the kernel code which needs it.

Suggested and reviewed by:	alc
MFC after:    2 weeks
2012-08-05 14:11:42 +00:00
Gabor Pali
599fc82b06 - Add support for displaying process stack memory regions.
Approved by:	rwatson
MFC after:	3 days
2012-07-16 09:38:19 +00:00
Konstantin Belousov
371778a333 Fix ki_cow for compat32 binaries.
MFC after:	3 days
2012-05-27 05:24:53 +00:00
Konstantin Belousov
4d34e019c4 Calculate the count of per-process cow faults. Export the count to
userspace using the obscure spare int field in struct kinfo_proc.

Submitted by:	Andrey Zonov <andrey zonov org>
MFC after:	1 week
2012-05-23 18:10:54 +00:00
Konstantin Belousov
b3bfb267cb Allow for the process information sysctls to accept a thread id in addition
to the process id.  It follows the ptrace(2) interface and allows debugging
libraries to use thread ids directly, without slow and verbose conversion
of thread id into pid.

The PGET_NOTID flag is provided to allow a specific sysctl to disallow
this behaviour.  All current callers of pget(9) have useful semantic to
operate on tid and do not need this flag.

Reviewed by:	jhb, trocini
MFC after:	1 week
2012-04-23 20:56:05 +00:00
Mikolaj Golub
903712c99c Add a sysctl to set and retrieve binary osreldate of another process.
Suggested by:	kib
Reviewed by:	kib
MFC after:	2 weeks
2012-03-23 20:05:41 +00:00
Mikolaj Golub
e0fcf639d2 Make kern.proc.umask sysctl readonly.
Requested by:	src
MFC after:	1 week
2012-03-03 11:53:35 +00:00
Mikolaj Golub
6ce13747dc Add sysctl to retrieve or set umask of another process.
Submitted by:	Dmitry Banschikov <me ubique spb ru>
Discussed with:	kib, rwatson
Reviewed by:	kib
MFC after:	2 weeks
2012-02-26 14:25:48 +00:00
Mikolaj Golub
45efc9b4aa Fix CTL flags in the declarations of KERN_PROC_ENV, AUXV and
PS_STRINGS sysctls: they are read only.

MFC after:	1 week
2012-01-25 20:15:58 +00:00
Mikolaj Golub
8854fe3915 Change kern.proc.rlimit sysctl to:
- retrive only one, specified limit for a process, not the whole
  array, as it was previously (the sysctl has been added recently and
  has not been backported to stable yet, so this change is ok);

- allow to set a resource limit for another process.

Submitted by:	Andrey Zonov <andrey at zonov.org>
Discussed with:	kib
Reviewed by:	kib
MFC after:	2 weeks
2012-01-22 20:25:00 +00:00
Mikolaj Golub
fe7f89b71a Abrogate nchr argument in proc_getargv() and proc_getenvv(): we always want
to read strings completely to know the actual size.

As a side effect it fixes the issue with kern.proc.args and kern.proc.env
sysctls, which didn't return the size of available data when calling
sysctl(3) with the NULL argument for oldp.

Note, in get_ps_strings(), which does actual work for proc_getargv() and
proc_getenvv(), we still have a safety limit on the size of data read in
case of a corrupted procces stack.

Suggested by:	kib
MFC after:	3 days
2012-01-15 18:47:24 +00:00
Mikolaj Golub
547b155eb1 Fix style and white spaces.
MFC after:	1 week
2011-12-17 22:18:26 +00:00
Mikolaj Golub
fa3935bcea On start most of sysctl_kern_proc functions use the same pattern:
locate a process calling pfind() and do some additional checks like
p_candebug(). To reduce this code duplication a new function pget() is
introduced and used.

As the function may be useful not only in kern_proc.c it is in the
kernel name space.

Suggested by:	kib
Reviewed by:	kib
MFC after:	2 weeks
2011-12-17 16:59:22 +00:00
Mikolaj Golub
9e94d5b83f Really protect kern.proc.ps_strings sysctls with p_candebug(). This
was intended to be in r228288.

Spotted by:	many
MFC after:	1 week
2011-12-06 06:40:14 +00:00
Mikolaj Golub
c65932be9d Protect kern.proc.auxv and kern.proc.ps_strings sysctls with p_candebug().
Citing jilles:

If we are ever going to do ASLR, the AUXV information tells an attacker
where the stack, executable and RTLD are located, which defeats much of
the point of randomizing the addresses in the first place.

Given that the AUXV information seems to be used by debuggers only anyway,
I think it would be good to move it to p_candebug() now.

The full virtual memory maps (KERN_PROC_VMMAP, procstat -v) are already
under p_candebug().

Suggested by:	jilles
Discussed with:	rwatson
MFC after:	1 week
2011-12-05 19:34:02 +00:00
Mikolaj Golub
0f60ecdaa4 In sysctl_kern_proc_ps_strings() there is no much sense in checking
for P_WEXIT and P_SYSTEM flags.

Reviewed by:	kib
2011-12-04 21:24:01 +00:00
Mikolaj Golub
9732458f35 Add sysctl to retrieve ps_strings structure location of another process.
Suggested by:	kib
Reviewed by:	kib
2011-11-27 17:05:26 +00:00
Mikolaj Golub
4fd6053b43 In sysctl_kern_proc_auxv the process was released too early: we still
need to hold it when checking process sv_flags.

MFC after:	2 weeks
2011-11-27 16:56:01 +00:00
Mikolaj Golub
9e7d058351 Add sysctl to get process resource limits.
Reviewed by:	kib
MFC after:	2 weeks
2011-11-24 20:43:37 +00:00
Mikolaj Golub
7ad9baae41 Fix build without INVARIANTS.
Discussed with:	kib
2011-11-23 08:11:04 +00:00
Mikolaj Golub
c5cfcb1c19 Add new sysctls, KERN_PROC_ENV and KERN_PROC_AUXV, to return
environment strings and ELF auxiliary vectors from a process stack.

Make sysctl_kern_proc_args to read not cached arguments from the
process stack.

Export proc_getargv() and proc_getenvv() so they can be reused by
procfs and linprocfs.

Suggested by:	kib
Reviewed by:	kib
Discussed with:	kib, rwatson, jilles
Tested by:	pho
MFC after:	2 weeks
2011-11-22 20:40:18 +00:00
Sergey Kandaurov
ca4aa8c363 Remove no more relevant XXXRW comments since accessing the vmspace is now
properly done with the acquired vmspace reference.

Pointed out by:		kib
2011-11-21 12:21:00 +00:00
Sergey Kandaurov
18be8527e9 Use the acquired reference to the vmspace instead of direct dereferencing
of p->p_vmspace like it is done in sysctl_kern_proc_vmmap().
2011-11-21 10:36:57 +00:00
Mikolaj Golub
5384d08913 Add KVME_FLAG_SUPER and use it in sysctl_kern_proc_vmmap for marking
entries with superpages.

Submitted by:	Mel Flynn <mel.flynn+fbsd.hackers@mailing.thruhere.net>
Reviewed by:	alc, rwatson
2011-11-07 21:13:19 +00:00
Kip Macy
8451d0dd78 In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by:	rwatson
Approved by:	re (bz)
2011-09-16 13:58:51 +00:00
John Baldwin
f55d3fbe84 One of the general principles of the sysctl(3) API is that a user can
query the needed size for a sysctl result by passing in a NULL old
pointer and a valid oldsize.  The kern.proc.args sysctl handler broke
this assumption by not calling SYSCTL_OUT() if the old pointer was
NULL.

Approved by:	re (kib)
MFC after:	3 days
2011-08-18 22:20:45 +00:00
Bjoern A. Zeeb
925af54487 Rename ki_ocomm to ki_tdname and OCOMMLEN to TDNAMLEN.
Provide backward compatibility defines under BURN_BRIDGES.

Suggested by:	jhb
Reviewed by:	emaste
Sponsored by:	Sandvine Incorporated
Approved by:	re (kib)
2011-07-18 20:06:15 +00:00
John Baldwin
2417d97ebb - Export each thread's individual resource usage in in struct kinfo_proc's
ki_rusage member when KERN_PROC_INC_THREAD is passed to one of the
  process sysctls.
- Correctly account for the current thread's cputime in the thread when
  doing the runtime fixup in calcru().
- Use TIDs as the key to lookup the previous thread to compute IO stat
  deltas in IO mode in top when thread display is enabled.

Reviewed by:	kib
Approved by:	re (kib)
2011-07-18 17:33:08 +00:00
Stanislav Sedov
0daf62d9f5 - Commit work from libprocstat project. These patches add support for runtime
file and processes information retrieval from the running kernel via sysctl
  in the form of new library, libprocstat.  The library also supports KVM backend
  for analyzing memory crash dumps.  Both procstat(1) and fstat(1) utilities have
  been modified to take advantage of the library (as the bonus point the fstat(1)
  utility no longer need superuser privileges to operate), and the procstat(1)
  utility is now able to display information from memory dumps as well.

  The newly introduced fuser(1) utility also uses this library and able to operate
  via sysctl and kvm backends.

  The library is by no means complete (e.g. KVM backend is missing vnode name
  resolution routines, and there're no manpages for the library itself) so I
  plan to improve it further.  I'm commiting it so it will get wider exposure
  and review.

  We won't be able to MFC this work as it relies on changes in HEAD, which
  was introduced some time ago, that break kernel ABI.  OTOH we may be able
  to merge the library with KVM backend if we really need it there.

Discussed with:	rwatson
2011-05-12 10:11:39 +00:00
John Baldwin
8e6fa660f2 Fix some locking nits with the p_state field of struct proc:
- Hold the proc lock while changing the state from PRS_NEW to PRS_NORMAL
  in fork to honor the locking requirements.  While here, expand the scope
  of the PROC_LOCK() on the new process (p2) to avoid some LORs.  Previously
  the code was locking the new child process (p2) after it had locked the
  parent process (p1).  However, when locking two processes, the safe order
  is to lock the child first, then the parent.
- Fix various places that were checking p_state against PRS_NEW without
  having the process locked to use PROC_LOCK().  Every place was already
  locking the process, just after the PRS_NEW check.
- Remove or reduce the use of PROC_SLOCK() for places that were checking
  p_state against PRS_NEW.  The PROC_LOCK() alone is sufficient for reading
  the current state.
- Reorder fill_kinfo_proc() slightly so it only acquires PROC_SLOCK() once.

MFC after:	1 week
2011-03-24 18:40:11 +00:00
Edward Tomasz Napierala
7123f4cd6f Export login class information via kinfo and make it possible to view
it using "ps -o class".
2011-03-05 14:41:49 +00:00
Robert Watson
96fcc75fdf Add initial support for Capsicum's Capability Mode to the FreeBSD kernel,
compiled conditionally on options CAPABILITIES:

Add a new credential flag, CRED_FLAG_CAPMODE, which indicates that a
subject (typically a process) is in capability mode.

Add two new system calls, cap_enter(2) and cap_getmode(2), which allow
setting and querying (but never clearing) the flag.

Export the capability mode flag via process information sysctls.

Sponsored by:	Google, Inc.
Reviewed by:	anderson
Discussed with:	benl, kris, pjd
Obtained from:	Capsicum Project
MFC after:	3 months
2011-03-01 13:23:37 +00:00
Konstantin Belousov
6fa39a7327 Allow debugger to specify that children of the traced process should be
automatically traced. Extend the ptrace(PL_LWPINFO) to report that child
just forked.

Reviewed by:	davidxu, jhb
MFC after:	2 weeks
2011-01-25 10:59:21 +00:00
Rebecca Cran
8d065a3914 Fix some more style(9) issues. 2010-11-14 16:10:15 +00:00
Rebecca Cran
b389be97db Fix style(9) issues from r215281 and r215282.
MFC after:	1 week
2010-11-14 08:06:29 +00:00
Rebecca Cran
2baa5cddb6 Add some descriptions to sys/kern sysctls.
PR:	kern/148710
Tested by:	Chip Camden <sterling at camdensoftware.com>
MFC after:	1 week
2010-11-14 06:09:50 +00:00
Edward Tomasz Napierala
a37e14e1d8 Fix style.
Submitted by:	bde
2010-11-11 21:53:46 +00:00
Edward Tomasz Napierala
98e0196aef Remove unneeded conditional.
Discussed with:	kib
2010-11-11 08:15:12 +00:00
Ed Maste
6239ef1d29 Make a thread's address available via the kern proc sysctl, just like the
process address.

Add "tdaddr" keyword to ps(1) to display this thread address.

Distilled from Sandvine's patch set by Mark Johnston.
2010-10-08 00:44:53 +00:00
Rui Paulo
79856499bd Add an extra comment to the SDT probes definition. This allows us to get
use '-' in probe names, matching the probe names in Solaris.[1]

Add userland SDT probes definitions to sys/sdt.h.

Sponsored by:	The FreeBSD Foundation
Discussed with:	rwaston [1]
2010-08-22 11:18:57 +00:00
John Baldwin
ba50d5975d There isn't really a need to hold the ktrace mutex just to read the value
of p_traceflag that is stored in the kinfo_proc structure.  It is still
racey even with the lock and the code will read a consistent snapshot of
the flag without the lock.
2010-08-19 16:40:30 +00:00
Attilio Rao
937912ea04 Add the support for reporting the NOCOREDUMP flag from
sysctl_kern_proc_vmmap().

Sponsored by:	Sandvine Incorporated
Reviewed by:	kib, emaste
MFC after:	1 week
2010-05-27 08:10:12 +00:00
Konstantin Belousov
213c077f62 Fix a mistake in r207603. td_rux.rux_runtime still needs conversion.
Reported and tested by:	nwhitehorn
Pointy hat to:	kib
MFC after:	6 days
2010-05-05 16:05:51 +00:00
Konstantin Belousov
03d13670c2 Use td_rux.rux_runtime for ki_runtime instead of redoing calculation.
Submitted by:	bde
MFC after:	1 week
2010-05-04 06:00:39 +00:00
Konstantin Belousov
e68d26fd5d Remove caddr_t casts.
Requested by:	bde
MFC after:	10 days
2010-04-29 09:55:51 +00:00
Konstantin Belousov
ed7806879b Move the constants specifying the size of struct kinfo_proc into
machine-specific header files. Add KINFO_PROC32_SIZE for struct
kinfo_proc32 for architectures providing COMPAT_FREEBSD32. Add
CTASSERT for the size of struct kinfo_proc32.

Submitted by:	pluknet
Reviewed by:	imp, jhb, nwhitehorn
MFC after:	2 weeks
2010-04-24 12:49:52 +00:00
Konstantin Belousov
b4bf2ac1ac Fix typo.
Submitted by:	emaste
Pointy hat to:	kib (who needs much bigger wardrobe)
MFC after:	1 week
2010-04-21 20:04:42 +00:00
Konstantin Belousov
05e06d1157 Provide compat32 shims for kinfo_proc sysctl. This allows 32bit ps(1) to
mostly work on 64bit host.

The work is based on an original patch submitted by emaste, obtained
from Sandvine's source tree.

Reviewed by:	jhb
MFC after:	1 week
2010-04-21 19:32:00 +00:00
Konstantin Belousov
00305546f3 For kinfo_proc in kp->ki_siglist, return the set of the signals pending
in the process queue when gathering information for the process, and set
of signals pending for the thread, when gathering information for the
thread. Previously, the sysctl returned a union of the process and some
arbitrary thread pending set for the process, and union of the process
and the thread pending set for the thread.

MFC after:	1 week
2010-02-27 15:32:49 +00:00
Jilles Tjoelker
bbb17d169c Include terminated threads in ps's process cpu time field.
MFC after:	2 weeks
2010-02-27 12:15:59 +00:00
Bjoern A. Zeeb
095809b084 Remove an unused global.
MFC after:	3 days
2009-12-25 20:03:03 +00:00
Ed Schouten
8dc9b4cf04 Let access overriding to TTYs depend on the cdev_priv, not the vnode.
Basically this commit changes two things, which improves access to TTYs
in exceptional conditions. Basically the problem was that when you ran
jexec(8) to attach to a jail, you couldn't use /dev/tty (well, also the
node of the actual TTY, e.g. /dev/pts/X). This is very inconvenient if
you want to attach to screens quickly, use ssh(1), etc.

The fixes:

- Cache the cdev_priv of the controlling TTY in struct session. Change
  devfs_access() to compare against the cdev_priv instead of the vnode.
  This allows you to bypass UNIX permissions, even across different
  mounts of devfs.

- Extend devfs_prison_check() to unconditionally expose the device node
  of the controlling TTY, even if normal prison nesting rules normally
  don't allow this. This actually allows you to interact with this
  device node.

To be honest, I'm not really happy with this solution. We now have to
store three pointers to a controlling TTY (s_ttyp, s_ttyvp, s_ttydp).
In an ideal world, we should just get rid of the latter two and only use
s_ttyp, but this makes certian pieces of code very impractical (e.g.
devfs, kern_exit.c).

Reported by:	Many people
2009-12-19 18:42:12 +00:00
Ed Maste
e380cc73ed In fill_kinfo_thread, copy the thread's name into struct kinfo_proc even
if it is empty.  Otherwise the previous thread's name would remain in the
struct and then be reported for this thread.

Submitted by:	Ryan Stone
MFC after:	1 week
2009-10-01 21:44:30 +00:00
Konstantin Belousov
8a945d109c Reintroduce the r196640, after fixing the problem with my testing.
Remove the altkstacks, instead instantiate threads with kernel stack
allocated with the right size from the start. For the thread that has
kernel stack cached, verify that requested stack size is equial to the
actual, and reallocate the stack if sizes differ [1].

This fixes the bug introduced by r173361 that was committed several days
after r173004 and consisted of kthread_add(9) ignoring the non-default
kernel stack size.

Also, r173361 removed the caching of the kernel stacks for a non-first
thread in the process. Introduce separate kernel stack cache that keeps
some limited amount of preallocated kernel stacks to lower the latency
of thread allocation. Add vm_lowmem handler to prune the cache on
low memory condition. This way, system with reasonable amount of the
threads get lower latency of thread creation, while still not exhausting
significant portion of KVA for unused kstacks.

Submitted by:	peter [1]
Discussed with:	jhb, julian, peter
Reviewed by:	jhb
Tested by:	pho (and retested according to new test scenarious)
MFC after:	1 week
2009-09-01 11:41:51 +00:00
Konstantin Belousov
f25fa6abb2 Reverse r196640 and r196644 for now. 2009-08-29 21:53:08 +00:00
Konstantin Belousov
c3cf0b476f Remove the altkstacks, instead instantiate threads with kernel stack
allocated with the right size from the start. For the thread that has
kernel stack cached, verify that requested stack size is equial to the
actual, and reallocate the stack if sizes differ [1].

This fixes the bug introduced by r173361 that was committed several days
after r173004 and consisted of kthread_add(9) ignoring the non-default
kernel stack size.

Also, r173361 removed the caching of the kernel stacks for a non-first
thread in the process. Introduce separate kernel stack cache that keeps
some limited amount of preallocated kernel stacks to lower the latency
of thread allocation. Add vm_lowmem handler to prune the cache on
low memory condition. This way, system with reasonable amount of the
threads get lower latency of thread creation, while still not exhausting
significant portion of KVA for unused kstacks.

Submitted by:	peter [1]
Discussed with:	jhb, julian, peter
Reviewed by:	jhb
Tested by:	pho
MFC after:	1 week
2009-08-29 13:28:02 +00:00
Brooks Davis
254d03c508 Introduce a new sysctl process mib, kern.proc.groups which adds the
ability to retrieve the group list of each process.

Modify procstat's -s option to query this mib when the kinfo_proc
reports that the field has been truncated.  If the mib does not exist,
fall back to the truncated list.

Reviewed by:	rwatson
Approved by:	re (kib)
MFC after:	2 weeks
2009-07-24 19:12:19 +00:00
Brooks Davis
1b5768be71 Revert the changes to struct kinfo_proc in r194498. Instead, fill
in up to 16 (KI_NGROUPS) values and steal a bit from ki_cr_flags
(all bits currently unused) to indicate overflow with the new flag
KI_CRF_GRP_OVERFLOW.

This fixes procstat -s.

Approved by: re (kib)
2009-07-24 15:03:10 +00:00
John Baldwin
013818111a Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses.  The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.

Reviewed by:	alc
Approved by:	re (kensmith)
MFC after:	2 weeks
2009-07-24 13:50:29 +00:00
Brooks Davis
838d985825 Rework the credential code to support larger values of NGROUPS and
NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024
and 1023 respectively.  (Previously they were equal, but under a close
reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it
is the number of supplemental groups, not total number of groups.)

The bulk of the change consists of converting the struct ucred member
cr_groups from a static array to a pointer.  Do the equivalent in
kinfo_proc.

Introduce new interfaces crcopysafe() and crsetgroups() for duplicating
a process credential before modifying it and for setting group lists
respectively.  Both interfaces take care for the details of allocating
groups array. crsetgroups() takes care of truncating the group list
to the current maximum (NGROUPS) if necessary.  In the future,
crsetgroups() may be responsible for insuring invariants such as sorting
the supplemental groups to allow groupmember() to be implemented as a
binary search.

Because we can not change struct xucred without breaking application
ABIs, we leave it alone and introduce a new XU_NGROUPS value which is
always 16 and is to be used or NGRPS as appropriate for things such as
NFS which need to use no more than 16 groups.  When feasible, truncate
the group list rather than generating an error.

Minor changes:
  - Reduce the number of hand rolled versions of groupmember().
  - Do not assign to both cr_gid and cr_groups[0].
  - Modify ipfw to cache ucreds instead of part of their contents since
    they are immutable once referenced by more than one entity.

Submitted by:	Isilon Systems (initial implementation)
X-MFC after:	never
PR:		bin/113398 kern/133867
2009-06-19 17:10:35 +00:00
Robert Watson
820c518125 Add a flags field to struct ucred, and export that via kinfo_proc,
consuming one of its spare fields.  The cr_flags field is currently
unused, but will be used for features, including capability mode and
pay-as-you-go audit.

Discussed with:	jhb, sson
2009-06-01 20:26:51 +00:00
Jamie Gritton
0304c73163 Add hierarchical jails. A jail may further virtualize its environment
by creating a child jail, which is visible to that jail and to any
parent jails.  Child jails may be restricted more than their parents,
but never less.  Jail names reflect this hierarchy, being MIB-style
dot-separated strings.

Every thread now points to a jail, the default being prison0, which
contains information about the physical system.  Prison0's root
directory is the same as rootvnode; its hostname is the same as the
global hostname, and its securelevel replaces the global securelevel.
Note that the variable "securelevel" has actually gone away, which
should not cause any problems for code that properly uses
securelevel_gt() and securelevel_ge().

Some jail-related permissions that were kept in global variables and
set via sysctls are now per-jail settings.  The sysctls still exist for
backward compatibility, used only by the now-deprecated jail(2) system
call.

Approved by:	bz (mentor)
2009-05-27 14:11:23 +00:00
Attilio Rao
f8d9048018 - Add a function (fill_kinfo_aggregate()) which aggregates relevant
members for a kinfo entry on a process-wide system.
- Use the newly introduced function in order to fix cases like
  KERN_PROC_PROC where aggregating stats are broken because they just
  consider the first thread in the pool for each process.
  (Note, additively, that KERN_PROC_PROC is rather inaccurate on
  thread-wide informations like the 'state' of the process.  Such
  informations should maybe be invalidated and being forceably discarded
  by the consumers?).
- Simplify the logic of sysctl_out_proc() and adjust the
  fill_kinfo_thread() accordingly.
- Remove checks on the FIRST_THREAD_IN_PROC() being NULL but add
  assertives.

This patch should fix aggregate statistics for KERN_PROC_PROC.
This is one of the reasons why top doesn't use this option and now it
can be use it safely.
ps, when launched in order to display just processes, now should report
correct cpu utilization percentages and times (as opposed by the old
code).

Reviewed by:	jhb, emaste
Sponsored by:	Sandvine Incorporated
2009-02-18 21:52:13 +00:00
John Baldwin
24f87fdbe8 - Add conditional Giant locking around the vrele() in
sysctl_kern_proc_pathname().
- Mark all the kern.proc.* sysctls as MPSAFE.

Submitted by:	csjp (2)
2009-01-23 22:46:45 +00:00
Konstantin Belousov
22a448c4d9 vm_map_lock_read() does not increment map->timestamp, so we should
compare map->timestamp with saved timestamp after map read lock is
reacquired, not with saved timestamp + 1. The only consequence of the +1
was unconditional lookup of the next map entry, though.

Tested by:	pho
Approved by:	des
MFC after:	2 weeks
2008-12-29 12:45:11 +00:00
Konstantin Belousov
c7462f4387 Reference the vmspace of the process being inspected by procfs, linprocfs
and sysctl kern_proc_vmmap handlers.

Reported and tested by:	pho
Reviewed by:	rwatson, des
MFC after:	1 week
2008-12-12 12:12:36 +00:00
Konstantin Belousov
118d0afa28 Do drop vm map lock earlier in the sysctl_kern_proc_vmmap(), to avoid
locking a vnode while having vm map locked.

Reported and tested by:	pho
MFC after:	1 week
2008-12-08 12:29:30 +00:00
Konstantin Belousov
aeb325719a Several threads in a process may do vfork() simultaneously. Then, all
parent threads sleep on the parent' struct proc until corresponding
child releases the vmspace. Each sleep is interlocked with proc mutex of
the child, that triggers assertion in the sleepq_add(). The assertion
requires that at any time, all simultaneous sleepers for the channel use
the same interlock.

Silent the assertion by using conditional variable allocated in the
child. Broadcast the variable event on exec() and exit().

Since struct proc * sleep wait channel is overloaded for several
unrelated events, I was unable to remove wakeups from the places where
cv_broadcast() is added, except exec().

Reported and tested by:	ganbold
Suggested and reviewed by:	jhb
MFC after:	2 week
2008-12-05 20:50:24 +00:00
Peter Wemm
43151ee6cf Merge user/peter/kinfo branch as of r185547 into head.
This changes struct kinfo_filedesc and kinfo_vmentry such that they are
same on both 32 and 64 bit platforms like i386/amd64 and won't require
sysctl wrapping.

Two new OIDs are assigned.  The old ones are available under
COMPAT_FREEBSD7 - but it isn't that simple.  The superceded interface
was never actually released on 7.x.

The other main change is to pack the data passed to userland via the
sysctl.  kf_structsize and kve_structsize are reduced for the copyout.
If you have a process with 100,000+ sockets open, the unpacked records
require a 132MB+ copyout.  With packing, it is "only" ~35MB.  (Still
seriously unpleasant, but not quite as devastating).  A similar problem
exists for the vmentry structure - have lots and lots of shared libraries
and small mmaps and its copyout gets expensive too.

My immediate problem is valgrind.  It traditionally achieves this
functionality by parsing procfs output, in a packed format.  Secondly, when
tracing 32 bit binaries on amd64 under valgrind, it uses a cross compiled
32 bit binary which ran directly into the differing data structures in 32
vs 64 bit mode.  (valgrind uses this to track file descriptor operations
and this therefore affected every single 32 bit binary)

I've added two utility functions to libutil to unpack the structures into
a fixed record length and to make it a little more convenient to use.
2008-12-02 06:50:26 +00:00
Pawel Jakub Dawidek
1ba4a712dd Update ZFS from version 6 to 13 and bring some FreeBSD-specific changes.
This bring huge amount of changes, I'll enumerate only user-visible changes:

- Delegated Administration

	Allows regular users to perform ZFS operations, like file system
	creation, snapshot creation, etc.

- L2ARC

	Level 2 cache for ZFS - allows to use additional disks for cache.
	Huge performance improvements mostly for random read of mostly
	static content.

- slog

	Allow to use additional disks for ZFS Intent Log to speed up
	operations like fsync(2).

- vfs.zfs.super_owner

	Allows regular users to perform privileged operations on files stored
	on ZFS file systems owned by him. Very careful with this one.

- chflags(2)

	Not all the flags are supported. This still needs work.

- ZFSBoot

	Support to boot off of ZFS pool. Not finished, AFAIK.

	Submitted by:	dfr

- Snapshot properties

- New failure modes

	Before if write requested failed, system paniced. Now one
	can select from one of three failure modes:
	- panic - panic on write error
	- wait - wait for disk to reappear
	- continue - serve read requests if possible, block write requests

- Refquota, refreservation properties

	Just quota and reservation properties, but don't count space consumed
	by children file systems, clones and snapshots.

- Sparse volumes

	ZVOLs that don't reserve space in the pool.

- External attributes

	Compatible with extattr(2).

- NFSv4-ACLs

	Not sure about the status, might not be complete yet.

	Submitted by:	trasz

- Creation-time properties

- Regression tests for zpool(8) command.

Obtained from:	OpenSolaris
2008-11-17 20:49:29 +00:00
John Baldwin
2ff47c5f18 Remove unnecessary locking around vn_fullpath(). The vnode lock for the
vnode in question does not need to be held.  All the data structures used
during the name lookup are protected by the global name cache lock.
Instead, the caller merely needs to ensure a reference is held on the
vnode (such as vhold()) to keep it from being freed.

In the case of procfs' <pid>/file entry, grab the process lock while we
gain a new reference (via vhold()) on p_textvp to fully close races with
execve(2).

For the kern.proc.vmmap sysctl handler, use a shared vnode lock around
the call to VOP_GETATTR() rather than an exclusive lock.

MFC after:	1 month
2008-11-04 19:04:01 +00:00
Peter Wemm
7a9c4d2409 Add three extra to the kinfo_proc_vmmap data. kve_offset - the offset
within an object that a mapping refers to.  fileid and fsid are inode/dev
for vnodes.  (Linux procfs has these and valgrind is really unhappy
without them.)  I believe I didn't change the size of the struct.
2008-10-31 05:43:19 +00:00
Dag-Erling Smørgrav
1ede983cc9 Retire the MALLOC and FREE macros. They are an abomination unto style(9).
MFC after:	3 months
2008-10-23 15:53:51 +00:00
Ed Schouten
42ff2756c7 Fix minor TTY API inconsistency.
Unlike tty_rel_gone() and tty_rel_sess(), the tty_rel_pgrp() routine
does not unlock the TTY. I once had the idea to make the code call
tty_rel_pgrp() and tty_rel_sess(), picking up the TTY lock once. This
turned out a little harder than I expected, so this is how it works now.

It's a lot easier if we just let tty_rel_pgrp() unlock the TTY, because
the other routines do this anyway.
2008-09-16 14:57:23 +00:00
Kevin Lo
f308bddd3f If the process id specified is invalid, the system call returns ESRCH 2008-09-04 10:44:33 +00:00
Ed Schouten
bc093719ca Integrate the new MPSAFE TTY layer to the FreeBSD operating system.
The last half year I've been working on a replacement TTY layer for the
FreeBSD kernel. The new TTY layer was designed to improve the following:

- Improved driver model:

  The old TTY layer has a driver model that is not abstract enough to
  make it friendly to use. A good example is the output path, where the
  device drivers directly access the output buffers. This means that an
  in-kernel PPP implementation must always convert network buffers into
  TTY buffers.

  If a PPP implementation would be built on top of the new TTY layer
  (still needs a hooks layer, though), it would allow the PPP
  implementation to directly hand the data to the TTY driver.

- Improved hotplugging:

  With the old TTY layer, it isn't entirely safe to destroy TTY's from
  the system. This implementation has a two-step destructing design,
  where the driver first abandons the TTY. After all threads have left
  the TTY, the TTY layer calls a routine in the driver, which can be
  used to free resources (unit numbers, etc).

  The pts(4) driver also implements this feature, which means
  posix_openpt() will now return PTY's that are created on the fly.

- Improved performance:

  One of the major improvements is the per-TTY mutex, which is expected
  to improve scalability when compared to the old Giant locking.
  Another change is the unbuffered copying to userspace, which is both
  used on TTY device nodes and PTY masters.

Upgrading should be quite straightforward. Unlike previous versions,
existing kernel configuration files do not need to be changed, except
when they reference device drivers that are listed in UPDATING.

Obtained from:		//depot/projects/mpsafetty/...
Approved by:		philip (ex-mentor)
Discussed:		on the lists, at BSDCan, at the DevSummit
Sponsored by:		Snow B.V., the Netherlands
dcons(4) fixed by:	kan
2008-08-20 08:31:58 +00:00
Konstantin Belousov
58e8af1bf5 Call pargs_drop() unconditionally in do_execve(), the function correctly
handles the NULL argument.
Make pargs_free() static.

MFC after:	1 week
2008-07-25 11:55:32 +00:00
John Birrell
5d217f173c Add DTrace 'proc' provider probes using the Statically Defined Trace
(sdt) mechanism.
2008-05-24 06:22:16 +00:00
Jeff Roberson
374ae2a393 - Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from
requiring the per-process spinlock to only requiring the process lock.
 - Reflect these changes in the proc.h documentation and consumers throughout
   the kernel.  This is a substantial reduction in locking cost for these
   fields and was made possible by recent changes to threading support.
2008-03-19 06:19:01 +00:00
Jeff Roberson
6617724c5f Remove kernel support for M:N threading.
While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential.  Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.
2008-03-12 10:12:01 +00:00
Robert Watson
d92909c1d4 Don't zero td_runtime when billing thread CPU usage to the process;
maintain a separate td_incruntime to hold unbilled CPU usage for
the thread that has the previous properties of td_runtime.

When thread information is requested using the thread monitoring
sysctls, export thread td_runtime instead of process rusage runtime
in kinfo_proc.

This restores the display of individual ithread and other kernel
thread CPU usage since inception in ps -H and top -SH, as well for
libthr user threads, valuable debugging information lost with the
move to try kthreads since they are no longer independent processes.

There is universal agreement that we should rewrite the process and
thread export sysctls, but this commit gets things going a bit
better in the mean time.  Likewise, there are resevations about the
continued validity of statclock given the speed of modern processors.

Reviewed by:		attilio, emaste, jhb, julian
2008-01-10 22:11:20 +00:00
Attilio Rao
cb05b60a89 vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by:	Diego Sardina <siarodx at gmail dot com>,
		Andrea Di Pasquale <whyx dot it at gmail dot com>
2008-01-10 01:10:58 +00:00
Robert Watson
0417fe5421 Return ESRCH when a kernel stack is queried on a process in execve() --
p_candebug() will return EAGAIN which, if the other process never
leaves execve(), will result in the sysctl spinning and never returning
to userspace.  Processes should always eventually leave execve(), but
spinning in kernel while we wait is bad for countless reasons, and
particularly harmful if execve() itself is deadlocked.

Possibly we should return another error, or return a marker indicating
the thread is in execve() so it can be reported that way in userspace.

Reported by:	kris
2007-12-27 22:44:01 +00:00
Robert Watson
63d79c4fd6 Check for P_WEXIT before PHOLD() on a process in kstack and vm query
sysctls, as PHOLD() asserts !P_WEXIT.

Reported by:	Michael Plass <mfp49_freebsd at plass-family dot net>
2007-12-09 17:22:27 +00:00
Robert Watson
1cc8c45c54 Add another new sysctl in support of the forthcoming procstat(1) to
support its -k argument:

kern.proc.kstack - dump the kernel stack of a process, if debugging
  is permitted.

This sysctl is present if either "options DDB" or "options STACK" is
compiled into the kernel.  Having support for tracing the kernel
stacks of processes from user space makes it much easier to debug
(or understand) specific wmesg's while avoiding the need to enter
DDB in order to determine the path by which a process came to be
blocked on a particular wait channel or lock.
2007-12-02 21:52:18 +00:00
Robert Watson
cc43c38c87 Add two new sysctls in support of the forthcoming procstat(1) to support
its -f and -v arguments:

kern.proc.filedesc - dump file descriptor information for a process, if
  debugging is permitted, including socket addresses, open flags, file
  offsets, file paths, etc.

kern.proc.vmmap - dump virtual memory mapping information for a process,
  if debugging is permitted, including layout and information on
  underlying objects, such as the type of object and path.

These provide a superset of the information historically available
through the now-deprecated procfs(4), and are intended to be exported
in an ABI-robust form.
2007-12-02 10:10:27 +00:00
Robert Watson
965b55e2b4 Test that p_textvp is non-NULL be dereferencing, as no executable vnode is
set for kernel processes.

Reported by:	Skip Ford <skip at menantico dot com>
MFC after:	3 days
2007-11-20 18:03:09 +00:00