Commit Graph

370 Commits

Author SHA1 Message Date
Konstantin Belousov
17dfbc1c43 Add per-process osrel node to the procfs, to allow read and set p_osrel
value for the process.

Approved by:	des (procfs maintainer)
MFC after:	3 weeks
2009-09-23 12:08:08 +00:00
John Baldwin
013818111a Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses.  The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.

Reviewed by:	alc
Approved by:	re (kensmith)
MFC after:	2 weeks
2009-07-24 13:50:29 +00:00
Konstantin Belousov
3364c323e6 Implement global and per-uid accounting of the anonymous memory. Add
rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved
for the uid.

The accounting information (charge) is associated with either map entry,
or vm object backing the entry, assuming the object is the first one
in the shadow chain and entry does not require COW. Charge is moved
from entry to object on allocation of the object, e.g. during the mmap,
assuming the object is allocated, or on the first page fault on the
entry. It moves back to the entry on forks due to COW setup.

The per-entry granularity of accounting makes the charge process fair
for processes that change uid during lifetime, and decrements charge
for proper uid when region is unmapped.

The interface of vm_pager_allocate(9) is extended by adding struct ucred *,
that is used to charge appropriate uid when allocation if performed by
kernel, e.g. md(4).

Several syscalls, among them is fork(2), may now return ENOMEM when
global or per-uid limits are enforced.

In collaboration with:	pho
Reviewed by:	alc
Approved by:	re (kensmith)
2009-06-23 20:45:22 +00:00
Jamie Gritton
0304c73163 Add hierarchical jails. A jail may further virtualize its environment
by creating a child jail, which is visible to that jail and to any
parent jails.  Child jails may be restricted more than their parents,
but never less.  Jail names reflect this hierarchy, being MIB-style
dot-separated strings.

Every thread now points to a jail, the default being prison0, which
contains information about the physical system.  Prison0's root
directory is the same as rootvnode; its hostname is the same as the
global hostname, and its securelevel replaces the global securelevel.
Note that the variable "securelevel" has actually gone away, which
should not cause any problems for code that properly uses
securelevel_gt() and securelevel_ge().

Some jail-related permissions that were kept in global variables and
set via sysctls are now per-jail settings.  The sysctls still exist for
backward compatibility, used only by the now-deprecated jail(2) system
call.

Approved by:	bz (mentor)
2009-05-27 14:11:23 +00:00
Konstantin Belousov
2883703e00 Use the p_sysent->sv_flags flag SV_ILP32 to detect 32bit process
executing on 64bit kernel. This eliminates the direct comparisions
of p_sysent with &ia32_freebsd_sysvec, that were left intact after
r185169.
2009-03-02 18:43:50 +00:00
Dag-Erling Smørgrav
655fcdaa00 Fix a logic bug that caused the pfs_attr method to be called only for
PFS_PROCDEP nodes.

Submitted by:	Andrew Brampton <brampton@gmail.com>
MFC after:	2 weeks
2009-02-16 15:17:26 +00:00
Konstantin Belousov
22a448c4d9 vm_map_lock_read() does not increment map->timestamp, so we should
compare map->timestamp with saved timestamp after map read lock is
reacquired, not with saved timestamp + 1. The only consequence of the +1
was unconditional lookup of the next map entry, though.

Tested by:	pho
Approved by:	des
MFC after:	2 weeks
2008-12-29 12:45:11 +00:00
Konstantin Belousov
c990bf0896 Use curproc->p_sysent->sv_flags bit SV_ILP32 for detection of the 32 bit
caller, instead of direct comparision with ia32_freebsd_sysvec.

Tested by:	pho
Approved by:	des
MFC after:	2 weeks
2008-12-29 12:41:32 +00:00
Konstantin Belousov
c7462f4387 Reference the vmspace of the process being inspected by procfs, linprocfs
and sysctl kern_proc_vmmap handlers.

Reported and tested by:	pho
Reviewed by:	rwatson, des
MFC after:	1 week
2008-12-12 12:12:36 +00:00
Konstantin Belousov
c96f374195 Relock user map earlier, to have the lock held when break leaves the
loop earlier due to sbuf error.

Pointy hat to:	me
Submitted by:	dchagin
2008-12-10 16:11:09 +00:00
Konstantin Belousov
9499cb83bf Make two style changes to create new commit and document proper commit
message for r185765.

Noted by:	rdivacky
Requested by:	des

Commit message for r185765 should be:
In procfs map handler, and in linprocfs maps handler, do not call
vn_fullpath() while having vm map locked. This is done in anticipation
of the vop_vptocnp commit, that would make vn_fullpath sometime
acquire vnode lock.

Also, in linprocfs, maps handler already acquires vnode lock.

No objections from:	des
MFC after:	2 week
2008-12-08 13:15:31 +00:00
Konstantin Belousov
5a66e0259b Change the linprocfs <pid>/maps and procfs <pid>/map handlers to use
sbuf instead of doing uiomove. This allows for reads from non-zero
offsets to work.

Patch is forward-ported des@' one, and was adopted to current code
by dchagin@ and me.

Reviewed by:	des (linprocfs part)
PR:	kern/101453
MFC after:	1 week
2008-12-08 12:34:52 +00:00
John Baldwin
2ff47c5f18 Remove unnecessary locking around vn_fullpath(). The vnode lock for the
vnode in question does not need to be held.  All the data structures used
during the name lookup are protected by the global name cache lock.
Instead, the caller merely needs to ensure a reference is held on the
vnode (such as vhold()) to keep it from being freed.

In the case of procfs' <pid>/file entry, grab the process lock while we
gain a new reference (via vhold()) on p_textvp to fully close races with
execve(2).

For the kern.proc.vmmap sysctl handler, use a shared vnode lock around
the call to VOP_GETATTR() rather than an exclusive lock.

MFC after:	1 month
2008-11-04 19:04:01 +00:00
Konstantin Belousov
9a1e630dfd Change the linprocfs <pid>/maps and procfs <pid>/map handlers to use
sbuf instead of doing uiomove. This allows for reads from non-zero
offsets to work.

Patch is forward-ported des@' one, and was adopted to current code
by dchagin@ and me.

Reviewed by:	des (linprocfs part)
PR:	kern/101453
MFC after:	1 week
2008-10-04 14:08:16 +00:00
Jeff Roberson
6617724c5f Remove kernel support for M:N threading.
While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential.  Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.
2008-03-12 10:12:01 +00:00
Attilio Rao
22db15c06f VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
2008-01-13 14:44:15 +00:00
Attilio Rao
cb05b60a89 vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by:	Diego Sardina <siarodx at gmail dot com>,
		Andrea Di Pasquale <whyx dot it at gmail dot com>
2008-01-10 01:10:58 +00:00
Jeff Roberson
b61ce5b0e6 - Move all of the PS_ flags into either p_flag or td_flags.
- p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or
   previously the sched_lock.  These bugs have existed for some time.
 - Allow swapout to try each thread in a process individually and then
   swapin the whole process if any of these fail.  This allows us to move
   most scheduler related swap flags into td_flags.
 - Keep ki_sflag for backwards compat but change all in source tools to
   use the new and more correct location of P_INMEM.

Reported by:	pho
Reviewed by:	attilio, kib
Approved by:	re (kensmith)
2007-09-17 05:31:39 +00:00
Robert Watson
32f9753cfb Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths.  Do, however, move those prototypes to priv.h.

Reviewed by:	csjp
Obtained from:	TrustedBSD Project
2007-06-12 00:12:01 +00:00
Attilio Rao
a1fe14bc33 rufetch and calcru sometimes should be called atomically together.
This patch fixes places where they should be called atomically changing
their locking requirements (both assume per-proc spinlock held) and
introducing rufetchcalc which wrappers both calls to be performed in
atomic way.

Reviewed by: jeff
Approved by: jeff (mentor)
2007-06-09 21:48:44 +00:00
Jeff Roberson
982d11f836 Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
   sychronization.
 - Use the per-process spinlock rather than the sched_lock for per-process
   scheduling synchronization.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-05 00:00:57 +00:00
Dag-Erling Smørgrav
1d776018d4 The process lock is held when procfs_ioctl() is called. Assert that this
is so, and PHOLD the process while sleeping since msleep() will release
the lock.
2007-05-01 12:59:20 +00:00
Alan Cox
cf75c506db Add synchronization. Eliminate the acquisition and release of Giant.
Reviewed by: tegge
2007-04-23 06:12:24 +00:00
Dag-Erling Smørgrav
b1f9e8cec9 Instead of stating GIANT_REQUIRED, just acquire and release Giant where
needed.  This does not make a difference now, but will when procfs is
marked MPSAFE.
2007-04-15 17:06:09 +00:00
Dag-Erling Smørgrav
302762c344 Fix the same bug as in procfs_doproc{,db}regs(): check that uio_offset is
0 upon entry, and don't reset it before returning.

MFC after:	3 weeks
2007-04-15 13:29:36 +00:00
Dag-Erling Smørgrav
66cd74a611 Don't reset uio_offset to 0 before returning. Instead, refuse to service
requests where uio_offset is not 0 to begin with.  This fixes a long-
standing bug where e.g. 'cat /proc/$$/regs' would loop forever.

MFC after:	3 weeks
2007-04-15 13:24:03 +00:00
Dag-Erling Smørgrav
771709eb78 Add a pn_destroy field to pfs_node. This field points to a destructor
function which is called from pfs_destroy() before the node is reclaimed.

Modify pfs_create_{dir,file,link}() to accept a pointer to a destructor
function in addition to the usual attr / fill / vis pointers.

This breaks both the programming and binary interfaces between pseudofs
and its consumers.  It is believed that there are no pseudofs consumers
outside the source tree, so that the impact of this change is minimal.

Submitted by:	Aniruddha Bohra <bohra@cs.rutgers.edu>
2007-03-12 12:16:52 +00:00
Robert Watson
969e5bdcd0 Do allow PIOCSFL in jail for setguid processes; this is more consistent
with other debugging checks elsewhere.  XXX comment on the fact that
p_candebug() is not being used here remains.
2007-02-19 13:04:25 +00:00
Konstantin Belousov
a257337698 Fix the race of dereferencing /proc/<pid>/file with execve(2) by caching
the value of p_textvp. This way, we always unlock the locked vnode.
While there, vhold() the vnode around the vn_lock().

Reported and tested by:	Guy Helmer (ghelmer palisadesys com)
Approved by:		des (procfs maintainer)
MFC after:		1 week
2007-02-07 10:30:49 +00:00
Julian Elischer
ad1e7d285a Threading cleanup.. part 2 of several.
Make part of John Birrell's KSE patch permanent..
Specifically, remove:
Any reference of the ksegrp structure. This feature was
never fully utilised and made things overly complicated.
All code in the scheduler that tried to make threaded programs
fair to unthreaded programs.  Libpthread processes will already
do this to some extent and libthr processes already disable it.

Also:
Since this makes such a big change to the scheduler(s), take the opportunity
to rename some structures and elements that had to be moved anyhow.
This makes the code a lot more readable.

The ULE scheduler compiles again but I have no idea if it works.

The 4bsd scheduler still reqires a little cleaning and some functions that now do
ALMOST nothing will go away, but I thought I'd do that as a separate commit.

Tested by David Xu, and Dan Eischen using libthr and libpthread.
2006-12-06 06:34:57 +00:00
Konstantin Belousov
dbf989ea6a Wake up PIOCWAIT handler on the process exit in addition to the stop
events. &p->p_stype is explicitely woken up on process exit for us.

Now, truss /nonexistent exits with error instead of waiting until killed
by signal.

Reported by:	Nikos Vassiliadis nvass at teledomenet gr
Reviewed by:	jhb
MFC after:	1 week
2006-11-17 14:52:38 +00:00
Robert Watson
acd3428b7d Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges.  These may
require some future tweaking.

Sponsored by:           nCircle Network Security, Inc.
Obtained from:          TrustedBSD Project
Discussed on:           arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
                        Alex Lyashkov <umka at sevcity dot net>,
                        Skip Ford <skip dot ford at verizon dot net>,
                        Antoine Brodin <antoine dot brodin at laposte dot net>
2006-11-06 13:42:10 +00:00
John Birrell
8460a577a4 Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by:	davidxu@
2006-10-26 21:42:22 +00:00
Ruslan Ermilov
9fddcc6661 Fix our ioctl(2) implementation when the argument is "int". New
ioctls passing integer arguments should use the _IOWINT() macro.
This fixes a lot of ioctl's not working on sparc64, most notable
being keyboard/syscons ioctls.

Full ABI compatibility is provided, with the bonus of fixing the
handling of old ioctls on sparc64.

Reviewed by:	bde (with contributions)
Tested by:	emax, marius
MFC after:	1 week
2006-09-27 19:57:02 +00:00
Guy Helmer
3266c22854 Upon further review, DES prefers this change over that in revision 1.13
to resolve the directory access problem for processes with P_SUGID flag
set.

Suggested by: des
2006-06-05 16:41:27 +00:00
Guy Helmer
e06dbd3229 Revision 1.4 set access for all sensitive files in /proc/<PID> to mode 0
if a process's uid or gid has changed, but the /proc/<PID> directory
itself was also set to mode 0.  Assuming this doesn't open any
security holes, open access to the /proc/<PID> directory for users
other than root to read or search the directory.

Reviewed by:	des (back in February)
MFC after:	3 weeks
2006-05-24 14:03:51 +00:00
John Baldwin
7a61c1a3cb Hold the proc lock while calling proc_sstep() since the function asserts
it and remove a PRELE() that didn't have a matching PHOLD().  The calling
code already has a PHOLD anyway.

MFC after:	1 week
2006-02-22 17:20:37 +00:00
Tom Rhodes
09c00166e4 Make tv_sec a time_t on all platforms but alpha. Brings us more in line with
POSIX.  This also makes the struct correct we ever implement an i386-time64
architecture.  Not that we need too.

Reviewed by:	imp, brooks
Approved by:	njl (acpica), des (no objects, touches procfs)
Tested with:	make universe
2005-12-24 22:22:17 +00:00
David Xu
9104847f21 1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most
changes in MD code are trivial, before this change, trapsignal and
   sendsig use discrete parameters, now they uses member fields of
   ksiginfo_t structure. For sendsig, this change allows us to pass
   POSIX realtime signal value to user code.

2. Remove cpu_thread_siginfo, it is no longer needed because we now always
   generate ksiginfo_t data and feed it to libpthread.

3. Add p_sigqueue to proc structure to hold shared signals which were
   blocked by all threads in the proc.

4. Add td_sigqueue to thread structure to hold all signals delivered to
   thread.

5. i386 and amd64 now return POSIX standard si_code, other arches will
   be fixed.

6. In this sigqueue implementation, pending signal set is kept as before,
   an extra siginfo list holds additional siginfo_t data for signals.
   kernel code uses psignal() still behavior as before, it won't be failed
   even under memory pressure, only exception is when deleting a signal,
   we should call sigqueue_delete to remove signal from sigqueue but
   not SIGDELSET. Current there is no kernel code will deliver a signal
   with additional data, so kernel should be as stable as before,
   a ksiginfo can carry more information, for example, allow signal to
   be delivered but throw away siginfo data if memory is not enough.
   SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can
   not be caught or masked.
   The sigqueue() syscall allows user code to queue a signal to target
   process, if resource is unavailable, EAGAIN will be returned as
   specification said.
   Just before thread exits, signal queue memory will be freed by
   sigqueue_flush.
   Current, all signals are allowed to be queued, not only realtime signals.

Earlier patch reviewed by: jhb, deischen
Tested on: i386, amd64
2005-10-14 12:43:47 +00:00
Peter Wemm
62919d788b Jumbo-commit to enhance 32 bit application support on 64 bit kernels.
This is good enough to be able to run a RELENG_4 gdb binary against
a RELENG_4 application, along with various other tools (eg: 4.x gcore).
We use this at work.

ia32_reg.[ch]: handle the 32 bit register file format, used by ptrace,
	procfs and core dumps.
procfs_*regs.c: vary the format of proc/XXX/*regs depending on the client
	and target application.
procfs_map.c: Don't print a 64 bit value to 32 bit consumers, or their
	sscanf fails.  They expect an unsigned long.
imgact_elf.c: produce a valid 32 bit coredump for 32 bit apps.
sys_process.c: handle 32 bit consumers debugging 32 bit targets.  Note
	that 64 bit consumers can still debug 32 bit targets.

IA64 has got stubs for ia32_reg.c.

Known limitations: a 5.x/6.x gdb uses get/setcontext(), which isn't
implemented in the 32/64 wrapper yet.  We also make a tiny patch to
gdb pacify it over conflicting formats of ld-elf.so.1.

Approved by:	re
2005-06-30 07:49:22 +00:00
Peter Wemm
2de92a386e Conditionally weaken sys_generic.c rev 1.136 to allow certain dubious
ioctl numbers in backwards compatability mode.  eg: an IOC_IN ioctl with
a size of zero.  Traditionally this was what you did before IOC_VOID
existed, and we had some established users of this in the tree, namely
procfs.  Certain 3rd party drivers with binary userland components also
have this too.

This is necessary to have 4.x and 5.x binaries use these ioctl's.  We
found this at work when trying to run 4.x binaries.

Approved by:	re
2005-06-30 00:19:08 +00:00
Poul-Henning Kamp
46d7d4a332 Don't export major,minor, instead export tty name. 2005-03-15 11:05:11 +00:00
Warner Losh
d167cf6f3a /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 18:10:42 +00:00
Colin Percival
691b3b0df9 Fix unvalidated pointer dereference. This is FreeBSD-SA-04:17.procfs. 2004-12-01 21:33:02 +00:00
John Baldwin
78c85e8dfc Rework how we store process times in the kernel such that we always store
the raw values including for child process statistics and only compute the
system and user timevals on demand.

- Fix the various kern_wait() syscall wrappers to only pass in a rusage
  pointer if they are going to use the result.
- Add a kern_getrusage() function for the ABI syscalls to use so that they
  don't have to play stackgap games to call getrusage().
- Fix the svr4_sys_times() syscall to just call calcru() to calculate the
  times it needs rather than calling getrusage() twice with associated
  stackgap, etc.
- Add a new rusage_ext structure to store raw time stats such as tick counts
  for user, system, and interrupt time as well as a bintime of the total
  runtime.  A new p_rux field in struct proc replaces the same inline fields
  from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime).  A new p_crux
  field in struct proc contains the "raw" child time usage statistics.
  ruadd() has been changed to handle adding the associated rusage_ext
  structures as well as the values in rusage.  Effectively, the values in
  rusage_ext replace the ru_utime and ru_stime values in struct rusage.  These
  two fields in struct rusage are no longer used in the kernel.
- calcru() has been split into a static worker function calcru1() that
  calculates appropriate timevals for user and system time as well as updating
  the rux_[isu]u fields of a passed in rusage_ext structure.  calcru() uses a
  copy of the process' p_rux structure to compute the timevals after updating
  the runtime appropriately if any of the threads in that process are
  currently executing.  It also now only locks sched_lock internally while
  doing the rux_runtime fixup.  calcru() now only requires the caller to
  hold the proc lock and calcru1() only requires the proc lock internally.
  calcru() also no longer allows callers to ask for an interrupt timeval
  since none of them actually did.
- calcru() now correctly handles threads executing on other CPUs.
- A new calccru() function computes the child system and user timevals by
  calling calcru1() on p_crux.  Note that this means that any code that wants
  child times must now call this function rather than reading from p_cru
  directly.  This function also requires the proc lock.
- This finishes the locking for rusage and friends so some of the Giant locks
  in exit1() and kern_wait() are now gone.
- The locking in ttyinfo() has been tweaked so that a shared lock of the
  proctree lock is used to protect the process group rather than the process
  group lock.  By holding this lock until the end of the function we now
  ensure that the process/thread that we pick to dump info about will no
  longer vanish while we are trying to output its info to the console.

Submitted by:	bde (mostly)
MFC after:	1 month
2004-10-05 18:51:11 +00:00
David Schultz
616b5f90d3 Don't PHOLD() the target process in procfs, since this is already done
in pseudofs.  Moreover, PHOLD() may block between the p_candebug()
access check and the actual operation.
2004-10-01 05:01:17 +00:00
Warner Losh
f36cfd49ad Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson
2004-04-07 20:46:16 +00:00
Pawel Jakub Dawidek
c5b7c33bc8 Remove ps_argsopen from this check, because of two reasons:
1. This check if wrong, because it is true by default
   (kern.ps_argsopen is 1 by default) (p_cansee() is not even checked).
2. Sysctl kern.ps_argsopen is going away.
2004-04-01 00:04:23 +00:00
John Baldwin
91d5354a2c Locking for the per-process resource limits structure.
- struct plimit includes a mutex to protect a reference count.  The plimit
  structure is treated similarly to struct ucred in that is is always copy
  on write, so having a reference to a structure is sufficient to read from
  it without needing a further lock.
- The proc lock protects the p_limit pointer and must be held while reading
  limits from a process to keep the limit structure from changing out from
  under you while reading from it.
- Various global limits that are ints are not protected by a lock since
  int writes are atomic on all the archs we support and thus a lock
  wouldn't buy us anything.
- All accesses to individual resource limits from a process are abstracted
  behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return
  either an rlimit, or the current or max individual limit of the specified
  resource from a process.
- dosetrlimit() was renamed to kern_setrlimit() to match existing style of
  other similar syscall helper functions.
- The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit()
  (it didn't used the stackgap when it should have) but uses lim_rlimit()
  and kern_setrlimit() instead.
- The svr4 compat no longer uses the stackgap for resource limits calls,
  but uses lim_rlimit() and kern_setrlimit() instead.
- The ibcs2 compat no longer uses the stackgap for resource limits.  It
  also no longer uses the stackgap for accessing sysctl's for the
  ibcs2_sysconf() syscall but uses kernel_sysctl() instead.  As a result,
  ibcs2_sysconf() no longer needs Giant.
- The p_rlimit macro no longer exists.

Submitted by:	mtm (mostly, I only did a few cleanups and catchups)
Tested on:	i386
Compiled on:	alpha, amd64
2004-02-04 21:52:57 +00:00
Robert Watson
1f1ca35f69 Lock p->p_textvp before calling vn_fullpath() on it. Note the
potential lock order concern due to the vnode lock held
simultaneously by the caller into procfs.

Reported by:	kuriyama
Approved by:	des
2004-01-07 17:58:51 +00:00
Dag-Erling Smørgrav
7caaf6c9c9 Minor whitespace and style issues. 2003-12-07 17:40:00 +00:00
Maxime Henrion
6fb826df1c Remove debug printf(). 2003-10-19 14:33:00 +00:00
Jacques Vidrine
8b7358ca43 Introduce a uiomove_frombuf helper routine that handles computing and
validating the offset within a given memory buffer before handing the
real work off to uiomove(9).

Use uiomove_frombuf in procfs to correct several issues with
integer arithmetic that could result in underflows/overflows.  As a
side-effect, the code is significantly simplified.

Add additional sanity checks when computing a memory allocation size
in pfs_read.

Submitted by:	rwatson  (original uiomove_frombuf -- bugs are mine :-)
Reported by:	Joost Pol <joost@pine.nl>  (integer underflows/overflows)
2003-10-02 15:00:55 +00:00
Robert Watson
309cd88432 Add a new column to the procfs map to hold the name of the mapped
file for vnode mappings.  Note that this uses vn_fullpath() and may
be somewhat unreliable, although not too unreliable for shared
libraries.  For non-vnode mappings, just print "-" for the field.

Obtained from:	TrustedBSD Projects
Sponsored by:	DARPA, AFRL, Network Associates Laboratories
2003-09-29 20:53:19 +00:00
Robert Watson
946e86b7e1 Add p_candebug() check to access a process map file in procfs; limit
access to map information for processes that you wouldn't otherwise
have debug rights on.

Tested by:	bms
2003-08-14 15:26:44 +00:00
David Xu
0e2a4d3aeb Rename P_THREADED to P_SA. P_SA means a process is using scheduler
activations.
2003-06-15 00:31:24 +00:00
Robert Watson
587ffa4508 Clean up proc locking in procfs: make sure the proc lock is held before
entering sys_process.c debugging primitives, or we violate assertions.
Also, be more careful about releasing the process lock around calls
to uiomove() which may sleep waiting for paging machinations or
related notions.  We may want to defer the uiomove() in at least
one case, but jhb will look into that at a later date.

Reported by:	Philippe Charnier <charnier@xp11.frmug.org>
Reviewed by:	jhb
2003-05-05 15:12:51 +00:00
Dag-Erling Smørgrav
87ccef7b77 Instead of recording the Unix time in a process when it starts, record the
uptime.  Where necessary, convert it back to Unix time by adding boottime
to it.  This fixes a potential problem in the accounting code, which would
compute the elapsed time incorrectly if the Unix time was stepped during
the lifetime of the process.
2003-05-01 16:59:23 +00:00
John Baldwin
664f718ba1 - Always call faultin() in _PHOLD() if PS_INMEM is clear. This closes a
race where a thread could assume that a process was swapped in by
  PHOLD() when it actually wasn't fully swapped in yet.
- In faultin(), always msleep() if PS_SWAPPINGIN is set instead of doing
  this check after bumping p_lock in the PS_INMEM == 0 case.  Also,
  sched_lock is only needed for setting and clearning swapping PS_*
  flags and the swap thread inhibitor.
- Don't set and clear the thread swap inhibitor in the same loops as the
  pmap_swapin/out_thread() since we have to do it under sched_lock.
  Instead, mimic the treatment of the PS_INMEM flag and use separate loops
  to set the inhibitors when clearing PS_INMEM and clear the inhibitors
  when setting PS_INMEM.
- swapout() now returns with the proc lock held as it holds the lock
  while adjusting the swapping-related PS_* flags so that the proc lock
  can be used to test those flags.
- Only use the proc lock to check the swapping-related PS_* flags in
  several places.
- faultin() no longer requires sched_lock to be held by callers.
- Rename PS_SWAPPING to PS_SWAPPINGOUT to be less ambiguous now that we
  have PS_SWAPPINGIN.
2003-04-22 20:00:26 +00:00
John Baldwin
f36403612a - Use a local variable to close a minor race when determining if the wmesg
printed out needs a prefix such as when a thread is blocked on a lock.
- Use another local variable to close another race for the td_wmesg and
  td_wchan members of struct thread.
2003-04-17 22:16:58 +00:00
John Baldwin
ab0eee5563 Protect p_flag with the proc lock. The sched_lock is not needed to turn
off P_STOPPED_SIG in p_flag.
2003-04-17 22:14:30 +00:00
John Baldwin
c2247848dc - P_SHOULDSTOP just needs proc lock now, so don't acquire sched_lock unless
it is needed.
- Add a proc lock assertion.
2003-04-17 22:13:46 +00:00
John Baldwin
c110b8e65e Add a proc lock assertion and move another assertion up to the top of the
function.
2003-04-17 22:12:12 +00:00
Dag-Erling Smørgrav
f9be0dee1e wakeup(9) and msleep(9) take void * arguments, not caddr_t. 2003-03-02 15:13:06 +00:00
Julian Elischer
ac2e415327 Change the process flags P_KSES to be P_THREADED.
This is just a cosmetic change but I've been meaning to do it for about a year.
2003-02-27 02:05:19 +00:00
Robert Watson
763bbd2f4f Slightly change the semantics of vnode labels for MAC: rather than
"refreshing" the label on the vnode before use, just get the label
right from inception.  For single-label file systems, set the label
in the generic VFS getnewvnode() code; for multi-label file systems,
leave the labeling up to the file system.  With UFS1/2, this means
reading the extended attribute during vfs_vget() as the inode is
pulled off disk, rather than hitting the extended attributes
frequently during operations later, improving performance.  This
also corrects sematics for shared vnode locks, which were not
previously present in the system.  This chances the cache
coherrency properties WRT out-of-band access to label data, but in
an acceptable form.  With UFS1, there is a small race condition
during automatic extended attribute start -- this is not present
with UFS2, and occurs because EAs aren't available at vnode
inception.  We'll introduce a work around for this shortly.

Approved by:	re
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-26 14:38:24 +00:00
Poul-Henning Kamp
659d5e21c7 Remove even more '&' from pointers to functions.
Spotted by:	FlexeLint
2002-10-20 21:30:02 +00:00
Juli Mallett
1d9c56964d Back our kernel support for reliable signal queues.
Requested by:	rwatson, phk, and many others
2002-10-01 17:15:53 +00:00
Juli Mallett
1226f694e6 First half of implementation of ksiginfo, signal queues, and such. This
gets signals operating based on a TailQ, and is good enough to run X11,
GNOME, and do job control.  There are some intricate parts which could be
more refined to match the sigset_t versions, but those require further
evaluation of directions in which our signal system can expand and contract
to fit our needs.

After this has been in the tree for a while, I will make in kernel API
changes, most notably to trapsignal(9) and sendsig(9), to use ksiginfo
more robustly, such that we can actually pass information with our
(queued) signals to the userland.  That will also result in using a
struct ksiginfo pointer, rather than a signal number, in a lot of
kern_sig.c, to refer to an individual pending signal queue member, but
right now there is no defined behaviour for such.

CODAFS is unfinished in this regard because the logic is unclear in
some places.

Sponsored by:	New Gold Technology
Reviewed by:	bde, tjr, jake [an older version, logic similar]
2002-09-30 20:20:22 +00:00
Jake Burkholder
05ba50f522 Use the fields in the sysentvec and in the vm map header in place of the
constants VM_MIN_ADDRESS, VM_MAXUSER_ADDRESS, USRSTACK and PS_STRINGS.
This is mainly so that they can be variable even for the native abi, based
on different machine types.  Get stack protections from the sysentvec too.
This makes it trivial to map the stack non-executable for certain abis, on
machines that support it.
2002-09-21 22:07:17 +00:00
Julian Elischer
71fad9fdee Completely redo thread states.
Reviewed by:	davidxu@freebsd.org
2002-09-11 08:13:56 +00:00
David Xu
1279572a92 s/SGNL/SIG/
s/SNGL/SINGLE/
s/SNGLE/SINGLE/

Fix abbreviation for P_STOPPED_* etc flags, in original code they were
inconsistent and difficult to distinguish between them.

Approved by: julian (mentor)
2002-09-05 07:30:18 +00:00
Jake Burkholder
e4f5294e18 Fixed 64bit big endian bugs relating to abuse of ioctl argument passing.
This makes truss work on sparc64.
2002-08-15 06:16:10 +00:00
Robert Watson
c1ff2d9baf Introduce support for Mandatory Access Control and extensible
kernel access control.

Modify procfs so that (when mounted multilabel) it exports process MAC
labels as the vnode labels of procfs vnodes associated with processes.

Approved by:	des
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 02:03:21 +00:00
Julian Elischer
1d7b9ed2e6 Create a new thread state to describe threads that would be ready to run
except for the fact tha they are presently swapped out. Also add a process
flag to indicate that the process has started the struggle to swap
back in. This will be  needed for the case where multiple threads
start the swapin action top a collision. Also add code to stop
a process fropm being swapped out if one of the threads in this
process is actually off running on another CPU.. that might hurt...

Submitted by:	Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>
2002-07-29 18:33:32 +00:00
Julian Elischer
e602ba25fd Part 1 of KSE-III
The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)

Reviewed by:	Almost everyone who counts
	(at various times, peter, jhb, matt, alfred, mini, bernd,
	and a cast of thousands)

	NOTE: this is still Beta code, and contains lots of debugging stuff.
	expect slight instability in signals..
2002-06-29 17:26:22 +00:00
John Baldwin
f44d9e24fb Change p_can{debug,see,sched,signal}()'s first argument to be a thread
pointer instead of a proc pointer and require the process pointed to
by the second argument to be locked.  We now use the thread ucred reference
for the credential checks in p_can*() as a result.  p_canfoo() should now
no longer need Giant.
2002-05-19 00:14:50 +00:00
Bruce Evans
54a4c5bf21 Include <sys/systm.h> for (at least) the definition of atomic functions
which are sometimes used by the macros in <sys/mutex.h>; don't depend
on not-quite-necessary namespace pollution in <sys/mutex.h>.
2002-04-21 15:35:54 +00:00
Robert Watson
d51ed1a04a Spelling fix for comment. 2002-04-20 01:14:25 +00:00
John Baldwin
a92e7c792a - Change procfs_control()'s first argument to be a thread pointer instead
of a process pointer.
- Move the p_candebug() at the start of procfs_control() a bit to make
  locking feasible.  We still perform the access check before doing
  anything, we just now perform it after acquiring locks.
- Don't lock the sched_lock for TRACE_WAIT_P() and when checking to see if
  p_stat is SSTOP.  We lock the process while setting p_stat to SSTOP
  so locking the process is sufficient to do a read to see if p_stat is
  SSTOP or not.
2002-04-13 23:19:13 +00:00
John Baldwin
ce5aaf4554 Lock the target process for p_candebug(). 2002-04-13 23:15:28 +00:00
John Baldwin
ff7299d998 Lock the target process in procfs_doproc*regs() for p_candebug and while
reading/writing the registers.
2002-04-13 23:14:08 +00:00
John Baldwin
590ae816c2 - p_cansee() needs the target process locked.
- We need the proc lock held for more of procfs_doprocstatus().
2002-04-13 23:09:41 +00:00
John Baldwin
44731cab3b Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API.  The entire API now consists of two functions
similar to the pre-KSE API.  The suser() function takes a thread pointer
as its only argument.  The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0.  The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on:	smp@
2002-04-01 21:31:13 +00:00
Alfred Perlstein
e9b192b758 Protect proc struct (p_args and p_comm) when doing procfs IO that pulls
data from it.

Submitted by: Jonathan Mini <mini@haikugeek.com>
2002-03-29 19:12:40 +00:00
Alfred Perlstein
11caded34f Remove __P. 2002-03-19 22:20:14 +00:00
Seigo Tanimura
f591779bb5 Lock struct pgrp, session and sigio.
New locks are:

- pgrpsess_lock which locks the whole pgrps and sessions,
- pg_mtx which protects the pgrp members, and
- s_mtx which protects the session members.

Please refer to sys/proc.h for the coverage of these locks.

Changes on the pgrp/session interface:

- pgfind() needs the pgrpsess_lock held.

- The caller of enterpgrp() is responsible to allocate a new pgrp and
  session.

- Call enterthispgrp() in order to enter an existing pgrp.

- pgsignal() requires a pgrp lock held.

Reviewed by:	jhb, alfred
Tested on:	cvsup.jp.FreeBSD.org
		(which is a quad-CPU machine running -current)
2002-02-23 11:12:57 +00:00
Dag-Erling Smørgrav
cd9e3b208c Paranoia: if the process is setugid, set all sensitive files mode 0. 2002-02-18 21:41:11 +00:00
Bruce Evans
a21759a1a9 FIxed the following style bugs:
- clobbering of jsp's $Id$ by FreeBSD's old $Id$.
- long lines in recent KSE changes (procfs_ctl.c).
- other style bugs in KSE changes (most related to an shadowed variable
  in procfs_status.c -- the td in the outer scope is obfuscated by
  PFS_FILL_ARGS).

Approved by:	des
2002-02-16 05:59:26 +00:00
Bruce Evans
a76d60f014 FIxed the following style bugs:
- clobbering of jsp's $Id$ by FreeBSD's old $Id$.
- lost Berkeley id in procfs_dbregs.c
- long lines in recent KSE changes.
- various gratuitous differences between procfs_*regs.c.
2002-02-16 05:38:07 +00:00
Bruce Evans
ff3741f519 Fixed missing PHOLD()/PRELE().
Obtained from:	procfs_dbregs.c
Approved by:	des
2002-02-16 04:05:32 +00:00
Julian Elischer
079b7badea Pre-KSE/M3 commit.
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.

Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
2002-02-07 20:58:47 +00:00
Dag-Erling Smørgrav
40e7a740c9 Remove an obsolete prototype for procfs_kmemaccess().
Submitted by:	rwatson
2001-12-11 19:07:10 +00:00
Dag-Erling Smørgrav
50cb89eed2 Fix various bugs in the debugging code and reenable it. 2001-12-09 00:35:30 +00:00
Dag-Erling Smørgrav
4aac2aa96c Fix a KSEfication brain-o in procfs_doprocfile(): return the path of the target process,
not the calling process.  While we're here, also unstaticize procfs_doprocfile() and
procfs_docurproc() so linprocfs can call them directly instead of duplicating them.

Submitted by:	Dominic Mitchell <dom@semantico.com>
2001-12-08 22:34:14 +00:00
Dag-Erling Smørgrav
3a669c52a8 Pseudofsize procfs(5). 2001-12-04 01:35:06 +00:00
Robert Watson
011376308f o Introduce pr_mtx into struct prison, providing protection for the
mutable contents of struct prison (hostname, securelevel, refcount,
  pr_linux, ...)
o Generally introduce mtx_lock()/mtx_unlock() calls throughout kern/
  so as to enforce these protections, in particular, in kern_mib.c
  protection sysctl access to the hostname and securelevel, as well as
  kern_prot.c access to the securelevel for access control purposes.
o Rewrite linux emulator abstractions for accessing per-jail linux
  mib entries (osname, osrelease, osversion) so that they don't return
  a pointer to the text in the struct linux_prison, rather, a copy
  to an array passed into the calls.  Likewise, update linprocfs to
  use these primitives.
o Update in_pcb.c to always use prison_getip() rather than directly
  accessing struct prison.

Reviewed by:	jhb
2001-12-03 16:12:27 +00:00
Peter Wemm
4ff021c699 Fix printf format bugs introduced in rev 1.34 for printing times.
quad_t cannot be printed with %lld on 64 bit systems.

Dont waste cpu to round user and system times up to long long, it is
highly improbable that a process will have accumulated 68 years of
user or system cpu time (not wall clock time) before a reboot or
process restart.
2001-11-07 02:51:25 +00:00
Brian Feldman
4228024de2 Correctly unlock the target process if /proc/$foo/mem is open()ed by
another process which cannot p_candebug() it.  The bug was introduced
in rev. 1.100.

Approved by:	des
2001-11-06 17:00:40 +00:00
Matthew Dillon
0e9fe2127c Adjust printfs to be time_t agnostic. 2001-10-28 22:53:45 +00:00