Commit Graph

278 Commits

Author SHA1 Message Date
jhb
db29bc1e15 - Move the wakeup() for exiting kthreads out of exit1() and into
kthread_exit() as that is cleaner and less obscured.  It also does the
  wakeup sooner.
- Add some comments to kthread_exit().
2006-02-06 21:56:13 +00:00
wsalamon
88c7ad2392 Audit the pid being requested in wait4().
Obtained from: TrustedBSD Project
Approved by: rwatson (mentor)
2006-02-06 00:19:09 +00:00
rwatson
4cd971d35d On process exit, audit the return value of the process, and commit the
record immediately, as this system call never returns.

Obtained from:	TrustedBSD Project
2006-02-05 21:08:25 +00:00
jhb
2b85b6db3b Add a comment. 2006-02-03 21:09:40 +00:00
rwatson
53a606d94a Hook up audit to fork() and exit() events. These changes manage the
audit state on processes, not auditing of these events.

Much work by:	wsalamon
Obtained from:	TrustedBSD Project
2006-02-02 01:32:58 +00:00
ups
18ba9270dc Hopefully fix the "calcru: runtime went backwards from ..." problem by
keeping the resource values locked (where needed) while we use them
for calculations.

MFC after:	3 days
2006-01-23 19:15:13 +00:00
phk
213233d76c Regenerate sysent with new abort2 system call.
Implement abort2(const char *reason, int narg, void **args);

Submitted by:	"Wojciech A. Koszek" <dunstan@freebsd.czest.pl>
2005-12-23 11:58:42 +00:00
davidxu
1628e16677 Register itimers_event_hook as a kernel event handler, so I don't
have to duplicate code to call it in exec() and exit1().
2005-12-09 05:43:26 +00:00
rwatson
2a5785fb21 Moderate rewrite of kernel ktrace code to attempt to generally improve
reliability when tracing fast-moving processes or writing traces to
slow file systems by avoiding unbounded queueuing and dropped records.
Record loss was previously possible when the global pool of records
become depleted as a result of record generation outstripping record
commit, which occurred quickly in many common situations.

These changes partially restore the 4.x model of committing ktrace
records at the point of trace generation (synchronous), but maintain
the 5.x deferred record commit behavior (asynchronous) for situations
where entering VFS and sleeping is not possible (i.e., in the
scheduler).  Records are now queued per-process as opposed to
globally, with processes responsible for committing records from their
own context as required.

- Eliminate the ktrace worker thread and global record queue, as they
  are no longer used.  Keep the global free record list, as records
  are still used.

- Add a per-process record queue, which will hold any asynchronously
  generated records, such as from context switches.  This replaces the
  global queue as the place to submit asynchronous records to.

- When a record is committed asynchronously, simply queue it to the
  process.

- When a record is committed synchronously, first drain any pending
  per-process records in order to maintain ordering as best we can.
  Currently ordering between competing threads is provided via a global
  ktrace_sx, but a per-process flag or lock may be desirable in the
  future.

- When a process returns to user space following a system call, trap,
  signal delivery, etc, flush any pending records.

- When a process exits, flush any pending records.

- Assert on process tear-down that there are no pending records.

- Slightly abstract the notion of being "in ktrace", which is used to
  prevent the recursive generation of records, as well as generating
  traces for ktrace events.

Future work here might look at changing the set of events marked for
synchronous and asynchronous record generation, re-balancing queue
depth, timeliness of commit to disk, and so on.  I.e., performing a
drain every (n) records.

MFC after:	1 month
Discussed with:	jhb
Requested by:	Marc Olzheim <marcolz at stack dot nl>
2005-11-13 13:27:44 +00:00
csjp
62ab0fa062 Giant clean up for exit(2)
-Change unconditional aquisition of Giant to only pickup Giant if the vnode
 for the controlling tty resides on a non-mpsafe file system.
-Pickup Giant around executable vnode reference counting operations only if
 the executable resides on a non-mpsafe file system.
-If this process is being traced, pickup Giant for trace file reference count
 operations only if it resides on a non-mpsafe file system.

Discussed with:	jhb
Tested by:	kris
2005-11-08 17:11:03 +00:00
davidxu
37bb483679 Add support for queueing SIGCHLD same as other UNIX systems did.
For each child process whose status has been changed, a SIGCHLD instance
is queued, if the signal is stilling pending, and process changed status
several times, signal information is updated to reflect latest process
status. If wait() returns because the status of a child process is
available, pending SIGCHLD signal associated with the child process is
discarded. Any other pending SIGCHLD signals remain pending.

The signal information is allocated at the same time when proc structure
is allocated, if process signal queue is fully filled or there is a memory
shortage, it can still send the signal to process.

There is a booting time tunable kern.sigqueue.queue_sigchild which
can control the behavior, setting it to zero disables the SIGCHLD queueing
feature, the tunable will be removed if the function is proved that it is
stable enough.

Tested on: i386 (SMP and UP)
2005-11-08 09:09:26 +00:00
jhb
2eddf38ca6 Push down Giant into fdfree() and remove it from two of the callers.
Other callers such as some rfork() cases weren't locking Giant anyway.

Reviewed by:	csjp
MFC after:	1 week
2005-11-01 17:13:05 +00:00
glebius
d9ad5313fd - Fix leak of struct nlminfo on process exit.
- Fix malloc type collision, that made the above problem
  difficult to understand.

Reported by:	Vladimir Sharun <sharun ukr.net>
2005-10-26 07:18:37 +00:00
davidxu
7086514f41 Make p_itimers as a pointer, so file sys/proc.h does not need to include
sys/timers.h.
2005-10-23 12:19:08 +00:00
davidxu
cf50eec401 Implement POSIX timers. Current only CLOCK_REALTIME and CLOCK_MONOTONIC
clock are supported. I have plan to merge XSI timer ITIMER_REAL and other
two CPU timers into the new code, current three slots are available for
the XSI timers.
The SIGEV_THREAD notification type is not supported yet because our
sigevent struct lacks of two member fields:
sigev_notify_function
sigev_notify_attributes
I have found the sigevent is used in AIO, so I won't add the two members
unless the AIO code is adjusted.
2005-10-23 04:22:56 +00:00
davidxu
3fbdb3c215 1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most
changes in MD code are trivial, before this change, trapsignal and
   sendsig use discrete parameters, now they uses member fields of
   ksiginfo_t structure. For sendsig, this change allows us to pass
   POSIX realtime signal value to user code.

2. Remove cpu_thread_siginfo, it is no longer needed because we now always
   generate ksiginfo_t data and feed it to libpthread.

3. Add p_sigqueue to proc structure to hold shared signals which were
   blocked by all threads in the proc.

4. Add td_sigqueue to thread structure to hold all signals delivered to
   thread.

5. i386 and amd64 now return POSIX standard si_code, other arches will
   be fixed.

6. In this sigqueue implementation, pending signal set is kept as before,
   an extra siginfo list holds additional siginfo_t data for signals.
   kernel code uses psignal() still behavior as before, it won't be failed
   even under memory pressure, only exception is when deleting a signal,
   we should call sigqueue_delete to remove signal from sigqueue but
   not SIGDELSET. Current there is no kernel code will deliver a signal
   with additional data, so kernel should be as stable as before,
   a ksiginfo can carry more information, for example, allow signal to
   be delivered but throw away siginfo data if memory is not enough.
   SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can
   not be caught or masked.
   The sigqueue() syscall allows user code to queue a signal to target
   process, if resource is unavailable, EAGAIN will be returned as
   specification said.
   Just before thread exits, signal queue memory will be freed by
   sigqueue_flush.
   Current, all signals are allowed to be queued, not only realtime signals.

Earlier patch reviewed by: jhb, deischen
Tested on: i386, amd64
2005-10-14 12:43:47 +00:00
jhb
736b03e795 Add witness warnings to panic if a thread tries to exit while holding any
locks.

Requested by:	jeff
MFC after:	3 days
2005-09-02 20:20:01 +00:00
jhb
77e96aefca - Slightly reorder the events around the setting of PRS_ZOMBIE to be less
hokie and much more readable and expand the comment to explain why it is
  the way that it is.
- Close a race where one CPU could free the process belonging to a thread
  on another CPU that hasn't quite finished exiting yet but is beyond the
  point of setting the process state as PRS_ZOMBIE.

Reported and tested by:	ps (2)
MFC after:	3 days
2005-07-18 20:08:14 +00:00
ups
acfce18a2a Use low level constructs borrowed from interrupt threads to wait for
work in proc0.
Remove the TDP_WAKEPROC0 workaround.
2005-05-23 23:01:53 +00:00
davidxu
af64c19b3b Only check signal event, single threading event shouldn't be reported. 2005-05-05 06:42:02 +00:00
davidxu
50a5bbcbfd Wake up swapper process if needed.
PR: kern/78474
Submitted by: Sam Lawrance <boris at brooknet dot com dot au>
2005-04-23 05:06:44 +00:00
davidxu
9452a25d2d Clear P_STATCHILD earlier to avoid unnecessary retrying. 2005-04-19 12:31:15 +00:00
davidxu
02615ff23a Fix a race condition between kern_wait() and thread_stopped().
Problem is in kern_wait(), parent process steps through children list,
once a child process is skipped, and later even if the child is stopped,
parent process still sleeps in msleep(), the race happens if parent
masked SIGCHLD.

Submitted by : Peter Edwards peadar.edwards at gmail dot com
MFC after    : 4 days
2005-04-19 08:07:28 +00:00
rwatson
75030e30f6 Introduce p_canwait() and MAC Framework and MAC Policy entry points
mac_check_proc_wait(), which control the ability to wait4() specific
processes.  This permits MAC policies to limit information flow from
children that have changed label, although has to be handled carefully
due to common programming expectations regarding the behavior of
wait4().  The cr_seeotheruids() check in p_canwait() is #if 0'd for
this reason.

The mac_stub and mac_test policies are updated to reflect these new
entry points.

Sponsored by:	SPAWAR, SPARTA
Obtained from:	TrustedBSD Project
2005-04-18 13:36:57 +00:00
jeff
7b0b6888cc - A lock is required before calling VOP_REVOKE. Our reference protects us
from accessing another vnode so a naked VOP_LOCK is sufficient.

Sponsored by:	Isilon Systems, Inc.
2005-03-13 11:47:04 +00:00
phk
cd117518d7 In 1.276 of kern/subr_trap.c I introduced a mechanism for delaying
a process return to userspace if it had pending GEOM events.

We need to have the same check in the exit pass to catch the case
where a GEOM related filedescriptor is not explicitly closed by
the process.

Bumped into by:	people using dd(1) to build releases, nanobsd etc.
2005-01-29 14:03:41 +00:00
rwatson
79b283cecc In kern_wait(), let the compiler copy the rusage structure rather than
an explicit bcopy() -- it probably does a better job.
2005-01-08 04:17:48 +00:00
imp
20280f1431 /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
jhb
3ec0dff7ad - Move the function prototypes for kern_setrlimit() and kern_wait() to
sys/syscallsubr.h where all the other kern_foo() prototypes live.
- Resort kern_execve() while I'm there.
2005-01-05 22:19:44 +00:00
das
130bed6547 Don't include sys/user.h merely for its side-effect of recursively
including other headers.
2004-11-27 06:51:39 +00:00
davidxu
45aef9e6ad Remove P_STOPPED_TRACE bit if debugger dies without a chance to
detach debugged process.
2004-10-23 11:20:26 +00:00
jhb
ce2d3f89af Rework how we store process times in the kernel such that we always store
the raw values including for child process statistics and only compute the
system and user timevals on demand.

- Fix the various kern_wait() syscall wrappers to only pass in a rusage
  pointer if they are going to use the result.
- Add a kern_getrusage() function for the ABI syscalls to use so that they
  don't have to play stackgap games to call getrusage().
- Fix the svr4_sys_times() syscall to just call calcru() to calculate the
  times it needs rather than calling getrusage() twice with associated
  stackgap, etc.
- Add a new rusage_ext structure to store raw time stats such as tick counts
  for user, system, and interrupt time as well as a bintime of the total
  runtime.  A new p_rux field in struct proc replaces the same inline fields
  from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime).  A new p_crux
  field in struct proc contains the "raw" child time usage statistics.
  ruadd() has been changed to handle adding the associated rusage_ext
  structures as well as the values in rusage.  Effectively, the values in
  rusage_ext replace the ru_utime and ru_stime values in struct rusage.  These
  two fields in struct rusage are no longer used in the kernel.
- calcru() has been split into a static worker function calcru1() that
  calculates appropriate timevals for user and system time as well as updating
  the rux_[isu]u fields of a passed in rusage_ext structure.  calcru() uses a
  copy of the process' p_rux structure to compute the timevals after updating
  the runtime appropriately if any of the threads in that process are
  currently executing.  It also now only locks sched_lock internally while
  doing the rux_runtime fixup.  calcru() now only requires the caller to
  hold the proc lock and calcru1() only requires the proc lock internally.
  calcru() also no longer allows callers to ask for an interrupt timeval
  since none of them actually did.
- calcru() now correctly handles threads executing on other CPUs.
- A new calccru() function computes the child system and user timevals by
  calling calcru1() on p_crux.  Note that this means that any code that wants
  child times must now call this function rather than reading from p_cru
  directly.  This function also requires the proc lock.
- This finishes the locking for rusage and friends so some of the Giant locks
  in exit1() and kern_wait() are now gone.
- The locking in ttyinfo() has been tweaked so that a shared lock of the
  proctree lock is used to protect the process group rather than the process
  group lock.  By holding this lock until the end of the function we now
  ensure that the process/thread that we pick to dump info about will no
  longer vanish while we are trying to output its info to the console.

Submitted by:	bde (mostly)
MFC after:	1 month
2004-10-05 18:51:11 +00:00
jhb
3bd2c1b67b Some more whitespace, style, and comment fixes.
Submitted by:	bde (mostly)
2004-09-24 20:27:04 +00:00
jhb
7b666b4137 A modest collection of various and sundry style, spelling, and whitespace
fixes.

Submitted by:	bde (mostly)
2004-09-24 00:38:15 +00:00
jhb
3956303607 Various small style fixes. 2004-09-22 15:24:33 +00:00
julian
5813d27029 Refactor a bunch of scheduler code to give basically the same behaviour
but with slightly cleaned up interfaces.

The KSE structure has become the same as the "per thread scheduler
private data" structure. In order to not make the diffs too great
one is #defined as the other at this time.

The KSE (or td_sched) structure is  now allocated per thread and has no
allocation code of its own.

Concurrency for a KSEGRP is now kept track of via a simple pair of counters
rather than using KSE structures as tokens.

Since the KSE structure is different in each scheduler, kern_switch.c
is now included at the end of each scheduler. Nothing outside the
scheduler knows the contents of the KSE (aka td_sched) structure.

The fields in the ksegrp structure that are to do with the scheduler's
queueing mechanisms are now moved to the kg_sched structure.
(per ksegrp scheduler private data structure). In other words how the
scheduler queues and keeps track of threads is no-one's business except
the scheduler's. This should allow people to write experimental
schedulers with completely different internal structuring.

A scheduler call sched_set_concurrency(kg, N) has been added that
notifies teh scheduler that no more than N threads from that ksegrp
should be allowed to be on concurrently scheduled. This is also
used to enforce 'fainess' at this time so that a ksegrp with
10000 threads can not swamp a the run queue and force out a process
with 1 thread, since the current code will not set the concurrency above
NCPU, and both schedulers will not allow more than that many
onto the system run queue at a time. Each scheduler should eventualy develop
their own methods to do this now that they are effectively separated.

Rejig libthr's kernel interface to follow the same code paths as
linkse for scope system threads. This has slightly hurt libthr's performance
but I will work to recover as much of it as I can.

Thread exit code has been cleaned up greatly.
exit and exec code now transitions a process back to
'standard non-threaded mode' before taking the next step.
Reviewed by:	scottl, peter
MFC after:	1 week
2004-09-05 02:09:54 +00:00
jmg
bc1805c6e8 Add locking to the kqueue subsystem. This also makes the kqueue subsystem
a more complete subsystem, and removes the knowlege of how things are
implemented from the drivers.  Include locking around filter ops, so a
module like aio will know when not to be unloaded if there are outstanding
knotes using it's filter ops.

Currently, it uses the MTX_DUPOK even though it is not always safe to
aquire duplicate locks.  Witness currently doesn't support the ability
to discover if a dup lock is ok (in some cases).

Reviewed by:	green, rwatson (both earlier versions)
2004-08-15 06:24:42 +00:00
alc
6aaed2f8ea Giant is no longer required by vm_waitproc() and vmspace_exitfree().
Eliminate it acquisition and release around vm_waitproc() in kern_wait().
2004-07-30 20:31:02 +00:00
alc
8a38bc6b2c - Use atomic ops for updating the vmspace's refcnt and exitingcnt.
- Push down Giant into shmexit().  (Giant is acquired only if the vmspace
   contains shm segments.)
 - Eliminate the acquisition of Giant from proc_rwmem().
 - Reduce the scope of Giant in exit1(), uncovering the destruction of the
   address space.
2004-07-27 03:53:41 +00:00
julian
a488bebcd2 When calling scheduler entrypoints for creating new threads and processes,
specify "us" as the thread not the process/ksegrp/kse.
You can always find the others from the thread but the converse is not true.
Theorotically this would lead to runtime being allocated to the wrong
entity in some cases though it is not clear how often this actually happenned.
(would only affect threaded processes and would probably be pretty benign,
but it WAS a bug..)

Reviewed by: peter
2004-07-18 23:36:13 +00:00
davidxu
1920ad199e Add code to support debugging threaded process.
1. Add tm_lwpid into kse_thr_mailbox to indicate which kernel
   thread current user thread is running on. Add tm_dflags into
   kse_thr_mailbox, the flags is written by debugger, it tells
   UTS and kernel what should be done when the process is being
   debugged, current, there two flags TMDF_SSTEP and TMDF_DONOTRUNUSER.

   TMDF_SSTEP is used to tell kernel to turn on single stepping,
   or turn off if it is not set.

   TMDF_DONOTRUNUSER is used to tell kernel to schedule upcall
   whenever possible, to UTS, it means do not run the user thread
   until debugger clears it, this behaviour is necessary because
   gdb wants to resume only one thread when the thread's pc is
   at a breakpoint, and thread needs to go forward, in order to
   avoid other threads sneak pass the breakpoints, it needs to remove
   breakpoint, only wants one thread to go. Also, add km_lwp to
   kse_mailbox, the lwp id is copied to kse_thr_mailbox at context
   switch time when process is not being debugged, so when process
   is attached, debugger can map kernel thread to user thread.

2. Add p_xthread to proc strcuture and td_xsig to thread structure.
   p_xthread is used by a thread when it wants to report event
   to debugger, every thread can set the pointer, especially, when
   it is used in ptracestop, it is the last thread reporting event
   will win the race. Every thread has a td_xsig to exchange signal
   with debugger, thread uses TDF_XSIG flag to indicate it is reporting
   signal to debugger, if the flag is not cleared, thread will keep
   retrying until it is cleared by debugger, p_xthread may be
   used by debugger to indicate CURRENT thread. The p_xstat is still
   in proc structure to keep wait() to work, in future, we may
   just use td_xsig.

3. Add TDF_DBSUSPEND flag, the flag is used by debugger to suspend
   a thread. When process stops, debugger can set the flag for
   thread, thread will check the flag in thread_suspend_check,
   enters a loop, unless it is cleared by debugger, process is
   detached or process is existing. The flag is also checked in
   ptracestop, so debugger can temporarily suspend a thread even
   if the thread wants to exchange signal.

4. Current, in ptrace, we always resume all threads, but if a thread
   has already a TDF_DBSUSPEND flag set by debugger, it won't run.

Encouraged by: marcel, julian, deischen
2004-07-13 07:20:10 +00:00
alc
ea94acfaa6 Push down the acquisition and release of the page queues lock into
pmap_remove_pages().  (The implementation of pmap_remove_pages() is
optional.  If pmap_remove_pages() is unimplemented, the acquisition and
release of the page queues lock is unnecessary.)

Remove spl calls from the alpha, arm, and ia64 pmap_remove_pages().
2004-07-13 02:49:22 +00:00
marcel
57e7de678f Implement the PT_LWPINFO request. This request can be used by the
tracing process to obtain information about the LWP that caused the
traced process to stop. Debuggers can use this information to select
the thread currently running on the LWP as the current thread.

The request has been made compatible with NetBSD for as much as
possible. This implementation differs from NetBSD in the following
ways:
1.  The data argument is allowed to be smaller than the size of the
    ptrace_lwpinfo structure known to the kernel, but not 0. This
    is opposite to what NetBSD allows. The reason for this is that
    we can extend the structure without affecting older binaries.
2.  On NetBSD the tracing process is to set the pl_lwpid field to
    the Id of the LWP it wants information of. We don't do that.
    Our ptrace interface allows passing the LWP Id instead of the
    PID. The tracing process is to set the PID to the LWP Id it
    wants information of.
3.  When the PID is actually the PID of the tracing process, this
    request returns the information about the LWP that caused the
    process to stop. This was the whole purpose of the request in
    the first place.

When the traced process has exited, this request will return the
LWP Id 0, indicating that the process state is not the result of
an event specific to a LWP.
2004-07-12 05:07:50 +00:00
bde
747f331358 (1) Removed the bogus condition "p->p_pid != 1" on calling sched_exit()
from exit1().  sched_exit() must be called unconditionally from exit1().
    It was called almost unconditionally because the only exits on system
    shutdown if at all.

(2) Removed the comment that presumed to know what sched_exit() does.
    sched_exit() does different things for the ULE case.  The call became
    essential when it started doing load average stuff, but its caller
    should not know that.

(3) Didn't fix bugs caused by bitrot in the condition.  The condition was
    last correct in rev.1.208 when it was in wait1().  There p was spelled
    curthread->td_proc and was for the waiting parent; now p is for the
    exiting child.  The condition was to avoid lowering init's priority.
    It should be in sched_exit() itself.  Lowering of priorities is broken
    in other ways in at least the 4BSD scheduler, and doing it for init
    causes less noticeable problems than doing it for for shells.

Noticed by:	julian (1)
2004-06-21 14:49:50 +00:00
bde
b65b61b58a Update p_runtime on exit. This fixes calcru() on zombies, and prepares
for not calling calcru() on exit.  calcru() on a zombie can happen if
ttyinfo() (^T) picks one.

PR:		52490
2004-06-21 14:03:38 +00:00
davidxu
d11f8ce42b Add comment to reflect that we should retry after thread singling failed. 2004-06-18 11:13:49 +00:00
davidxu
70a732669b Remove a bogus panic. It is possible more than one threads will
be suspended in thread_suspend_check, after they are resumed, all
threads will call thread_single, but only one can be success,
others should retry and will exit in thread_suspend_check.
2004-06-18 06:21:09 +00:00
tjr
58dbd6e669 Remove remnants of PGINPROF. 2004-06-08 10:37:30 +00:00
tjr
80d36400ed Move TDF_SA from td_flags to td_pflags (and rename it accordingly)
so that it is no longer necessary to hold sched_lock while
manipulating it.

Reviewed by:	davidxu
2004-06-02 07:52:36 +00:00
tmm
7b769ce88f Retire cpu_sched_exit(); it is not used any more. 2004-05-26 12:09:39 +00:00