Commit Graph

301 Commits

Author SHA1 Message Date
jhb
af94eb296c Improve the ktrace locking somewhat to reduce overhead:
- Depessimize userret() in kernels where KTRACE is enabled by doing an
  unlocked check of the per-process queue of pending events before
  acquiring any locks.  Previously ktr_userret() unconditionally acquired
  the global ktrace_sx lock on every return to userland for every thread,
  even if ktrace wasn't enabled for the thread.
- Optimize the locking in exit() to first perform an unlocked read of
  p_traceflag to see if ktrace is enabled and only acquire locks and
  teardown ktrace if the test succeeds.  Also, explicitly disable tracing
  before draining any pending events so the pending events actually get
  written out.  The unlocked read is safe because proc lock is acquired
  earlier after single-threading so p_traceflag can't change between then
  and this check (well, it can currently due to a bug in ktrace I will fix
  next, but that race existed prior to this change as well).

Reviewed by:	rwatson
2007-06-13 20:01:42 +00:00
attilio
12d804e413 rufetch and calcru sometimes should be called atomically together.
This patch fixes places where they should be called atomically changing
their locking requirements (both assume per-proc spinlock held) and
introducing rufetchcalc which wrappers both calls to be performed in
atomic way.

Reviewed by: jeff
Approved by: jeff (mentor)
2007-06-09 21:48:44 +00:00
attilio
c105658c88 The current rusage code show peculiar problems:
- Unsafeness on ruadd() in thread_exit()
- Unatomicity of thread_exiit() in the exit1() operations

This patch addresses these problems allocating p_fd as part of the
process and modifying the way it is accessed.

A small chunk of this patch, resolves a race about p_state in kern_wait(),
since we have to be sure about the zombif-ing process.

Submitted by: jeff
Approved by: jeff (mentor)
2007-06-09 18:56:11 +00:00
rwatson
9f332c91ef Move per-process audit state from a pointer in the proc structure to
embedded storage in struct ucred.  This allows audit state to be cached
with the thread, avoiding locking operations with each system call, and
makes it available in asynchronous execution contexts, such as deep in
the network stack or VFS.

Reviewed by:	csjp
Approved by:	re (kensmith)
Obtained from:	TrustedBSD Project
2007-06-07 22:27:15 +00:00
jeff
91d1501790 Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
   sychronization.
 - Use the per-process spinlock rather than the sched_lock for per-process
   scheduling synchronization.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-05 00:00:57 +00:00
jeff
a7a8bac81f - Move rusage from being per-process in struct pstats to per-thread in
td_ru.  This removes the requirement for per-process synchronization in
   statclock() and mi_switch().  This was previously supported by
   sched_lock which is going away.  All modifications to rusage are now
   done in the context of the owning thread.  reads proceed without locks.
 - Aggregate exiting threads rusage in thread_exit() such that the exiting
   thread's rusage is not lost.
 - Provide a new routine, rufetch() to fetch an aggregate of all rusage
   structures from all threads in a process.  This routine must be used
   in any place requiring a rusage from a process prior to it's exit.  The
   exited process's rusage is still available via p_ru.
 - Aggregate tick statistics only on demand via rufetch() or when a thread
   exits.  Tick statistics are kept in the thread and protected by sched_lock
   until it exits.

Initial patch by:	attilio
Reviewed by:		attilio, bde (some objections), arch (mostly silent)
2007-06-01 01:12:45 +00:00
jhb
77d161b46b Move cpu_exit() earlier in exit1() to close a race between
SIGCHLD/kevent(2) notification of process termination and wait().  Now
we no longer drop locks between sending the notification and marking
the process as a zombie.  Previously, if another process attempted to do
a wait() with W_NOHANG after receiving a SIGCHLD or kevent and locked
the process while the exiting thread was in cpu_exit(), then wait() would
fail to find the process, which is quite astonishing to the process
calling wait().

MFC after:	3 days
2007-05-14 22:21:58 +00:00
jhb
a84f74bb36 Rename the 'mtx_object', 'rw_object', and 'sx_object' members of mutexes,
rwlocks, and sx locks to 'lock_object'.
2007-03-21 21:20:51 +00:00
rwatson
69938bd196 Further system call comment cleanup:
- Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde)
- Remove extra blank lines in some cases.
- Add extra blank lines in some cases.
- Remove no-op comments consisting solely of the function name, the word
  "syscall", or the system call name.
- Add punctuation.
- Re-wrap some comments.
2007-03-05 13:10:58 +00:00
rwatson
300d4098cf Remove 'MPSAFE' annotations from the comments above most system calls: all
system calls now enter without Giant held, and then in some cases, acquire
Giant explicitly.

Remove a number of other MPSAFE annotations in the credential code and
tweak one or two other adjacent comments.
2007-03-04 22:36:48 +00:00
davidxu
4889cc1ac4 Move sigqueue_take() call into proc_reparent(), this fixed bugs where
proc_reparent() is called but sigqueue_take() is forgotten.
2006-10-25 06:18:04 +00:00
davidxu
c117e9447c Protect sigqueue_take() call by child process's lock, it fixed a
potential race with ptrace 'attach' which changes parent of the
child process.
2006-10-24 12:04:21 +00:00
rwatson
7beaaf5cd2 Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h.  sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from:	TrustedBSD Project
Sponsored by:	SPARTA
2006-10-22 11:52:19 +00:00
davidxu
df9c81e665 Since revision 1.333 of kern_sig.c no longer uses P_WEXIT, the change
opened a race window which can cause memory leak in signal queue.
Here we free memory for signal queue when process state is set to
PRS_ZOMBIE.
2006-10-21 23:59:15 +00:00
csjp
3927aa4474 Back out one of the Giant removals from revision 1.272. Giant was not here to
protect the vnode, it was present to synchronize access to TTY session
information between exit(2) and the TTY code. While we are here, note that
Giant is required for TTY protection.

Clue from:	bde
Discussed with:	jhb
MFC after:	1 week
2006-09-13 15:47:53 +00:00
tegge
0d5f191162 Close race between vmspace_exitfree() and exit1() and races between
vmspace_exitfree() and vmspace_free() which could result in the same
vmspace being freed twice.

Factor out part of exit1() into new function vmspace_exit().  Attach
to vmspace0 to allow old vmspace to be freed earlier.

Add new function, vmspace_acquire_ref(), for obtaining a vmspace
reference for a vmspace belonging to another process.  Avoid changing
vmspace refcount from 0 to 1 since that could also lead to the same
vmspace being freed twice.

Change vmtotal() and swapout_procs() to use vmspace_acquire_ref().

Reviewed by:	alc
2006-05-29 21:28:56 +00:00
csjp
8ddf669db3 Kill the last Giant acquisition in the exit(2) code. This Giant acquisition
doesn't appear to be protecting anything. Most of consumers funsetownlst(9)
do not appear to be picking up Giant anywhere. This was originally a part
of my Giant exit(2) clean up revision 1.272 but I thought it was a good idea
to leave it out until we were able to analyze it better.

Tested by:	kris
MFC after:	3 weeks
2006-04-10 14:07:28 +00:00
peter
0f363b7d24 Remove the unused sva and eva arguments from pmap_remove_pages(). 2006-04-03 21:16:10 +00:00
davidxu
baf4d3f4f1 1. Count last time slice, this intends to fix
"calcru: runtime went backwards" bug for threaded process.
2. Add comment about possible logical problem with scheduler.

MFC after: 3 days
2006-03-14 04:00:21 +00:00
jhb
ff9c76bccd Close some races between procfs/ptrace and exit(2):
- Reorder the events in exit(2) slightly so that we trigger the S_EXIT
  stop event earlier.  After we have signalled that, we set P_WEXIT and
  then wait for any processes with a hold on the vmspace via PHOLD to
  release it.  PHOLD now KASSERT()'s that P_WEXIT is clear when it is
  invoked, and PRELE now does a wakeup if P_WEXIT is set and p_lock drops
  to zero.
- Change proc_rwmem() to require that the processing read from has its
  vmspace held via PHOLD by the caller and get rid of all the junk to
  screw around with the vmspace reference count as we no longer need it.
- In ptrace() and pseudofs(), treat a process with P_WEXIT set as if it
  doesn't exist.
- Only do one PHOLD in kern_ptrace() now, and do it earlier so it covers
  FIX_SSTEP() (since on alpha at least this can end up calling proc_rwmem()
  to clear an earlier single-step simualted via a breakpoint).  We only
  do one to avoid races.  Also, by making the EINVAL error for unknown
  requests be part of the default: case in the switch, the various
  switch cases can now just break out to return which removes a _lot_ of
  duplicated PRELE and proc unlocks, etc.  Also, it fixes at least one bug
  where a LWP ptrace command could return EINVAL with the proc lock still
  held.
- Changed the locking for ptrace_single_step(), ptrace_set_pc(), and
  ptrace_clear_single_step() to always be called with the proc lock
  held (it was a mixed bag previously).  Alpha and arm have to drop
  the lock while the mess around with breakpoints, but other archs
  avoid extra lock release/acquires in ptrace().  I did have to fix a
  couple of other consumers in kern_kse and a few other places to
  hold the proc lock and PHOLD.

Tested by:	ps (1 mostly, but some bits of 2-4 as well)
MFC after:	1 week
2006-02-22 18:57:50 +00:00
jhb
abc45ce010 Move the ruadd() in kern_exit() to save our final stats in our child
stats even further down in exit1() so that it includes the runtime and
tick counts from the final time slice for the dying thread.

Reviewed by:	phk
2006-02-21 21:48:42 +00:00
phk
79081baaf0 CPU time accounting speedup (step 2)
Keep accounting time (in per-cpu) cputicks and the statistics counts
in the thread and summarize into struct proc when at context switch.

Don't reach across CPUs in calcru().

Add code to calibrate the top speed of cpu_tickrate() for variable
cpu_tick hardware (like TSC on power managed machines).

Don't enforce monotonicity (at least for now) in calcru.  While the
calibrated cpu_tickrate ramps up it may not be true.

Use 27MHz counter on i386/Geode.

Use TSC on amd64 & i386 if present.

Use tick counter on sparc64
2006-02-11 09:33:07 +00:00
phk
bb2f62f536 Modify the way we account for CPU time spent (step 1)
Keep track of time spent by the cpu in various contexts in units of
"cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t
only when somebody wants to inspect the numbers.

For now "cputicks" are still derived from the current timecounter
and therefore things should by definition remain sensible also on
SMP machines.  (The main reason for this first milestone commit is
to verify that hypothesis.)

On slower machines, the avoided multiplications to normalize timestams
at every context switch, comes out as a 5-7% better score on the
unixbench/context1 microbenchmark.  On more modern hardware no change
in performance is seen.
2006-02-07 21:22:02 +00:00
jhb
db29bc1e15 - Move the wakeup() for exiting kthreads out of exit1() and into
kthread_exit() as that is cleaner and less obscured.  It also does the
  wakeup sooner.
- Add some comments to kthread_exit().
2006-02-06 21:56:13 +00:00
wsalamon
88c7ad2392 Audit the pid being requested in wait4().
Obtained from: TrustedBSD Project
Approved by: rwatson (mentor)
2006-02-06 00:19:09 +00:00
rwatson
4cd971d35d On process exit, audit the return value of the process, and commit the
record immediately, as this system call never returns.

Obtained from:	TrustedBSD Project
2006-02-05 21:08:25 +00:00
jhb
2b85b6db3b Add a comment. 2006-02-03 21:09:40 +00:00
rwatson
53a606d94a Hook up audit to fork() and exit() events. These changes manage the
audit state on processes, not auditing of these events.

Much work by:	wsalamon
Obtained from:	TrustedBSD Project
2006-02-02 01:32:58 +00:00
ups
18ba9270dc Hopefully fix the "calcru: runtime went backwards from ..." problem by
keeping the resource values locked (where needed) while we use them
for calculations.

MFC after:	3 days
2006-01-23 19:15:13 +00:00
phk
213233d76c Regenerate sysent with new abort2 system call.
Implement abort2(const char *reason, int narg, void **args);

Submitted by:	"Wojciech A. Koszek" <dunstan@freebsd.czest.pl>
2005-12-23 11:58:42 +00:00
davidxu
1628e16677 Register itimers_event_hook as a kernel event handler, so I don't
have to duplicate code to call it in exec() and exit1().
2005-12-09 05:43:26 +00:00
rwatson
2a5785fb21 Moderate rewrite of kernel ktrace code to attempt to generally improve
reliability when tracing fast-moving processes or writing traces to
slow file systems by avoiding unbounded queueuing and dropped records.
Record loss was previously possible when the global pool of records
become depleted as a result of record generation outstripping record
commit, which occurred quickly in many common situations.

These changes partially restore the 4.x model of committing ktrace
records at the point of trace generation (synchronous), but maintain
the 5.x deferred record commit behavior (asynchronous) for situations
where entering VFS and sleeping is not possible (i.e., in the
scheduler).  Records are now queued per-process as opposed to
globally, with processes responsible for committing records from their
own context as required.

- Eliminate the ktrace worker thread and global record queue, as they
  are no longer used.  Keep the global free record list, as records
  are still used.

- Add a per-process record queue, which will hold any asynchronously
  generated records, such as from context switches.  This replaces the
  global queue as the place to submit asynchronous records to.

- When a record is committed asynchronously, simply queue it to the
  process.

- When a record is committed synchronously, first drain any pending
  per-process records in order to maintain ordering as best we can.
  Currently ordering between competing threads is provided via a global
  ktrace_sx, but a per-process flag or lock may be desirable in the
  future.

- When a process returns to user space following a system call, trap,
  signal delivery, etc, flush any pending records.

- When a process exits, flush any pending records.

- Assert on process tear-down that there are no pending records.

- Slightly abstract the notion of being "in ktrace", which is used to
  prevent the recursive generation of records, as well as generating
  traces for ktrace events.

Future work here might look at changing the set of events marked for
synchronous and asynchronous record generation, re-balancing queue
depth, timeliness of commit to disk, and so on.  I.e., performing a
drain every (n) records.

MFC after:	1 month
Discussed with:	jhb
Requested by:	Marc Olzheim <marcolz at stack dot nl>
2005-11-13 13:27:44 +00:00
csjp
62ab0fa062 Giant clean up for exit(2)
-Change unconditional aquisition of Giant to only pickup Giant if the vnode
 for the controlling tty resides on a non-mpsafe file system.
-Pickup Giant around executable vnode reference counting operations only if
 the executable resides on a non-mpsafe file system.
-If this process is being traced, pickup Giant for trace file reference count
 operations only if it resides on a non-mpsafe file system.

Discussed with:	jhb
Tested by:	kris
2005-11-08 17:11:03 +00:00
davidxu
37bb483679 Add support for queueing SIGCHLD same as other UNIX systems did.
For each child process whose status has been changed, a SIGCHLD instance
is queued, if the signal is stilling pending, and process changed status
several times, signal information is updated to reflect latest process
status. If wait() returns because the status of a child process is
available, pending SIGCHLD signal associated with the child process is
discarded. Any other pending SIGCHLD signals remain pending.

The signal information is allocated at the same time when proc structure
is allocated, if process signal queue is fully filled or there is a memory
shortage, it can still send the signal to process.

There is a booting time tunable kern.sigqueue.queue_sigchild which
can control the behavior, setting it to zero disables the SIGCHLD queueing
feature, the tunable will be removed if the function is proved that it is
stable enough.

Tested on: i386 (SMP and UP)
2005-11-08 09:09:26 +00:00
jhb
2eddf38ca6 Push down Giant into fdfree() and remove it from two of the callers.
Other callers such as some rfork() cases weren't locking Giant anyway.

Reviewed by:	csjp
MFC after:	1 week
2005-11-01 17:13:05 +00:00
glebius
d9ad5313fd - Fix leak of struct nlminfo on process exit.
- Fix malloc type collision, that made the above problem
  difficult to understand.

Reported by:	Vladimir Sharun <sharun ukr.net>
2005-10-26 07:18:37 +00:00
davidxu
7086514f41 Make p_itimers as a pointer, so file sys/proc.h does not need to include
sys/timers.h.
2005-10-23 12:19:08 +00:00
davidxu
cf50eec401 Implement POSIX timers. Current only CLOCK_REALTIME and CLOCK_MONOTONIC
clock are supported. I have plan to merge XSI timer ITIMER_REAL and other
two CPU timers into the new code, current three slots are available for
the XSI timers.
The SIGEV_THREAD notification type is not supported yet because our
sigevent struct lacks of two member fields:
sigev_notify_function
sigev_notify_attributes
I have found the sigevent is used in AIO, so I won't add the two members
unless the AIO code is adjusted.
2005-10-23 04:22:56 +00:00
davidxu
3fbdb3c215 1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most
changes in MD code are trivial, before this change, trapsignal and
   sendsig use discrete parameters, now they uses member fields of
   ksiginfo_t structure. For sendsig, this change allows us to pass
   POSIX realtime signal value to user code.

2. Remove cpu_thread_siginfo, it is no longer needed because we now always
   generate ksiginfo_t data and feed it to libpthread.

3. Add p_sigqueue to proc structure to hold shared signals which were
   blocked by all threads in the proc.

4. Add td_sigqueue to thread structure to hold all signals delivered to
   thread.

5. i386 and amd64 now return POSIX standard si_code, other arches will
   be fixed.

6. In this sigqueue implementation, pending signal set is kept as before,
   an extra siginfo list holds additional siginfo_t data for signals.
   kernel code uses psignal() still behavior as before, it won't be failed
   even under memory pressure, only exception is when deleting a signal,
   we should call sigqueue_delete to remove signal from sigqueue but
   not SIGDELSET. Current there is no kernel code will deliver a signal
   with additional data, so kernel should be as stable as before,
   a ksiginfo can carry more information, for example, allow signal to
   be delivered but throw away siginfo data if memory is not enough.
   SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can
   not be caught or masked.
   The sigqueue() syscall allows user code to queue a signal to target
   process, if resource is unavailable, EAGAIN will be returned as
   specification said.
   Just before thread exits, signal queue memory will be freed by
   sigqueue_flush.
   Current, all signals are allowed to be queued, not only realtime signals.

Earlier patch reviewed by: jhb, deischen
Tested on: i386, amd64
2005-10-14 12:43:47 +00:00
jhb
736b03e795 Add witness warnings to panic if a thread tries to exit while holding any
locks.

Requested by:	jeff
MFC after:	3 days
2005-09-02 20:20:01 +00:00
jhb
77e96aefca - Slightly reorder the events around the setting of PRS_ZOMBIE to be less
hokie and much more readable and expand the comment to explain why it is
  the way that it is.
- Close a race where one CPU could free the process belonging to a thread
  on another CPU that hasn't quite finished exiting yet but is beyond the
  point of setting the process state as PRS_ZOMBIE.

Reported and tested by:	ps (2)
MFC after:	3 days
2005-07-18 20:08:14 +00:00
ups
acfce18a2a Use low level constructs borrowed from interrupt threads to wait for
work in proc0.
Remove the TDP_WAKEPROC0 workaround.
2005-05-23 23:01:53 +00:00
davidxu
af64c19b3b Only check signal event, single threading event shouldn't be reported. 2005-05-05 06:42:02 +00:00
davidxu
50a5bbcbfd Wake up swapper process if needed.
PR: kern/78474
Submitted by: Sam Lawrance <boris at brooknet dot com dot au>
2005-04-23 05:06:44 +00:00
davidxu
9452a25d2d Clear P_STATCHILD earlier to avoid unnecessary retrying. 2005-04-19 12:31:15 +00:00
davidxu
02615ff23a Fix a race condition between kern_wait() and thread_stopped().
Problem is in kern_wait(), parent process steps through children list,
once a child process is skipped, and later even if the child is stopped,
parent process still sleeps in msleep(), the race happens if parent
masked SIGCHLD.

Submitted by : Peter Edwards peadar.edwards at gmail dot com
MFC after    : 4 days
2005-04-19 08:07:28 +00:00
rwatson
75030e30f6 Introduce p_canwait() and MAC Framework and MAC Policy entry points
mac_check_proc_wait(), which control the ability to wait4() specific
processes.  This permits MAC policies to limit information flow from
children that have changed label, although has to be handled carefully
due to common programming expectations regarding the behavior of
wait4().  The cr_seeotheruids() check in p_canwait() is #if 0'd for
this reason.

The mac_stub and mac_test policies are updated to reflect these new
entry points.

Sponsored by:	SPAWAR, SPARTA
Obtained from:	TrustedBSD Project
2005-04-18 13:36:57 +00:00
jeff
7b0b6888cc - A lock is required before calling VOP_REVOKE. Our reference protects us
from accessing another vnode so a naked VOP_LOCK is sufficient.

Sponsored by:	Isilon Systems, Inc.
2005-03-13 11:47:04 +00:00
phk
cd117518d7 In 1.276 of kern/subr_trap.c I introduced a mechanism for delaying
a process return to userspace if it had pending GEOM events.

We need to have the same check in the exit pass to catch the case
where a GEOM related filedescriptor is not explicitly closed by
the process.

Bumped into by:	people using dd(1) to build releases, nanobsd etc.
2005-01-29 14:03:41 +00:00
rwatson
79b283cecc In kern_wait(), let the compiler copy the rusage structure rather than
an explicit bcopy() -- it probably does a better job.
2005-01-08 04:17:48 +00:00