The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)
Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)
NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..
obtained, when all other scheduling activity is suspended. This is needed
on sparc64 to deactivate the vmspace of the exiting process on all cpus.
Otherwise if another unrelated process gets the exact same vmspace structure
allocated to it (same address), its address space will not be activated
properly. This seems to fix some spontaneous signal 11 problems with smp
on sparc64.
(P_CONTINUED) is set when a stopped process receives a SIGCONT and
cleared after it has notified a parent process that has requested
notification via waitpid(2) with WCONTINUED specified in its options
operand. The status value can be checked with the new WIFCONTINUED()
macro.
Reviewed by: jake
be done internally.
Ensure that no one can fsetown() to a dying process/pgrp. We need
to check the process for P_WEXIT to see if it's exiting. Process
groups are already safe because there is no such thing as a pgrp
zombie, therefore the proctree lock completely protects the pgrp
from having sigio structures associated with it after it runs
funsetownlst.
Add sigio lock to witness list under proctree and allproc, but over
proc and pgrp.
Seigo Tanimura helped with this.
exit1() we don't have to release it until we acquire schd_lock to
call cpu_throw().
- Since we can switch at any time due to preemption or a lock release
prior to acquiring sched_lock, don't update switchtime and switchticks
until the very end of exit1() after we have acquired sched_lock.
- Interlock the proctree_lock and proc lock in wait1() and exit1() to
avoid lost wakeups when a parent blocks waiting for a child to exit at
the bottom of wait1(). In exit1() the proc lock interlocked with
proctree_lock (and released after acquiring sched_lock) is that of
the parent process.
- In wait1() use an exclusive lock of proctree lock while we are
looking for a process to harvest. This allows us to completely
remove all references to the process once we've found one (i.e.,
disconnect it from pgrp's, session's, zombproc list, and it's parent's
children list) "atomically" without needing to worry about a lock
upgrade.
- We don't need sched_lock to test if p_stat is SZOMB or SSTOP when holding
the proc lock since the proc lock is always held with p_stat is set to
SZOMB or SSTOP.
- Protect nprocs with an xlock of the allproc_lock.
SIGCHLD handler is SIG_IGN. This is a reimplementation of the
problematic revision 1.131 of kern_exit.c. To avoid accessing process
UPAGES, we set a new procsig flag when the SIGCHLD handler is SIG_IGN
and use that instead.
p_pgrp since the pgrp locking went in. We also don't need it to check for
invalid values in the options argument to wait1(), so push Giant down
slightly.
while holding the proc lock, and by holding the pargs structure when
accessing it from outside of the owner.
Submitted by: Jonathan Mini <mini@haikugeek.com>
There is still some locations where the PROC lock should be held
in order to prevent inconsistent views from outside (like the
proc->p_fd fix for kern/vfs_syscalls.c:checkdirs()) that can be
fixed later.
Submitted by: Jonathan Mini <mini@haikugeek.com>
New locks are:
- pgrpsess_lock which locks the whole pgrps and sessions,
- pg_mtx which protects the pgrp members, and
- s_mtx which protects the session members.
Please refer to sys/proc.h for the coverage of these locks.
Changes on the pgrp/session interface:
- pgfind() needs the pgrpsess_lock held.
- The caller of enterpgrp() is responsible to allocate a new pgrp and
session.
- Call enterthispgrp() in order to enter an existing pgrp.
- pgsignal() requires a pgrp lock held.
Reviewed by: jhb, alfred
Tested on: cvsup.jp.FreeBSD.org
(which is a quad-CPU machine running -current)
shared.
Also introduce vm_endcopy instead of using pointer tricks when
initializing new vmspaces.
The race occured because of how the reference was utilized:
test vmspace reference,
possibly block,
decrement reference
When sharing a vmspace between multiple processes it was possible
for two processes exiting at the same time to test the reference
count, possibly block and neither one free because they wouldn't
see the other's update.
Submitted by: green
fifesystem problems could prevent the release from completing and
this could result in init being blocked indefinitely.
This was looked over by Matt ages ago.
Approved by: dillon
mutex releases to not require flags for the cases when preemption is
not allowed:
The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent
switching to a higher priority thread on mutex releease and swi schedule,
respectively when that switch is not safe. Now that the critical section
API maintains a per-thread nesting count, the kernel can easily check
whether or not it should switch without relying on flags from the
programmer. This fixes a few bugs in that all current callers of
swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from
fast interrupt handlers and the swi_sched of softclock needed this flag.
Note that to ensure that swi_sched()'s in clock and fast interrupt
handlers do not switch, these handlers have to be explicitly wrapped
in critical_enter/exit pairs. Presently, just wrapping the handlers is
sufficient, but in the future with the fully preemptive kernel, the
interrupt must be EOI'd before critical_exit() is called. (critical_exit()
can switch due to a deferred preemption in a fully preemptive kernel.)
I've tested the changes to the interrupt code on i386 and alpha. I have
not tested ia64, but the interrupt code is almost identical to the alpha
code, so I expect it will work fine. PowerPC and ARM do not yet have
interrupt code in the tree so they shouldn't be broken. Sparc64 is
broken, but that's been ok'd by jake and tmm who will be fixing the
interrupt code for sparc64 shortly.
Reviewed by: peter
Tested on: i386, alpha
Remove the explicit call to aio_proc_rundown() from exit1(), instead AIO
will use at_exit(9).
Add functions at_exec(9), rm_at_exec(9) which function nearly the
same as at_exec(9) and rm_at_exec(9), these functions are called
on behalf of modules at the time of execve(2) after the image
activator has run.
Use a modified version of tegge's suggestion via at_exec(9) to close
an exploitable race in AIO.
Fix SYSCALL_MODULE_HELPER such that it's archetecuterally neutral,
the problem was that one had to pass it a paramater indicating the
number of arguments which were actually the number of "int". Fix
it by using an inline version of the AS macro against the syscall
arguments. (AS should be available globally but we'll get to that
later.)
Add a primative system for dynamically adding kqueue ops, it's really
not as sophisticated as it should be, but I'll discuss with jlemon when
he's around.
userland. The per thread ucred reference is immutable and thus needs no
locks to be read. However, until all the proc locking associated with
writes to p_ucred are completed, it is still not safe to use the per-thread
reference.
Tested on: x86 (SMP), alpha, sparc64
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
it to the MI area. KSE touched cpu_wait() which had the same change
replicated five ways for each platform. Now it can just do it once.
The only MD parts seemed to be dealing with fpu state cleanup and things
like vm86 cleanup on x86. The rest was identical.
XXX: ia64 and powerpc did not have cpu_throw(), so I've put a functional
stub in place.
Reviewed by: jake, tmm, dillon
This paniced my one of my machines one time too many :-( and there is
no sign of a solution in the pipeline. The deltas are still easily
available in cvs. The problem is that if the parent has been swapped
out, the child process cannot grope around in the parent's UPAGES to
see the sigact[] array or it will fault. This probably is a showstopper
for this implementation anyway.
an unexpected user-visible side effect with the sigaction flags. Also cleanup
a minor union issue.
Submitted by: Rudolf Cejka <cejkar@dcse.fee.vutbr.cz>
MFC addendum: MFC will be combined w/ original commit
MFC after: 3 days
(this commit is just the first stage). Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.
processes.
- Don't construct fake call args and then call kill(). psignal is not
anymore complicated and is quicker and not prone to locking problems.
Calling psignal() avoids having to do a pfind() since we already have a
proc pointer and also allows us to keep the task leader locked while we
kill all the peer processes so the list is kept coherent.
- When a kthread exits, do a wakeup() on its proc pointers. This can be
used by kernel modules that have kthreads and want to ensure they have
safely exited before completely the MOD_UNLOAD event.
Connectivity provided by: Usenix wireless
real uid, saved uid, real gid, and saved gid to ucred, as well as the
pcred->pc_uidinfo, which was associated with the real uid, only rename
it to cr_ruidinfo so as not to conflict with cr_uidinfo, which
corresponds to the effective uid.
o Remove p_cred from struct proc; add p_ucred to struct proc, replacing
original macro that pointed.
p->p_ucred to p->p_cred->pc_ucred.
o Universally update code so that it makes use of ucred instead of pcred,
p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo,
cr_{r,sv}{u,g}id instead of p_*, etc.
o Remove pcred0 and its initialization from init_main.c; initialize
cr_ruidinfo there.
o Restruction many credential modification chunks to always crdup while
we figure out locking and optimizations; generally speaking, this
means moving to a structure like this:
newcred = crdup(oldcred);
...
p->p_ucred = newcred;
crfree(oldcred);
It's not race-free, but better than nothing. There are also races
in sys_process.c, all inter-process authorization, fork, exec, and
exit.
o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid;
remove comments indicating that the old arrangement was a problem.
o Restructure exec1() a little to use newcred/oldcred arrangement, and
use improved uid management primitives.
o Clean up exit1() so as to do less work in credential cleanup due to
pcred removal.
o Clean up fork1() so as to do less work in credential cleanup and
allocation.
o Clean up ktrcanset() to take into account changes, and move to using
suser_xxx() instead of performing a direct uid==0 comparision.
o Improve commenting in various kern_prot.c credential modification
calls to better document current behavior. In a couple of places,
current behavior is a little questionable and we need to check
POSIX.1 to make sure it's "right". More commenting work still
remains to be done.
o Update credential management calls, such as crfree(), to take into
account new ruidinfo reference.
o Modify or add the following uid and gid helper routines:
change_euid()
change_egid()
change_ruid()
change_rgid()
change_svuid()
change_svgid()
In each case, the call now acts on a credential not a process, and as
such no longer requires more complicated process locking/etc. They
now assume the caller will do any necessary allocation of an
exclusive credential reference. Each is commented to document its
reference requirements.
o CANSIGIO() is simplified to require only credentials, not processes
and pcreds.
o Remove lots of (p_pcred==NULL) checks.
o Add an XXX to authorization code in nfs_lock.c, since it's
questionable, and needs to be considered carefully.
o Simplify posix4 authorization code to require only credentials, not
processes and pcreds. Note that this authorization, as well as
CANSIGIO(), needs to be updated to use the p_cansignal() and
p_cansched() centralized authorization routines, as they currently
do not take into account some desirable restrictions that are handled
by the centralized routines, as well as being inconsistent with other
similar authorization instances.
o Update libkvm to take these changes into account.
Obtained from: TrustedBSD Project
Reviewed by: green, bde, jhb, freebsd-arch, freebsd-audit
vm_mtx does not recurse and is required for most low level
vm operations.
faults can not be taken without holding Giant.
Memory subsystems can now call the base page allocators safely.
Almost all atomic ops were removed as they are covered under the
vm mutex.
Alpha and ia64 now need to catch up to i386's trap handlers.
FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).
Reviewed (partially) by: jake, jhb
uses lockmgr locks and this leads to a lock order reversal. At this point
in wait1() the process is not on any process lists or in the process tree,
so no other process should be able to find it or have a reference to it
anyways, so the locking is not needed.
than dinking around in the process lists explicitly.
- Hold both the proctree lock and proc lock of the child process when
reparenting a process via proc_reparent.
- Lock processes while sending them signals.
- Miscellaenous proc locking.
- proc_reparent() now asserts that the child is locked in addition to an
exclusive proctree lock.