Commit Graph

139 Commits

Author SHA1 Message Date
John Baldwin
8e2e767b1f Add a per-thread ucred reference for syscalls and synchronous traps from
userland.  The per thread ucred reference is immutable and thus needs no
locks to be read.  However, until all the proc locking associated with
writes to p_ucred are completed, it is still not safe to use the per-thread
reference.

Tested on:	x86 (SMP), alpha, sparc64
2001-10-26 08:12:54 +00:00
Matthew Dillon
79deba82cd Fix ktrace enablement/disablement races that can result in a vnode
ref count panic.

Bug noticed by:	ps
Reviewed by:	ps
MFC after:	1 day
2001-10-24 01:05:39 +00:00
John Baldwin
4e5e677bc0 Change the sx(9) assertion API to use a sx_assert() function similar to
mtx_assert(9) rather than several SX_ASSERT_* macros.
2001-10-23 22:39:11 +00:00
Julian Elischer
b40ce4165d KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
Peter Wemm
eb30c1c0b9 Rip some well duplicated code out of cpu_wait() and cpu_exit() and move
it to the MI area.  KSE touched cpu_wait() which had the same change
replicated five ways for each platform.  Now it can just do it once.
The only MD parts seemed to be dealing with fpu state cleanup and things
like vm86 cleanup on x86.  The rest was identical.

XXX: ia64 and powerpc did not have cpu_throw(), so I've put a functional
stub in place.

Reviewed by:	jake, tmm, dillon
2001-09-10 04:28:58 +00:00
Matthew Dillon
234216ef98 Giant pushdown sys_exit(), [o]wait(), wait4() 2001-09-01 04:37:34 +00:00
Peter Wemm
99ab2d5dca *** empty log message *** 2001-08-09 01:21:58 +00:00
Peter Wemm
aa7a4dae6d Temporarily back out kern_sig.c rev 1.125 and kern_exit.c rev 1.131.
This paniced my one of my machines one time too many :-( and there is
no sign of a solution in the pipeline.  The deltas are still easily
available in cvs.  The problem is that if the parent has been swapped
out, the child process cannot grope around in the parent's UPAGES to
see the sigact[] array or it will fault.  This probably is a showstopper
for this implementation anyway.
2001-08-01 20:35:24 +00:00
Matthew Dillon
4fec48c6fe As per further discussions on hackers redo the SIGCHLD patch to not generate
an unexpected user-visible side effect with the sigaction flags.  Also cleanup
a minor union issue.

Submitted by: Rudolf Cejka <cejkar@dcse.fee.vutbr.cz>
MFC addendum: MFC will be combined w/ original commit
MFC after: 3 days
2001-07-22 18:47:31 +00:00
Matthew Dillon
0cddd8f023 With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage).  Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.
2001-07-04 16:20:28 +00:00
John Baldwin
776e0b3693 - Always use the proc lock of the task leader to protect the peers list of
processes.
- Don't construct fake call args and then call kill().  psignal is not
  anymore complicated and is quicker and not prone to locking problems.
  Calling psignal() avoids having to do a pfind() since we already have a
  proc pointer and also allows us to keep the task leader locked while we
  kill all the peer processes so the list is kept coherent.
- When a kthread exits, do a wakeup() on its proc pointers.  This can be
  used by kernel modules that have kthreads and want to ensure they have
  safely exited before completely the MOD_UNLOAD event.

Connectivity provided by:	Usenix wireless
2001-06-27 06:15:44 +00:00
Robert Watson
b1fc0ec1a7 o Merge contents of struct pcred into struct ucred. Specifically, add the
real uid, saved uid, real gid, and saved gid to ucred, as well as the
  pcred->pc_uidinfo, which was associated with the real uid, only rename
  it to cr_ruidinfo so as not to conflict with cr_uidinfo, which
  corresponds to the effective uid.
o Remove p_cred from struct proc; add p_ucred to struct proc, replacing
  original macro that pointed.
  p->p_ucred to p->p_cred->pc_ucred.
o Universally update code so that it makes use of ucred instead of pcred,
  p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo,
  cr_{r,sv}{u,g}id instead of p_*, etc.
o Remove pcred0 and its initialization from init_main.c; initialize
  cr_ruidinfo there.
o Restruction many credential modification chunks to always crdup while
  we figure out locking and optimizations; generally speaking, this
  means moving to a structure like this:
        newcred = crdup(oldcred);
        ...
        p->p_ucred = newcred;
        crfree(oldcred);
  It's not race-free, but better than nothing.  There are also races
  in sys_process.c, all inter-process authorization, fork, exec, and
  exit.
o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid;
  remove comments indicating that the old arrangement was a problem.
o Restructure exec1() a little to use newcred/oldcred arrangement, and
  use improved uid management primitives.
o Clean up exit1() so as to do less work in credential cleanup due to
  pcred removal.
o Clean up fork1() so as to do less work in credential cleanup and
  allocation.
o Clean up ktrcanset() to take into account changes, and move to using
  suser_xxx() instead of performing a direct uid==0 comparision.
o Improve commenting in various kern_prot.c credential modification
  calls to better document current behavior.  In a couple of places,
  current behavior is a little questionable and we need to check
  POSIX.1 to make sure it's "right".  More commenting work still
  remains to be done.
o Update credential management calls, such as crfree(), to take into
  account new ruidinfo reference.
o Modify or add the following uid and gid helper routines:
      change_euid()
      change_egid()
      change_ruid()
      change_rgid()
      change_svuid()
      change_svgid()
  In each case, the call now acts on a credential not a process, and as
  such no longer requires more complicated process locking/etc.  They
  now assume the caller will do any necessary allocation of an
  exclusive credential reference.  Each is commented to document its
  reference requirements.
o CANSIGIO() is simplified to require only credentials, not processes
  and pcreds.
o Remove lots of (p_pcred==NULL) checks.
o Add an XXX to authorization code in nfs_lock.c, since it's
  questionable, and needs to be considered carefully.
o Simplify posix4 authorization code to require only credentials, not
  processes and pcreds.  Note that this authorization, as well as
  CANSIGIO(), needs to be updated to use the p_cansignal() and
  p_cansched() centralized authorization routines, as they currently
  do not take into account some desirable restrictions that are handled
  by the centralized routines, as well as being inconsistent with other
  similar authorization instances.
o Update libkvm to take these changes into account.

Obtained from:	TrustedBSD Project
Reviewed by:	green, bde, jhb, freebsd-arch, freebsd-audit
2001-05-25 16:59:11 +00:00
Alfred Perlstein
2395531439 Introduce a global lock for the vm subsystem (vm_mtx).
vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb
2001-05-19 01:28:09 +00:00
John Baldwin
ac07d659c3 Don't hold the process mutex across calls to FREE() since the vm system
uses lockmgr locks and this leads to a lock order reversal.  At this point
in wait1() the process is not on any process lists or in the process tree,
so no other process should be able to find it or have a reference to it
anyways, so the locking is not needed.
2001-05-04 16:13:28 +00:00
Seigo Tanimura
ebdc3f1d2d Do not leave a process with no credential in zombproc.
Reviewed by:	jhb
2001-04-25 10:22:35 +00:00
John Baldwin
33a9ed9d0e Change the pfind() and zpfind() functions to lock the process that they
find before releasing the allproc lock and returning.

Reviewed by:	-smp, dfr, jake
2001-04-24 00:51:53 +00:00
John Baldwin
1005a129e5 Convert the allproc and proctree locks from lockmgr locks to sx locks. 2001-03-28 11:52:56 +00:00
John Baldwin
f34fa851e0 Catch up to header include changes:
- <sys/mutex.h> now requires <sys/systm.h>
- <sys/mutex.h> and <sys/sx.h> now require <sys/lock.h>
2001-03-28 09:17:56 +00:00
John Baldwin
c65437a326 - Call proc_reparent() when handing a process off to init in exit rather
than dinking around in the process lists explicitly.
- Hold both the proctree lock and proc lock of the child process when
  reparenting a process via proc_reparent.
- Lock processes while sending them signals.
- Miscellaenous proc locking.
- proc_reparent() now asserts that the child is locked in addition to an
  exclusive proctree lock.
2001-03-07 02:22:31 +00:00
Tor Egge
9d0ddf1861 Streamline updating of switchtime (don't copy code from kern_sync.c).
Submitted by:	jhb
2001-02-22 20:16:51 +00:00
Tor Egge
0d139b3741 Protect update of the per processor switchtime variable against
interrupts.

Protect usage of the per processor switchtime variable against
interrupts in calcru().

This seem to eliminate the "microuptime() went backwards" warnings.
2001-02-22 19:50:37 +00:00
Robert Watson
91421ba234 o Move per-process jail pointer (p->pr_prison) to inside of the subject
credential structure, ucred (cr->cr_prison).
o Allow jail inheritence to be a function of credential inheritence.
o Abstract prison structure reference counting behind pr_hold() and
  pr_free(), invoked by the similarly named credential reference
  management functions, removing this code from per-ABI fork/exit code.
o Modify various jail() functions to use struct ucred arguments instead
  of struct proc arguments.
o Introduce jailed() function to determine if a credential is jailed,
  rather than directly checking pointers all over the place.
o Convert PRISON_CHECK() macro to prison_check() function.
o Move jail() function prototypes to jail.h.
o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the
  flag in the process flags field itself.
o Eliminate that "const" qualifier from suser/p_can/etc to reflect
  mutex use.

Notes:

o Some further cleanup of the linux/jail code is still required.
o It's now possible to consider resolving some of the process vs
  credential based permission checking confusion in the socket code.
o Mutex protection of struct prison is still not present, and is
  required to protect the reference count plus some fields in the
  structure.

Reviewed by:	freebsd-arch
Obtained from:	TrustedBSD Project
2001-02-21 06:39:57 +00:00
John Baldwin
c3a6f33758 Revert the previous revision for two reasons:
- I can't seem to reproduce the warning I got from WITNESS anymore.
- The fix was wrong.  Since a uidinfo struct is a member of proc, it
  makes sense for the locking order to be such that you are allowed to
  hold proc and then grab the uidinfo lock.
2001-02-09 20:51:11 +00:00
John Baldwin
8ad802d82c Release the proc lock around crfree() and uifree() in wait1(). It leads to
a lock order violation, and since p is already a zombie at this point,
I'm not sure that we even need all the locking currently in wait1().
2001-02-09 16:43:18 +00:00
Bosko Milekic
9ed346bab0 Change and clean the mutex lock interface.
mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)
2001-02-09 06:11:45 +00:00
John Baldwin
a914fb6b27 - Proc locking.
- Protect calcru() with sched_lock.
2001-01-24 00:33:44 +00:00
Jake Burkholder
ef73ae4b0c Use PCPU_GET, PCPU_PTR and PCPU_SET to access all per-cpu variables
other then curproc.
2001-01-10 04:43:51 +00:00
Jake Burkholder
98f03f9030 Protect proc.p_pptr and proc.p_children/p_sibling with the
proctree_lock.

linprocfs not locked pending response from informal maintainer.

Reviewed by:	jhb, -smp@
2000-12-23 19:43:10 +00:00
Jake Burkholder
1156bc4de2 Whitespace. Fix a comment block and an if statement that were wider
than 80 characters.
2000-12-18 07:10:04 +00:00
Jake Burkholder
c0c2557090 - Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead
of explicit calls to lockmgr.  Also provides macros for the flags
  pased to specify shared, exclusive or release which map to the
  lockmgr flags.  This is so that the use of lockmgr can be easily
  replaced with optimized reader-writer locks.
- Add some locking that I missed the first time.
2000-12-13 00:17:05 +00:00
Jake Burkholder
85b039fe64 Remove if defined(tahoe) cobwebs. 2000-12-04 09:49:34 +00:00
John Baldwin
4971f62a86 - Add a mutex to the proc structure p_mtx that will be used to lock accesses
to each individual proc.
- Initialize the lock during fork1(), and destroy it in wait1().
2000-12-03 01:22:34 +00:00
John Baldwin
2925cbe569 Protect p_stat with sched_lock. 2000-12-01 16:59:02 +00:00
John Baldwin
472fd56ea5 Don't update p_stat in exit1() to SZOMB until after releasing the allproc
lock.  Otherwise, if we block on the backing mutex while releasing the
allproc lock, then when we resume, we will be at SRUN, and we will stay
that way all the way through cpu_exit.  As a result, our parent will never
harvest us.
2000-12-01 03:42:17 +00:00
Jake Burkholder
4f55983606 Use callout_reset instead of timeout(9). Most callouts are statically
allocated, 2 have been added to struct proc for setitimer and sleep.

Reviewed by:	jhb, jlemon
2000-11-27 22:52:31 +00:00
Jake Burkholder
553629ebc9 Protect the following with a lockmgr lock:
allproc
	zombproc
	pidhashtbl
	proc.p_list
	proc.p_hash
	nextpid

Reviewed by:	jhb
Obtained from:	BSD/OS and netbsd
2000-11-22 07:42:04 +00:00
John Baldwin
35e0e5b311 Catch up to moving headers:
- machine/ipl.h -> sys/ipl.h
- machine/mutex.h -> sys/mutex.h
2000-10-20 07:58:15 +00:00
Bruce Evans
621dbe43df Added used include of <sys/mutex.h> (don't depend on pollution in
<sys/signalvar.h>).
2000-09-17 12:20:49 +00:00
Jason Evans
0384fff8c5 Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*().  See mutex(9).  (Note: The
  alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
  preempted (i386 only).

Partially contributed by:	BSDi (BSD/OS)
Submissions by (at least):	cp, dfr, dillon, grog, jake, jhb, sheldonh
2000-09-07 01:33:02 +00:00
Don Lewis
f535380cb6 Remove uidinfo hash table lookup and maintenance out of chgproccnt() and
chgsbsize(), which are called rather frequently and may be called from an
interrupt context in the case of chgsbsize().  Instead, do the hash table
lookup and maintenance when credentials are changed, which is a lot less
frequent.  Add pointers to the uidinfo structures to the ucred and pcred
structures for fast access.  Pass a pointer to the credential to chgproccnt()
and chgsbsize() instead of passing the uid.  Add a reference count to the
uidinfo structure and use it to decide when to free the structure rather
than freeing the structure when the resource consumption drops to zero.
Move the resource tracking code from kern_proc.c to kern_resource.c.  Move
some duplicate code sequences in kern_prot.c to separate helper functions.
Change KASSERTs in this code to unconditional tests and calls to panic().
2000-09-05 22:11:13 +00:00
Peter Wemm
ac2b067b9a Change the 'exit()' system call to 'sys_exit()'. This avoids overlapping
gcc's internal exit() prototypes and the (futile) hackery that we did to
try and avoid warnings.  main() was renamed for similar reasons.
Remove an exit related hack from makesyscalls.sh.
2000-07-29 00:16:28 +00:00
Alfred Perlstein
c636255150 fix races in the uidinfo subsystem, several problems existed:
1) while allocating a uidinfo struct malloc is called with M_WAITOK,
   it's possible that while asleep another process by the same user
   could have woken up earlier and inserted an entry into the uid
   hash table.  Having redundant entries causes inconsistancies that
   we can't handle.

   fix: do a non-waiting malloc, and if that fails then do a blocking
   malloc, after waking up check that no one else has inserted an entry
   for us already.

2) Because many checks for sbsize were done as "test then set" in a non
   atomic manner it was possible to exceed the limits put up via races.

   fix: instead of querying the count then setting, we just attempt to
   set the count and leave it up to the function to return success or
   failure.

3) The uidinfo code was inlining and repeating, lookups and insertions
   and deletions needed to be in their own functions for clarity.

Reviewed by: green
2000-06-22 22:27:16 +00:00
Jake Burkholder
e39756439c Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by:		msmith and others
2000-05-26 02:09:24 +00:00
Jake Burkholder
740a1973a6 Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by:	phk
Reviewed by:	phk
Approved by:	mdodd
2000-05-23 20:41:01 +00:00
Brian Feldman
a274d19ba2 Back out NOTE_EXIT status reporting pending discussion. 2000-05-21 16:27:41 +00:00
Brian Feldman
a24b514d72 Put the wait(2) exit status in "data" for NOTE_EXIT kevents. 2000-05-17 01:16:11 +00:00
Jonathan Lemon
cb679c385e Introduce kqueue() and kevent(), a kernel event notification facility. 2000-04-16 18:53:38 +00:00
Sean Eric Fagan
893618352c Handle the case where we truss an SUGID program -- in particular, we need
to wake up any processes waiting via PIOCWAIT on process exit, and truss
needs to be more aware that a process may actually disappear while it's
waiting.

Reviewed by:	Paul Saab <ps@yahoo-inc.com>
2000-01-10 04:09:05 +00:00
Bruce Evans
bdf423572e Scheduler fixes equivalent to the ones logged in the following NetBSD
commit to kern_synch.c:

----------------------------
revision 1.55
date: 1999/02/23 02:56:03;  author: ross;  state: Exp;  lines: +39 -10
Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
  steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
  platform hz values don't cause nice +0 processes to look like they are
  niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX.  Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value.  It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock.  Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.
----------------------------

The details are a little different in FreeBSD:

=== nice bug ===   Fixing this is the main point of this commit.  We use
essentially the same clipping rule as NetBSD (our limit on p_estcpu
differs by a scale factor).  However, clipping at all is fundamentally
bad.  It gives free CPU the hoggiest hogs once they reach the limit, and
reaching the limit is normal for long-running hogs.  This will be fixed
later.

=== New schedclk() mechanism ===  We don't use the NetBSD schedclk()
(now schedclock()) mechanism.  We require (real)stathz to be about 128
and scale by an extra factor of 2 compared with NetBSD's statclock().
We scale p_estcpu instead of scaling the clock.  This is more accurate
and flexible.

=== Algorithm change ===  Same change.

=== Other bugs ===  The p_pctcpu bug was fixed long ago.  We don't try as
hard to abstract functionality yet.

Related changes: the new limit on p_estcpu must be exported to kern_exit.c
for clipping in wait1().

Agreed with by:		dufault
1999-11-28 12:12:14 +00:00
Poul-Henning Kamp
da654d9070 s/p_cred->pc_ucred/p_ucred/g 1999-11-21 12:38:21 +00:00