Commit Graph

8664 Commits

Author SHA1 Message Date
Ken Smith
c0cac8dc20 Remove a variable that became unused as a result of changes made
in v1.139.  This was only exposed if MALLOC_PROFILE was defined.

Submitted by:	Gary Jennejohn
Pointy hat:	rwatson
Approved by:	re (scottl)
2005-06-16 16:01:46 +00:00
Jeff Roberson
114a1006a8 - Change holdcnt use around vnode recycling. We now always keep a holdcnt
ref while we're calling vgone().  This prevents transient refs from
   re-adding us to the free list.  Previously, a vfree() triggered via
   vinvalbuf() getting rid of all of a vnode's pages could place a partially
   destructed vnode on the free list where vtryrecycle() could find it.  The
   first call to vtryrecycle would hang up on the vnode lock, but when it
   failed it would place a now dead vnode onto the free list, and another
   call to vtryrecycle() would free an already free vnode.  There were many
   complications of having a zero ref count while freeing which can now go
   away.
 - Change vdropl() to release the interlock before returning.  All callers
   now respect this, so vdropl() directly frees VI_DOOMED vnodes once the
   last ref is dropped.  This means that we'll never have VI_DOOMED vnodes
   on the free list.
 - Seperate v_incr_usecount() into v_incr_usecount(), v_decr_usecount() and
   v_decr_useonly().  The incr/decr split is so that incr usecount can
   return with the interlock still held while decr drops the interlock so
   it can call vdropl() which will potentially free the vnode.  The calling
   function can't drop the lock of an already free'd node.  v_decr_useonly()
   drops a usecount without droping the hold count.  This is done so the
   usecount reaches zero in vput() before we recycle, however the holdcount
   is still 1 which prevents any new references from placing the vnode
   back on the free list.
 - Fix vnlrureclaim() to vhold the vnode since it doesn't do a vget().  We
   wouldn't want vnlrureclaim() to bump the usecount since this has
   different semantics.  Also change vnlrureclaim() to do a NOWAIT on the
   vn_lock.  When this function runs we're usually in a desperate situation
   and we wouldn't want to wait for any specific vnode to be released.
 - Fix a bunch of misc comments to reflect the new behavior.
 - Add vhold() and vdrop() to vflush() for the same reasons that we do in
   vlrureclaim().  Previously we held no reference and a vnode could have
   been freed while we were waiting on the lock.
 - Get rid of vlruvp() and vfreehead().  Neither are used.  vlruvp() should
   really be rethought before it's reintroduced.
 - vgonel() always returns with the vnode locked now and never puts the
   vnode back on a free list.  The vnode will be freed as soon as the last
   reference is released.

Sponsored by:	Isilon Systems, Inc.
Debugging help from:	Kris Kennaway, Peter Holm
Approved by:	re (blanket vfs)
2005-06-16 04:41:42 +00:00
Jeff Roberson
bdcd9f26b0 - Fix insertions of bios which represent data earlier than anything else
in the queue.  The insertion sort assumed this had already been taken
   care of.

Spotted by:	Antoine Brodin
Approved by:	re (scottl)
2005-06-15 23:32:07 +00:00
Jeff Roberson
7a06fe49dc - Add and enhance asserts related to the wrong bufobj panic.
Sponsored by:	Isilon Systems, Inc.
Approved by:	re (blanket vfs)
2005-06-14 20:32:27 +00:00
Jeff Roberson
12c2dcde40 - In reassignbuf() add many asserts to validate the head and tail pointers
of the clean and dirty lists.  This is in an attempt to catch the wrong
   bufobj problem sooner.
 - In vgonel() don't acquire an extra reference in the active case, the
   vnode lock and VI_DOOMED protect us from recursively cleaning.
 - Also in vgonel() clean up some stale comments.

Sponsored by:	Isilon Systems, Inc.
Approved by:	re (blanket vfs)
2005-06-14 20:31:53 +00:00
Jeff Roberson
dbb3ec5ce3 - Remove vnode lock asserts at the end of vfs syscalls. These asserts were
used to ensure that we weren't exiting the syscall with a lock still
   held.  This wasn't safe, however, because we'd already executed a vput()
   and on a loaded system the vnode may have been free'd by the time we
   assert.  This functionality is also handled by the td_locks assert in
   userret, which doesn't tell you what the syscall was, but will at least
   panic before you deadlock.

Sponsored by:   Isilon Systems, Inc.
Discovred by:   Peter Holm
Approved by:	re (blanket vfs)
2005-06-14 01:14:40 +00:00
Jeff Roberson
b930d85380 - Don't make vgonel() globally visible, we want to change its prototype
anyway and it's not used outside of vfs_subr.c.
 - Change vgonel() to accept a parameter which determines whether or not
   we'll put the vnode on the free list when we're done.
 - Use the new vgonel() parameter rather than VI_DOOMED to signal our
   intentions in vtryrecycle().
 - In vgonel() return if VI_DOOMED is already set, this vnode has already
   been reclaimed.

Sponsored by:	Isilon Systems, Inc.
2005-06-13 06:26:55 +00:00
Jeff Roberson
6bd8103d33 - Clear v_dd in cache_zap() instead of cache_purge() as cache_purge() may
not be called in all cases where we free the cnp.

Sponsored by:	Isilon Systems, Inc.
2005-06-13 05:59:59 +00:00
Jeff Roberson
d598b04d44 - It has long been my suspicion that we don't actually need a loop in
vn_lock().  Add an assert that will help me gain more confidence that this
   is correct.

Sponsored by:	Isilon Systems, Inc.
2005-06-13 00:47:29 +00:00
Jeff Roberson
d2ad9baac0 - Add KTR_VFS events to vdestroy, vtruncbuf, vinvalbuf, vfreehead.
Sponsored by:	Isilon Systems, Inc.
2005-06-13 00:46:37 +00:00
Jeff Roberson
eff2d12635 - Add KTR_VFS messages for various name cache related events.
Sponsored by:	Isilon Systems, Inc.
2005-06-13 00:46:03 +00:00
Jeff Roberson
748c92fbad - Split one KASSERT in bremfree() into two to aid in debugging.
Sponsored by:	Isilon Systems, Inc.
2005-06-13 00:45:05 +00:00
Jeff Roberson
f19f6869cf - Dramatically simplify bioqdisksort(). We no longer do ordered bios so
most of the code to deal with them has been dead for sometime.  Simplify
   the code by doing an insert sort hinted by the current head position.

Met with apathy by:	arch@
2005-06-12 22:32:29 +00:00
Pawel Jakub Dawidek
65ac438c8f Do not allocate memory while holding a mutex.
I introduce a very small race here (some file system can be mounted or
unmounted between 'count' calculation and file systems list creation),
but it is harmless.

Found by:	FreeBSD Kernel Stress Test Suite: http://www.holm.cc/stress/
Reported by:	Peter Holm <peter@holm.cc>
2005-06-12 07:03:23 +00:00
Pawel Jakub Dawidek
3a996d6e91 Do not allocate memory based on not-checked argument from userland.
It can be used to panic the kernel by giving too big value.
Fix it by moving allocation and size verification into kern_getfsstat().
This even simplifies kern_getfsstat() consumers, but destroys symmetry -
memory is allocated inside kern_getfsstat(), but has to be freed by the
caller.

Found by:	FreeBSD Kernel Stress Test Suite: http://www.holm.cc/stress/
Reported by:	Peter Holm <peter@holm.cc>
2005-06-11 14:58:20 +00:00
Maxim Konovalov
922a5d9c2b o setsockopt(2) cannot remove accept filter. [1]
o getsockopt(SO_ACCEPTFILTER) always returns success on listen socket
  even we didn't install accept filter on the socket.
o Fix these bugs and add regression tests for them.

Submitted by:	Igor Sysoev [1]
Reviewed by:	alfred
MFC after:	2 weeks
2005-06-11 11:59:48 +00:00
Jeff Roberson
d6dbf760a6 - Assert that we're not in the name cache anymore in vdestroy().
Sponsored by:	Isilon Systems, Inc.
2005-06-11 08:48:09 +00:00
Jeff Roberson
1b2da2d0fa - Assert that we're not adding a doomed vnode to the name cache.
Sponsored by:	Isilon Systems, Inc.
2005-06-11 08:47:30 +00:00
Jeff Roberson
9aa0eba464 - Add KTR_VFS tracing to track the life of vnodes. Eventually KTR_VFS
events could be added to cover other interesting details.
 - Add some VNASSERTs to discover places where we access vnodes after
   they have been uma_zfree'd before we try to free them again.
 - Add a few more VNASSERTs to vdestroy() to be certain that the vnode is
   really unused.

Sponsored by:	Isilon Systems, Inc.
2005-06-11 01:16:46 +00:00
Brian Feldman
cc3149b1ea Fix a serious deadlock with the NFS client. Given a large enough
atomic write request, it can fill the buffer cache with the entirety
of that write in order to handle retries.  However, it never drops
the vnode lock, or else it wouldn't be atomic, so it ends up waiting
indefinitely for more buf memory that cannot be gotten as it has it
all, and it waits in an uncancellable state.

To fix this, hibufspace is exported and scaled to a reasonable
fraction.  This is used as the limit of how much of an atomic write
request by the NFS client will be handled asynchronously.  If the
request is larger than this, it will be turned into a synchronous
request which won't deadlock the system.  It's possible this value is
far off from what is required by some, so it shall be tunable as soon
as mount_nfs(8) learns of the new field.

The slowdown between an asynchronous and a synchronous write on NFS
appears to be on the order of 2x-4x.

General nod by:	gad
MFC after:	2 weeks
More testing:	wes
PR:		kern/79208
2005-06-10 23:50:41 +00:00
Jeff Roberson
37ee2d8dd4 - Add curthread to the state that ktr is saving. The extra information is
well worth the bloat.
 - Change the formatting of 'show ktr' slightly to accommodate the
   additional field.  Remove a tab from the verbose output and place the
   actual trace data after a : so it is more easy to understand which
   part is the event and which is part of the record.
2005-06-10 23:21:29 +00:00
Joseph Koshy
8c61b21927 Fix typo.
Reviewed by:	rwatson, sam
2005-06-10 18:06:59 +00:00
Brooks Davis
fc74a9f93a Stop embedding struct ifnet at the top of driver softcs. Instead the
struct ifnet or the layer 2 common structure it was embedded in have
been replaced with a struct ifnet pointer to be filled by a call to the
new function, if_alloc(). The layer 2 common structure is also allocated
via if_alloc() based on the interface type. It is hung off the new
struct ifnet member, if_l2com.

This change removes the size of these structures from the kernel ABI and
will allow us to better manage them as interfaces come and go.

Other changes of note:
 - Struct arpcom is no longer referenced in normal interface code.
   Instead the Ethernet address is accessed via the IFP2ENADDR() macro.
   To enforce this ac_enaddr has been renamed to _ac_enaddr.
 - The second argument to ether_ifattach is now always the mac address
   from driver private storage rather than sometimes being ac_enaddr.

Reviewed by:	sobomax, sam
2005-06-10 16:49:24 +00:00
Stephan Uphoff
3ea6bbc59a Restore preemption of idle threads.
Submitted by:	jhb
2005-06-10 03:00:29 +00:00
Suleiman Souhlal
679985d03a Allow EVFILT_VNODE events to work on every filesystem type, not just
UFS by:
- Making the pre and post hooks for the VOP functions work even when
DEBUG_VFS_LOCKS is not defined.
- Moving the KNOTE activations into the corresponding VOP hooks.
- Creating a MNTK_NOKNOTE flag for the mnt_kern_flag field of struct
mount that permits filesystems to disable the new behavior.
- Creating a default VOP_KQFILTER function: vfs_kqfilter()

My benchmarks have not revealed any performance degradation.

Reviewed by:	jeff, bde
Approved by:	rwatson, jmg (kqueue changes), grehan (mentor)
2005-06-09 20:20:31 +00:00
Scott Long
8bde93598a Drat! Committed from the wrong branch. Restore HEAD to its previous goodness. 2005-06-09 19:59:09 +00:00
Scott Long
76b472dbda Back out 1.68.2.26. It was a mis-guided change that was already backed out
of HEAD and should not have been MFC'd.  This will restore UDP socket
functionality, which will correct the recent NFS problems.

Submitted by: rwatson
2005-06-09 19:56:38 +00:00
Joseph Koshy
f263522a45 MFP4:
- Implement sampling modes and logging support in hwpmc(4).

- Separate MI and MD parts of hwpmc(4) and allow sharing of
  PMC implementations across different architectures.
  Add support for P4 (EMT64) style PMCs to the amd64 code.

- New pmcstat(8) options: -E (exit time counts) -W (counts
  every context switch), -R (print log file).

- pmc(3) API changes, improve our ability to keep ABI compatibility
  in the future.  Add more 'alias' names for commonly used events.

- bug fixes & documentation.
2005-06-09 19:45:09 +00:00
Stephan Uphoff
a3f2d84279 Lots of whitespace cleanup.
Fix for broken if condition.

Submitted by:	nate@
2005-06-09 19:43:08 +00:00
Pawel Jakub Dawidek
820a0de9a9 Rename sysctl security.jail.getfsstatroot_only to security.jail.enforce_statfs
and extend its functionality:

value	policy
0	show all mount-points without any restrictions
1	show only mount-points below jail's chroot and show only part of the
	mount-point's path (if jail's chroot directory is /jails/foo and
	mount-point is /jails/foo/usr/home only /usr/home will be shown)
2	show only mount-point where jail's chroot directory is placed.

Default value is 2.

Discussed with:	rwatson
2005-06-09 18:49:19 +00:00
Pawel Jakub Dawidek
4eb7c9f6c9 Remove process information leak from inside a jail, when
security.bsd.see_other_uids is set to 0, etc.
One can check if invisible process is active, by doing:

	# ktrace -p <pid>

If ktrace returns 'Operation not permitted' the process is alive and
if returns 'No such process' there is no such process.

MFC after:	1 week
2005-06-09 18:33:21 +00:00
Stephan Uphoff
f3a0f87396 Fix some race conditions for pinned threads that may cause them to run
on the wrong CPU.

Add IPI support for preempting a thread on another CPU.

MFC after:3 weeks
2005-06-09 18:26:31 +00:00
Pawel Jakub Dawidek
13a82b9623 Avoid code duplication in serval places by introducing universal
kern_getfsstat() function.

Obtained from:	jhb
2005-06-09 17:44:46 +00:00
Warner Losh
139f16505d Simplify the code a bit after the bzero(). 2005-06-09 05:50:01 +00:00
Jeff Roberson
a3d239bc29 - My sub-par public school education has been exposed. s/sentinal/sentinel/
Noticed by:	Emil Mikulic
2005-06-09 04:40:20 +00:00
Garance A Drosehn
386ea9321d Remove the previous parsing-logic for arguments on the '#!'-line of shell
scripts.  As far as I know, no one has needed the '#!#<' kludge to get at
the behavior implemented by the historical parsing.
2005-06-09 00:27:02 +00:00
Jeff Roberson
9e879a5ee0 - Under heavy IO load the buf daemon can run for many hundereds of
milliseconds due to what is essentially n^2 algorithmic complexity.  This
   change makes the algorithm N*2 instead.  This heavy processing manifested
   itself as skipping in audio and video playback due to the long scheduling
   latencies and contention on giant by pcm.
 - flushbufqueues() is now responsible for flushing multiple buffers
   rather than one at a time.  This allows us to save our progress in the
   list by using a sentinal.  We must do the numdirtywakeup() and
   waitrunningbufspace() here now rather than in buf_daemon().
 - Also add a uio_yield() after we have processed the list once for bufs
   without deps and again for bufs with deps.  This is to release Giant
   and allow any other giant locked code to proceed.

Tested by:	Many users on current@
Revealed by:	schedgraph traces sent by Emil Mikulic & Anthony Ginepro
2005-06-08 20:26:05 +00:00
Craig Rodrigues
1209e08faf Initialize uio_iovcnt to 1 in extattr_list_vp() and extattr_get_vp()
PR:		kern/79357
Approved by:	rwatson
2005-06-08 13:22:10 +00:00
Robert Watson
e2f7a83d6b In sem_forkhook(), don't attempt to generate a copy of the process semaphore
list on fork() if the process doesn't actually have references to any
semaphores.  This avoids extra work, as well as potentially asking to
allocate storage for 0 references.

Found by:	avatar
MFC after:	1 week
2005-06-08 07:29:22 +00:00
Jeff Roberson
fae89dce3e - Clear OWEINACT prior to calling VOP_INACTIVE to remove the possibility
of a vget causing another call to INACTIVE before we're finished.
2005-06-07 22:05:32 +00:00
Alan Cox
b490cc72b2 In lio_listio(2) change jobref from an int to a long so that
lio_listio(LIO_WAIT, ...) works correctly on 64-bit architectures.

Reviewed by: tegge
2005-06-07 05:28:21 +00:00
Robert Watson
3831e7d7f5 Gratuitous renaming of four System V Semaphore MAC Framework entry
points to convert _sema() to _sem() for consistency purposes with
respect to the other semaphore-related entry points:

mac_init_sysv_sema() -> mac_init_sysv_sem()
mac_destroy_sysv_sem() -> mac_destroy_sysv_sem()
mac_create_sysv_sema() -> mac_create_sysv_sem()
mac_cleanup_sysv_sema() -> mac_cleanup_sysv_sem()

Congruent changes are made to the policy interface to support this.

Obtained from:	TrustedBSD Project
Sponsored by:	SPAWAR, SPARTA
2005-06-07 05:03:28 +00:00
Jeff Roberson
6680bbd529 - Fix the case where we're not preempting but there is already a newtd
as this happens via thread_switchout().  I don't particularly like the
   structure of the code here.  We twice call out to thread code when
   a thread is voluntarily switching.  Once to thread_switchout() and once
   to slot_fill(), while sched_4BSD does even more work which is redundant
   to select another thread to use our remaining slice.  This should be
   simplified in the future, but for now I'm only going to fix the bug not
   the bad design.
2005-06-07 02:59:16 +00:00
Doug White
4a30c508d1 Make "show msgbuf" use the pager instead of blasting the whole thing out.
MFC after:	3 days
2005-06-06 22:18:32 +00:00
David Xu
ec8297bda1 Fix a bug relavant to debugging, a masked signal unexpectedly interrupts
a sleeping thread when process is being debugged.

PR: GNU/77818
Tested by: Sean C. Farley <sean-freebsd at farley org>
2005-06-06 05:13:10 +00:00
Andrew Gallatin
92dd256bd4 Allow sends sent from non page-aligned userspace addresses to be
considered for zero-copy sends.

Reviewed by: alc
Submitted by: Romer Gil at Rice University
2005-06-05 17:13:23 +00:00
Alan Cox
67b95a95eb Eliminate an unused field from struct aio_liojob. 2005-06-05 05:41:48 +00:00
Marius Strobl
fce21e7e25 After some input from bde@ and rereading the datasheet use a MTX_SPIN
mutex instead of a MTX_DEF one in order to defer preemption while
reading the date and time registers. If we don't manage to read them
within the time slot where we are guaranteed that no updates occur we
might actually read them during an update in which case the output is
undefined.
2005-06-04 23:24:50 +00:00
Alan Cox
bbe7bbdfee Eliminate the original method of requesting notification of aio_read(2) and
aio_write(2) completion through kevent(2).  This method does not work on
64-bit architectures.  It was deprecated in FreeBSD 4.4.  See revisions
1.87 and 1.70.2.7.

Change aio_physwakeup() to call psignal(9) directly rather than indirectly
through a timeout(9).  Discussed with: bde

Correct a bug introduced in revision 1.65 that could result in premature
delivery of a signal if an lio_listio(2) consisted of a mixture of
direct/raw and queued I/O operations.  Observed by: tegge

Eliminate a field from struct kaioinfo that is now unused.

Reviewed by: tegge
2005-06-04 19:16:33 +00:00
Jeff Roberson
9fe02f7e16 - It's 2005 already, I've been working on this for three years. 2005-06-04 09:24:15 +00:00
Jeff Roberson
21381d1b9e - Don't SLOT_USE() in the preempt case, sched_add() has already taken the
slot for us.  Previously, we would take two slots on every preempt, and
   setrunqueue() would fix it up for us in the non threaded case.  The
   threaded case was simply broken.
 - Clean up flags, prototypes, comments.
2005-06-04 09:23:28 +00:00
Paul Saab
efe5becafa Wrap copyin/copyout for kevent so the 32bit wrapper does not have
to malloc nchanges * sizeof(struct kevent) AND/OR nevents *
sizeof(struct kevent) on every syscall.

Glanced at by:	peter, jmg
Obtained from:	Yahoo!
MFC after:	2 weeks
2005-06-03 23:15:01 +00:00
Alan Cox
3769f562e2 Synchronize access to the per process aiocb lists in many of the functions. 2005-06-03 05:27:20 +00:00
Alan Cox
e293dc860c In aio_waitcomplete() correct two cases of using an aiocb after freeing it. 2005-06-02 23:14:38 +00:00
Alan Cox
f0e5132053 Giant is no longer required in kern_setrlimit(); remove its acquisition and
release.

Reviewed by: jhb
2005-06-01 17:52:51 +00:00
Ken Smith
6341095e0d This patch addresses a standards violation issue. The standards say a
file's access time should be updated when it gets executed.  A while
ago the mechanism used to exec was changed to use a more mmap based
mechanism and this behavior was broken as a side-effect of that.

A new vnode flag is added that gets set when the file gets executed,
and the VOP_SETATTR() vnode operation gets called.  The underlying
filesystem is expected to handle it based on its own semantics, some
filesystems don't support access time at all.  Those that do should
handle it in a way that does not block, does not generate I/O if possible,
etc.  In particular vn_start_write() has not been called.  The UFS code
handles it the same way as it would normally handle the access time if
a file was read - the IN_ACCESS flag gets set in the inode but no other
action happens at this point.  The actual time update will happen later
during a sync (which handles all the necessary locking).

Got me into this:	cperciva
Discussed with:		a lot with bde, a little with kan
Showed patches to:	phk, jeffr, standards@, arch@
Minor discussion on:	arch@
2005-05-31 19:39:52 +00:00
Alan Cox
3148c2c96a Synchronize access to aio_freeproc with a mutex. Eliminate related spl
calls.

Reduce the scope of Giant in aio_daemon().
2005-05-30 22:26:34 +00:00
Alan Cox
3999ebe3b6 Use the proc mtx to prevent simultaneous changes to p_aioinfo. 2005-05-30 19:33:33 +00:00
Alan Cox
8285135020 Eliminate unnecessary calls to wakeup(); no one sleeps on &aio_freeproc.
Eliminate an unused flag, AIOP_SCHED; it's cleared but never set.
2005-05-30 18:02:00 +00:00
Robert Watson
3984b2328c Rebuild generated system call definition files following the addition of
the audit event field to the syscalls.master file format.

Submitted by:	wsalamon
Obtained from:	TrustedBSD Project
2005-05-30 15:20:21 +00:00
Robert Watson
f3596e3370 Introduce a new field in the syscalls.master file format to hold the
audit event identifier associated with each system call, which will
be stored by makesyscalls.sh in the sy_auevent field of struct sysent.
For now, default the audit identifier on all system calls to AUE_NULL,
but in the near future, other BSM event identifiers will be used.  The
mapping of system calls to event identifiers is many:one due to
multiple system calls that map to the same end functionality across
compatibility wrappers, ABI wrappers, etc.

Submitted by:	wsalamon
Obtained from:	TrustedBSD Project
2005-05-30 15:09:18 +00:00
Jeff Roberson
1f22a07afd - Add bufobj_wrefl() to add a write ref to a bufobj that is already locked. 2005-05-30 07:01:18 +00:00
Joseph Koshy
36c0fd9d0f Kernel hooks to support PMC sampling modes.
Reviewed by:	alc
2005-05-30 06:29:29 +00:00
Alan Cox
95eca142ec Eliminate aio_activeproc; it's unused. 2005-05-30 05:25:10 +00:00
Alan Cox
8484b5e66c Eliminate aio_bufjobs; it's unused. 2005-05-29 21:29:15 +00:00
Robert Watson
45cb0a0074 Normalize white space in syscalls.master: try to use tabs before system
call types.
2005-05-29 20:20:16 +00:00
Robert Watson
63a7e0a3f9 Kernel malloc layers malloc_type allocation over one of two underlying
allocators: a set of power-of-two UMA zones for small allocations, and the
VM page allocator for large allocations.  In order to maintain unified
statistics for specific malloc types, kernel malloc maintains a separate
per-type statistics pool, which can be monitored using vmstat -m.  Prior
to this commit, each pool of per-type statistics was protected using a
per-type mutex associated with the malloc type.

This change modifies kernel malloc to maintain per-CPU statistics pools
for each malloc type, and protects writing those statistics using critical
sections.  It also moves to unsynchronized reads of per-CPU statistics
when generating coalesced statistics.  To do this, several changes are
implemented:

- In the previous world order, the statistics memory was allocated by
  the owner of the malloc type structure, allocated statically using
  MALLOC_DEFINE().  This embedded the definition of the malloc_type
  structure into all kernel modules.  Move to a model in which a pointer
  within struct malloc_type points at a UMA-allocated
  malloc_type_internal data structure owned and maintained by
  kern_malloc.c, and not part of the exported ABI/API to the rest of
  the kernel.  For the purposes of easing a possible MFC, re-use an
  existing pointer in 'struct malloc_type', and maintain the current
  malloc_type structure size, as well as layout with respect to the
  fields reused outside of the malloc subsystem (such as ks_shortdesc).
  There are several unused fields as a result of no longer requiring
  the mutex in malloc_type.

- Struct malloc_type_internal contains an array of malloc_type_stats,
  of size MAXCPU.  The structure defined above avoids hard-coding a
  kernel compile-time value of MAXCPU into kernel modules that interact
  with malloc.

- When accessing per-cpu statistics for a malloc type, surround read -
  modify - update requests with critical_enter()/critical_exit() in
  order to avoid races during write.  The per-CPU fields are written
  only from the CPU that owns them.

- Per-CPU stats now maintained "allocated" and "freed" counters for
  number of allocations/frees and bytes allocated/freed, since there is
  no longer a coherent global notion of the totals.  When coalescing
  malloc stats, accept a slight race between reading stats across CPUs,
  and avoid showing the user a negative allocation count for the type
  in the event of a race.  The global high watermark is no longer
  maintained for a malloc type, as there is no global notion of the
  number of allocations.

- While tearing up the sysctl() path, also switch to using sbufs.  The
  current "export as text" sysctl format is retained with the same
  syntax.  We may want to change this in the future to export more
  per-CPU information, such as how allocations and frees are balanced
  across CPUs.

This change results in a substantial speedup of kernel malloc and free
paths on SMP, as critical sections (where usable) out-perform mutexes
due to avoiding atomic/bus-locked operations.  There is also a minor
improvement on UP due to the slightly lower cost of critical sections
there.  The cost of the change to this approach is the loss of a
continuous notion of total allocations that can be exploited to track
per-type high watermarks, as well as increased complexity when
monitoring statistics.

Due to carefully avoiding changing the ABI, as well as hardening the ABI
against future changes, it is not necessary to recompile kernel modules
for this change.  However, MFC'ing this change to RELENG_5 will require
also MFC'ing optimizations for soft critical sections, which may modify
exposed kernel ABIs.  The internal malloc API is changed, and
modifications to vmstat in order to restore "vmstat -m" on core dumps will
follow shortly.

Several improvements from:		bde
Statistics approach discussed with:	ups
Tested by:				scottl, others
2005-05-29 13:38:07 +00:00
Pawel Jakub Dawidek
885fec3e08 Fix panic when module is compiled in and it is loaded from loader.conf.
Only panic is fixed, module will be still listed in kldstat(8) output.
Not sure what is correct fix, because adding unloading code in case of
failure to linker_init_kernel_modules() doesn't work.
2005-05-28 23:20:05 +00:00
Garance A Drosehn
5f49915eb2 Change the way options are parsed on the `#!'-line of a shell-script. Instead
of having the kernel parse that line and add an entry to the argument list for
each 'separate word' it finds, have it add only one entry which holds all
the words found on that line.  The old behavior is useful in some situations,
but it does not match the way any other operating system will parse that line.

This has been discussed in the thread "Bug in #! processing - One More Time"
on the freebsd-arch mailing list (starting back on Feb 24, 2005).  The first
few messages in that thread provide the background in much detail.

PR:		16393
Reviewed by:	freebsd-arch
2005-05-28 22:42:41 +00:00
Pawel Jakub Dawidek
870fba2648 Prevent loading modules with are compiled into the kernel.
PR:		kern/48759
Submitted by:	Pawe³ Ma³achowski <pawmal@unia.3lo.lublin.pl>
Patch from:	demon
MFC after:	2 weeks
2005-05-28 22:29:44 +00:00
Robert Watson
0cc0090517 Regenerate from syscalls.master. 2005-05-28 14:35:43 +00:00
Robert Watson
d85bfefd79 Mark ntp_gettime() as MSTD, since its system call path will acquire
Giant if required.
2005-05-28 14:35:05 +00:00
Robert Watson
75b8223886 Explicitly acquire Giant around the ntp_gettime() and assert it in the
sysctl path.  While this code is close to MPSAFE, it may require some
additional locking.  Mark ntp_gettime1() as GIANT_REQUIRED for now.

Suggested by:	phk
2005-05-28 14:34:41 +00:00
Robert Watson
7329f580c8 Regenerate for updated syscalls.master. 2005-05-28 13:24:05 +00:00
Robert Watson
d7b9187bff Mark the following compatability system calls as MCOMPAT or MCOMPAT4 based
on the their simply wrapping MPSAFE implementations of existing MPSAFE
system calls:

  getfsstat()
  lseek()
  stat()
  lstat()
  truncate()
  ftruncate()
  statfs()
  fstatfs()

Note that ogetdirentries() is not marked MPSAFE because it does not share
the MPSAFE implementation used for getdirentries(), and requires separate
locking to be implemented.
2005-05-28 13:23:42 +00:00
Robert Watson
958a52b82b Regenerate from syscalls.master. 2005-05-28 13:13:01 +00:00
Robert Watson
160349adb1 Mark quotactl() as MSTD. 2005-05-28 13:12:04 +00:00
Robert Watson
f8e5f64207 Acquire Giant explicitly in quotactl() so that the syscalls.master
entry can become MSTD.
2005-05-28 13:11:35 +00:00
Robert Watson
a72baeca1d Regenerate from updated syscalls.master. 2005-05-28 13:09:56 +00:00
Robert Watson
ec792a6740 Mark kenv(2) as MPSAFE, since it appears to be properly locked down. 2005-05-28 13:09:41 +00:00
Robert Watson
848c3ec33f Regenerate system call tables from syscalls.master. 2005-05-28 13:08:26 +00:00
Robert Watson
5267dc0b3a Also mark the COMPAT4 version of fhstatfs() as MPSAFE. 2005-05-28 13:07:43 +00:00
Robert Watson
2191a5d154 Mark fhopen(), fhstat(), and fhstatfs() as MSTD, since they now
acquire Giant themselves.
2005-05-28 12:59:33 +00:00
Robert Watson
f73e1f57cf Acquire Giant explicitly in fhopen(), fhstat(), and kern_fhstatfs(),
so that we can start to eliminate the presence of non-MPSAFE system
call entries in syscalls.master.
2005-05-28 12:58:54 +00:00
Pawel Jakub Dawidek
9e9cfdca7a Remove (now) unused argument 'td' from cvtstatfs(). 2005-05-27 19:23:48 +00:00
Pawel Jakub Dawidek
2b19a055ab Sync locking in freebsd4_getfsstat() with getfsstat().
Giant is probably also needed in kern_fhstatfs().
2005-05-27 19:21:08 +00:00
Pawel Jakub Dawidek
fd91686850 Use consistent style in functions I want to modify in the near future. 2005-05-27 19:15:46 +00:00
Robert Watson
83b3d58d05 In the current world order, each socket has two mutexes: a mutex
that protects socket and receive socket buffer state, and a second
mutex to protect send socket buffer state.  In some places, the
mutex shared between the socket and receive socket buffer will be
acquired twice, once by each layer, resulting in some
inconsistency, but providing the abstraction benefit of being able
to more easily separate the two mutexes in the future if desired.

When transitioning a socket to the SS_ISDISCONNECTING or
SS_ISDISCONNECTED states, grab the socket/receive socket buffer lock
once rather than grabbing it as the socket lock, modifying socket
state, then grabbing a second time as the receive lock in order to
modify the socket buffer state to indicate no further data can be
read.  This change is believed to close a race between the change in
socket state and the change in socket buffer state, which for a
remotely initiated close on a UNIX domain socket, resulted in
soreceive() returning ENOTCONN rather than an EOF condition.

A similar race still exists in the case of send, however, and is
harder to fix as the socket and send socket buffer mutexes are not
the same, and we would like to avoid holding combinations of socket
mutexes over sb_upcall until we've finished clarifying the locking
protocol for upcalls.

This change has the side affect of reducing the number of mutex
operations to initiate disconnect or perform disconnect on a
socket by two.

PR:		78824
Rerported by:	Marc Olzheim <marcolz@stack.nl>
MFC after:	2 weeks
2005-05-27 17:16:43 +00:00
David Xu
0cb7166f78 Remove thread_upcall_check, it was used to avoid race bug in earlier
day's sleep queue code, today the bug no longer exists.
please see 04/25/2004 freebsd-threads@ mailing list archive.
2005-05-27 15:57:27 +00:00
David Xu
39c4e4c202 Remove sleep queue hack, it is no longer needed with current sleep queue.
Actually, it causes process to hang when it is being debugged.

PR: gnu/77818
2005-05-27 04:27:22 +00:00
John-Mark Gurney
b633f50dd8 make stat return an zero'd struct, and be a FIFO again... This is only
to fix libc_r since it requires stat to close fd's, and so commented in
the code...

PR:		threads/75795
Reviewed by:	ps
MFC after:	1 week
2005-05-24 23:42:50 +00:00
Olivier Houchard
77f30ffff8 Don't set the default of kern.fallback_elf_brand to FreeBSD for arm, as
binutils now do the job for us
2005-05-24 22:21:44 +00:00
Stephan Uphoff
d13ec71369 Use low level constructs borrowed from interrupt threads to wait for
work in proc0.
Remove the TDP_WAKEPROC0 workaround.
2005-05-23 23:01:53 +00:00
Pawel Jakub Dawidek
c95cbf7ec7 Protect fsid in freebsd4_getfsstat() in simlar way as it is done in
getfsstat().
2005-05-22 23:05:27 +00:00
Pawel Jakub Dawidek
a0e96a49df If we need to hide fsid, kern_statfs()/kern_fstatfs() will do it for us,
so do not duplicate the code in cvtstatfs().
Note, that we now need to clear fsid in freebsd4_getfsstat().

This moves all security related checks from functions like cvtstatfs()
and will allow to add more security related stuff (like statfs(2), etc.
protection for jails) a bit easier.
2005-05-22 21:52:30 +00:00
Nate Lawson
96ab794b26 Document that the returned pointer should be freed even if the number
of items returned is 0.
2005-05-20 05:04:22 +00:00
Stephan Uphoff
503c2ea34d Fix a bug that caused preemption to happen for a thread in the same
ksegrp with the same priority as the currently running thread.
This can cause propagate_priority() to panic.

Pointy hat to: ups
2005-05-19 01:08:30 +00:00
Pawel Jakub Dawidek
b578b0bdd3 devfs_first() return value isn't used, remove it. 2005-05-18 22:05:12 +00:00
Alan Cox
7daa3570e4 Revert revision 1.164: pmap_qremove() does not require protection by
VM_LOCK_GIANT.

Discussed with: jeff
2005-05-14 05:09:11 +00:00
John Baldwin
a4c24c66bc Actually use the iterating variable in the for loop when trying to avoid
overflow.

Reported by:	Vladislav Shabanov vs at rambler-co dot ru
MFC after:	1 week
Glanced at:	alfred
2005-05-12 20:04:48 +00:00
Pawel Jakub Dawidek
07ebf8c8c3 We don't use 'mp' variable, but we do want to mount devfs, ehh. 2005-05-12 01:49:51 +00:00
Pawel Jakub Dawidek
b8bc5373e1 Remove unised variable introduced by accident in rev 1.168.
Found by:	Coverity Prevent analysis tool
2005-05-11 19:50:34 +00:00
Pawel Jakub Dawidek
f850b2781f Plug memory leaks.
Found by:		Coverity Prevent analysis tool
2005-05-11 19:27:38 +00:00
Alexander Kabaev
0ca9ed8674 Handle theoretical case of vfs_export being called with both MNT_DELEXPORT and
MNT_EXPORT flags set. Do not reuse the memory that has just been freed.
2005-05-11 18:25:42 +00:00
Colin Percival
fe2eee8231 Fix two issues which were missed in FreeBSD-SA-05:08.kmem.
Reported by:	Uwe Doering
2005-05-07 00:41:36 +00:00
Colin Percival
fd94099ec2 If we are going to
1. Copy a NULL-terminated string into a fixed-length buffer, and
2. copyout that buffer to userland,
we really ought to
0. Zero the entire buffer
first.

Security: FreeBSD-SA-05:08.kmem
2005-05-06 02:50:00 +00:00
David Xu
5a2f73e624 Only check signal event, single threading event shouldn't be reported. 2005-05-05 06:42:02 +00:00
Maksim Yevmenkin
75ae257016 Change m_uiotombuf so it will accept offset at which data should be copied
to the mbuf. Offset cannot exceed MHLEN bytes. This is currently used to
fix Ethernet header alignment problem on alpha and sparc64. Also change all
users of m_uiotombuf to pass proper offset.

Reviewed by:	jmg, sam
Tested by:	Sten Spans "sten AT blinkenlights DOT nl"
MFC after:	1 week
2005-05-04 18:55:03 +00:00
Robert Watson
5264841183 Introduce MAC Framework and MAC Policy entry points to label and control
access to POSIX Semaphores:

mac_init_posix_sem()            Initialize label for POSIX semaphore
mac_create_posix_sem()          Create POSIX semaphore
mac_destroy_posix_sem()         Destroy POSIX semaphore
mac_check_posix_sem_destroy()   Check whether semaphore may be destroyed
mac_check_posix_sem_getvalue()  Check whether semaphore may be queried
mac_check_possix_sem_open()     Check whether semaphore may be opened
mac_check_posix_sem_post()      Check whether semaphore may be posted to
mac_check_posix_sem_unlink()    Check whether semaphore may be unlinked
mac_check_posix_sem_wait()      Check whether may wait on semaphore

Update Biba, MLS, Stub, and Test policies to implement these entry points.
For information flow policies, most semaphore operations are effectively
read/write.

Submitted by:	Dandekar Hrishikesh <rishi_dandekar at sbcglobal dot net>
Sponsored by:	DARPA, McAfee, SPARTA
Obtained from:	TrustedBSD Project
2005-05-04 10:39:15 +00:00
Robert Watson
97cce3269c Move definitions of 'struct kuser' and 'struct ksem' from uipc_sem.c
to ksem.h so that they are accessible from the MAC Framework for the
purposes of labeling and enforcing additional protections.  #error
if these are included without _KERNEL, since they are not intended
(nor installed) for user application use.

Submitted by:	Dandekar Hrishikesh <rishi_dandekar at sbcglobal dot net>
Sponsored by:	DARPA, SPARTA
Obtained from:	TrustedBSD Project
2005-05-03 20:21:24 +00:00
Jeff Roberson
532c491b2b - Initialize vfslocked correctly early enough for MAC to compile.
- Fix one place where we explicitly drop Giant!

Pointy hat to:	me
Submitted by:	Max Laier
Warned by:	Tinderbox
2005-05-03 16:24:59 +00:00
Jeff Roberson
279d4d3761 - Remove two mtx_asserts that can incorrectly trigger if
devstat_end_transaction is called from a fast interrupt.  Presently
   there is no way for mtx_assert to determine that we're not executing
   in a real thread context.

Submitted by:	jhusted@isilon.com
2005-05-03 10:58:05 +00:00
Jeff Roberson
059f090fa1 - A vnode may have made its way onto the free list while it was being
vgone'd.  We must remove it from the freelist before returning in
   vtryrecycle() or we may get a duplicate free.

Reported by:	kkenn
2005-05-03 10:56:00 +00:00
Jeff Roberson
269576c82a - Use namei to acquire Giant for VFS if it is necessary. Drop the explicit
Giant acquisition.
 - Remove GIANT_REQUIRED in the few remaining cases; the vm and vfs have
   both been locked.
2005-05-03 10:55:05 +00:00
Jeff Roberson
6de925e58b - Use NAMEI to pickup Giant if we need it in fpcheckstd(). 2005-05-03 10:52:22 +00:00
Jeff Roberson
e11a45c9e5 - Neither of our image formats require Giant now that the vm and vfs have
been locked.
2005-05-03 10:51:38 +00:00
Christian S.J. Peron
02fe1744f1 Since it is not possible for curthread to be NULL in this context,
drop the check+initialization for a straight initialization. Also
assert that curthread will never be NULL just to be sure.

Discussed with:	rwatson, peter
MFC after:	1 week
2005-05-02 02:07:55 +00:00
Jeff Roberson
b2e2166483 - All buffers should either be clean or dirty. If neither of these flags
are set when we attempt to remove a buffer from a queue we should panic.
   Hopefully this will catch the source of the wrong bufobj panics.

Sponsored by:	Isilon Systems, Inc.
2005-05-01 12:00:36 +00:00
Jeff Roberson
8d46d9c46f - Remove spls and comments relating to them. 2005-05-01 01:01:17 +00:00
Jeff Roberson
194dfed917 - Remove an old splcam hack. 2005-05-01 00:59:55 +00:00
Jeff Roberson
5270a2dbe3 - Remove unnecessary spls. 2005-05-01 00:59:34 +00:00
Jeff Roberson
4d7d299383 - Return EACCES if we're trying to exec on a vp with no object.
Errno supplied by:	cperciva
2005-05-01 00:58:19 +00:00
Sam Leffler
52bc746a28 o enable shutdown of taskqueue threads; the thread servicing the queue checks
a new entry in the taskqueue struct each time it wakes up to see if it
  should terminate
o adjust TASKQUEUE_DEFINE_THREAD & co. to record the thread/proc identity for
  the shutdown rendezvous
o replace wakeup after adding a task to a queue with wakeup_one; this helps
  queues where multiple threads are used to service tasks (e.g. acpi)
o remove NULL check of tq_enqueue method; it should never be NULL

Reviewed by:	dfr, njl
2005-05-01 00:38:11 +00:00
Doug White
fdc9713bf7 Implement an alternate method to stop CPUs when entering DDB. Normally we use
a regular IPI vector, but this vector is blocked when interrupts are disabled.
With "options KDB_STOP_NMI" and debug.kdb.stop_cpus_with_nmi set, KDB will
send an NMI to each CPU instead. The code also has a context-stuffing
feature which helps ddb extract the state of processes running on the
stopped CPUs.

KDB_STOP_NMI is only useful with SMP and complains if SMP is not defined.
This feature only applies to i386 and amd64 at the moment, but could be
used on other architectures with the appropriate MD bits.

Submitted by:	ups
2005-04-30 20:01:00 +00:00
Jeff Roberson
4a723b3604 - Remove long dead splbio() calls and comments relating to the old
synchronization mechanism.
2005-04-30 12:18:50 +00:00
Jeff Roberson
ba4f7c7023 - Don't acquire Giant before calling b_biodone, individual consumers are
now required to do so themselves.

Sponsored by:	Isilon Systems, Inc.
2005-04-30 11:44:22 +00:00
Jeff Roberson
a230c79b9e - Acquire Giant in AIO's iodone routine. VFS will no longer do it for us
soon.

Sponsored by:	Isilon Systems, Inc.
2005-04-30 11:27:31 +00:00
Jeff Roberson
4e0ed69694 - Call VM_LOCK_GIANT in cluster_callback() to protect some pmap calls. VFS
will not be acquiring Giant before calling this function anymore.

Sponsored by:	Isilon Systems, Inc.
2005-04-30 11:26:58 +00:00
Jeff Roberson
b2183bfe05 - In vnlru_free() remove the vnode from the free list before we call
vtryrecycle().  We could sometimes get into situations where two threads
   could try to recycle the same vnode before this.
 - vtryrecycle() is now responsible for returning the vnode to the free list
   if it fails and someone else hasn't done it.
 - Make a new function vfreehead() which moves a vnode to the head of the
   free list and use it in vgone() to clean up that code a bit.

Sponsored by:	Isilon Systems, Inc.
Reported by:	pho, kkenn
2005-04-30 11:22:40 +00:00
Jeff Roberson
0dd02d67eb - Don't vgonel() via vgone() or vrecycle() if the vnode is already doomed.
This fixes forced unmounts via nullfs.

Reported by:	kkenn
Sponsored by:	Isilon Systems, Inc.
2005-04-27 10:03:21 +00:00
Jeff Roberson
6c317bc4cf - Stop setting vxthread, we've asserted that it was useless for several
weeks now.
2005-04-27 09:17:33 +00:00
Jeff Roberson
549817334a - Stop checking vxthread, we've asserted that it was useless for several
weeks.
2005-04-27 09:17:11 +00:00
Jeff Roberson
7625cbf3cc - Pass the ISOPEN flag to namei so filesystems will know we're about to
open them or otherwise access the data.
2005-04-27 09:05:19 +00:00
Matthew N. Dodd
abb886facb Add missing break.
Found by:	marcus
2005-04-25 00:48:04 +00:00
Sam Leffler
f4581151a8 o eliminate modification of task structures after their run to avoid
modify-after-free races when the task structure is malloc'd
o shrink task structure by removing ta_flags (no longer needed with
  avoid fix) and combining ta_pending and ta_priority

Reviewed by:	dwhite, dfr
MFC after:	4 days
2005-04-24 16:52:45 +00:00
David Xu
bc247e78c0 Wake up swapper process if needed.
PR: kern/78474
Submitted by: Sam Lawrance <boris at brooknet dot com dot au>
2005-04-23 05:06:44 +00:00
David Xu
40f2d4dafd Regen. 2005-04-23 02:38:17 +00:00
David Xu
c4bd610f58 Add new syscall thr_new to create thread in atomic, it will
inherit signal mask from parent thread, setup TLS and stack, and
user entry address.
Also support POSIX thread's PTHREAD_SCOPE_PROCESS and PTHREAD_SCOPE_SYSTEM,
sysctl is also provided to control the scheduler scope.
2005-04-23 02:36:07 +00:00
David Xu
21fc316430 Change cpu_set_kse_upcall to more generic style, so we can reuse it
in other codes. Add cpu_set_user_tls, use it to tweak user register
and setup user TLS. I ever wanted to merge it into cpu_set_kse_upcall,
but since cpu_set_kse_upcall is also used by M:N threads which may
not need this feature, so I wrote a separated cpu_set_user_tls.
2005-04-23 02:32:32 +00:00
Jeff Roberson
17314e6286 - Define the real lock order with cdev and a few vm/vfs related locks. This
can be removed once cdev no longer calls free() with the cdev lock held.
2005-04-22 22:43:31 +00:00
Jeff Roberson
57f66be038 - Check LO_DUPOK as well as LOP_DUPOK when determining whether we should
warn about duplicate acquires.

Sponsored by:	Isilon Systems, Inc.
2005-04-22 22:39:46 +00:00
Tom Rhodes
498693053c Get the directory structure correct in a comment.
Submitted by:	Samy Al Bahra
2005-04-22 19:09:12 +00:00
Jeff Roberson
7d60dc524b - Disable code which allows getnewvnode() to fail. Many ffs_vget() callers
do not correctly deal with failures.  This presently risks deadlock
   problems if dependency processing is held up by failures to allocate
   a vnode, however, this is better than the situation with the failures.

Sponsored by:	Isilon Systems, Inc.
2005-04-22 00:57:05 +00:00
Jeff Roberson
0d12524bbf - Add two KASSERTs to prevent us from recycling a buf that is still on a
bufobj list.

Sponsored by:	Isilon Systems, Inc.
2005-04-22 00:53:20 +00:00
Marcel Moolenaar
1020bb9756 Do not conditionally compile the contents of this file upon whether
HWPMC_HOOKS is defined. The pmc_cpu_is_*() functions in this file
are referenced unconditionally by hwpmc(4).

This is mostly a stop-gap. The pmc_cpu_is*() function should
probably be declared inline in <sys/pmc.h> or <sys/pmckern.h> and
the function pointers with corresponding SX lock should probably
be moved to another file and compiled conditionally upon HWPMC_HOOKS.

Ok'd by: jkoshy@
2005-04-20 20:30:59 +00:00
David Xu
3d5c30f7c2 Inherit signal mask for child process in fork1(), RELENG_4 and other
*BSD have this behaviour, also it is required by POSIX.

PR: kern/80130
Submitted by: Kostik Belousov konstantin.belousov at zoral dot com dot ua
2005-04-20 13:14:52 +00:00
Matthew N. Dodd
96a041b533 Check sopt_level in uipc_ctloutput() and return early if it is non-zero.
This prevents unintended consequnces when an application calls things like
setsockopt(x, SOL_SOCKET, SO_REUSEADDR, ...) on a Unix domain socket.
2005-04-20 02:57:56 +00:00
Pawel Jakub Dawidek
f163441e7e Call g_waitidle() before every check the list of holds is empty.
Suggested by:	phk
2005-04-19 21:44:44 +00:00
David Xu
902c0d8297 Clear P_STATCHILD earlier to avoid unnecessary retrying. 2005-04-19 12:31:15 +00:00
David Xu
407948a530 Oops, forgot to update this file.
Fix a race condition between kern_wait() and thread_stopped().
Problem is in kern_wait(), parent process steps through children list,
once a child process is skipped, and later even if the child is stopped,
parent process still sleeps in msleep(), the race happens if parent
masked SIGCHLD.

Submitted by : Peter Edwards peadar.edwards at gmail dot com
MFC after    : 4 days
2005-04-19 08:11:28 +00:00