Commit Graph

7070 Commits

Author SHA1 Message Date
Peter Wemm
25e247af44 The KERN_PROC_PROC sysctl took 4 args in 5.0-REL and 5.1-REL. We need to
accept this for a bit longer.  Requiring the new order of 3 args only
was not very helpful.
2003-10-15 03:11:46 +00:00
Sam Leffler
bd19669855 Change default for kern.polling.idle_poll back to 1. This was set to 0
because Luigi observed livelock but in recent testing it did not occur
so I'm re-enabling it by default.

Reviewed by:	luigi
2003-10-14 18:39:36 +00:00
Poul-Henning Kamp
b84044731d Made use of 'error' argument, which was unused (by mistake) before.
Submitted by:    Pawel Jakub Dawidek <nick@garage.freebsd.pl>
2003-10-14 08:09:43 +00:00
Warner Losh
d29516dd82 With DIAGNOSTICS, sometimes we get weird crashes when some driver
accesses softc after it is freed.  Use a different malloc type for
softc than the rest of the bus code to make it more clear when these
things happen that it is the driver that's at fault, not the bus code.

Suggested by: sam and/or phk (I think)
2003-10-14 06:22:07 +00:00
Jeff Roberson
85b9831dfa - Add a mising vn_finished_write()
Pointy hat:     jeff
Found by:       robert
Obtained from:  kirk
2003-10-14 00:38:34 +00:00
David Xu
3a2e2a0ec8 Don't clear signal mask in execsig(). RELENG_4 does not clear it and POSIX
asks to inherit signal mask for execv.
2003-10-13 14:03:08 +00:00
Jeff Roberson
736c97c7b3 - In SCHED_CURR() add holding Giant to the list of criteria that will keep
you on the current queue.  In the future, it would be nice if priority
   propagation could deterministicly pluck a thread off of the next queue
   and put it on the current queue.  Until then this hack stops us from
   holding up our entire current queue, including interrupt handlers, while
   a thread on the next queue is blocked while holding Giant.
 - Inherit our pctcpu information from our parent.
2003-10-12 21:07:31 +00:00
Alan Cox
d58e70a08d In vfs_bio_clrbuf(), ignore the state of the object lock if the page is the
"bogus" page.

Found by:	tegge
2003-10-12 18:26:48 +00:00
Poul-Henning Kamp
5108cd3652 Simplify vn_isdisk() a bit. 2003-10-12 14:04:39 +00:00
John-Mark Gurney
9e5de980c6 fix a problem referencing free'd memory. This is only a problem for
kqueue write events on a socket and you regularly create tons of pipes
which overwrites the structure causing a panic when removing the knote
from the list.  If the peer has gone away (and it's a write knote), then
don't bother trying to remove the knote from the list.

Submitted by:	Brian Buchanan and myself
Obtained from:	nCircle
2003-10-12 07:06:02 +00:00
Jeff Roberson
7dd1328c13 - Fix a typo, I meant & and not |. This was causing lockups from the syncer
looping forever due to list corruption.

Solved by:	tegge
2003-10-11 21:50:45 +00:00
Alan Cox
08814d66d5 - Synchronize access to a page's valid field in vfs_bio_clrbuf()
by using the lock from its containing object.
 - Remove GIANT_REQUIRED from vm_hold_load_pages().
2003-10-10 07:26:21 +00:00
Robert Drehmel
ea924c4cd3 Implement preliminary support for the PT_SYSCALL command to ptrace(2). 2003-10-09 10:17:16 +00:00
Tim J. Robbins
a50f62fd9f Remove support for the unused 4th component of the KERN_PROC_PROC sysctl. 2003-10-06 01:26:11 +00:00
Jeff Roberson
d1cf0fc7fc - Add a missing vn_start_write() to flushbufqueues(). This could have
caused snapshot related problems.
 - The vp can not be NULL here or we would panic in vfs_bio_awrite().  Stop
   confusing the logic by checking for it in several places.

Submitted by:	kirk and then rototilled by me to remove vp == NULL checks.
2003-10-05 22:16:08 +00:00
Bruce M Simpson
f05970242b Bring back sysctl_wire_old_buffer(). Fix a bug in sysctl_handle_opaque()
whereby the pointers would not get reset on a retried SYSCTL_OUT() call.

Noticed by:	bde
2003-10-05 13:31:33 +00:00
Bruce M Simpson
dcf59a59fc Fix a security problem in sysctl() the long way round.
Use pre-emption detection to avoid the need for wiring a userland buffer
when copying opaque data structures.

sysctl_wire_old_buffer() is now a no-op. Other consumers of this
API should use pre-emption detection to notice update collisions.

vslock() and vsunlock() should no longer be called by any code
and should be retired in subsequent commits.

Discussed with:	pete, phk
MFC after:	1 week
2003-10-05 09:37:47 +00:00
Bruce M Simpson
0c9601bc6b Add a pre-emption counter, td_generation, so that threads can notice
when they have been pre-empted by other threads. This is bumped from
within mi_switch() every time a context switch takes place.

Discussed with:	pete
2003-10-05 09:35:08 +00:00
Bruce M Simpson
51830edcc5 Fold the vslock() and vsunlock() calls in this file with #if 0's; they will
go away in due course. Involuntary pre-emption means that we can't count
on wiring of pages alone for consistency when performing a SYSCTL_OUT()
bigger than PAGE_SIZE.

Discussed with:	pete, phk
2003-10-05 08:38:22 +00:00
Jeff Roberson
98d7d155c1 - Apply a big giant lock around the namecache. This has been sitting in
my tree since BSDcon.
2003-10-05 07:13:50 +00:00
Jeff Roberson
bdcfcdecea - Fix an XXX. Check the error of vn_lock() in vflush(). Don't specify
LK_RETRY either, we don't want this vnode if it turns into another.
 - Remove the code that checks the mount point after acquiring the lock
   we are guaranteed to either fail or get the vnode that we wanted.
2003-10-05 07:12:38 +00:00
Bruce M Simpson
5be99846fc Remove magic numbers surrounding locking state in the sysctl module, and
replace them with more meaningful defines.
2003-10-05 05:38:30 +00:00
Jeff Roberson
45503a37dd - Rename vcanrecycle() to vtryrecycle() to reflect its new role.
- In vtryrecycle() try to vgonel the vnode if all of the previous checks
   passed.  We won't vgonel if someone has either acquired a hold or usecount
   or started the vgone process elsewhere.  This is because we may have been
   removed from the free list while we were inspecting the vnode for
   recycling.
 - The VI_TRYLOCK stops two threads from entering getnewvnode() and recycling
   the same vnode.  To further reduce the likelyhood of this event, requeue
   the vnode on the tail of the list prior to calling vtryrecycle().  We can
   not actually remove the vnode from the list until we know that it's
   going to be recycled because other interlock holders may see the VI_FREE
   flag and try to remove it from the free list.
 - Kill a bogus XXX comment.  If XLOCK is set we shouldn't wait for it
   regardless of MNT_WAIT because the vnode does not actually belong to
   this filesystem.
2003-10-05 05:35:41 +00:00
Jeff Roberson
85311d4b59 - Don't cache_purge() in getnewvnode. It's done in vclean(). With this
purge, the purge in vclean, and the filesystems purge, we had 3 purges
   per vnode.
 - Move the insmntque(vp, 0) to vclean() so that we may remove it from the
   two vgone() functions and reduce the number of lock operations required.
2003-10-05 02:48:04 +00:00
Jeff Roberson
ce13b187e7 - Solve a LOR with the sync_mtx by using the VI_ONWORKLST flag to determine
whether or not the sync failed.  This could potentially get set between
   the time that we VOP_UNLOCK and VI_LOCK() but the race would harmelssly
   lead to the sync being delayed by an extra 30 seconds.  If we do not move
   the vnode it could cause an endless loop if it continues to fail to sync.
 - Use vhold and vdrop to stop the vnode from changing identities while we
   have it unlocked.  Other internal vfs lists are likely to follow this
   scheme.
2003-10-05 00:35:41 +00:00
Jeff Roberson
894fbf9769 - Move the xlock 'locking' code into vx_lock() and vx_unlock().
- Create a new function, vgonechrl(), which performs vgone for an in-use
   character device.  Move the code from vflush() that did this into
   vgonechrl().
 - Hold the xlock across the entirety of vgonel() and vgonechrl() so that
   at no point will an invalid vnode exist on any list without XLOCK set.
 - Move the xlock code out of vclean() now that it is in the vgone*()
   functions.
2003-10-05 00:02:41 +00:00
Alan Cox
6ec2fca505 Eliminate some unnecessary uses of the vm page queues lock around the
vm page's valid field.  This field is being synchronized using the
containing vm object's lock.
2003-10-04 22:47:20 +00:00
Alan Cox
bf0da100d6 - Extend the scope the vm object lock to cover calls to
vm_page_is_valid().
 - Assert that the lock on the containing vm object is held in
   vm_page_is_valid().
2003-10-04 19:23:29 +00:00
Jeff Roberson
6f4b0863e0 - In sched_sync() test our preconditions prior to dropping the sync_mtx.
This is so that we may grab the interlock while still holding the
   sync_mtx.  We have to VI_TRYLOCK() because in all other cases the lock
   order runs the other way.
 - If we don't meet any of the preconditions, reinsert the vp into the
   list for the next second.
 - We don't need to panic if we fail to sync here because each FSYNC
   function handles this case.  Removing this redundant code also
   simplifies locking.
2003-10-04 18:03:53 +00:00
Jeff Roberson
8ec82641d8 - Change a lame iterative algorithm to a constant time algorithm. Remove
the XXX that complains about it as well.

Submitted by:	ThomasWuerfl@gmx.de
2003-10-04 17:41:13 +00:00
Jeff Roberson
e4c49d2b50 - In a Giantless world, the vn_lock() in vcanrecycle() could legitimately
fail.  Remove the panic from that case and document why it might fail.
 - Document the reason for calling cache_purge() on a newly created vnode.
 - In insmntque() order the operations so that we can call mtx_unlock()
   one fewer times.  This makes the code somewhat clearer as well.
 - Add XXX comments in sched_sync() and vflush().
 - In vget(), do not sleep while waiting for XLOCK to clear if LK_NOWAIT is
   set.
 - In vclean() we don't need to acquire a lock around a single TAILQ_FIRST
   call.  It's ok if we race here, the vinvalbuf will just do nothing.
 - Increase the scope of the lock in vgonel() to reduce the number of lock
   operations that are performed.
2003-10-04 15:10:40 +00:00
Jeff Roberson
1de1f935f2 - If we are called with LK_NOWAIT in vn_lock() we may be holding a mutex
and should not sleep while waiting for XLOCK to clear.  Care needs to be
   taken in functions that use this capability to avoid spinning.
2003-10-04 14:35:22 +00:00
Jacques Vidrine
8b7358ca43 Introduce a uiomove_frombuf helper routine that handles computing and
validating the offset within a given memory buffer before handing the
real work off to uiomove(9).

Use uiomove_frombuf in procfs to correct several issues with
integer arithmetic that could result in underflows/overflows.  As a
side-effect, the code is significantly simplified.

Add additional sanity checks when computing a memory allocation size
in pfs_read.

Submitted by:	rwatson  (original uiomove_frombuf -- bugs are mine :-)
Reported by:	Joost Pol <joost@pine.nl>  (integer underflows/overflows)
2003-10-02 15:00:55 +00:00
Robert Watson
c142b0fcfe Remove the global variable 'cmask', which was used to initialize the
fd_cmask field in the file descriptor structure for the first process
indirectly from CMASK, and when an fd structure is initialized before
being filled in, and instead just use CMASK.  This appears to be an
artifact left over from the initial integration of quotas into BSD.

Suggested by:	peter
2003-10-02 03:57:59 +00:00
Jeff Roberson
fa3f9daae5 - On my Pentium4-M laptop, invalpg takes ~1100 cycles if the page is found in
the TLB and ~1600 if it is not.  Therefore, it is more effecient to
   invalidate the TLB after operations that use CMAP rather than before.
 - So that the tlb is invalidated prior to switching off of a processor, we
   must change the switchin functions to switchout functions.
 - Remove td_switchout from the thread and move it to the x86 pcb.
 - Move the code that calls switchout into swtch.s.  These changes make this
   optimization truely x86 specific.
2003-09-30 08:11:36 +00:00
Robert Watson
cc7b13bfe0 If the struct mac copied into the kernel has a negative length, return
EINVAL rather than failing the following malloc due to the value being
too large.
2003-09-29 18:35:17 +00:00
Poul-Henning Kamp
431021789f Retire revoke_and_destroy_dev() with extreme prejudice. 2003-09-28 20:50:36 +00:00
Marcel Moolenaar
c31f2280ed Remove the regstkpages sysctl variable. We have a growable register
stack now.
2003-09-27 23:07:47 +00:00
Marcel Moolenaar
fd75d71049 Part 2 of implementing rstacks: add the ability to create rstacks and
use the ability on ia64 to map the register stack. The orientation of
the stack (i.e. its grow direction) is passed to vm_map_stack() in the
overloaded cow argument. Since the grow direction is represented by
bits, it is possible and allowed to create bi-directional stacks.
This is not an advertised feature, more of a side-effect.

Fix a bug in vm_map_growstack() that's specific to rstacks and which
we could only find by having the ability to create rstacks: when
the mapped stack ends at the faulting address, we have not actually
mapped the faulting address. we need to include or cover the faulting
address.

Note that at this time mmap(2) has not been extended to allow the
creation of rstacks by processes. If such a need arises, this can
be done.

Tested on: alpha, i386, ia64, sparc64
2003-09-27 22:28:14 +00:00
Poul-Henning Kamp
98c469d484 Make life a little bit easier for cloning device drivers. 2003-09-27 21:50:00 +00:00
Poul-Henning Kamp
b294143142 Introduce no_poll() default method for device drivers. Have it
do exactly the same as vop_nopoll() for consistency and put a
comment in the two pointing at each other.

Retire seltrue() in favour of no_poll().

Create private default functions in kern_conf.c instead of public
ones.

Change default strategy to return the bio with ENODEV instead of
doing nothing which would lead the bio stranded.

Retire public nullopen() and nullclose() as well as the entire band
of public no{read,write,ioctl,mmap,kqfilter,strategy,poll,dump}
funtions, they are the default actions now.

Move the final two trivial functions from subr_xxx.c to kern_conf.c
and retire the now empty subr_xxx.c
2003-09-27 12:53:33 +00:00
Poul-Henning Kamp
41cbb0b237 Don't use seltrue when that is not really what we mean. 2003-09-27 12:44:06 +00:00
Poul-Henning Kamp
70cd771337 The present defaults for the open and close for device drivers which
provide no methods does not make any sense, and is not used by any
driver.

It is a pretty hard to come up with even a theoretical concept of
a device driver which would always fail open and close with ENODEV.

Change the defaults to be nullopen() and nullclose() which simply
does nothing.

Remove explicit initializations to these from the drivers which
already used them.
2003-09-27 12:01:01 +00:00
Poul-Henning Kamp
3f99f14bf1 OK, I messed up /dev/console with what I had hoped would be compat
code.  Convert remaining console drivers and hope for the best.
2003-09-26 19:35:50 +00:00
Robert Drehmel
4cc9f52f78 Move some tracing related code into its own function as it will
be needed for system call related ptrace functionality I plan
to commit soon.
2003-09-26 15:09:46 +00:00
Poul-Henning Kamp
3d4274a52b Update the list of CDROM device names to try for booting with RB_CDROM
flag set.
2003-09-26 09:07:27 +00:00
Poul-Henning Kamp
0d44087987 Remove wrongly sized cnd_name field, we now store the name in the
consdev structure.

If the consdev name is not set and we have a cn_dev, set the name
from there.  Try to issue a printf about this, even though it may
not have a place to go.

Modify the sysctl related code to pick up the name from the consdev
instead.
2003-09-26 07:26:54 +00:00
Peter Wemm
c460ac3a00 Add sysentvec->sv_fixlimits() hook so that we can catch cases on 64 bit
systems where the data/stack/etc limits are too big for a 32 bit process.

Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c.

Supply an ia32_fixlimits function.  Export the clip/default values to
sysctl under the compat.ia32 heirarchy.

Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max
value rather than the sysctl tweakable variable.  This allows mmap to
place mappings at sensible locations when limits have been reduced.

Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same
method as mmap(0, ...) now does.

Note that we cannot remove all references to the sysctl tweakable
maxdsiz etc variables because /etc/login.conf specifies a datasize
of 'unlimited'.  And that causes exec etc to fail since it can no
longer find space to mmap things.
2003-09-25 01:10:26 +00:00
Max Khon
b15572e3fc Avoid NULL pointer dereferencing in modlist_lookup2().
PR:		56570
Submitted by:	Thomas Wintergerst <Thomas.Wintergerst@nord-com.net>
2003-09-23 14:42:38 +00:00
Alan Cox
c76789caa6 - vm_hold_free_pages() should lock the kernel object. (The pages being
freed belong to the kernel object.)
 - Increase the granularity of the vm object locking in vm_hold_load_pages()
   in order to reduce the number of times that we acquire and release the
   same lock.
2003-09-22 04:58:09 +00:00
Doug Rabson
ab7a2646e0 The method link_preload_finish is not static. 2003-09-20 17:39:32 +00:00
Jeff Roberson
81de51bf1d - Somewhere along the line I stupidly removed critical logic from
sched_ptcpu_update().  This caused erroneous cpu times in TOP for
   processes that were asleep.  Replace the code that was removed.
2003-09-20 02:05:58 +00:00
Jeff Roberson
51b575490c - In reassignbuf() don't unlock vp and lock newvp if they are the same.
Doing so creates a race where the buf is on neither list.
 - Only vfree() in an error case in vclean() if VSHOULDFREE() thinks we
   should.
 - Convert the error case in vclean() to INVARIANTS from DIAGNOSTIC as this
   really should not happen and is fast to check.
2003-09-20 00:21:48 +00:00
Jeff Roberson
6b6c163a37 - Remove spls(). The locking that has replaced them is in place and they
no longer serve as guidelines for future work.
2003-09-19 23:52:06 +00:00
Alexander Kabaev
aebbeee812 Eliminate one case of VI_UNLOCK followed by an immediate
VI_LOCK.
2003-09-19 19:13:54 +00:00
Tim J. Robbins
3ddaef4034 Allow the KERN_PROC_PROC sysctl to be used without the useless 4th
name component, for consistency with KERN_PROC_ALL. Support for the
4-argument form will be removed some time before 5.2-R.
2003-09-19 14:16:50 +00:00
Jeff Roberson
9fb535dec5 - Only use UMA to cache malloc requests up to PAGE_SIZE. Values larger than
this are requested very infrequently and waste memory when we cache
   spares.
2003-09-19 04:39:08 +00:00
Alan Cox
35b86dc8de Correct a typo in the previous revision. 2003-09-15 02:56:48 +00:00
Robert Watson
62c45ef40a Add a new sysctl, security.bsd.conservative_signals, to disable
special signal-delivery protections for setugid processes.  In the
event that a system is relying on "unusual" signal delivery to
processes that change their credentials, this can be used to work
around application problems.

Also, add SIGALRM to the set of signals permitted to be delivered to
setugid processes by unprivileged subjects.

Reported by:	Joe Greco <jgreco@ns.sol.net>
2003-09-14 07:22:38 +00:00
Jacques Vidrine
5949ba2136 sched_setscheduler: Return EINVAL when a invalid policy is specified,
thus complying with POLA and the man page.  (Previously, no error was
returned for this case.)
2003-09-13 18:46:24 +00:00
Jacques Vidrine
b5e80ae344 Correct mostly harmless off-by-one error in getdomainname().
Reviewed by:	imp
2003-09-13 17:12:22 +00:00
Alan Cox
58abfe0051 Convert vmapbuf() from using pmap_extract() to using
pmap_extract_and_hold().  Note, however, that GIANT_REQUIRED should not be
removed until all platforms fully implement the "prot" parameter to
pmap_extract_and_hold().

Reviewed by:	tegge
2003-09-13 04:29:55 +00:00
Alan Cox
27d203eab3 pipe_build_write_buffer() only requires read access of the page that it
obtains from pmap_extract_and_hold().
2003-09-12 07:13:15 +00:00
Marcel Moolenaar
da13b8f9fe Introduce BUS_CONFIG_INTR(). The method allows devices to tell parents
about interrupt trigger mode and interrupt polarity. This allows ACPI
for example to pass interrupt resource information up the hierarchy.
The default implementation of the method therefore is to pass the
request to the parent.

Reviewed by: jhb, njl
2003-09-10 21:37:10 +00:00
Hidetoshi Shimokawa
8edbaf859d Fix asynchronous physio breakage introduced in rev 1.163.
We cannnot use bp->b_caller2 because DEV_STRATEGY will overwrite it.
2003-09-10 15:48:51 +00:00
John Baldwin
2b3c42a9e9 Update the license on this file to be a bit more sane. 2003-09-10 01:09:32 +00:00
Ian Dowse
ffe40c80ea In the !MNT_BYFSID case, return EINVAL from unmount(2) when the
specified directory is not found in the mount list. Before the
MNT_BYFSID changes, unmount(2) used to return ENOENT for a nonexistent
path and EINVAL for a non-mountpoint, but we can no longer distinguish
between these cases. Of the two error codes, EINVAL was more likely
to occur in practice, and it was the only one of the two that was
documented.

Update the manual page to match the current behaviour.

Suggested by:	tjr
Reviewed by:	tjr
2003-09-08 16:23:21 +00:00
Alan Cox
03be99d20c Use pmap_extract_and_hold() in pipe_build_write_buffer(). Consequently,
pipe_build_write_buffer() no longer requires Giant on entry.

Reviewed by:	tegge
2003-09-08 04:58:32 +00:00
Tim J. Robbins
f05a427aa6 Return EINVAL if the contested bit is not set on the umtx passed to
_umtx_unlock() instead of firing a KASSERT.
2003-09-07 11:14:52 +00:00
Alan Cox
ffe5125eac msync(2) should be declared MP-safe. 2003-09-07 05:42:07 +00:00
Sam Leffler
6c024e8ef6 add fast swi taskqueue spinlock to the order_list so witness doesn't complain
Submitted by:	Tor Egge <Tor.Egge@cvsup.no.freebsd.org>
2003-09-06 21:06:08 +00:00
Sam Leffler
7e2282a5a6 correct fast swi taskqueue spinlock name to be different from the sleep lock
Submitted by:	Tor Egge <Tor.Egge@cvsup.no.freebsd.org>
2003-09-06 21:05:18 +00:00
Alan Cox
603d3d4a44 Giant is no longer required by pipe_destroy_write_buffer(). Reduce
unnecessary white space from pipe_destroy_write_buffer().
2003-09-06 21:02:10 +00:00
Sam Leffler
f82c9e70f9 "fast swi" taskqueue support. This is a taskqueue that uses spinlocks
making it useful for dispatching swi tasks from fast interrupt handlers.

Sponsered by:	FreeBSD Foundation
2003-09-05 23:09:22 +00:00
Sam Leffler
7c00e355a2 Print a message at boot for interrupt handlers created with INTR_MPSAFE
and/or INTR_FAST.  This belongs elsehwere and perhaps under bootverbose;
I'm committing it for now as it's uesful to know which drivers have
been converted and which have not.
2003-09-05 22:51:18 +00:00
Peter Wemm
917cf8d2a3 Log involuntary context switches correctly. 2003-09-05 22:15:26 +00:00
Poul-Henning Kamp
ce914a08b0 Put the message about msgbuf cksum mismatch under bootverbose and tell
people what the consequence is.
2003-09-05 11:12:00 +00:00
Poul-Henning Kamp
c679c73452 Use the quality to disable timecounters for which we deem Hz too low. 2003-09-03 08:14:16 +00:00
Kenneth D. Merry
cb32189e23 Move dynamic sysctl(8) variable creation for the cd(4) and da(4) drivers
out of cdregister() and daregister(), which are run from interrupt context.

The sysctl code does blocking mallocs (M_WAITOK), which causes problems
if malloc(9) actually needs to sleep.

The eventual fix for this issue will involve moving the CAM probe process
inside a kernel thread.  For now, though, I have fixed the issue by moving
dynamic sysctl variable creation for these two drivers to a task queue
running in a kernel thread.

The existing task queues (taskqueue_swi and taskqueue_swi_giant) run in
software interrupt handlers, which wouldn't fix the problem at hand.  So I
have created a new task queue, taskqueue_thread, that runs inside a kernel
thread.  (It also runs outside of Giant -- clients must explicitly acquire
and release Giant in their taskqueue functions.)

scsi_cd.c:	Remove sysctl variable creation code from cdregister(), and
		move it to a new function, cdsysctlinit().  Queue
		cdsysctlinit() to the taskqueue_thread taskqueue once we
		have fully registered the cd(4) driver instance.

scsi_da.c:	Remove sysctl variable creation code from daregister(), and
		move it to move it to a new function, dasysctlinit().
		Queue dasysctlinit() to the taskqueue_thread taskqueue once
		we have fully registered the da(4) instance.

taskqueue.h:	Declare the new taskqueue_thread taskqueue, update some
		comments.

subr_taskqueue.c:
		Create the new kernel thread taskqueue.  This taskqueue
		runs outside of Giant, so any functions queued to it would
		need to explicitly acquire/release Giant if they need it.

cd.4:		Update the cd(4) man page to talk about the minimum command
		size sysctl/loader tunable.  Also note that the changer
		variables are available as loader tunables as well.

da.4:		Update the da(4) man page to cover the retry_count,
		default_timeout and minimum_cmd_size sysctl variables/loader
		tunables.  Remove references to /dev/r???, they aren't used
		any longer.

cd.9:		Update the cd(9) man page to describe the CD_Q_10_BYTE_ONLY
		quirk.

taskqueue.9:	Update the taskqueue(9) man page to describe the new thread
		task queue, and the taskqueue_swi_giant queue.

MFC after:	3 days
2003-09-03 04:46:28 +00:00
Sam Leffler
28ace1bf60 move domain list mutex initialization to earlier in the boot sequence so
statically configured modules like netgraph can call net_init_domain

Noticed by:	D.Rock@t-online.de (D. Rock)
2003-09-02 20:59:23 +00:00
Mike Silbersack
3390d47670 Implement MBUF_STRESS_TEST mark II.
Changes from the original implementation:

- Fragmentation is handled by the function m_fragment, which can
be called from whereever fragmentation is needed.  Note that this
function is wrapped in #ifdef MBUF_STRESS_TEST to discourage non-testing
use.

- m_fragment works slightly differently from the old fragmentation
code in that it allocates a seperate mbuf cluster for each fragment.
This defeats dma_map_load_mbuf/buffer's feature of coalescing adjacent
fragments.  While that is a nice feature in practice, it nerfed the
usefulness of mbuf_stress_test.

- Add two modes of random fragmentation.  Chains with fragments all of
the same random length and chains with fragments that are each uniquely
random in length may now be requested.
2003-09-01 05:55:37 +00:00
Sam Leffler
b9651df42c o interlock domain list when adding domains
o remove irrlevant spl

Notes:

1. We don't lock domain list traversals as this is safe until we start
   removing domains.
2. The calculation of max_datalen in net_init_domain appears safe as
   noone depends on max_hdr and max_datalen having consistent values.
3. Giant is still held for fast and slow timeouts; this must stay until
   each timeout routine is properly locked (coming soon).

Sponsored by:	FreeBSD Fondation
2003-09-01 05:01:55 +00:00
Jeff Roberson
d919a11d06 - Define a new flag for getblk(): GB_NOCREAT. This flag causes getblk() to
bail out if the buffer is not already present.
 - The buffer returned by incore() is not locked and should not be sent to
   brelse().  Use getblk() with the new GB_NOCREAT flag to preserve the
   desired semantics.
2003-08-31 08:50:11 +00:00
Jeff Roberson
a7db559087 - If there is no vp assume that BKGRDINPROG is not set and set RELPBUF in
brelse().
2003-08-31 01:07:45 +00:00
Jeff Roberson
b5c61abd82 - In some cases bp->b_vp can be NULL in brelse, don't try to lock the
interlock in that case.

Found by:	alc
2003-08-31 00:06:07 +00:00
Alan Cox
411d10a600 Migrate the sf_buf allocator that is used by sendfile(2) and zero-copy
sockets into machine-dependent files.  The rationale for this
migration is illustrated by the modified amd64 allocator.  It uses the
amd64's direct map to avoid emphemeral mappings in the kernel's
address space.  On an SMP, the emphemeral mappings result in an IPI
for TLB shootdown for each transmitted page.  Yuck.

Maintainers of other 64-bit platforms with direct maps should be able
to use the amd64 allocator as a reference implementation.
2003-08-29 20:04:10 +00:00
Marcel Moolenaar
9e8147f3af In bufdone(), change the format specifier for m->valid and m->dirty to
a long type and explicitly cast m->valid and m->dirty to unsigned long.
When PAGE_SIZE is 32K, these fields are in fact unsigned long.
2003-08-28 19:58:11 +00:00
Alexander Kabaev
772a9659d9 Do not return with vnode interlock held.
Reviewed by:	rwatson
2003-08-28 15:48:15 +00:00
Jeff Roberson
9dbfeb0ae6 - Move BX_BKGRDWAIT and BX_BKGRDINPROG to BV_ and the b_vflags field.
- Surround all accesses of the BKGRD{WAIT,INPROG} flags with the vnode
   interlock.
 - Don't use the B_LOCKED flag and QUEUE_LOCKED for background write
   buffers.  Check for the BKGRDINPROG flag before recycling or throwing
   away a buffer.  We do this instead because it is not safe for us to move
   the original buffer to a new queue from the callback on the background
   write buffer.
 - Remove the B_LOCKED flag and the locked buffer queue.  They are no longer
   used.
 - The vnode interlock is used around checks for BKGRDINPROG where it may
   not be strictly necessary.  If we hold the buf lock the a back-ground
   write will not be started without our knowledge, one may only be
   completed while we're not looking.  Rather than remove the code, Document
   two of the places where this extra locking is done.  A pass should be
   done to verify and minimize the locking later.
2003-08-28 06:55:18 +00:00
Robert Watson
a6a65b05d5 Fix a mac_policy_list reference to be a mac_static_policy_list
reference: this fixes mac_syscall() for static policies when using
optimized locking.

Obtained from:	TrustedBSD Project
Sponosred by:	DARPA, Network Associates Laboratories
2003-08-26 17:29:02 +00:00
David Xu
ab2baa7254 Let SA process work under ULE scheduler, originally it would panic kernel.
Reviewed by: jeff
2003-08-26 11:33:15 +00:00
Alan Cox
b7ad744dc5 Hold the page queues lock when performing vm_page_clear_dirty() and
vm_page_set_invalid().
2003-08-23 18:11:53 +00:00
Tim J. Robbins
c89d555c6c Fix a logic error in osethostid() that was introduced in rev. 1.34:
allow hostid to be set when suser() returns 0, not when it returns
an error. This would have allowed non-root users to set the host ID.
2003-08-23 15:45:57 +00:00
Marcel Moolenaar
38bf4e9667 On ia64 time_t is 64 bit. Explicitly cast tv_sec to long and change
the corresponding format specifier to %ld in a call to printf() in
function softclock(). The printf() is conditional upon DIAGNOSTIC.

Found by: LINT
2003-08-23 08:31:32 +00:00
Robert Watson
eb8c7f9992 Introduce two new MAC Framework and MAC policy entry points:
mac_reflect_mbuf_icmp()
  mac_reflect_mbuf_tcp()

These entry points permit MAC policies to do "update in place"
changes to the labels on ICMP and TCP mbuf headers when an ICMP or
TCP response is generated to a packet outside of the context of
an existing socket.  For example, in respond to a ping or a RST
packet to a SYN on a closed port.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-08-21 18:21:22 +00:00
Eivind Eklund
effb9ebd01 Change description of kern.osreldate from "Operating system release date" to
"Kernel release date" - userland version is in /usr/include/osreldate.h
2003-08-21 14:47:08 +00:00
Robert Watson
c096756c00 Add mac_check_vnode_deleteextattr() and mac_check_vnode_listextattr():
explicit access control checks to delete and list extended attributes
on a vnode, rather than implicitly combining with the setextattr and
getextattr checks.  This reflects EA API changes in the kernel made
recently, including the move to explicit VOP's for both of these
operations.

Obtained from:	TrustedBSD PRoject
Sponsored by:	DARPA, Network Associates Laboratories
2003-08-21 13:53:01 +00:00
Robert Watson
8d8d5ea8f2 Remove about 40 lines of #ifdef/#endif by using new macros
MAC_DEBUG_COUNTER_INC() and MAC_DEBUG_COUNTER_DEC() to maintain
debugging counter values rather than #ifdef'ing the atomic
operations to MAC_DEBUG.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-08-20 19:16:49 +00:00
Warner Losh
c1cccd1ea6 bde made a number of suggested improvements to the code. This commit
represents the pruely stylistic changes and should have no net impact
on the rest of the code.

bde's more substantive changes will follow in a separate commit once
we've come to closure on them.

Submitted by: bde
2003-08-20 19:12:46 +00:00
Warner Losh
45cc9f5f4f Fix an extreme edge case in leap second handling. We need to call
ntp_update_second twice when we have a large step in case that step
goes across a scheduled leap second.  The only way this could happen
would be if we didn't call tc_windup over the end of day on the day of
a leap second, which would only happen if timeouts were delayed for
seconds.  While it is an edge case, it is an important one to get
right for my employer.

Sponsored by: Timing Solutions Corporation
2003-08-20 05:34:27 +00:00
Sam Leffler
c06eb4e293 Change instances of callout_init that specify MPSAFE behaviour to
use CALLOUT_MPSAFE instead of "1" for the second parameter.  This
does not change the behaviour; it just makes the intent more clear.
2003-08-19 17:51:11 +00:00
Poul-Henning Kamp
037c3d0fb0 It is not an error to have no devices in the kernel: Return the
generation number and start it from one instead of zero.
2003-08-17 12:06:19 +00:00
Bosko Milekic
b618bba486 Use constants less throughout the code and instead use the objsize
variable.  This makes changing the size of an mbuf or cluster for
testing/debugging/whatever purposes easier.

Submitted by: sam
2003-08-16 19:48:52 +00:00
Marcel Moolenaar
26502503e5 Further cleanup <machine/cpu.h> and <machine/md_var.h>: move the MI
prototypes of cpu_halt(), cpu_reset() and swi_vm() from md_var.h to
cpu.h. This affects db_command.c and kern_shutdown.c.

ia64: move all MD prototypes from cpu.h to md_var.h. This affects
madt.c, interrupt.c and mp_machdep.c. Remove is_physical_memory().
It's not used (vm_machdep.c).

alpha: the MD prototypes have been left in cpu.h with a comment
that they should be there. Moving them is left for later. It was
expected that the impact would be significant enough to be done in
a seperate commit.

powerpc: MD prototypes left in cpu.h. Comment added.

Suggested by: bde
Tested with: make universe (pc98 incomplete)
2003-08-16 16:57:57 +00:00
Poul-Henning Kamp
78a49a45bc Give timecounters a numeric quality field.
A timecounter will be selected when registered if its quality is
not negative and no less than the current timecounters.

Add a sysctl to report all available timecounters and their qualities.

Give the dummy timecounter a solid negative quality of minus a million.

Give the i8254 zero and the ACPI 1000.

The TSC gets 800, unless APM or SMP forces it negative.

Other timecounters default to zero quality and thereby retain current
selection behaviour.
2003-08-16 08:23:53 +00:00
John Baldwin
70fca4277e - Various style fixes in both code and comments.
- Update some stale comments.
- Sort a couple of includes.
- Only set 'newcpu' in updatepri() if we use it.
- No functional changes.

Obtained from:	bde (via an old diff I got a long time ago)
2003-08-15 21:29:06 +00:00
Marcel Moolenaar
1c843354aa Add or finish support for machine dependent ptrace requests. When we
check for permissions, do it for all requests, not the known requests.
Later when we actually service the request we deal with the invalid
requests we previously caught earlier.

This commit changes the behaviour of the ptrace(2) interface for
boundary cases such as an unknown request without proper permissions.
Previously we would return EINVAL. Now we return EBUSY or EPERM.

Platforms need to define __HAVE_PTRACE_MACHDEP when they have MD
requests. This makes the prototype of cpu_ptrace() visible and
introduces a call to this function for all requests greater or
equal to PT_FIRSTMACH.

Silence on: audit
2003-08-15 05:25:06 +00:00
John-Mark Gurney
fc8684cd46 if we got this far, we definately don't have an EBADF. Return a more
sane result of EPIPE.

Reported by:	nCircle dev team
MFC after:	3 day
2003-08-15 04:31:01 +00:00
Cameron Grant
828447e0ca add a read-only sysctl to display the number of entries in the fixed size
kobj global method table; also kassert that the table has not overflowed
when defining a new method.

there are indications that the table is being overflowed in certain
situations as we gain more kobj consumers- this will allow us to check
whether kobj is at fault.  symptoms would be incorrect methods being called.
2003-08-14 21:16:46 +00:00
Peter Grehan
eac100658a Update powerpc to use the (old thread,new thread) calling convention
for cpu_throw() and cpu_switch().
2003-08-14 03:56:24 +00:00
Alan Cox
77685ea594 - The vm_object pointer in pipe_buffer is unused. Remove it.
- Check for successful initialization of pipe_zone in pipeinit()
   rather than every call to pipe(2).
2003-08-13 20:01:38 +00:00
Warner Losh
06b4bf3e55 Expand inline the relevant parts of src/COPYRIGHT for Matt Dillon's
copyrighted files.

Approved by: Matt Dillon
2003-08-12 23:24:05 +00:00
Maxime Henrion
affd4332fd Remove extra space. 2003-08-12 20:34:31 +00:00
John Baldwin
e9911cf591 - Convert Alpha over to the new calling conventions for cpu_throw() and
cpu_switch() where both the old and new threads are passed in as
  arguments.  Only powerpc uses the old conventions now.
- Update comments in the Alpha swtch.s to reflect KSE changes.

Tested by:	obrien, marcel
2003-08-12 19:33:36 +00:00
Alan Cox
ad8204e3f5 Pipespace() no longer requires Giant. 2003-08-11 22:23:25 +00:00
Alexander Kabaev
660ebf0ef2 Drop Giant in recvit before returning an error to the caller to avoid
leaking the Giant on the syscall exit.
2003-08-11 19:37:11 +00:00
Bruce M Simpson
abd498aa71 Add the mlockall() and munlockall() system calls.
- All those diffs to syscalls.master for each architecture *are*
   necessary. This needed clarification; the stub code generation for
   mlockall() was disabled, which would prevent applications from
   linking to this API (suggested by mux)
 - Giant has been quoshed. It is no longer held by the code, as
   the required locking has been pushed down within vm_map.c.
 - Callers must specify VM_MAP_WIRE_HOLESOK or VM_MAP_WIRE_NOHOLES
   to express their intention explicitly.
 - Inspected at the vmstat, top and vm pager sysctl stats level.
   Paging-in activity is occurring correctly, using a test harness.
 - The RES size for a process may appear to be greater than its SIZE.
   This is believed to be due to mappings of the same shared library
   page being wired twice. Further exploration is needed.
 - Believed to back out of allocations and locks correctly
   (tested with WITNESS, MUTEX_PROFILING, INVARIANTS and DIAGNOSTIC).

PR:             kern/43426, standards/54223
Reviewed by:    jake, alc
Approved by:    jake (mentor)
MFC after:	2 weeks
2003-08-11 07:14:08 +00:00
Mike Silbersack
cebde06978 More pipe changes:
From alc:
Move pageable pipe memory to a seperate kernel submap to avoid awkward
vm map interlocking issues.  (Bad explanation provided by me.)

From me:
Rework pipespace accounting code to handle this new layout, and adjust
our default values to account for the fact that we now have a solid
limit on allocations.

Also, remove the "maxpipes" limit, as it no longer has a purpose.
(The limit on kva usage solves the problem of having two many pipes.)
2003-08-11 05:51:51 +00:00
Alan Cox
f9999c67be Use vm_page_hold() instead of vm_page_wire(). Otherwise, a multithreaded
application could cause a wired page to be freed.  In general,
vm_page_hold() should be preferred for ephemeral kernel mappings of pages
borrowed from a user-level address space.  (vm_page_wire() should really be
reserved for indefinite duration pinning by the "owner" of the page.)

Discussed with:	silby
Submitted by:	tegge
2003-08-11 00:17:44 +00:00
Jacques Vidrine
41b3077a6c panic() if we try to handle an out-of-range signal number in
psignal()/tdsignal().  The test was historically in psignal().  It was
changed into a KASSERT, and then later moved to tdsignal() when the
latter was introduced.

Reviewed by:	iedowse, jhb
2003-08-10 23:05:37 +00:00
Jacques Vidrine
007e25d95a Add or correct range checking of signal numbers in system calls and
ioctls.

In the particular case of ptrace(), this commit more-or-less reverts
revision 1.53 of sys_process.c, which appears to have been erroneous.

Reviewed by:	iedowse, jhb
2003-08-10 23:04:55 +00:00
Alan Cox
c6eb850aac Background: When proc_rwmem() wired and mapped a page, it also added
a reference to the containing object.  The purpose of the reference
being to prevent the destruction of the object and an attempt to free
the wired page.  (Wired pages can't be freed.)  Unfortunately, this
approach does not work.  Some operations, like fork(2) that call
vm_object_split(), can move the wired page to a difference object,
thereby making the reference pointless and opening the possibility
of the wired page being freed.

A solution is to use vm_page_hold() in place of vm_page_wire().  Held
pages can be freed.  They are moved to a special hold queue until the
hold is released.

Submitted by:	tegge
2003-08-09 18:01:19 +00:00
Alan Cox
9c62fce085 - Remove GIANT_REQUIRED from pipespace().
- Remove a duplicate initialization from pipe_create().
2003-08-08 22:38:15 +00:00
Daniel Eischen
ab908f5935 Copyin the thread mailbox flags from the correct location
in the mailbox.
2003-08-08 20:23:10 +00:00
John Baldwin
b2db7dc658 td_dupfd just needs to be less than 0, it does not have to hold the
negative value of the index of the new file, so just use -1.
2003-08-07 17:08:26 +00:00
Jacques Vidrine
01b9dc96e3 Update some argument-documenting comments to match reality.
Add an explicit range check to those same arguments to reduce risk of
cardiac arrest in future code readers.
2003-08-07 16:42:27 +00:00
John Baldwin
8b149b5131 Consistently use the BSD u_int and u_short instead of the SYSV uint and
ushort.  In most of these files, there was a mixture of both styles and
this change just makes them self-consistent.

Requested by:	bde (kern_ktrace.c)
2003-08-07 15:04:27 +00:00
John Baldwin
277576de43 The ktrace mutex does not need to be locked around the post of the ktrace
semaphore and doing so can lead to a possible reversal.  WITNESS would have
caught this if semaphores were used more often in the kernel.

Submitted by:	Ted Unangst <tedu@stanford.edu>, Dawson Engler
2003-08-07 13:58:13 +00:00
Alan Cox
f9b1de367e - Remove GIANT_REQUIRED from pipe_free_kmem().
- Remove the acquisition and release of Giant around pipe_kmem_free() and
   uma_zfree() in pipeclose().
2003-08-07 04:32:40 +00:00
Yaroslav Tykhiy
b81694ed13 If connect(2) has been interrupted by a signal and therefore the
connection is to be established asynchronously, behave as in the
case of non-blocking mode:

- keep the SS_ISCONNECTING bit set thus indicating that
  the connection establishment is in progress, which is the case
  (clearing the bit in this case was just a bug);

- return EALREADY, instead of the confusing and unreasonable
  EADDRINUSE, upon further connect(2) attempts on this socket
  until the connection is established (this also brings our
  connect(2) into accord with IEEE Std 1003.1.)
2003-08-06 14:04:47 +00:00
David Xu
75ea65e3a2 kse.h is not needed for these files. 2003-08-05 12:08:49 +00:00
David Xu
d3b5e418bc Introduce a thread mailbox flag TMF_NOUPCALL. On some architectures other
than i386 or AMD64, TP register points to thread mailbox, and they can not
atomically clear km_curthread in kse mailbox, in this case, thread retrieves
its thread pointer from TP register and sets flag TMF_NOUPCALL in its thread
mailbox to indicate a critical region.
2003-08-05 12:00:55 +00:00
Jeffrey Hsu
cc3426866c Make the second argument to sooptcopyout() constant in order to
simplify the upcoming PIM patches.

Submitted by:   Pavlin Radoslavov <pavlin@icir.org>
2003-08-05 00:27:54 +00:00
Ian Dowse
76bd23557b In the mknod(), mkfifo(), link(), symlink() and undelete() syscalls,
use vrele() instead of vput() on the parent directory vnode returned
by namei() in the case where it is equal to the target vnode. This
handles namei()'s somewhat strange (but documented) behaviour of
not locking either vnode when the two vnodes are equal and LOCKPARENT
but not LOCKLEAF is specified.

Note that since a vnode double-unlock is not currently fatal, these
coding errors were effectively harmless.

Spotted by:	Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
Reviewed by:	mckusick
2003-08-05 00:26:51 +00:00
David Malone
d2cce3d6e8 Do some minor Giant pushdown made possible by copyin, fget, fdrop,
malloc and mbuf allocation all not requiring Giant.

1) ostat, fstat and nfstat don't need Giant until they call fo_stat.
2) accept can copyin the address length without grabbing Giant.
3) sendit doesn't need Giant, so don't bother grabbing it until kern_sendit.
4) move Giant grabbing from each indivitual recv* syscall to recvit.
2003-08-04 21:28:57 +00:00
John Baldwin
bfe6598264 Adjust a comment to remove staleness and take slightly less implementation
specific perspective.
2003-08-04 20:35:13 +00:00
John Baldwin
139b7550d9 Set td_critnest to 1 when setting up a thread since it is a MI field with
MI values.  This ensures that td_critnest for a newly fork'd thread is
always valid.

Requested by:	bde (a long time ago)
2003-08-04 20:28:20 +00:00
John Baldwin
b35b737201 Insert cosmetic spaces.
Reported by:	kris
2003-08-04 19:24:25 +00:00
Robert Watson
60bdc14e90 Move more ACL logic from the UFS code (ufs_acl.c) to the central POSIX.1e
support routines in kern_acl.c:

- Define ACL_OVERRIDE_MASK and ACL_PRESERVE_MASK centrally in acl.h: the
  mode bits that are (and aren't) stored in the ACL.

- Add acl_posix1e_acl_to_mode(): given a POSIX.1e extended ACL, generate
  a compatibility mode (only the bits supported by the POSIX.1e ACL).

- acl_posix1e_newfilemode(): Given a requested creation mode and default
  ACL, calculate the mode for the new file system object (only the bits
  supported by the POSIX.1e ACL).

PR:		50148
Reported by:	Ritz, Bruno <bruno_ritz@gmx.ch>
Obtained from:	TrustedBSD Project
2003-08-04 02:13:05 +00:00
John Baldwin
80c09f69b0 Both 'c' an 'lines' are unused, the bogus init of lines was accidentally
left behind.
2003-08-02 17:35:00 +00:00
Alan Cox
884962ae4e Use kmem_alloc_nofault() rather than kmem_alloc_pageable() in proc_rwmem().
See revision 1.140 of kern/sys_pipe.c for a detailed rationale.

Submitted by:	tegge
2003-08-02 17:08:21 +00:00
Poul-Henning Kamp
4bfd22f25e Grab Giant in bufdonebio() since drivers may not hold it.
This only protects the "struct buf" consumers (ie: DEV_STRATEGY()),
but does not protect BIO_STRATEGY() users.
2003-08-02 09:45:10 +00:00
Poul-Henning Kamp
f7e56e489d Grab Giant in physio() since non-giant drivers are starting to appear. 2003-08-02 09:40:53 +00:00
Alan Cox
105660e8ba Eliminate an abuse of kmem_alloc_pageable() in bufinit()
by using VM_ALLOC_NOOBJ to allocate the bogus page.

Reviewed by:	tegge
2003-08-02 05:05:34 +00:00
Alan Cox
efd02757c2 Use kmem_alloc_nofault() rather than kmem_alloc_pageable() in sf_buf_init().
(See revision 1.140 of kern/sys_pipe.c for a detailed rationale.)

Submitted by:	tegge
2003-08-02 04:18:56 +00:00
David E. O'Brien
05a1bfa142 Fix kernel build -- 'c' was the unused var, not 'lines'. 2003-08-01 17:00:49 +00:00
Robert Watson
19c3e120f0 Attempt to simplify #ifdef logic for MAC_ALWAYS_LABEL_MBUF.
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-08-01 15:45:14 +00:00
Alan Cox
882d8469af Remove Giant from writev(2). Eliminate trivial style differences between
writev(2) and readv(2).
2003-08-01 02:21:54 +00:00
John Baldwin
4110951861 If a spin lock is held for too long and WITNESS is enabled, then call
witness_display_spinlock() to see if we can find out where the current
owner of the spin lock last acquired the lock.
2003-07-31 18:52:18 +00:00
John Baldwin
1beccae67c Add a new function to look for a spinlock's instance when it is held by
another thread.  We use the td_oncpu member of the other field to locate
it's associated CPU and then search the that CPU's list of spin locks
contained in its per-CPU data.  This is not always safe and may in fact
panic or just not work, but it is useful in at least one case.
2003-07-31 18:50:58 +00:00
John Baldwin
3f2a1b0656 Update the 'ps', 'show pci', and 'show ktr' ddb commands to use the new
pager callout instead of homerolling their own paging facility.
2003-07-31 17:29:42 +00:00
Peter Wemm
aeaead20b8 When ktracing context switches, make sure we record involuntary switches.
Otherwise, when we get a evicted from the cpu, there is no record of it.
This is not a default ktrace flag.
2003-07-31 01:36:24 +00:00
David Xu
1fc434dc9a Use correct signal when calling sigexit. 2003-07-30 23:11:37 +00:00
Pierre Beyssac
ae9fcf4c66 Remove test in pipe_write() which causes write(2) to return EAGAIN
on a non-blocking pipe in cases where select(2) returns the file
descriptor as ready for write. This in turns causes libc_r, for
one, to busy wait in such cases.

Note: it is a quick performance fix, a more complex fix might be
required in case this turns out to have unexpected side effects.

Reviewed by:	silby
MFC after:	3 days
2003-07-30 22:50:37 +00:00
John Baldwin
47b722c1af When complaining about a sleeping thread owning a mutex, display the
thread's pid to make debugging easier for people who don't want to have to
use the intended tool for these panics (witness).

Indirectly prodded by:	kris
2003-07-30 20:42:15 +00:00
Alan Cox
93b4c5b707 The introduction of vm object locking has caused witness to reveal
a long-standing mistake in the way a portion of a pipe's KVA is
allocated.  Specifically, kmem_alloc_pageable() is inappropriate
for use in the "direct" case because it allows a preceding vm map entry
and vm object to be extended to support the new KVA allocation.
However, the direct case KVA allocation should not have a backing
vm object.  This is corrected by using kmem_alloc_nofault().

Submitted by:	tegge (with the above explanation by me)
2003-07-30 18:55:04 +00:00
Alan Cox
fbe1bdddcc Revision 1.51 of vm/uma_core.c modified uma_large_free() to acquire Giant
when needed.  So, don't do it here.
2003-07-29 05:23:19 +00:00
Robert Watson
9080ff25cf Rename VOP_RMEXTATTR() to VOP_DELETEEXTATTR() for consistency with the
kernel ACL interfaces and system call names.

Break out UFS2 and FFS extattr delete and list vnode operations from
setextattr and getextattr to deleteextattr and listextattr, which
cleans up the implementations, and makes the results more readable,
and makes the APIs more clear.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-07-28 18:53:29 +00:00
Robert Watson
2e4a71cdb1 When exporting file descriptor data for threads invoking the
kern.file sysctl, don't return information about processes that
fail p_cansee(td, p).  This prevents sockstat and related
programs from seeing file descriptors owned by processes not
in the same jail as the thread, as well as having implications
for MAC, etc.

This is a partial solution: it permits an information leak about
the number of descriptors in the sizing calculation (but this is
not new information, you can also get it from kern.openfiles),
and doesn't attempt to mask file descriptors based on the
properties of the descriptor, only the process referencing it.
However, it provides most of what you want under most
circumstances, without complicating the locking.

PR:	54211
Based on a patch submitted by:	Pawel Jakub Dawidek <nick@garage.freebsd.pl>
2003-07-28 16:03:53 +00:00
Poul-Henning Kamp
cf7742997a Pass the file descriptor index down to vn_open.
If the method vector was replaced and we got the "special return code"
smile and trust that whatever happened below DTRT.
2003-07-27 20:09:13 +00:00
Poul-Henning Kamp
3ab6b09c53 Pass the fdidx argument from vn_open{_cred}() onto VOP_OPEN() 2003-07-27 20:05:36 +00:00
Poul-Henning Kamp
7c89f162bc Add fdidx argument to vn_open() and vn_open_cred() and pass -1 throughout. 2003-07-27 17:04:56 +00:00
Poul-Henning Kamp
1b6c609507 Call the new argument "fdidx" that is more precise than "fd". 2003-07-27 17:03:20 +00:00
David Malone
e41cbeba6d Now that we can call kmem_malloc without Giant it should be safe
to do mbuf allocation without Giant, so remove the GIANT_REQUIRED
from mb_alloc in the M_TRYWAIT case.
2003-07-27 14:19:23 +00:00
Poul-Henning Kamp
a8d43c90af Add a "int fd" argument to VOP_OPEN() which in the future will
contain the filedescriptor number on opens from userland.

The index is used rather than a "struct file *" since it conveys a bit
more information, which may be useful to in particular fdescfs and /dev/fd/*

For now pass -1 all over the place.
2003-07-26 07:32:23 +00:00
Scott Long
c43cad1ac1 Guard against MLEN growing larger than a uint8_t due to MSIZE grwoing to a
value of 512 in LINT.  This keeps gcc from complaining.
2003-07-26 07:23:24 +00:00
Alan Cox
18e8d4e79c revision 1.51 of vm/uma_core.c modified uma_large_malloc() to acquire
Giant when needed.
2003-07-25 22:26:43 +00:00
Mike Makonnen
a6ca48085c The POSIX spec also requires that kern_sigtimedwait return
EINVAL if tv_nsec of the timeout is less than zero.
2003-07-24 17:07:17 +00:00
Peter Wemm
80611144e4 Initialize 'blocked' to NULL. I think this was a real problem, but I
am not sure about that.  The lack of -Werror and the inline noise hid
this for a while.
2003-07-23 20:29:13 +00:00
Poul-Henning Kamp
68f2d20b70 Revert stuff which accidentally ended up in the previous commit. 2003-07-22 10:36:36 +00:00
Poul-Henning Kamp
55d1d7034f Don't attempt to inline large functions mb_alloc() and mb_free(),
it more than doubles the text size of this file.

GCC has wisely ignored us on this previously
2003-07-22 10:24:41 +00:00
David Xu
432b45de08 Always deliver synchronous signal to UTS for SA threads. 2003-07-21 00:26:52 +00:00
Mike Makonnen
6022ec6737 Turn a KASSERT back into an EINVAL return value. So, next time someone
comes across it, it will turn into a core dump in userland instead of
a kernel panic. I had also inverted the sense of the test, so

Double pointy hat to:	mtm
2003-07-19 11:32:48 +00:00
Mike Silbersack
f8bf8e397b Three fixes:
- Make m_prepend use m_gethdr instead of m_get where
  appropriate

- Make m_copym use m_gethdr instead of m_get where
  appropriate

- Add a call to m_fixhdr in m_defrag; m_defrag can't
  deal with corrupted pkthdr.len counts.

MFC after:	3 days
2003-07-19 06:03:48 +00:00
Mike Makonnen
5c6edbec80 Remove a lock held across casuptr() that snuck in last commit. 2003-07-18 21:26:45 +00:00
Mike Makonnen
7df7f5c5ab Move the decision on whether to unset the contested
bit or not from lock to unlock time.

Suggested by:	jhb
2003-07-18 17:58:37 +00:00
Robert Drehmel
4e19fe1081 To avoid a kernel panic provoked by a NULL pointer dereference,
do not clear the `sb_sel' member of the sockbuf structure
while invalidating the receive sockbuf in sorflush(), called
from soshutdown().

The panic was reproduceable from user land by attaching a knote
with EVFILT_READ filters to a socket, disabling further reads
from it using shutdown(2), and then closing it.  knote_remove()
was called to remove all knotes from the socket file descriptor
by detaching each using its associated filterops' detach call-
back function, sordetach() in this case, which tried to remove
itself from the invalidated sockbuf's klist (sb_sel.si_note).

PR:	kern/54331
2003-07-17 23:49:10 +00:00
David Xu
3074d1b454 Fix sigwait to conform to POSIX.
When a signal is being delivered to process, first find a sigwait
thread to deliver, POSIX's argument is speed of delivering signal
to sigwait thread is faster than other ways. A signal in its wait
set will cause sigwait to return the signal number, a signal not
in its wait set but in not blocked by the thread also causes sigwait
to return, but sigwait returns EINTR, sigwait is oneshot operation,
only one signal can be delivered to its wait set, when a signal is
delivered to the sigwait thread, the thread's sigwait state is canceled.
2003-07-17 22:52:55 +00:00
David Xu
dd7da9aa28 o Refine kse_thr_interrupt to allow it to handle different commands.
o Remove TDF_NOSIGPOST.
o Add a member td_waitset to proc structure, it will be used for sigwait.

Tested by: deischen
2003-07-17 22:45:33 +00:00
Robert Drehmel
e76bad968c Correct six return statements which returned zero instead of
an appropriate error number after a failure condition.

In particular, three of the changed statements return ESRCH for a
failed pfind(), and in also three places a non-zero return
from p_cansee() will be passed back,

Also noticed by:	rwatson
2003-07-17 22:44:41 +00:00
Mike Makonnen
994599d782 Fix umtx locking, for libthr, in the kernel.
1. There was a race condition between a thread unlocking
   a umtx and the thread contesting it. If the unlocking
   thread won the race it may try to wakeup a thread that
   was not yet in msleep(). The contesting thread would then
   go to sleep to await a wakeup that would never come. It's
   not possible to close the race by using a lock because
   calls to casuptr() may have to fault a page in from swap.
   Instead, the race was closed by introducing a flag that
   the unlocking thread will set when waking up a thread.
   The contesting thread will check for this flag before
   going to sleep. For now the flag is kept in td_flags,
   but it may be better to use some other member or create
   a new one because of the possible performance/contention
   issues of having to own sched_lock. Thanks to jhb for
   pointing me in the right direction on this one.

2. Once a umtx was contested all future locks and unlocks
   were happening in the kernel, regardless of whether it
   was contested or not. To prevent this from happening,
   when a thread locks a umtx it checks the queue for that
   umtx and unsets the contested bit if there are no other
   threads waiting on it. Again, this is slightly more
   complicated than it needs to be because we can't hold
   a lock across casuptr(). So, the thread has to check
   the queue again after unseting the bit, and reset the
   contested bit if it finds that another thread has put
   itself on the queue in the mean time.

3. Remove the if... block for unlocking an uncontested
   umtx, and replace it with a KASSERT. The _only_ time
   a thread should be unlocking a umtx in the kernel is
   if it is contested.
2003-07-17 11:06:40 +00:00
Bosko Milekic
48719ca7c8 Change the style of the english used to print accounting enabled
and disabled.  This means no period at the end and changing
"Process accounting <foo>" to "Accounting <foo>".

Pointed out by: bde
2003-07-16 13:20:10 +00:00
Bosko Milekic
d2dbf5bc0b Log process accounting activation/deactivation.
Useful for some auditing purposes.

Submitted by: Christian S.J. Peron <maneo@bsdpro.com>
PR: kern/54529
2003-07-16 03:59:50 +00:00
Don Lewis
6ff1481d5c Rearrange the SYSINIT order to call lockmgr_init() earlier so that
the runtime lockmgr initialization code in lockinit() can be eliminated.

Reviewed by:	jhb
2003-07-16 01:00:39 +00:00
David Xu
af161f2232 If initial thread is still a bound thread, don't change its signal mask. 2003-07-15 14:04:38 +00:00
Hartmut Brandt
7e9024cdd9 Add a facility for devices, specifically network interfaces, that require
large to huge amounts of small or medium sized receive buffers. The problem
with these situations is that they eat up the available DMA address space
very quickly when using mbufs or even mbuf clusters. Additionally this
facility provides a direct mapping between 32-bit integers and these buffers.
This is needed for devices originally designed for 32-bit systems. Ususally
the virtual address of the buffer is used as a handle to find the buffer as
soon as it is returned by the card. This does not work for 64-bit machines
and hence this mapping is needed.
2003-07-15 08:59:38 +00:00
David Xu
4b7d5d84ee Rename thread_siginfo to cpu_thread_siginfo 2003-07-15 04:26:26 +00:00
Jeffrey Hsu
330841c763 Rev 1.121 meant to pass the value 1 to soalloc() to indicate waitok.
Reported by:	arr
2003-07-14 20:39:22 +00:00
Don Lewis
857d9c60d0 Extend the mutex pool implementation to permit the creation and use of
multiple mutex pools with different options and sizes.  Mutex pools can
be created with either the default sleep mutexes or with spin mutexes.
A dynamically created mutex pool can now be destroyed if it is no longer
needed.

Create two pools by default, one that matches the existing pool that
uses the MTX_NOWITNESS option that should be used for building higher
level locks, and a new pool with witness checking enabled.

Modify the users of the existing mutex pool to use the appropriate pool
in the new implementation.

Reviewed by:	jhb
2003-07-13 01:22:21 +00:00
Robert Drehmel
baf731e6ed Make the system call vector name of a process accessible to user
land applications by introducing the KERN_PROC_SV_NAME sysctl node,
which is searchable by PID.
2003-07-12 02:00:16 +00:00
David Xu
ffb2e92a98 If a thread is sending signal to its process, if the thread can handle
the signal itself, it should get it without looking for other threads.
2003-07-11 13:42:23 +00:00
Mike Silbersack
347194c172 Add init_param3() to subr_param. This function is called
immediately after the kernel map has been sized, and is
the optimal place for the autosizing of memory allocations
which occur within the kernel map to occur.

Suggested by:	bde
2003-07-11 00:01:03 +00:00
Peter Wemm
e95babf3a8 unifdef -DLAZY_SWITCH and start to tidy up the associated glue. 2003-07-10 01:02:59 +00:00
Mike Silbersack
ff56f15e26 A few minor changes:
- Use atomic ops to update the bigpipe count
- Make the bigpipe count sysctl readable
- Remove a duplicate comparison in an if statement
- Comment two SYSCTLs.
2003-07-09 21:59:48 +00:00
Mike Silbersack
41f16f8208 Pull in the entire kmem_map size calculation from kern_malloc, rather
than the shortcircuited version I had been using, which only worked
properly on i386 & amd64.

Also, change an autoscale constant to account for the more correct
kmem_map size.

Problem noticed by:     mux
2003-07-08 18:59:21 +00:00
Jeff Roberson
0c0a98b231 - When stealing a kse in kseq_move() ignore the current kseq's min nice
value.  We want to steal any thread, even one that is not given a slice
   on its current queue.
2003-07-08 06:19:40 +00:00
Mike Silbersack
289016f2d1 Put some concrete limits on pipe memory consumption:
- Limit the total number of pipes so that we do not
  exhaust all vm objects in the kernel map.  When
  this limit is reached, a ratelimited message will
  be printed to the console.

- Put a soft limit on the amount of memory consumable
  by pipes.  Once the limit has been reached, all new
  pipes will be limited to 4K in size, rather than the
  default of 16K.

- Put a limit on the number of pages that may be used
  for high speed page flipping in order to reduce the
  amount of wired memory.  Pipe writes that occur
  while this limit is exceeded will fall back to
  non-page flipping mode.

The above values are auto-tuned in subr_param.c and
are scaled to take into account both the size of
physical memory and the size of the kernel map.

These limits help to reduce the "kernel resources exhausted"
panics that could be caused by opening a large
number of pipes.  (Pipes alone are no longer able
to exhaust all resources, but other kernel memory hogs
in league with pipes may still be able to do so.)

PR:			53627
Ideas / comments from:	hsu, tjr, dillon@apollo.backplane.com
MFC after:		1 week
2003-07-08 04:02:31 +00:00
Jeff Roberson
0ec896fd28 - Clean up an unused variable.
Submitted by:	Steve Kargl <skg@routmask.apl.washington.edu>
2003-07-07 21:08:28 +00:00
Mike Makonnen
14b5ae1a98 Make the conditional, which decides what siglist to put a signal on,
more concise and improve the comment.

Submitted by: bde
2003-07-05 08:37:40 +00:00
Mike Makonnen
e55c35c433 I was so happy I found the semi-colon from hell that I didn't
notice another typo in the same line. This typo makes libthr unuseable,
but it's effects where counter-balanced by the extra semicolon, which
made libthr remarkably useable for the past several months.
2003-07-04 23:28:42 +00:00
Jeff Roberson
749d01b011 - Parse the cpu topology map in sched_setup().
- Associate logical CPUs on the same physical core with the same kseq.
 - Adjust code that assumed there would only be one running thread in any
   kseq.
 - Wrap the HTT code with a ULE_HTT_EXPERIMENTAL ifdef.  This is a start
   towards HyperThreading support but it isn't quite there yet.
2003-07-04 19:59:00 +00:00
Poul-Henning Kamp
1226914c17 Use the f_vnode field to tell which file descriptors have a vnode. 2003-07-04 12:20:27 +00:00
Mike Makonnen
1069e3a6f4 It's unfair how one extraneous semi-colon can cause so much grief. 2003-07-04 11:18:07 +00:00
Mike Makonnen
71cfaac0b0 style(9)
o Remove double-spacing, and while I'm here add a couple
  of braces as well.

Requested by:	bde
2003-07-04 06:59:28 +00:00
Olivier Houchard
a10d5f02c8 In setpgrp(), don't assume a pgrp won't exist if the provided pgid is the same
as the target process' pid, it may exist if the process forked before leaving
the pgrp.
Thix fixes a panic that happens when calling setpgid to make a process
re-enter the pgrp with the same pgid as its pid if the pgrp still exists.
2003-07-04 02:21:28 +00:00
Mike Makonnen
8689793bfb kse_thr_interrupt should target the thread, specifically.
Requested by:	davidxu
2003-07-04 01:41:32 +00:00
Mike Makonnen
c197abc49a Signals sent specifically to a particular thread must
be delivered to that thread, regardless of whether it
has it masked or not.

Previously, if the targeted thread had the signal masked,
it would be put on the processes' siglist. If
another thread has the signal umasked or unmasks it before
the target, then the thread it was intended for would never
receive it.

This patch attempts to solve the problem by requiring callers
of tdsignal() to say whether the signal is for the thread or
for the process. If it is for the process, then normal processing
occurs and any thread that has it unmasked can receive it.
But if it is destined for a specific thread, it is put on
that thread's pending list regardless of whether it is currently
masked or not.

The new behaviour still needs more work, though.  If the signal
is reposted for some reason it is always posted back to the
thread that handled it because the information regarding the
target of the signal has been lost by then.

Reviewed by:	jdp, jeff, bde (style)
2003-07-03 19:09:59 +00:00
John Baldwin
f7ee15901a - Add comments about the maintenance of the per-thread list of contested
locks held by each thread.
- Fix a bug in the original BSD/OS code where a contested lock was not
  properly handed off from the old thread to the new thread when a
  contested lock with more than one blocked thread was transferred from
  one thread to another.
- Don't use an atomic operation to write the MTX_CONTESTED value to
  mtx_lock in the aforementioned special case.  The memory barriers and
  exclusion provided by sched_lock are sufficient.

Spotted by:	alc (2)
2003-07-02 16:14:09 +00:00
John Baldwin
6591b31040 Add a resource_disabled() helper function that returns true (non-zero) if
a specified resource has been disabled via a non-zero 'disabled' hint and
false otherwise.
2003-07-02 16:01:38 +00:00
Poul-Henning Kamp
d94e36521e typo fix in comment. 2003-07-02 08:01:52 +00:00
David Xu
34178711be Allow SA process unblocks a thread blocked in condition variable.
Reviewed by: deischen
2003-07-02 01:19:15 +00:00
Ian Dowse
318f2fb4bf Add a new mount flag MNT_BYFSID that can be used to unmount a file
system by specifying the file system ID instead of a path. Use this
by default in umount(8). This avoids the need to perform any vnode
operations to look up the mount point, so it makes it possible to
unmount a file system whose root vnode cannot be looked up (e.g.
due to a dead NFS server, or a file system that has become detached
from the hierarchy because an underlying file system was unmounted).
It also provides an unambiguous way to specify which file system is
to be unmunted.

Since the ability to unmount using a path name is retained only for
compatibility, that case now just uses a simple string comparison
of the supplied path against f_mntonname of each mounted file system.

Discussed on:	freebsd-arch
mdoc help from:	ru
2003-07-01 17:40:23 +00:00
Scott Long
79501b66a7 Make swi_vm be INTR_MPSAFE. On all platforms, it is only used to activate
busdma_swi().  Now that busdma_swi() uses driver-provided locking, this
should be safe.
2003-07-01 16:00:38 +00:00
David Xu
df9c6cda37 Fix typo. 2003-06-30 10:04:04 +00:00
Marcel Moolenaar
4e4422d4d4 Don't use fuword() and suword() on struct members of type int. This
happens to work on 32-bit platforms as sizeof(long)=sizeof(int), but
wrecks all kinds of havoc (garbage reads, corrupting writes and
misaligned loads/stores) on 64-bit architectures.
The fix for now is to use fuword32() and suword32() and change the
type of the applicable int fields to int32. This is to make it
explicit that we depend on these fields being 32-bit. We may want
to revisit this later.

Reviewed by: deischen
2003-06-28 19:45:15 +00:00
Jeff Roberson
7a20304f84 - Don't migrate to stopped cpus. 2003-06-28 09:09:33 +00:00
David Xu
9dde3bc999 o Change kse_thr_interrupt to allow send a signal to a specified thread,
or unblock a thread in kernel, and allow UTS to specify whether syscall
  should be restarted.
o Add ability for UTS to monitor signal comes in and removed from process,
  the flag PS_SIGEVENT is used to indicate the events.
o Add a KMF_WAITSIGEVENT for KSE mailbox flag, UTS call kse_release with
  this flag set to wait for above signal event.
o For SA based thread, kernel masks all signal in its signal mask, let
  UTS to use kse_thr_interrupt interrupt a thread, and install a signal
  frame in userland for the thread.
o Add a tm_syncsig in thread mailbox, when a hardware trap occurs,
  it is used to deliver synchronous signal to userland, and upcall
  is schedule, so UTS can process the synchronous signal for the thread.

Reviewed by: julian (mentor)
2003-06-28 08:29:05 +00:00
Jeff Roberson
86f8ae9663 - If smp is not started yet don't try to load balance or we'll put threads
on cpus that aren't running yet.
2003-06-28 08:24:42 +00:00
David Xu
418228df24 Fix POSIX compatible bug for sigwaitinfo and sigtimedwait.
POSIX says siginfo pointer parameter can be NULL and if the
function success, it should return signal number but not zero.
The waitset it past should be negatived before it can be
used as thread signal mask.
2003-06-28 08:03:28 +00:00
Jeff Roberson
a91172ade1 - Throttle the inherited sleep and run time in sched_fork_kseg(). This
allows us to learn the behavior of a thread much more quickly after it
   starts up.
2003-06-28 06:19:56 +00:00
Jeff Roberson
e493a5d90c - Adjust the default maximum slice value to ~140ms. This has improved the
nice distribution without significantly impacting interactive response.
   As a side effect it should also allow batch processes to run for a
   slightly longer period which will positively impact their performance.
2003-06-28 06:04:47 +00:00
Peter Wemm
eabd19726f Tidy up leftover lazy_switch instrumentation that is no longer needed.
This cleans up some #ifdef hell.
2003-06-27 22:39:14 +00:00
Sean Kelly
6cda41555b Fix this to build on alpha. Build test successful.
Suggested fix from:	tjr
2003-06-27 08:35:05 +00:00
Sean Kelly
370c3cb57c - Add a software watchdog facility.
This commit has two pieces. One half is the watchdog kernel code which lives
primarily in hardclock() in sys/kern/kern_clock.c. The other half is a userland
daemon which, when run, will keep the watchdog from firing while the userland
is intact and functioning.

Approved by:	jeff (mentor)
2003-06-26 09:50:52 +00:00
Warner Losh
4f2073fb4c Fix leap second processing by the kernel time keeping routines.
Before, we would add/subtract the leap second when the system had been
up for an even multiple of days, rather than at the end of the day, as
a leap second is defined (at least wrt ntp).  We do this by
calculating the notion of UTC earlier in the loop, and passing that to
get it adjusted.  Any adjustments that ntp_update_second makes to this
time are then transferred to boot time.  We can't pass it either the
boot time or the uptime because their sum is what determines when a
leap second is needed.  This code adds an extra assignment and two
extra compare in the typical case, which is as cheap as I could made
it.

I have confirmed with this code the kernel time does the correct thing
for both positive and negative leap seconds.  Since the ntp interface
doesn't allow for +2 or -2, those cases can't be tested (and the folks
in the know here say there will never be a +2s or -2s leap event, but
rather two +1s or -1s leap events).

There will very likely be no leap seconds for a while, given how the
earth is speeding up and slowing down, so there will be plenty of time
for this fix to propigate.  UT1-UTC is currently at "about -0.4s" and
decrementing by .1s every 8 months or so.  6 * 8 is 48 months, or 4
years.

-stable has different code, but a similar bug that was introduced
about the time of the last leap second, which is why nobody has
noticed until now.

MFC After: 3 weeks
Reviewed by: phk

"Furthermore, leap seconds must die." -- Cato the Elder
2003-06-25 21:23:51 +00:00
Warner Losh
eac3c62b51 During a positive leap second, the tai_time offset should be
incremented at the start of the leap second, not after the leap second
has been inserted.  This is because at the start of the leap second,
we set the time back one second.  This setting back one second is the
moment that the offset changes.  The old code set it back after the
leap second, but that's one second too late.  The negative leap second
case is handled correctly.

Reviewed by: phk
2003-06-25 20:56:40 +00:00
Olivier Houchard
7f3bfd6651 At this point targp will always be NULL, so remove the useless if. 2003-06-25 13:28:32 +00:00
Warner Losh
4e82e5f6f1 Use UTC rather than GMT to describe time scale. latter is obsolete. 2003-06-23 20:14:08 +00:00
Robert Watson
f51e58036e Redesign the externalization APIs from the MAC Framework to
the MAC policy modules to improve robustness against C string
bugs and vulnerabilities.  Following these revisions, all
string construction of labels for export to userspace (or
elsewhere) is performed using the sbuf API, which prevents
the consumer from having to perform laborious and intricate
pointer and buffer checks.  This substantially simplifies
the externalization logic, both at the MAC Framework level,
and in individual policies; this becomes especially useful
when policies export more complex label data, such as with
compartments in Biba and MLS.

Bundled in here are some other minor fixes associated with
externalization: including avoiding malloc while holding the
process mutex in mac_lomac, and hence avoid a failure mode
when printing labels during a downgrade operation due to
the removal of the M_NOWAIT case.

This has been running in the MAC development tree for about
three weeks without problems.

Obtained from:	TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
2003-06-23 01:26:34 +00:00
Robert Watson
6b42f0a2eb Prefer the vop_rmextattr() vnode operation for removing extended
attributes from objects over vop_setextattr() with a NULL uio; if
the file system doesn't support the vop_rmextattr() method, fall
back to the vop_setextattr() method.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-06-22 23:03:07 +00:00
Robert Watson
77533ed2aa Expose vop_rmextattr as an explicit operation at the vnode operation
interface, rather than relying on a NULL uio for the deletion
operation.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-06-22 22:45:24 +00:00
Robert Watson
4b090e41ff Add an explicit credential argument to alq_open() to allow the caller to
specify what credential to use when authorizing vn_open() and later
write operations, rather than curthread->td_ucred.

When writing KTR traces to an ALQ, specify the credential of the thread
generating the sysctl request.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-06-22 22:28:56 +00:00
Poul-Henning Kamp
3b6d965263 Add a f_vnode field to struct file.
Several of the subtypes have an associated vnode which is used for
stuff like the f*() functions.

By giving the vnode a speparate field, a number of checks for the specific
subtype can be replaced simply with a check for f_vnode != NULL, and
we can later free f_data up to subtype specific use.

At this point in time, f_data still points to the vnode, so any code I
might have overlooked will still work.
2003-06-22 08:41:43 +00:00
Ian Dowse
adef9265ef When DDB is active, always send printf() output directly to the
console, even if there is a TIOCCONS console tty. We were already
doing this after a panic, but it's also useful when entering DDB
for some other reason too.
2003-06-22 03:20:24 +00:00
Ian Dowse
d29bf12ff8 Use a new message buffer `consmsgbuf' to forward messages to a
TIOCCONS console (e.g. xconsole) via a timeout routine instead of
calling into the tty code directly from printf(). This fixes a
number of cases where calling printf() at the wrong time (such as
with locks held) would cause a panic if xconsole is running.

The TIOCCONS message buffer is 8k in size by default, but this can
be changed with the kern.consmsgbuf_size sysctl. By default, messages
are checked for 5 times per second. The timer runs and the buffer
memory remains allocated only at times when a TIOCCONS console is
active.

Discussed on:	freebsd-arch
2003-06-22 02:54:33 +00:00
Ian Dowse
4784a46912 Replace the code for reading and writing the kernel message buffer
with a new implementation that has a mostly reentrant "addchar"
routine, supports multiple message buffers in the kernel, and hides
the implementation details from callers.

The new code uses a kind of sequence number to represend the current
read and write positions in the buffer. This approach (suggested
mainly by bde) permits the read and write pointers to be maintained
separately, which reduces the number of atomic operations that are
required. The "mostly reentrant" above refers to the way that while
it is now always safe to have any number of concurrent writers,
readers could see the message buffer after a writer has advanced
the pointers but before it has witten the new character.

Discussed on:	freebsd-arch
2003-06-22 02:18:31 +00:00
Jeff Roberson
1a7a9d0ec2 - lticks was erroneously being updated in sched_pctcpu(). This was causing
us to skip the pctcpu_update() call which lead to inaccurate cpu usage
   statistics for processes that didn't run often.
2003-06-21 02:31:49 +00:00
Jeff Roberson
665cb285a8 - Don't allow nice to have such a large effect on priority. This was
causing poor interactive performance while unnice processes were running.
   The new scheme still allows nice to have an effect on priority but it is
   not as dramatic as the effect of the interactivity score.
2003-06-21 02:22:47 +00:00
Bosko Milekic
b2b417bb41 Fix a divide-by-zero on kern.log_wakeups_per_second tunable.
Submitted by: Christian S.J. Peron <maneo@bsdpro.com>
PR: kern/53557
2003-06-20 22:18:38 +00:00
Stefan Eßer
c2ef4dd48a Add comment about **vpp being special-cased in vnode_if.awk (1.38) 2003-06-20 12:24:06 +00:00
David Xu
ab78d4d641 cpu_set_upcall_kse needs to access userspace, release schedule lock
before calling it for bound thread. To avoid this problem, change
thread_schedule_upcall to not put new thread on run queue, let caller
do it, so we can tweak the new thread before setting it to run.

Reported by: pho
2003-06-20 09:12:12 +00:00
Poul-Henning Kamp
166400b7e6 Don't put callout_lock under #ifdef DIAGNOSTIC despite the fact that it
works anyway.
2003-06-20 08:39:04 +00:00
Poul-Henning Kamp
568733688b Initialize b_saveaddr when we hand out buffers 2003-06-20 08:26:38 +00:00
Poul-Henning Kamp
ce6912c420 Crude but efficient:
#ifdef DIAGNOSTIC hold a mutex while calling callout's so that we hear
about it if they sleep.
2003-06-20 08:07:15 +00:00
Poul-Henning Kamp
eaaca5deee Don't (re)initialize f_gcflag to zero.
Move initialization of DTYPE_VNODE specific field f_seqcount into
the DTYPE_VNODE specific code.
2003-06-20 08:02:30 +00:00
David Xu
062cf543fc When a STOP signal is being sent to a process, it is possible all
threads in the process have already masked the signal, so job control
is delayed. But later a thread unmasking the STOP signal should enable
job control, so in issignal(), scanning all threads in process to see
if we can direct suspend some of them, not just suspend current thread.
2003-06-20 03:36:45 +00:00
David Xu
8b56079e2b Fix typo. td should be td0. 2003-06-20 01:56:28 +00:00
Alfred Perlstein
bab88630ba Unlock the struct file lock before aquiring Giant, otherwise
we can deadlock because of lock order reversals.  This was not
caught because Witness ignores pool mutexes right now.

Diagnosis and help: truckman
Noticed by: pho
2003-06-19 18:13:07 +00:00
Mike Silbersack
b083ea5114 Add a ratelimited message of the form
"maxproc limit exceeded by uid %i, please see tuning(7) and login.conf(5)."

Which will be triggered whenever a user hits his/her maxproc limit or
the systemwide maxproc limit is reached.

MFC after:	1 week
2003-06-19 05:57:25 +00:00
Don Lewis
6084b6c9d5 FILE_LOCK() uses a pool mutex, as does the vnode v_vnlock. Since pool
mutexes are supposed to only be used as leaf mutexes, and what appear
to be separate pool mutexes could be aliased together, it is bad idea
for a thread to attempt to hold two pool mutexes at the same time.

Slightly rearrange the code in kern_open() so that FILE_UNLOCK() is
called before calling VOP_GETVOBJECT(), which will grab the v_vnlock
mutex.
2003-06-19 04:10:56 +00:00
Mike Silbersack
4d7dfc31b8 Add a rate limited message reporting when kern.maxfiles is exceeded,
reporting who did it.

Also, fix a style bug introduced in the previous change.

MFC after:	1 week
2003-06-19 04:07:12 +00:00
Don Lewis
8d5f9131fc VOP_GETVOBJECT() wants to be called with the vnode lock held. 2003-06-19 03:55:01 +00:00
Poul-Henning Kamp
2db4b023bb Introduce a new flag on a file descriptor: DFLAG_SEEKABLE and use that
rather than assume that only DTYPE_VNODE is seekable.
2003-06-18 19:53:59 +00:00
Mike Silbersack
438f085b2f Reserve the last 5% of file descriptors for root use. This should allow
systems to fail more gracefully when a file descriptor exhaustion situation
occurs.

Original patch by:	David G. Andersen <dga@lcs.mit.edu>
PR:			45353
MFC after:		1 week
2003-06-18 18:57:58 +00:00
Poul-Henning Kamp
7c2d2efd58 Initialize struct fileops with C99 sparse initialization. 2003-06-18 18:16:40 +00:00
Jeff Roberson
d07ac847ef - Use a more robust mechanism for determining whether or not a kse is on a
kseq.
2003-06-17 19:49:18 +00:00
Scott Long
04d2f20f6b Drop the proc lock around SYSCTL_OUT in the no-threads case.
Submitted by:	truckman
2003-06-17 19:14:00 +00:00
Jeff Roberson
7cd0f83355 - Temporarily patch a problem where the interact score could be negative
because the run time exceeds the largest value a signed int can hold.
   The real solution involves calculating how far we are over the limit.
   To quickly solve this problem we loop removing 1/5th of the current value
   until it falls below the limit.  The common case requires no passes.
2003-06-17 10:21:34 +00:00
Jeff Roberson
4b60e3242e - Add a new function "sched_interact_update()" that scales back the sleep
and run time.
 - Scale the sleep and run time back via sched_interact_update() in more
   places.  This is to keep the statistic more accurate.
 - Charge a parent one tick for forking a child.
 - Add only the run time and not the sleep time to the parents kg when a
   thread exits.  This allows us to give a penalty for having an expensive
   thread exit but does not give a bonus for having an interactive thread
   exit.
 - Change the SLP_RUN_THROTTLE to limit us to 4/5th and not 1/2.
 - Change the SLP_RUN_MAX to two seconds.  This keeps bursty interactive
   applications like mozilla and openoffice in the interactive range even
   through expensive tasks.
 - Recalculate the slice after every sleep.  This ensures that once a task
   has been marked interactive it only has a slice of 1 at the risk of
   giving tasks that sleep for a very brief period a longer time slice.
2003-06-17 06:39:51 +00:00
Mike Silbersack
51710a4597 Hide the m_defrag* statistics under MBUF_STRESS_TEST, there seems
to be no need to see them in the general case (and they aren't
smp-safe anyway.)

Suggested by:	hmp
MFC after:	1 week
2003-06-17 02:34:40 +00:00
David Xu
4184d79115 Forgot to commit code to disable creating a bound thread in same
group again except first kse_create syscall.

Noticed by: julian
2003-06-16 23:46:41 +00:00
David Xu
075102cc4e Reset ncpus to 1 for bound thread group since there is only one
thread in such group.
Change message text from kse_rel to kserel, it is better displayed
in top.
2003-06-16 13:14:52 +00:00
Poul-Henning Kamp
e725c18c3a Get rid of the b_spc specialty field in struct buf by using an already
available caller private field.
2003-06-16 07:18:39 +00:00
Poul-Henning Kamp
2a0f8aeb52 I have not had any reports of trouble for a long time, so remove the
gentle versions of the vop_strategy()/vop_specstrategy() mismatch methods
and use vop_panic() instead.
2003-06-15 19:49:14 +00:00
Robert Watson
2bceb0f2b2 Various cr*() calls believed to be MPSAFE, since the uidinfo
code is locked down.
2003-06-15 15:57:42 +00:00
David Xu
cd4f6ebb13 1. Add code to support bound thread. when blocked, a bound thread never
schedules an upcall. Signal delivering to a bound thread is same as
   non-threaded process. This is intended to be used by libpthread to
   implement PTHREAD_SCOPE_SYSTEM thread.
2. Simplify kse_release() a bit, remove sleep loop.
2003-06-15 12:51:26 +00:00
Ian Dowse
4f1b457770 Don't overwrite the static panicstr buffer for secondary and further
panics. Before revision 1.38, we used to just point panicstr at the
format string if panicstr was NULL, but since we now use a static
buffer for the formatted panic message, we have to be careful to
only write to it during the first panic.

Pointed out by:	bde
2003-06-15 11:43:00 +00:00
Jeff Roberson
3c12473229 - Increase the ksegrp's cpu time history buffer to 250ms.
- Decrease the history buffer divisor to 2 so that we remember more of the
   old behavior.
2003-06-15 04:14:25 +00:00
David Xu
1d5a24bec6 1. Migrate TDF_UPCALLING from td_flags to td_pflags.
2. Add a flag TDF_SA, it will be used to distinguish SA
   based thread from bound thread.
2003-06-15 03:18:58 +00:00
Jeff Roberson
b41f3d22cc - Cap the growth of sleep and run time in sched_exit_kse(). 2003-06-15 02:52:29 +00:00
Jeff Roberson
210491d3d9 - Fix the maximum slice value. I accidentally checked in a value of '2'
which meant no process would run for longer than 20ms.
 - Slightly redo the interactivity scorer.  It follows the same algorithm but
   in a slightly more correct way.  Previously values above half were
   incorrect.
 - Lower the interactivity threshold to 20.  It seems that in testing non-
   interactive tasks are hardly ever near there and expensive interactive
   tasks can sometimes surpass it.  This area needs more testing.
 - Remove an unnecessary KTR.
 - Fix a case where an idle thread that had an elevated priority due to
   priority prop. would be placed back on the idle queue.
 - Delay setting NEEDRESCHED until userret() for threads that haad their
   priority elevated while in kernel.  This gives us the same context switch
   optimization as SCHED_4BSD.
 - Limit the child's slice to 1 in sched_fork_kse() so we detect its behavior
   more quickly.
 - Inhert some of the run/slp time from the child in sched_exit_ksegrp().
 - Redo some of the priority comparisons so they are more clear.
 - Throttle the frequency of sched_pctcpu_update() so that rounding errors
   do not make it invalid.
2003-06-15 02:18:29 +00:00
David Xu
0e2a4d3aeb Rename P_THREADED to P_SA. P_SA means a process is using scheduler
activations.
2003-06-15 00:31:24 +00:00
Alan Cox
49a2507bd1 Migrate the thread stack management functions from the machine-dependent
to the machine-independent parts of the VM.  At the same time, this
introduces vm object locking for the non-i386 platforms.

Two details:

1. KSTACK_GUARD has been removed in favor of KSTACK_GUARD_PAGES.  The
different machine-dependent implementations used various combinations
of KSTACK_GUARD and KSTACK_GUARD_PAGES.  To disable guard page, set
KSTACK_GUARD_PAGES to 0.

2. Remove the (unnecessary) clearing of PG_ZERO in vm_thread_new.  In
5.x, (but not 4.x,) PG_ZERO can only be set if VM_ALLOC_ZERO is passed
to vm_page_alloc() or vm_page_grab().
2003-06-14 23:23:55 +00:00
Alan Cox
89f4fca265 Move the *_new_altkstack() and *_dispose_altkstack() functions out of the
various pmap implementations into the machine-independent vm.  They were
all identical.
2003-06-14 06:20:25 +00:00
Maxime Henrion
fca737117a Style(9). 2003-06-13 19:39:21 +00:00
Dag-Erling Smørgrav
c2935410f6 Make the VFS cache use zones instead of malloc(9). This results in a
small but noticeable increase in performance for name lookup operations.

The code uses two zones, one for short names (less than 32 characters)
and one for long names (up to NAME_MAX).  Since most file names are
fairly short, this saves a considerable amount of space that would
otherwise be wasted if we always allocated NAME_MAX bytes.  The cutoff
value of 32 characters was picked arbitrarily and may benefit from some
tweaking; it could also be made into a tunable.

Submitted by:	hmp
2003-06-13 08:46:13 +00:00
Alan Cox
8630c1173e Add vm object locking to various pagers' "get pages" methods, i386 stack
management functions, and a u area management function.
2003-06-13 03:02:28 +00:00
Poul-Henning Kamp
7652131bee Initialize struct vfsops C99-sparsely.
Submitted by:   hmp
Reviewed by:	phk
2003-06-12 20:48:38 +00:00
Dag-Erling Smørgrav
633f506489 Document some sysctl variables.
Submitted by:	hmp
2003-06-12 19:46:51 +00:00
Scott Long
30c6f34e00 Add support to sysctl_kern_proc to return all threads in a proc, not just the
first one.  The old behaviour can be switched by specifying KERN_PROC_PROC.

Submitted by: julian, tweaks and added functionality by myself
2003-06-12 16:41:50 +00:00
Alan Cox
c10c537816 Finish the vm object locking in sendfile(2). More generally,
the vm locking in sendfile(2) is complete.
2003-06-12 05:52:09 +00:00
Alan Cox
2ab3670aad Lock the vm object when removing a page. 2003-06-11 21:23:04 +00:00
Alan Cox
f717a9d063 Lock the vm object when removing a page. 2003-06-11 16:37:33 +00:00
Dag-Erling Smørgrav
ffe92432e3 Whitespace cleanup. 2003-06-11 07:35:56 +00:00
Alan Cox
c40f7377a4 Add vm object locking. 2003-06-11 06:43:48 +00:00
David E. O'Brien
f4636c5959 Use __FBSDID(). 2003-06-11 06:34:30 +00:00
Paul Saab
1795d0cdec Don't overflow when calculating vm_kmem_size. This fixes kmem_map
too small panics on PAE machines which have odd > 4GB sizes (4.5 gig
would render a 20MB of KVA for kmem_map instead of 200MB).

Submitted by:	John Cagle <john.cagle@hp.com>, jeff
Reviewed by:	jeff, peter, scottl, lots of USENIX folks
2003-06-11 05:18:59 +00:00
David Xu
7677ce18b8 Fix error in my last commit. Correctly maintain p_maxthrwaits and unlock
sched_lock.
2003-06-11 01:08:33 +00:00
David E. O'Brien
677b542ea2 Use __FBSDID(). 2003-06-11 00:56:59 +00:00
David Xu
36407bec4f If there are signals delivered to current thread, breaks out of loop,
userret() will be called again by ast() and thread_userret() will be
called again by userret().

Reported by: tegge
2003-06-10 02:21:32 +00:00
Maxime Henrion
0ca5dc1c3e style(9). 2003-06-09 21:57:48 +00:00
John Baldwin
5499ea019d Wait for the real interval timer callout handler to finish executing if it
is currently executing when we try to remove it in exit1().  Without this,
it was possible for the callout to bogusly rearm itself and eventually
refire after the process had been free'd resulting in a panic.

PR:		kern/51964
Reported by:	Jilles Tjoelker <jilles@stack.nl>
Reviewed by:	tegge, bde
2003-06-09 21:46:22 +00:00
John Baldwin
8bccf7034e The issetugid() function is MPSAFE. 2003-06-09 21:34:19 +00:00
Alan Cox
06fa71cdcc Update the vm object and page locking in exec_map_first_page(). Mark the
one still anticipated change with XXX.  Otherwise, this function is done.
2003-06-09 19:37:14 +00:00
Alan Cox
4412dc5468 - Add vm object locking to vm_pgmoveco().
- Add a comment to vm_pgmoveco() describing what remains to be done
   for vm locking.
2003-06-09 19:23:03 +00:00
Juli Mallett
c02d762181 Attempt to fix Alpha build by renaming ident[] to kern_ident[]. 2003-06-09 18:19:33 +00:00
John Baldwin
5e26dcb560 - Add a td_pflags field to struct thread for private flags accessed only by
curthread.  Unlike td_flags, this field does not need any locking.
- Replace the td_inktr and td_inktrace variables with equivalent private
  thread flags.
- Move TDF_OLDMASK over to the private flags field so it no longer requires
  sched_lock.
2003-06-09 17:38:32 +00:00
Juli Mallett
da1186f2c7 Expose kern.ident by way of OID_AUTO.
Requested by:	phk
2003-06-09 10:54:23 +00:00
Jeff Roberson
356500a306 - Add a simple CPU load balancing algorithm. This works by executing once a
second and equalizing the load between the two most imbalanced CPU.  This
   is intended to clear up long term load imbalances that would not be handled
   by the 'pull' method in sched_choose().
 - Pull out some bits of sched_choose() into a kseq_move() function that moves
   an arbitrary thread from one kseq to another.
2003-06-09 00:39:09 +00:00
Alan Cox
fd0cc9a862 Lock the vm object when performing vm_page_grab(). 2003-06-08 07:14:30 +00:00
Jeff Roberson
b90816f188 - When a new thread is added to a kseq the load is incremented prior to
adding it to the nice tables.  Therefore, in kseq_add_nice, we should
   keep in mind that the load will be 1 if we are the only thread, and not
   0.
 - Assert that the sched lock is held in all the appropriate places.
 - Increase the scope of the sched lock in sched_pctcpu_update().
 - Hold the sched lock in sched_runnable().  It is not held by the caller.
2003-06-08 00:47:33 +00:00
Poul-Henning Kamp
84c080a85e Improve the root-dev prompt facility for printing devices which could
possibly be a root filesystem.
2003-06-07 15:46:53 +00:00
David Xu
b0bd5f38a6 thread_signal_add now is called with ps_mtx held, unlock it before
calling copyin.
2003-06-06 02:17:38 +00:00
Robert Watson
777621799b If a system call comes in requesting to retrieve an attribute named
"", temporarily map it to a call to extattr_list_vp() to provide
compatibility for older applications using the "" API to retrieve
EA lists.

Use VOP_LISTEXTATTR() to support extattr_list_vp() rather than
VOP_GETEXTATTR(..., "", ...).

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Asssociates Laboratories
2003-06-05 05:55:34 +00:00
Robert Watson
a6f1342ff6 Add vop_listextattr(), similar to vop_getextattr() but without a
specific attribute name.  It will have the same semantics as the
older vop_getextattr() "retrieve the names" hack, returning
a buffer with ASCII nul-seperated names.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-06-05 05:53:35 +00:00
Marcel Moolenaar
11e0f8e16d Change the second (and last) argument of cpu_set_upcall(). Previously
we were passing in a void* representing the PCB of the parent thread.
Now we pass a pointer to the parent thread itself.
The prime reason for this change is to allow cpu_set_upcall() to copy
(parts of) the trapframe instead of having it done in MI code in each
caller of cpu_set_upcall(). Copying the trapframe cannot always be
done with a simply bcopy() or may not always be optimal that way. On
ia64 specifically the trapframe contains information that is specific
to an entry into the kernel and can only be used by the corresponding
exit from the kernel. A trapframe copied verbatim from another frame
is in most cases useless without some additional normalization.

Note that this change removes the assignment to td->td_frame in some
implementations of cpu_set_upcall(). The assignment is redundant.
A previous call to cpu_thread_setup() already did the exact same
assignment. An added benefit of removing the redundant assignment is
that we can now change td_pcb without nasty side-effects.

This change officially marks the ability on ia64 for 1:1 threading.

Not tested on: amd64, powerpc
Compile & boot tested on: alpha, sparc64
Functionally tested on: i386, ia64
2003-06-04 21:13:21 +00:00
Poul-Henning Kamp
22ee8c4f50 Add instrumentation which tells us how much work softclock() does
per invocation.
2003-06-04 05:25:58 +00:00
Robert Watson
8bebbb1a32 Implementations of extattr_list_fd(), extattr_list_file(), and
extattr_list_link() system calls, which return a least of extended
attributes defined for a vnode referenced by a file descriptor
or path name.  Currently, we just invoke VOP_GETEXTATTR() since
it will convert a request for an empty name into a query for a
name list, which was the old (more hackish) API.  At some point
in the near future, we'll push the distinction between get and
list down to the vnode operation layer, but this provides access
to the new API for applications in the short term.

Pointed out by:	Dominic Giampaolo <dbg@apple.com>
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-06-04 03:57:28 +00:00
Robert Watson
31d13e2a29 Regen from syscalls.master:1.149, addition of extended attribute
list system calls for fd, file, link.
2003-06-04 03:50:20 +00:00
Robert Watson
9e18f27730 Add system calls to explicitly list extended attributes on a
file/directory/link, rather than using a less explicit hack on
the extattr retrieval API:

  extattr_list_fd()
  extattr_list_file()
  extattr_list_link()

The existing API was counter-intuitive, and poorly documented.
The prototypes for these system calls are identical to
extattr_get_*(), but without a specific attribute name to
leave NULL.

Pointed out by:	Dominic Giampaolo <dbg@apple.com>
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-06-04 03:49:31 +00:00
Robert Watson
0b95513444 Assert the vnode lock when returning successfully from vn_open_cred(). 2003-06-04 00:54:27 +00:00
Julian Elischer
2b035cbe5a Remove un-needed code.
Don't copyin() data we are about to overwrite.
Add a flag to tell userland that KSE is officially "DONE" with the
mailbox and has gone away.

Obtained from:	davidxu@
2003-06-04 00:12:57 +00:00
Bosko Milekic
479728fd77 Fix a potential bucket leak where when freeing to an empty bucket
we failed to put the bucket back into the general cache/container.

Also, fix a bad assumption.  There was a KASSERT() that aimed to
guarantee that whenever the pcpu container's mc_starved was > 0,
that whatever the bucket we were freeing to was an empty bucket,
assuming it belonged to the pcpu container cache. However, there
is at least one case where this is not true anymore; consider:
1) All containers empty, next thread to try to alloc will touch
   a pcpu container, notice it's empty, and increment the pcpu
   container's mc_starved.
2) Some other thread frees an mbuf belonging to a bucket in
   the general cache/container.  Then it frees another mbuf
   belonging to the same bucket (still in gen container).
3) Some third thread tries to allocate an mbuf from the pcpu
   container and, since empty, grabs one mbuf now available
   in the general cache and moves the non-empty bucket from
   which it took 1 mbuf and to which the thread in (2) freed
   to, and moves it to the pcpu container.
4) A final thread tries to free an mbuf belonging to the
   NON-EMPTY bucket mentionned in (2) and (3) and, since
   the pcpu container's mc_starved is > 0, but the bucket
   is obviously non-empty, it trips on the KASSERT.
This meant that one could potentially get a panic in some
cases when out of mbufs and clusters.  The problem could
be mitigated by commenting out some cv_signal() calls,
but I'm assuming that was pure coincidence and this is
the correct fix.
2003-06-03 19:19:13 +00:00
Jeff Roberson
980c75b4d8 - Remove the blocked pointer from the umtx structure.
- Use a hash of umtx queues to queue blocked threads.  We hash on pid and the
   virtual address of the umtx structure.  This eliminates cases where we
   previously held a lock across a casuptr call.

Reviwed by:	jhb (quickly)
2003-06-03 05:24:46 +00:00
Tor Egge
ad05d58087 Add tracking of process leaders sharing a file descriptor table and
allow a file descriptor table to be shared between multiple process
leaders.

PR:		50923
2003-06-02 16:05:32 +00:00
Marcel Moolenaar
bf822712f7 Remove the ia64 hackery in threadinit() that was needed to work around
the lameness of the kstack code. The EPC overhaul de-lame-ified the
kstack code by removing the need for contigmalloc(). We can now
allocate stacks using malloc(). We probably want to make the stacks
swappable as well so that we can make it MI. But that's another story.
2003-06-01 05:57:58 +00:00
Robert Watson
ef2e1ca561 Attempt to further comment and clarify System V IPC logic: document
why certain exceptions are made, note an inconsistency between
FreeBSD and some other implementations regarding IPC_M, and let
suser() generate our EPERM rather than forcing it ourselves.
Remove a carriage return that crept in in the last commit.

Reviewed by:	gordon
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-05-31 23:31:51 +00:00
Robert Watson
a0ccd3f6ad Attempt to marginally de-obfuscate sections of the System V IPC access
control logic.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-05-31 23:17:30 +00:00
Poul-Henning Kamp
b82af320cf Add "" around mutex name to make message less confusing. 2003-05-31 21:11:01 +00:00
Poul-Henning Kamp
670966596b Remove unused variable(s).
Found by:       FlexeLint
2003-05-31 20:29:34 +00:00
Poul-Henning Kamp
1e93e04fa9 Remove return after panic.
Found by:       FlexeLint
2003-05-31 20:18:23 +00:00
Poul-Henning Kamp
90471005e1 Remove needless return
Found by:       FlexeLint
2003-05-31 20:16:44 +00:00
Poul-Henning Kamp
4fe77d64a0 Add a couple of XXX comments where the intent is not clear.
Found by:       FlexeLint
2003-05-31 20:13:58 +00:00
Poul-Henning Kamp
74f1af0191 Remove unused variable(s).
Remove break after goto

Found by:       FlexeLint
2003-05-31 20:11:33 +00:00
Poul-Henning Kamp
b1921a6f33 Remove return after panic.
Found by:       FlexeLint
2003-05-31 20:09:42 +00:00
Poul-Henning Kamp
a62f80f8e0 Remove unused variable and now unbalanced call to splbio();
Found by:       FlexeLint
2003-05-31 20:09:01 +00:00
Marcel Moolenaar
a063facbf6 Fix ia32 compat on ia64. Recent ia64 MD changes caused the garbage on
the stack to be changed in a way incompatible with elf32_map_insert()
where we used data_buf without initializing it for when the partial
mapping resulting in a misaligned image (typical when the page size
implied by the image is not the same as the page size in use by the
kernel). Since data_buf is passed by reference to vm_map_find(), the
compiler cannot warn about it.

While here, move all local variables to the top of the function.
2003-05-31 19:55:05 +00:00
Poul-Henning Kamp
850cb24ef8 "break" rather than fall through to a break in the default clause.
Found by:       FlexeLint
2003-05-31 16:53:16 +00:00
Poul-Henning Kamp
8313328657 Introduce {be,le}_uuid_{enc,dec}() functions for explicitly encoding
and decoding UUID's in big endian and little endian binary format.
2003-05-31 16:47:07 +00:00
Poul-Henning Kamp
17a1391990 The IO_NOWDRAIN and B_NOWDRAIN hacks are no longer needed to prevent
deadlocks with vnode backed md(4) devices because md now uses a
kthread to run the bio requests instead of doing it directly from
the bio down path.
2003-05-31 16:42:45 +00:00
Peter Wemm
d5167abf3c Add __amd64__ to the ifdefs that introduce the "pcicfg" spinlock to
witness.

Approved by:  re (safe amd64 support)
2003-05-31 06:42:37 +00:00
Maxime Henrion
193f2edbf9 When loading a module that contains a sysctl which is already compiled
in the kernel, the sysctl_register() call would fail, as expected.
However, when unloading this module again, the kernel would then panic
in sysctl_unregister().  Print a message error instead.

Submitted by:	Nicolai Petri <nicolai@catpipe.net>
Reviewed by:	imp
Approved by:	re@ (jhb)
2003-05-29 21:19:18 +00:00
David Malone
0f7e5f778a Add an INVARIENTS only check to make sure Giant is held if mbuf
allocation is attempted with M_TRYWAIT.

Reviewed by:	bmilekic
Approved by:	re (scottl)
2003-05-29 18:38:24 +00:00
David Malone
de1cab2b60 Grab giant in sendit rather than kern_sendit because sockargs may
allocate mbufs with M_TRYWAIT, which may require Giant.

Reviewed by:	bmilekic
Approved by:	re (scottl)
2003-05-29 18:36:26 +00:00
Ian Dowse
ad6adb4f18 In cluster_wbuild(), initialise b_iocmd to BIO_WRITE before calling
buf_start() to avoid triggering a panic in softdep_disk_io_initiation()
if b_iocmd happened to be BIO_READ. The later initialisation of
b_iocmd in cluster_wbuild() could probably be moved to before the
buf_start() call, but this patch keeps the change as simple as
possible.

This is reported to fix occasional "softdep_disk_io_initiation: read"
panics, especially on NFS servers.

Reported by:	Nick Hilliard <nick@netability.ie>
Tested by:	Nick Hilliard <nick@netability.ie>
Approved by:	re (rwatson)
2003-05-28 13:22:10 +00:00
Peter Wemm
a9a0bbad19 Copy the va_list in sbuf_vprintf() before passing it to vsnprintf(),
because we could fail due to a small buffer and loop and rerun.  If this
happens, then the vsnprintf() will have already taken the arguments off
the va_list.  For i386 and others, this doesn't matter because the
va_list type is a passed as a copy.  But on powerpc and amd64, this is
fatal because the va_list is a reference to an external structure that
keeps the vararg state due to the more complicated argument passing system.
On amd64, arguments can be passed as follows:
First 6 int/pointer type arguments go in registers, the rest go on
  the memory stack.
Float and double are similar, except using SSE registers.
long double (80 bit precision) are similar except using the x87 stack.
Where the 'next argument' comes from depends on how many have been
processed so far and what type it is.  For amd64, gcc keeps this state
somewhere that is referenced by the va_list.

I found a description that showed the va_copy was required here:
http://mirrors.ccs.neu.edu/cgi-bin/unixhelp/man-cgi?va_end+9
The single unix spec doesn't mention va_copy() at all.

Anyway, the problem was that the sysctl kern.geom.conf* nodes would panic
due to walking off the end of the va_arg lists in vsnprintf.  A better fix
would be to have sbuf_vprintf() use a single pass and call kvprintf()
with a callback function that stored the results and grew the buffer
as needed.

Approved by:	re (scottl)
2003-05-25 19:03:08 +00:00
Jeff Roberson
0003d1b74e - Create a new lock, umtx_lock, for use instead of the proc lock for
protecting the umtx queues.  We can't use the proc lock because we need
   to hold the lock across calls to casuptr, which can fault.

Approved by:	re
2003-05-25 18:18:32 +00:00
Jeff Roberson
30fd5d085d - Reset the free ent to NULL if we have consumed the last free entry. This
fixes a problem where we would overwrite old data if we ran out of free
   entries.

Submitted by:	sam
Approved by:	re (scottl)
2003-05-25 08:48:42 +00:00
Alan Cox
2e05d89828 Make the maximum number of vnodes a function of both the physical memory
size and the kernel's heap size, specifically, vm_kmem_size.  This
function allows a maximum of 40% of the vm_kmem_size to be used for
vnodes and vm objects.  This is a conservative bound based upon recent
problem reports.  (In other words, a slight increase in this percentage
may be safe.)

Finally, machines with less than ~3GB of RAM should be unaffected
by this change, i.e., the maximum number of vnodes should remain
the same.  If necessary, machines with 3GB or more of RAM can increase
the maximum number of vnodes by increasing vm_kmem_size.

Desired by:	scottl
Tested by:	jake
Approved by:	re (rwatson,scottl)
2003-05-23 19:54:02 +00:00
Julian Elischer
faaa20f639 When we are spilling threads out of the run queue during panic, make sure we
keep the thread state variable consistent with its real state.
i.e. Don't say it's on the run queue when it isn't.

Also clarify the associated comment.

Turns a double panic back to a single panic :-/

Approved by:	re@ (jhb)
2003-05-21 18:53:25 +00:00
Marcel Moolenaar
f2c49dd248 Revamp of the syscall path, exception and context handling. The
prime objectives are:
o  Implement a syscall path based on the epc inststruction (see
   sys/ia64/ia64/syscall.s).
o  Revisit the places were we need to save and restore registers
   and define those contexts in terms of the register sets (see
   sys/ia64/include/_regset.h).

Secundairy objectives:
o  Remove the requirement to use contigmalloc for kernel stacks.
o  Better handling of the high FP registers for SMP systems.
o  Switch to the new cpu_switch() and cpu_throw() semantics.
o  Add a good unwinder to reconstruct contexts for the rare
   cases we need to (see sys/contrib/ia64/libuwx)

Many files are affected by this change. Functionally it boils
down to:
o  The EPC syscall doesn't preserve registers it does not need
   to preserve and places the arguments differently on the stack.
   This affects libc and truss.
o  The address of the kernel page directory (kptdir) had to
   be unstaticized for use by the nested TLB fault handler.
   The name has been changed to ia64_kptdir to avoid conflicts.
   The renaming affects libkvm.
o  The trapframe only contains the special registers and the
   scratch registers. For syscalls using the EPC syscall path
   no scratch registers are saved. This affects all places where
   the trapframe is accessed. Most notably the unaligned access
   handler, the signal delivery code and the debugger.
o  Context switching only partly saves the special registers
   and the preserved registers. This affects cpu_switch() and
   triggered the move to the new semantics, which additionally
   affects cpu_throw().
o  The high FP registers are either in the PCB or on some
   CPU. context switching for them is done lazily. This affects
   trap().
o  The mcontext has room for all registers, but not all of them
   have to be defined in all cases. This mostly affects signal
   delivery code now. The *context syscalls are as of yet still
   unimplemented.

Many details went into the removal of the requirement to use
contigmalloc for kernel stacks. The details are mostly CPU
specific and limited to exception_save() and exception_restore().
The few places where we create, destroy or switch stacks were
mostly simplified by not having to construct physical addresses
and additionally saving the virtual addresses for later use.

Besides more efficient context saving and restoring, which of
course yields a noticable speedup, this also fixes the dreaded
SMP bootup problem as a side-effect. The details of which are
still not fully understood.

This change includes all the necessary backward compatibility
code to have it handle older userland binaries that use the
break instruction for syscalls. Support for break-based syscalls
has been pessimized in favor of a clean implementation. Due to
the overall better performance of the kernel, this will still
be notived as an improvement if it's noticed at all.

Approved by: re@ (jhb)
2003-05-16 21:26:42 +00:00
Don Lewis
1e9bc9f889 Detect that a vnode has been reclaimed while vflush() was waiting to lock
the vnode and restart the loop.  Vflush() is vulnerable since it does not
hold a reference to the vnode and it holds no other locks while waiting
for the vnode lock.  The vnode will no longer be on the list when the
loop is restarted.

Approved by:	re (rwatson)
2003-05-16 19:46:51 +00:00
David E. O'Brien
8d542cb56d Fix long standing bug that prevents the PT_CONTINUE, PT_KILL and
PT_DETACH ptrace(2) requests from functioning as advertised in the
manual page.  As described in kern/35175, the PT_DETACH request will,
under certain circumstances, pass an unwanted signal on to the traced
process upan detaching from it.  The PT_CONTINUE request will
sometimes fail if you make it pass a signal that has "properties" that
differ from the properties of the signal that origionally caused the
traced process to be stopped.  Since PT_KILL is nothing than
PT_CONTINUE with SIGKILL, it is broken too.  In the PT_KILL case, this
leads to an unkillable process.

PR:		44011
Submitted by:	Mark Kettenis <kettenis@chello.nl>
Approved by:	re(jhb)
2003-05-16 01:34:23 +00:00
Robert Watson
c1dca9ab07 VOP_PATHCONF() requires a vnode lock; this patch adds locking to
fpathconf(). The lock is held for direct calls to VOP_PATHCONF() in
pathconf() already.

Approved by:	re (jhb)
Pointed out by:	DEBUG_VFS_LOCKS
2003-05-15 21:13:08 +00:00
Bosko Milekic
11583f6c93 Make the mb_alloc low-watermark sysctl-tunable read-only and make
netstat(1) not display it for now because its effects are not yet
completely implemented and we're about to cut 5.2-RELEASE.
This is temporary.

Approved by: re (scottl, rwatson)
2003-05-15 19:05:28 +00:00
Paul Saab
13d56a9a90 p_sigignore moved into struct sigacts. move one which was missed.
Approved by:	re (scottl)
2003-05-14 00:03:55 +00:00
John Baldwin
90af4afacb - Merge struct procsig with struct sigacts.
- Move struct sigacts out of the u-area and malloc() it using the
  M_SUBPROC malloc bucket.
- Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(),
  sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared().
- Remove the p_sigignore, p_sigacts, and p_sigcatch macros.
- Add a mutex to struct sigacts that protects all the members of the struct.
- Add sigacts locking.
- Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now
  that sigacts is locked.
- Several in-kernel functions such as psignal(), tdsignal(), trapsignal(),
  and thread_stopped() are now MP safe.

Reviewed by:	arch@
Approved by:	re (rwatson)
2003-05-13 20:36:02 +00:00
John Baldwin
25b4d3a8a6 In setitimer(2), if the it_value of the new itimer value is clear, then
don't add the current time to it, but leave it as clear so that when the
timer is disabled, the it_value is always clear.

Reviewed by:	bde
Approved by:	re (rwatson)
2003-05-13 19:21:46 +00:00
Alan Cox
099e981aa1 Optimize the use of splay in gbincore(). During a "make buildworld" the
desired buffer is found at one of the roots more than 60% of the time.
Thus, checking both roots before performing either splay eliminates
unnecessary splays on the first tree splayed.

Approved by:	re (jhb)
2003-05-13 04:36:02 +00:00
Poul-Henning Kamp
87b1831f1d Bail out if there were not two loadable sections. Add XXX comment about
one other issue.

Approved by:	re/rwatson.
2003-05-12 15:08:10 +00:00
Robert Watson
1964fb9ba2 Remove bogus locking from DDB's "show lockedvnods" command: using
synchronization primitives from inside DDB is generally a bad idea,
and in this case it frequently results in panics due to DDB commands
being executed from the sio fast interrupt context on a serial
console.  Replace the locking with a note that a lack of locking
means that DDB may get see inconsistent views of the mount and vnode
lists, which could also result in a panic.  More frequently,
though, this avoids a panic than causes it.

Discussed with ages ago:	bde
Approved by:			re (scottl)
2003-05-12 14:37:47 +00:00
Poul-Henning Kamp
1282e9acea Don't pass NULL pointer to memset if we are compiled with DIAGNOSTIC
Approved by:	re/rwatson
2003-05-12 05:09:56 +00:00
Bosko Milekic
969bab3efb Make m_freem() just use m_free() instead of duplicating the code. The
reason for the duplication was that m_freem() was meant to eventually
be optimized to hold the lock of the cache being freed to as long as
possible across frees but the difficulty of implementing said
optimization right now is too high, given that in some cases (see MAC
and non-cluster external buffers), we need to call into other subsytems,
something not permissible when the cache lock is held.

This change minimizes code duplication while keeping at least the
atomic mbuf+cluster free optimization.

Suggested by: luigi
2003-05-10 18:08:23 +00:00
John Baldwin
b1bf1c3a98 Remove Giant from kern_sigsuspend() and osigsuspend() as these should now
be MP safe.

Approved by:	re (scottl)
2003-05-09 19:11:32 +00:00
Robert Watson
b2aef57123 Rename MAC_MAX_POLICIES to MAC_MAX_SLOTS, since the variables and
constants in question refer to the number of label slots, not the
maximum number of policies that may be loaded.  This should reduce
confusion regarding an element in the MAC sysctl MIB, as well as
make it more clear what the affect of changing the compile-time
constants is.

Approved by:	re (jhb)
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-05-08 19:49:42 +00:00
Robert Watson
41a17fe326 Clean up locking for the MAC Framework:
(1) Accept that we're now going to use mutexes, so don't attempt
    to avoid treating them as mutexes.  This cleans up locking
    accessor function names some.

(2) Rename variables to _mtx, _cv, _count, simplifying the naming.

(3) Add a new form of the _busy() primitive that conditionally
    makes the list busy: if there are entries on the list, bump
    the busy count.  If there are no entries, don't bump the busy
    count.  Return a boolean indicating whether or not the busy
    count was bumped.

(4) Break mac_policy_list into two lists: one with the same name
    holding dynamic policies, and a new list, mac_static_policy_list,
    which holds policies loaded before mac_late and without the
    unload flag set.  The static list may be accessed without
    holding the busy count, since it can't change at run-time.

(5) In general, prefer making the list busy conditionally, meaning
    we pay only one mutex lock per entry point if all modules are
    on the static list, rather than two (since we don't have to
    lower the busy count when we're done with the framework).  For
    systems running just Biba or MLS, this will halve the mutex
    accesses in the network stack, and may offer a substantial
    performance benefits.

(6) Lay the groundwork for a dynamic-free kernel option which
    eliminates all locking associated with dynamically loaded or
    unloaded policies, for pre-configured systems requiring
    maximum performance but less run-time flexibility.

These changes have been running for a few weeks on MAC development
branch systems.

Approved by:	re (jhb)
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-05-07 17:49:24 +00:00
Alan Cox
658ad5fff5 Lock the vm_object when performing vm_pager_deallocate(). 2003-05-06 02:45:28 +00:00
John Baldwin
01de25134f Tweak the clearing of TDF_DEADLKTREAT so that we only bother grabbing the
lock and clearing the flag if it was clear when uiomove() was called.
2003-05-05 21:27:29 +00:00
John Baldwin
854dc8c2a1 Mostly sort the includes. 2003-05-05 21:26:25 +00:00
John Baldwin
18440c7fe7 Lock the proc lock around calls to tdsignal() in the sigwait() family of
syscalls.
2003-05-05 21:18:10 +00:00
John Baldwin
6711f10fb6 Make issignal() private to kern_sig.c since it is only called from cursig()
and cursig() is now a function rather than a macro.
2003-05-05 21:16:28 +00:00
John Baldwin
e668d8d834 Remove TD_ON_RUNQ() from a check to make sure Giant is not held when
calling mi_switch().  The kernel would panic on an earlier KASSERT() in
mi_switch() if TD_ON_RUNQ() was true.
2003-05-05 21:12:36 +00:00
David Malone
710c5645af Split sendit into two parts. The first part, still called sendit, that
does the copyin stuff and then calls the second part kern_sendit to do
the hard work. Don't bother holding Giant during the copyin phase.

The intent of this is to allow the Linux emulator to impliment send*
syscalls without using the stackgap.
2003-05-05 20:33:38 +00:00
Martin Blapp
f130dcf22a Change the semantics of sysv shm emulation to take a additional
argument to the functions shm{at,ctl}1 and shm_find_segment_by_shmid{x}.
The BSD semantics didn't allow the usage of shared segment after
being marked for removal through IPC_RMID.

The patch involves the following functions:
  - shmat
  - shmctl
  - shm_find_segment_by_shmid
  - shm_find_segment_by_shmidx
  - linux_shmat
  - linux_shmctl

Submitted by:	Orlando Bassotto <orlando.bassotto@ieo-research.it>
Reviewed by:	marcel
2003-05-05 09:22:58 +00:00
Poul-Henning Kamp
8cb72d6174 Add two KASSERTS which trigger if free(9) would drag the "memuse" statistic
for a malloc bucket under zero.  This typically happens if you malloc(9)
from one bucket and free to another.
2003-05-05 08:32:53 +00:00
Poul-Henning Kamp
c0c0300f51 Use le32dec() instead of le32toh() because we are not guaranteed to have
a word aligned input.
2003-05-05 07:22:35 +00:00
Alan Cox
bff99f0d12 - Revert kern/vfs_subr.c revision 1.444. The vm_object's size isn't
trustworthy for vnode-backed objects.
 - Restore the old behavior of vm_object_page_remove() when the end
   of the given range is zero.  Add a comment to vm_object_page_remove()
   regarding this behavior.

Reported by:	iedowse
2003-05-03 08:09:24 +00:00
Alan Cox
12352fdcaa Lock access to the vm_object's flags in vop_stdcreatevobject(). 2003-05-02 19:33:21 +00:00
Julian Elischer
43fdafb1e1 Fix typo in last commit 2003-05-02 06:18:55 +00:00
Mike Silbersack
d563b41e1f Add the M_FREELIST flag, which is used to detect whenever a
double free of a mbuf occurs and cause an immediate panic, rather
than allowing free list corruption to occur.

This code is trapped under INVARIANTS, so it should not cause any
change in default performance.

Reviewed by:	a bunch of people on -net
MFC after:	1 week
2003-05-02 03:43:40 +00:00
Julian Elischer
13652e9578 remove old and inaccurate XXX comment. 2003-05-02 01:02:20 +00:00
Julian Elischer
b1ac98d8b2 Move the flag that indicates an idle thread from the KSE to the thread.
It was always referenced via the thread anyhow.

Reviewed by:	jhb (a LOOOOONG time ago)
2003-05-02 00:33:12 +00:00
John Baldwin
52c3844c7a Remove Giant from the setuid(), seteuid(), setgid(), setegid(),
setgroups(), setreuid(), setregid(), setresuid(), and setresgid() syscalls
as well as the cred_update_thread() function.
2003-05-01 21:21:42 +00:00
John Baldwin
7d447c956b Initialize and destroy the struct proc mutex in the proc zone's init and
fini routines instead of in fork() and wait().  This has the nice side
benefit that the proc lock of any process on the allproc list is always
valid and sched_lock doesn't have to be used to test against PRS_NEW
anymore.
2003-05-01 21:16:38 +00:00
John Baldwin
f2957f6b9a Garbage collect unused TDF_INMSLEEP flag. 2003-05-01 17:05:24 +00:00
Dag-Erling Smørgrav
87ccef7b77 Instead of recording the Unix time in a process when it starts, record the
uptime.  Where necessary, convert it back to Unix time by adding boottime
to it.  This fixes a potential problem in the accounting code, which would
compute the elapsed time incorrectly if the Unix time was stepped during
the lifetime of the process.
2003-05-01 16:59:23 +00:00
David Xu
c6523b663f Fix compiling problem, p_tracee is in my local repository for
threaded process debugging, not ready for this time.
2003-05-01 12:16:06 +00:00
David Xu
1ecc645634 Drop Giant lock before suspended, pick up it after resumed.
thread_suspend_check() is used in exit1() which still needs
Giant lock.
2003-05-01 07:29:25 +00:00
Alan Cox
ab7b0ae578 Lock an update to a vm_object's ref_count. 2003-05-01 03:51:05 +00:00
Alan Cox
ebba1b25f9 Lock accesses to the vm_object's ref_count and resident_page_count. 2003-05-01 03:10:38 +00:00
Peter Wemm
cb1f265c60 AMD64 uses the new-style cpu_switch()/cpu_throw() calling conventions. 2003-04-30 21:45:03 +00:00
John Baldwin
a14e118939 Forgot to remove Giant around call to kern_sigaction() in
freebsd4_sigaction() in revision 1.232.
2003-04-30 19:45:13 +00:00
John Baldwin
428eb576a5 Axe a stale comment. 2003-04-30 19:41:04 +00:00
Mark Murray
51da11a27a Fix some easy, global, lint warnings. In most cases, this means
making some local variables static. In a couple of cases, this means
removing an unused variable.
2003-04-30 12:57:40 +00:00
David Xu
5c29a450ae Increase some default values. 2003-04-30 01:18:29 +00:00
Alexander Kabaev
104a9b7e3e Deprecate machine/limits.h in favor of new sys/limits.h.
Change all in-tree consumers to include <sys/limits.h>

Discussed on:	standards@
Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>
2003-04-29 13:36:06 +00:00
Mike Barcroft
9ddb795450 style(9) 2003-04-28 18:32:19 +00:00
Alan Cox
01dfc1deae Finish the vm_object locking for this file, including holding the vm_object
lock when accessing the vm_object's flags or calling vm_page_lookup().
2003-04-28 05:40:45 +00:00
David Xu
5073e68fa3 unlock sched_lock at right time. 2003-04-27 04:32:40 +00:00
Alan Cox
ecde4b3218 Various changes to vm_object_page_remove():
- Eliminate an odd, special-case feature:
   if start == end == 0 then all pages are removed.  Only one caller
   used this feature and that caller can trivially pass the object's
   size.
 - Assert that the vm_object is locked on entry; don't bother testing
   for a NULL vm_object.
 - Style: Fix lines that are longer than 80 characters.
2003-04-26 23:41:30 +00:00
Alan Cox
c829b9d0fc - Lock the vm_object on entry to vm_object_terminate(). 2003-04-26 19:36:19 +00:00
Alan Cox
1ca5895341 - Convert vm_object_pip_wait() from using tsleep() to msleep().
- Make vm_object_pip_sleep() static.
 - Lock the vm_object when performing vm_object_pip_wait().
2003-04-26 18:33:18 +00:00
Alan Cox
af3e0bb202 - Lock the vm_object when performing vm_page_alloc() in allocbuf(). 2003-04-26 07:42:24 +00:00
Poul-Henning Kamp
3f6ee876c1 Update the "last malloc failure timestamp" also for simulated
malloc errors.
2003-04-25 21:49:24 +00:00
John Baldwin
a70a2b741b Remove Giant from getpgid() and getsid() and tweak the logic to more
closely match that of 4.x.
2003-04-25 20:09:31 +00:00
John Baldwin
17b8a8a77a Push down Giant around calls to proc_rwmem() in kern_ptrace. kern_ptrace()
should now be MP safe.
2003-04-25 20:02:16 +00:00
John Baldwin
25d6dc0606 Push Giant down into kern_sigaction() instead of locking it around calls
to kern_sigaction() in the various callers of the function.
2003-04-25 20:01:19 +00:00
John Baldwin
64cc6a13e7 - Push down Giant around vnode operations in ktrace().
- Mark the ktrace() and utrace() syscalls as being MP safe.
- Validate the facs argument to ktrace() prior to doing any vnode
  operations or acquiring any locks.
- Share lock the proctree lock over the entire section that calls
  ktrsetchildren() and ktrops().  We already did this for process groups.
  Doing it for the process case closes a small race where a process might
  go away after we look it up.  As a result of this, ktrstchildren() now
  just asserts that the proctree lock is locked rather than acquiring the
  lock itself.
- Add some missing comments to #else and #endif.
2003-04-25 19:59:35 +00:00
Daniel Eischen
1328e1c4be Add an argument to get_mcontext() which specified whether the
syscall return values should be cleared.  The system calls
getcontext() and swapcontext() want to return 0 on success
but these contexts can be switched to at a later time so
the return values need to be cleared in the saved register
sets.  Other callers of get_mcontext() would normally want
the context without clearing the return values.

Remove the i386-specific context saving from the KSE code.
get_mcontext() is not i386-specific any more.

Fix a bad pointer in the alpha get_mcontext() code.  The
context was being bcopy()'d from &td->tf_frame, but tf_frame
is itself a pointer, so the thread was being copied instead.
Spotted by jake.

Glanced at by:  jake
Reviewed by:    bde (months ago)
2003-04-25 01:50:30 +00:00
Tim J. Robbins
913fc94d2b Include altkstack pages in the RSS regardless of whether the process
is swapped out. Pointed out by jhb.
2003-04-25 00:20:40 +00:00
Dag-Erling Smørgrav
013466aa50 It seems that 1 was not a magic value as I thought, but a coincidence.
Instead of applying the adjustment to processes with a start time of 1,
apply it to all processes with a start time of less than 3600.

None of this would be necessary if the start times were recorded in ticks
instead of seconds and microseconds.
2003-04-24 12:12:06 +00:00
Tim J. Robbins
ceff7f2a48 Do a better job of calculating the RSS for swapped-out processes:
don't include the kernel stacks of swapped-out threads in the page count,
but do include the alternate kernel stack. jhb provided some helpful
comments on this.

PR:		49102
2003-04-24 11:03:04 +00:00
Tim J. Robbins
38dd7dee8a Free mount credentials (mnt_cred) when freeing the mount struct
in failure cases to avoid leaking struct ucreds, and ultimately
leaking struct uidinfo references.
2003-04-24 08:16:06 +00:00
Alan Cox
b6e48e0372 - Acquire the vm_object's lock when performing vm_object_page_clean().
- Add a parameter to vm_pageout_flush() that tells vm_pageout_flush()
   whether its caller has locked the vm_object.  (This is a temporary
   measure to bootstrap vm_object locking.)
2003-04-24 04:31:25 +00:00
Dag-Erling Smørgrav
1f7440d9f6 When filling out a kinfo_proc structure, if we come across a process
whose p_stats->p_start has the magic value 1, replace it with boottime.
Some users were apparently confused by the fact that ps(1) reported a
start time in early 1970 for system processes.
2003-04-24 03:37:59 +00:00
John Baldwin
cf60731b01 Remove Giant from osigblock(), osigsetmask(), and kern_sigaltstack(). 2003-04-23 19:49:18 +00:00
John Baldwin
5eac9e2dcb The signotify() sanity check in userret() doesn't need Giant anymore. 2003-04-23 18:51:55 +00:00
John Baldwin
2056d0a168 Add lock assertions for various proc/thread/kse/ksegroup fields to the
scheduler functions.
2003-04-23 18:51:05 +00:00
John Baldwin
5afe0c9947 - Reorganize osigstack() to do the copyin first, grab the proc lock once,
do all the various sigstack dances, unlock the proc lock, and finally do
  the copyout.  This more closely resembles the behavior of
  kern_sigaltstack() and closes a small race.
- Remove Giant from osigstack as it is no longer needed.
2003-04-23 18:50:25 +00:00
John Baldwin
4d923fe3f5 Remove Giant from [gs]etpriority(). 2003-04-23 18:48:55 +00:00
John Baldwin
112afcb232 - Protect p_numthreads with the sched_lock.
- Protect p_singlethread with both the sched_lock and the proc lock.
- Protect p_suspcount with the proc lock.
2003-04-23 18:46:51 +00:00
David E. O'Brien
2603007ace Add /dev to the Alpha manual mount root example. 2003-04-23 05:02:40 +00:00
John Baldwin
9752f794c7 - Move PS_PROFIL and its new cousin PS_STOPPROF back over to p_flag and
rename them appropriately.  Protect both flags with both the proc lock
  and the sched_lock.
- Protect p_profthreads with the proc lock.
- Remove Giant from profil(2).
2003-04-22 20:54:04 +00:00
John Baldwin
0b5318c81a - Assert that the proc lock and sched_lock are held in sched_nice().
- For the 4BSD scheduler, this means that all callers of the static
  function resetpriority() now always hold sched_lock, so don't lock
  sched_lock explicitly in that function.
2003-04-22 20:50:38 +00:00
John Baldwin
a15cc35909 Lock both the proc lock and sched_lock when calling sched_nice since
kg_nice is now protected by both.  Being protected by both means that
other places in the kernel that want to read kg_nice only need one of the
two locks.
2003-04-22 20:45:38 +00:00
John Baldwin
eeec6bab2e Prefer the proc lock to sched_lock when testing PS_INMEM now that it is
safe to do so.
2003-04-22 20:01:56 +00:00
John Baldwin
828e7683bf Protect p_swtime with the sched_lock. 2003-04-22 19:48:25 +00:00
John Baldwin
a6f37ac9d6 - Mark the kse_purge_group() and kse_purge() definitions static to match
their prototypes.
- Remove sched_lock locking from kse_purge() as all callers already lock
  the sched_lock before calling it.
- Hold the proc lock slightly longer to protect P_SHOULDSTOP().
2003-04-22 19:47:55 +00:00
Warner Losh
01a9b4348f Create a new function, device_is_attached(), that is like
device_is_alive() that tells us if the device has successfully
attached.  device_is_alive just tells us that the device has
successfully probed.
2003-04-21 18:19:08 +00:00
David Xu
11b20c685b Fix lock order reversal problem. 2003-04-21 14:42:04 +00:00
David Xu
1ecb38a365 Introduce two flags to control upcall behaviour:
o KMF_NOUPCALL
	Ask kse_release to not return to userland upcall entry, but instead
	direct returns to userland by using current thread's stack and return
	address on stack. This flags is intended to be used by UTS in critical
	region to wait another UTS thread to leave critical region, by using
	kse_release with this flag to avoid spinnng and burning CPU. Also this
	flags can be used by UTS to poll completed context when there is nothing
	to do in userland and needn't restart from its entry like normal upcall.

o KMF_NOCOMPLETED
	Ask kernel to not bring completed thread contexts back to userland when
	doing upcall, this flags is intend to be used with above flag when an
	upcall thread is in critical region and can not process completed contexts
	at that time.

Tested by: deischen
2003-04-21 07:27:59 +00:00
Warner Losh
e22b0bf4b8 Fix /dev/devctl's implementation of poll. We should only be setting
the poll bits when there's actually something in the queue.
Otherwise, select always returned '2' when there were no items to be
read, and '3' when there were.  This would preclude being able to read
in a threaded (libc_r) program, as well as checking to see if there
were pending events or not.
2003-04-21 05:58:51 +00:00
Alan Cox
2b7e071e89 - Lock the vm_object when performing vm_object_pip_add(). 2003-04-20 07:29:50 +00:00
Alan Cox
097d4338db Lock the vm_object in vfs_busy_pages(). 2003-04-20 00:17:05 +00:00
Alan Cox
0fa05eae77 - Lock the vm_object when performing vm_object_pip_subtract().
- Assert that the vm_object lock is held in vm_object_pip_subtract().
2003-04-19 22:11:41 +00:00
Alan Cox
0d420ad3e6 - Lock the vm_object when performing vm_object_pip_wakeupn().
- Assert that the vm_object lock is held in vm_object_pip_wakeupn().
 - Add a new macro VM_OBJECT_LOCK_ASSERT().
2003-04-19 21:15:44 +00:00
Alan Cox
ea08145b76 Lock the jumbo_vm_object when performing vm_page_alloc(). 2003-04-19 19:13:25 +00:00
David Xu
95bee4c365 Test next upcall time correctly. 2003-04-19 06:16:04 +00:00
David Xu
06ce69a720 Unbreak sigaltstack syscall. sigonstack is now a function and
want proc lock be held.
2003-04-19 05:04:06 +00:00
David Xu
588257e810 Use correct thread pointer. 2003-04-19 04:39:10 +00:00
John Baldwin
8b94a0616d - Make sigonstack() a regular function instead of an inline and add a proc
lock assertion to it.
- SIGPENDING() no longer needs sched_lock, so only grab sched_lock to set
  the TDF_NEEDSIGCHK and TDF_ASTPENDING flags in signotify().
- Add a proc lock assertion to tdsigwakeup().
- Since we always set TDF_OLDMASK while holding the proc lock, the proc
  lock is sufficient protection to check its state in postsig() and we only
  need sched_lock when clearing the actual flag.
2003-04-18 20:59:05 +00:00
John Baldwin
889a6b5845 Use the proc lock to protect p_singlethread and a P_WEXIT test. This
fixes a couple of potential KSE panics on non-i386 arch's that weren't
holding the proc lock when calling thread_exit().
2003-04-18 20:20:00 +00:00
John Baldwin
e77daab1af Rename do_sigprocmask() to kern_sigprocmask() and make it a global symbol
so that it can be used by binary emulators.
2003-04-18 20:18:44 +00:00
John Baldwin
08865ba1d1 Add a couple of sched_lock asserts. 2003-04-18 20:17:47 +00:00
John Baldwin
02e878d97c - Add a static function pgadjustjobc() to adjust the job control count for
a process group.
- Call pgadjustjobc() twice in fixjobc() to avoid code duplication and
  improve readability.
- Use the proc lock to protect P_SHOULDSTOP() instead of sched_lock.
- Check to see if a process is PRS_NEW with sched_lock before trying to
  lock its proc lock since the lock may not be constructed yet.
2003-04-18 20:17:05 +00:00
Robert Watson
2d3db0b823 Update NAI copyright to 2003, missed in earlier commits and merges. 2003-04-18 19:57:37 +00:00
Alan Cox
49281fbf68 Update locking around vm_object_page_remove() to use the new macros. 2003-04-18 16:39:03 +00:00
Jeff Roberson
7cd650a972 - Set the ke_cpu field in sched_add() for interrupt and realtime threads
since they are going on the current cpu and not their previously assigned
   cpu.
 - sched_runnable() should only return true in the SMP case if the other
   processor has more than one thread that is runnable.  We can not steal
   curthread.
 - Change kseq_print() to accept the cpuid instead of a kseq pointer.  This
   makes use of this function in ddb much easier.
2003-04-18 05:24:10 +00:00
Julian Elischer
d3a0bd78a8 Add a thread_unlink() and use it.
It could also be used twice in kern_thr.c but that's owned by jeff
so I'l let him change it when he's next there.
2003-04-18 00:16:13 +00:00
John Baldwin
cd4ed3b5b0 - kthread's don't have p_textvp set to anything, so replace code that
dealt with that possibility with a KASSERT().
- No need to set P_SYSTEM, kthread_create() does that for us.
2003-04-17 22:37:48 +00:00
John Baldwin
213b19e9fb - Use a local struct proc variable to improve readability.
- Use a local variable to close a minor race when determining if the wmesg
    printed out needs a prefix such as when a thread is blocked on a lock.
2003-04-17 22:36:40 +00:00
John Baldwin
f5d5cb3c7c Tweak locking in the PS_XCPU handler to hold the sched_lock while reading
p_runtime.
2003-04-17 22:33:04 +00:00
John Baldwin
b68e08498f The sched_lock is not needed while clearing two of the P_STOPPED bits in
p_flag.  Also, the proc lock can't be recursed, so simplify an older proc
lock assertion.
2003-04-17 22:31:54 +00:00
John Baldwin
b5a2bad175 Don't assume that p_session hasn't changed out from under us after unlocking
the process and session.  Instead, cache a true reference to the session
when we do the hold and release our reference on that session.  This avoids
the need for the proc lock when dropping the reference.
2003-04-17 22:30:43 +00:00
John Baldwin
f385f7156a Lock the sched_lock while setting TDF_INPANIC. 2003-04-17 22:29:23 +00:00
John Baldwin
27dad03c97 Use TD_IS_RUNNING() instead of thread_running() in the adaptive mutex
code.
2003-04-17 22:28:58 +00:00
John Baldwin
0bfc4d1445 fork1() already sets PS_INMEM, so don't set it again. This lets us push
sched_lock down slightly so that it isn't needed in the RFSTOPPED case.
2003-04-17 22:28:28 +00:00
John Baldwin
69c4ee54ff - The prison mutex cannot possibly protect pointers to the prison it
protects, so don't bother locking it while we assign it to a ucred's
  cr_prison.
- Fully construct the new credential for a process before assigning it to
  p_ucred.
2003-04-17 22:26:53 +00:00
John Baldwin
e674d80790 Add some locking in for a few proc and thread fields. 2003-04-17 22:25:35 +00:00
John Baldwin
bb0e8070fd - Push Giant down into the fork1() function a small bit.
- Set p_acflag earlier while already hold the proc lock in fork1().
- Mark the realitexpire() callout MPSAFE for new processes.  It was already
  marked safe for proc0 a long while ago.
2003-04-17 22:24:59 +00:00