8004 Commits

Author SHA1 Message Date
Robert Watson
6ce8940626 Attempt to slightly refine the print out from "show alllocks" -- list
the process and thread numbers/names on the same line rather than on
separate lines, and print the thread pointer not just the tid.
2004-12-27 10:47:08 +00:00
Alexander Kabaev
aa6f98d12f Do not vput(9) unlocked vnode and do not VREF it with the sole purpose
of vputting it back immediately.

Complained by:	DEBUG_VFS_LOCKS
2004-12-27 05:17:11 +00:00
Jeff Roberson
2ebf8eb132 - Unintentionally checked in a debugging panic. Remove that. 2004-12-26 23:21:48 +00:00
Jeff Roberson
36996b3b7c - Remove a 4BSD specific hack since this will work on ULE too. 2004-12-26 22:56:51 +00:00
Jeff Roberson
598b368d6c - Fix a long standing problem where an ithread would not honor sched_pin().
- Remove the sched_add wrapper that used sched_add_internal() as a backend.
   Its only purpose was to interpret one flag and turn it into an int.  Do
   the right thing and interpret the flag in sched_add() instead.
 - Pass the flag argument to sched_add() to kseq_runq_add() so that we can
   get the SRQ_PREEMPT optimization too.
 - Add a KEF_INTERNAL flag.  If KEF_INTERNAL is set we don't adjust the SLOT
   counts, otherwise the slot counts are adjusted as soon as we enter
   sched_add() or sched_rem() rather than when the thread is actually placed
   on the run queue.  This greatly simplifies the handling of slots.
 - Remove the explicit prevention of migration for ithreads on non-x86
   platforms.  This was never shown to have any real benefit.
 - Remove the unused class argument to KSE_CAN_MIGRATE().
 - Add ktr points for thread migration events.
 - Fix a long standing bug on platforms which don't initialize the cpu
   topology.  The ksg_maxid variable was never correctly set on these
   platforms which caused the long term load balancer to never inspect
   more than the first group or processor.
 - Fix another bug which prevented the long term load balancer from working
   properly.  If stathz != hz we can't expect sched_clock() to be called
   on the exact tick count that we're anticipating.
 - Rearrange sched_switch() a bit to reduce indentation levels.
2004-12-26 22:56:08 +00:00
Robert Watson
b6dd9ef2fe Add "show alllocks" command to DDB, which dumps a list of processes
and threads currently holding sleep mutexes (and spin mutexes for
curthread).  This can be quite useful in looking for a lock condition
summary for a system, as it avoids manually iterating through threads
and processes to find all the interesting locks.

NB: "alllocks" is up there with "lockedvnods" for a bad argument for
show.

MFC after:	2 weeks
2004-12-26 22:52:24 +00:00
Jeff Roberson
6a98702001 - Run sched_userret() after thread_userret(). Before, sched_userret() would
lower the priority of the returning thread to a user priority before
   calling into thread_userret() which would call wakeup() which in turn would
   cause the returning thread to eventually context switch rather than
   completing its slice.  Allowing this thread to complete its slice first
   yields a 15% performance improvement in super-smack on my dual opteron with
   4BSD.
2004-12-26 07:30:35 +00:00
Jeff Roberson
907bdbc288 - Wrap the thread count adjustment in sched_load_add() and sched_load_rem()
so that we may place some ktr entries nearby.
 - Define other KTR_SCHED tracepoints so that we may graph the operation
   of the scheduler.
2004-12-26 00:16:24 +00:00
Jeff Roberson
81d47d3f4b - Remove earlier KTR_ULE tracepoints.
- Define new KTR_SCHED points so that we can graph the operation of the
   scheduler.
2004-12-26 00:15:33 +00:00
Jeff Roberson
85da7a569b - Define KTR points for KTR_SCHED. 2004-12-26 00:14:21 +00:00
David Xu
c180db2bce Make _umtx_op() as more general interface, the final parameter needn't be
timespec pointer, every parameter will be interpreted by its opcode.
2004-12-25 13:02:50 +00:00
David Xu
8b37fbabb4 1. introduce umtx_owner to get an owner of a umtx.
2. add const qualifier to umtx_timedlock and umtx_timedwait.
3. add missing blackets in umtx do_unlock_and_wait.
2004-12-25 12:49:35 +00:00
David Xu
3dd213f160 Add umtxq_lock/unlock around umtx_signal, fix debug kernel compiling,
let umtx_lock returns EINTR when it returns ERESTART, this lets
userland have chance to back off mtx lock code when needed.
2004-12-24 11:59:20 +00:00
David Xu
a08c214a72 1. Fix race condition between umtx lock and unlock, heavy testing
on SMP can explore the bug.
2. Let umtx_wake returns number of threads have been woken.
2004-12-24 11:30:55 +00:00
Robert Watson
0fddf92d72 Assert the sem lock in sem_ref() and sem_rel(), as it is required to
safely manipulate the reference count.
2004-12-23 02:22:47 +00:00
Robert Watson
38e6a58c77 Remove temporary debugging printf that was used to detect the presence
of a race that had previously caused a panic in order to determine if
the fix was for the right problem.  It was.

MFC after:	2 weeks
2004-12-23 01:19:27 +00:00
Robert Watson
1ef121cf6b In sonewconn(), the s/if/while/ change to wait for room at the tail of
the accept queue is a feature, not a bug/issue, so remove the XXXRW
from the comment.
2004-12-23 01:16:21 +00:00
Robert Watson
ba65391172 Remove an XXXRW indicating atomic operations might be used as a
substitute for a global mutex protecting the socket count and
generation number.

The observation that soreceive_rcvoob() can't return an mbuf
chain is a property, not a bug, so remove the XXXRW.

In sorflush, s/existing/previous/ for code when describing prior
behavior.

For SO_LINGER socket option retrieval, remove an XXXRW about why
we hold the mutex: this is correct and not dubious.

MFC after:	2 weeks
2004-12-23 01:07:12 +00:00
Robert Watson
81b5dbecd4 In soalloc(), simplify the mac_init_socket() handling to remove
unnecessary use of a global variable and simplify the return case.
While here, use ()'s around return values.

In sodealloc(), remove a comment about why we bump the gencnt and
decrement the socket count separately.  It doesn't add
substantially to the reading, and clutters the function.

MFC after:	2 weeks
2004-12-23 00:59:43 +00:00
Alan Cox
7abe2ac214 Add send buffer locking to uipc_send(). Without this locking a race can
occur between a reader and a writer that results in a panic upon close,
e.g.,
	"panic: sbflush_locked: cc 4 || mb 0xffffff0052afa400 || mbcnt 0"

Reviewed by: rwatson@
MFC after: 2 weeks
2004-12-22 20:28:46 +00:00
Poul-Henning Kamp
40b5a6f2c6 Include uio.h
Check O_NONBLOCK instead if IO_NDELAY
Don't include vnode.h
2004-12-22 17:37:14 +00:00
Poul-Henning Kamp
72e8dfe5a0 Hide/remove various printfs, now that root mounting doesn't seem to explode
on people.
2004-12-20 21:59:25 +00:00
Poul-Henning Kamp
118253ca24 fix a misleading sleep identifier. 2004-12-20 21:38:13 +00:00
Poul-Henning Kamp
e87047b437 We can only ever get to vgonechrl() from a devfs vnode, so we do not
need to reassign the vp->v_op to devfs_specops, we know that is the
value already.

Make devfs_specops private to devfs.
2004-12-20 21:34:29 +00:00
David Xu
839f811c6a 1. msleep returns EWOULDBLOCK not ETIMEDOUT, use EWOULDBLOCK instead.
2. Eliminate a possible lock leak in timed wait loop.
2004-12-18 13:43:16 +00:00
David Xu
50586e8b6b 1. make umtx sharable between processes, the way is two or more processes
call mmap() to create a shared space, and then initialize umtx on it,
   after that, each thread in different processes can use the umtx same
   as threads in same process.
2. introduce a new syscall _umtx_op to support timed lock and condition
   variable semantics. also, orignal umtx_lock and umtx_unlock inline
   functions now are reimplemented by using _umtx_op, the _umtx_op can
   use arbitrary id not just a thread id.
2004-12-18 12:52:44 +00:00
Sam Leffler
a37c415e66 fix m_append for case where additional mbufs are required 2004-12-15 19:04:07 +00:00
Poul-Henning Kamp
662d80dc23 Fix a deadlock I introduced this morning.
Mostly from:	tegge
2004-12-14 20:48:40 +00:00
Jeff Roberson
7842f65e7f - Garbage collect several unused members of struct kse and struce ksegrp.
As best as I can tell, some of these were never used.
2004-12-14 10:53:55 +00:00
Jeff Roberson
8ffb8f5558 - In kseq_choose(), don't recalculate slice values for processes with a
nice of 0.  Doing so can cause an infinite loop because they should be
   running, but a nice -20 process could prevent them from doing so.
 - Add a new flag KEF_PRIOELEV to flag a thread that has had its priority
   elevated due to priority propagation.  If a thread has had its priority
   elevated, we assume that it must go on the current queue and it must
   get a slice.
 - In sched_userret() if our priority was elevated and we shouldn't have
   a timeslice, yield here until we should.

Found/Tested by:	glebius
2004-12-14 10:34:27 +00:00
Poul-Henning Kamp
d986dbb448 Add a new kind of reference count (fd_holdcnt) to struct filedesc
which holds on to just the data structure and the mutex.  (The
existing refcount (fd_refcnt) holds onto the open files in the
descriptor.)

The fd_holdcnt is protected by fdesc_mtx, fd_refcnt by FILEDESC_LOCK.

Add fdhold(struct proc *) which gets a hold on the filedescriptors of
the specified proc..

Add fddrop(struct filedesc *) which drops the fd_holdcnt and if zero
destroys the mutex and frees the memory.

Initialize the fd_holdcnt to one in fdinit().  Normal operations on
the filedesc structure will not change it.

In fdfree() use fddrop() to dispose of the mutex and structure.  Hold
the FILEDESC_LOCK() until we have cleaned out the contents and carefully
set the fields to null values during cleanup.

Use fdhold()/fddrop() in mountcheckdirs() and sysctl_kern_file().
2004-12-14 09:09:51 +00:00
Poul-Henning Kamp
30abaa53df Make fdesc_mtx private to kern_descrip.c now that the flock has come home. 2004-12-14 08:44:51 +00:00
Poul-Henning Kamp
12b18fdab4 Move the checkdirs() function from vfs_mount.c to kern_descrip.c and
call it mountcheckdirs().
2004-12-14 08:23:18 +00:00
Poul-Henning Kamp
c113083c5a Add new function fdunshare() which encapsulates the necessary light magic
for ensuring that a process' filedesc is not shared with anybody.

Use it in the two places which previously had private implmentations.

This collects all fd_refcnt handling in kern_descrip.c
2004-12-14 07:20:03 +00:00
Jeff Roberson
3ef6ac3361 - If delivering a signal will result in killing a process that has a
nice value above 0, set it to 0 so that it may proceed with haste.
   This is especially important on ULE, where adjusting the priority
   does not guarantee that a thread will be granted a greater time slice.
2004-12-13 16:45:57 +00:00
Jeff Roberson
2d59a44dc0 - Take up a 'slot' while we're on the assigned queue, waiting to be
posted to another processor.  Otherwise, kern_switch() gets confused
   and tries to sched_add(NULL).
2004-12-13 13:09:33 +00:00
Pawel Jakub Dawidek
bf4843166f Add bioq_insert_head() function.
OK'd by:	phk
2004-12-13 12:57:21 +00:00
Alan Cox
db24060c25 Correct the handling of two unusual cases by the zero-copy receive path,
specifically, vm_pgmoveco():
1. If vm_pgmoveco() sleeps on a busy page, it must redo the look up
because the page may have been freed.
2. If the receive buffer is copy-on-write due to, for example, a fork,
then although the first vm object in the shadow chain may not contain
a page there may still be one from a backing object that is mapped.
Thus, a pmap_remove() is required for the new page rather than the
backing object's page to been seen by the application.

Also, add some comments to vm_pgmoveco() and update some assertions.

Tested by: ken@
2004-12-13 06:24:14 +00:00
Poul-Henning Kamp
1ab58cc2df Copy the entire stats structure. Let compiler decide how. 2004-12-11 22:13:02 +00:00
Poul-Henning Kamp
e40da1f149 Fix whitespace.
Spotted by:	njl
2004-12-11 20:41:32 +00:00
Poul-Henning Kamp
494ea31a7d Remove the /dev/dev -> / symlink after we are done with it. 2004-12-11 12:48:37 +00:00
Alan Cox
c73e3e9223 Remove unneeded code from the zero-copy receive path.
Discussed with: gallatin@
Tested by: ken@
2004-12-10 04:49:13 +00:00
Max Laier
f8aabcb680 Start the protocol timeouts only after all domains have been initialized
completely. For some reason (that I am still curious about) we started to no
longer manage to finish the initialization before the timeouts run the first
time leading to panics when using uninitialized mutex etc.

The root of this problem is that we currently first link a domain to the
domains list and only later initialize the domain's protocols. This should
be reworked in the future, but with the current API it is not possible in
all situations. We settle with this lazy fix for now.

Tested by:	gnn, ru, myself
2004-12-09 11:47:30 +00:00
Sam Leffler
4873d1754f add m_append utility function to be used in forthcoming changes 2004-12-08 05:42:02 +00:00
Alan Cox
1c4dbedac4 Tidy up the zero-copy receive path: Remove an unneeded argument to
uiomoveco() and userspaceco().
2004-12-08 05:25:08 +00:00
Nate Lawson
8844d5efa6 Add the devclass_get_count(9) function and man page. It gets a count of
the number of devices in a devclass and is a subset of
devclass_get_devices(9).

Reviewed by:	imp, dfr
2004-12-08 02:39:56 +00:00
Stephan Uphoff
5656474145 Propagate TDF_NEEDRESCHED to replacement thread in sched_switch().
Reviewed by:    julian, jhb (in October)
Approved by:    sam (mentor)
MFC after:      4 weeks
2004-12-07 18:17:24 +00:00
Poul-Henning Kamp
20a92a18f1 The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly
split the conversion of the remaining three filesystems out from the root
mounting changes, so in one go:

cd9660:
	Convert to nmount.
	Add omount compat shims.
	Remove dedicated rootfs mounting code.
	Use vfs_mountedfrom()
	Rely on vfs_mount.c calling VFS_STATFS()

nfs(client):
	Convert to nmount (the simple way, mount_nfs(8) is still necessary).
	Add omount compat shims.
	Drop COMPAT_PRELITE2 mount arg compatibility.

ffs:
	Convert to nmount.
	Add omount compat shims.
	Remove dedicated rootfs mounting code.
	Use vfs_mountedfrom()
	Rely on vfs_mount.c calling VFS_STATFS()

Remove vfs_omount() method, all filesystems are now converted.

Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem
task, and they all do it now.

Change rootmounting to use DEVFS trampoline:

vfs_mount.c:
	Mount devfs on /.  Devfs needs no 'from' so this is clean.
	symlink /dev to /.  This makes it possible to lookup /dev/foo.
	Mount "real" root filesystem on /.
	Surgically move the devfs mountpoint from under the real root
	filesystem onto /dev in the real root filesystem.

Remove now unnecessary getdiskbyname().

kern_init.c:
	Don't do devfs mounting and rootvnode assignment here, it was
	already handled by vfs_mount.c.

Remove now unused bdevvp(), addaliasu() and addalias().  Put the
few necessary lines in devfs where they belong.  This eliminates the
second-last source of bogo vnodes, leaving only the lemming-syncer.

Remove rootdev variable, it doesn't give meaning in a global context and
was not trustworth anyway.  Correct information is provided by
statfs(/).
2004-12-07 08:15:41 +00:00
Poul-Henning Kamp
46d2b4184d Instead of complaining about it, just silently filter out MNT_ROOTFS.
This fixes the "fsck /" problem various people have reported overnight.
2004-12-07 06:58:42 +00:00
Poul-Henning Kamp
8d8883caaf make "ffs" and alias for "ufs" when it comes to filesystem names. 2004-12-06 22:22:57 +00:00