Commit Graph

8014 Commits

Author SHA1 Message Date
Pawel Jakub Dawidek
46003fb337 Be consistent and always use form 'return (value);' instead of 'return value;'.
We had (before this change) 84 lines where it was style(9)-clean and 15 lines
where it was not.
2004-12-31 14:52:53 +00:00
John Baldwin
50aaa791ba Fix a typo and two whitespace nits. 2004-12-30 22:17:00 +00:00
John Baldwin
f5c157d986 Rework the interface between priority propagation (lending) and the
schedulers a bit to ensure more correct handling of priorities and fewer
priority inversions:
- Add two functions to the sched(9) API to handle priority lending:
  sched_lend_prio() and sched_unlend_prio().  The turnstile code uses these
  functions to ask the scheduler to lend a thread a set priority and to
  tell the scheduler when it thinks it is ok for a thread to stop borrowing
  priority.  The unlend case is slightly complex in that the turnstile code
  tells the scheduler what the minimum priority of the thread needs to be
  to satisfy the requirements of any other threads blocked on locks owned
  by the thread in question.  The scheduler then decides where the thread
  can go back to normal mode (if it's normal priority is high enough to
  satisfy the pending lock requests) or it it should continue to use the
  priority specified to the sched_unlend_prio() call.  This involves adding
  a new per-thread flag TDF_BORROWING that replaces the ULE-only kse flag
  for priority elevation.
- Schedulers now refuse to lower the priority of a thread that is currently
  borrowing another therad's priority.
- If a scheduler changes the priority of a thread that is currently sitting
  on a turnstile, it will call a new function turnstile_adjust() to inform
  the turnstile code of the change.  This function resorts the thread on
  the priority list of the turnstile if needed, and if the thread ends up
  at the head of the list (due to having the highest priority) and its
  priority was raised, then it will propagate that new priority to the
  owner of the lock it is blocked on.

Some additional fixes specific to the 4BSD scheduler include:
- Common code for updating the priority of a thread when the user priority
  of its associated kse group has been consolidated in a new static
  function resetpriority_thread().  One change to this function is that
  it will now only adjust the priority of a thread if it already has a
  time sharing priority, thus preserving any boosts from a tsleep() until
  the thread returns to userland.  Also, resetpriority() no longer calls
  maybe_resched() on each thread in the group. Instead, the code calling
  resetpriority() is responsible for calling resetpriority_thread() on
  any threads that need to be updated.
- schedcpu() now uses resetpriority_thread() instead of just calling
  sched_prio() directly after it updates a kse group's user priority.
- sched_clock() now uses resetpriority_thread() rather than writing
  directly to td_priority.
- sched_nice() now updates all the priorities of the threads after the
  group priority has been adjusted.

Discussed with:	bde
Reviewed by:	ups, jeffr
Tested on:	4bsd, ule
Tested on:	i386, alpha, sparc64
2004-12-30 20:52:44 +00:00
John Baldwin
99b808f461 Whitespace fix. 2004-12-30 20:30:58 +00:00
John Baldwin
63710c4d35 Stop explicitly touching td_base_pri outside of the scheduler and simply
set a thread's priority via sched_prio() when that is the desired action.
The schedulers will start managing td_base_pri internally shortly.
2004-12-30 20:29:58 +00:00
John Baldwin
9e6c867ccc Call tty_close() at the very end of ttyclose() since otherwise NULL
deferences can occur since tty_close() may end up freeing the tty structure
if it drops the last reference to it.

Glanced at by:	phk
2004-12-30 19:24:49 +00:00
Robert Watson
b36aab857b Make the sysctls kern.ipc.msgmnb and kern.ipc.msgtql into tunables as
is the case for most other sysctls in the System V IPC message queue
implementation.

PR:		75541
Submitted by:	Sergiy Vyshnevetskiy <serg at vostok dot net>
MFC after:	2 weeks
2004-12-30 13:56:34 +00:00
David Xu
cc1000ac5b Make umtx_wait and umtx_wake more like linux futex does, it is
more general than previous. It also lets me implement cancelable point
in thread library. Also in theory, umtx_lock and umtx_unlock can
be implemented by using umtx_wait and umtx_wake, all atomic operations
can be done in userland without kernel's casuptr() function.
2004-12-30 02:56:17 +00:00
Alan Cox
956d03da83 Eliminate (now) unnecessary acquisition and release of the global page
queues lock.
2004-12-29 04:49:10 +00:00
John Baldwin
83ae089aab - Up the WITNESS_COUNT macro from 200 to 1024 to support the growing number
of lock types in the kernel.  This results in an increase of witness
  data usage from ~145k to ~280k on i386 for kernels with
  'options WITNESS'.
- Remove the unused witness malloc bucket.

Submitted by:	Michal Mertl mime at traveller dot cz (1)
2004-12-28 21:21:27 +00:00
Robert Watson
6ce8940626 Attempt to slightly refine the print out from "show alllocks" -- list
the process and thread numbers/names on the same line rather than on
separate lines, and print the thread pointer not just the tid.
2004-12-27 10:47:08 +00:00
Alexander Kabaev
aa6f98d12f Do not vput(9) unlocked vnode and do not VREF it with the sole purpose
of vputting it back immediately.

Complained by:	DEBUG_VFS_LOCKS
2004-12-27 05:17:11 +00:00
Jeff Roberson
2ebf8eb132 - Unintentionally checked in a debugging panic. Remove that. 2004-12-26 23:21:48 +00:00
Jeff Roberson
36996b3b7c - Remove a 4BSD specific hack since this will work on ULE too. 2004-12-26 22:56:51 +00:00
Jeff Roberson
598b368d6c - Fix a long standing problem where an ithread would not honor sched_pin().
- Remove the sched_add wrapper that used sched_add_internal() as a backend.
   Its only purpose was to interpret one flag and turn it into an int.  Do
   the right thing and interpret the flag in sched_add() instead.
 - Pass the flag argument to sched_add() to kseq_runq_add() so that we can
   get the SRQ_PREEMPT optimization too.
 - Add a KEF_INTERNAL flag.  If KEF_INTERNAL is set we don't adjust the SLOT
   counts, otherwise the slot counts are adjusted as soon as we enter
   sched_add() or sched_rem() rather than when the thread is actually placed
   on the run queue.  This greatly simplifies the handling of slots.
 - Remove the explicit prevention of migration for ithreads on non-x86
   platforms.  This was never shown to have any real benefit.
 - Remove the unused class argument to KSE_CAN_MIGRATE().
 - Add ktr points for thread migration events.
 - Fix a long standing bug on platforms which don't initialize the cpu
   topology.  The ksg_maxid variable was never correctly set on these
   platforms which caused the long term load balancer to never inspect
   more than the first group or processor.
 - Fix another bug which prevented the long term load balancer from working
   properly.  If stathz != hz we can't expect sched_clock() to be called
   on the exact tick count that we're anticipating.
 - Rearrange sched_switch() a bit to reduce indentation levels.
2004-12-26 22:56:08 +00:00
Robert Watson
b6dd9ef2fe Add "show alllocks" command to DDB, which dumps a list of processes
and threads currently holding sleep mutexes (and spin mutexes for
curthread).  This can be quite useful in looking for a lock condition
summary for a system, as it avoids manually iterating through threads
and processes to find all the interesting locks.

NB: "alllocks" is up there with "lockedvnods" for a bad argument for
show.

MFC after:	2 weeks
2004-12-26 22:52:24 +00:00
Jeff Roberson
6a98702001 - Run sched_userret() after thread_userret(). Before, sched_userret() would
lower the priority of the returning thread to a user priority before
   calling into thread_userret() which would call wakeup() which in turn would
   cause the returning thread to eventually context switch rather than
   completing its slice.  Allowing this thread to complete its slice first
   yields a 15% performance improvement in super-smack on my dual opteron with
   4BSD.
2004-12-26 07:30:35 +00:00
Jeff Roberson
907bdbc288 - Wrap the thread count adjustment in sched_load_add() and sched_load_rem()
so that we may place some ktr entries nearby.
 - Define other KTR_SCHED tracepoints so that we may graph the operation
   of the scheduler.
2004-12-26 00:16:24 +00:00
Jeff Roberson
81d47d3f4b - Remove earlier KTR_ULE tracepoints.
- Define new KTR_SCHED points so that we can graph the operation of the
   scheduler.
2004-12-26 00:15:33 +00:00
Jeff Roberson
85da7a569b - Define KTR points for KTR_SCHED. 2004-12-26 00:14:21 +00:00
David Xu
c180db2bce Make _umtx_op() as more general interface, the final parameter needn't be
timespec pointer, every parameter will be interpreted by its opcode.
2004-12-25 13:02:50 +00:00
David Xu
8b37fbabb4 1. introduce umtx_owner to get an owner of a umtx.
2. add const qualifier to umtx_timedlock and umtx_timedwait.
3. add missing blackets in umtx do_unlock_and_wait.
2004-12-25 12:49:35 +00:00
David Xu
3dd213f160 Add umtxq_lock/unlock around umtx_signal, fix debug kernel compiling,
let umtx_lock returns EINTR when it returns ERESTART, this lets
userland have chance to back off mtx lock code when needed.
2004-12-24 11:59:20 +00:00
David Xu
a08c214a72 1. Fix race condition between umtx lock and unlock, heavy testing
on SMP can explore the bug.
2. Let umtx_wake returns number of threads have been woken.
2004-12-24 11:30:55 +00:00
Robert Watson
0fddf92d72 Assert the sem lock in sem_ref() and sem_rel(), as it is required to
safely manipulate the reference count.
2004-12-23 02:22:47 +00:00
Robert Watson
38e6a58c77 Remove temporary debugging printf that was used to detect the presence
of a race that had previously caused a panic in order to determine if
the fix was for the right problem.  It was.

MFC after:	2 weeks
2004-12-23 01:19:27 +00:00
Robert Watson
1ef121cf6b In sonewconn(), the s/if/while/ change to wait for room at the tail of
the accept queue is a feature, not a bug/issue, so remove the XXXRW
from the comment.
2004-12-23 01:16:21 +00:00
Robert Watson
ba65391172 Remove an XXXRW indicating atomic operations might be used as a
substitute for a global mutex protecting the socket count and
generation number.

The observation that soreceive_rcvoob() can't return an mbuf
chain is a property, not a bug, so remove the XXXRW.

In sorflush, s/existing/previous/ for code when describing prior
behavior.

For SO_LINGER socket option retrieval, remove an XXXRW about why
we hold the mutex: this is correct and not dubious.

MFC after:	2 weeks
2004-12-23 01:07:12 +00:00
Robert Watson
81b5dbecd4 In soalloc(), simplify the mac_init_socket() handling to remove
unnecessary use of a global variable and simplify the return case.
While here, use ()'s around return values.

In sodealloc(), remove a comment about why we bump the gencnt and
decrement the socket count separately.  It doesn't add
substantially to the reading, and clutters the function.

MFC after:	2 weeks
2004-12-23 00:59:43 +00:00
Alan Cox
7abe2ac214 Add send buffer locking to uipc_send(). Without this locking a race can
occur between a reader and a writer that results in a panic upon close,
e.g.,
	"panic: sbflush_locked: cc 4 || mb 0xffffff0052afa400 || mbcnt 0"

Reviewed by: rwatson@
MFC after: 2 weeks
2004-12-22 20:28:46 +00:00
Poul-Henning Kamp
40b5a6f2c6 Include uio.h
Check O_NONBLOCK instead if IO_NDELAY
Don't include vnode.h
2004-12-22 17:37:14 +00:00
Poul-Henning Kamp
72e8dfe5a0 Hide/remove various printfs, now that root mounting doesn't seem to explode
on people.
2004-12-20 21:59:25 +00:00
Poul-Henning Kamp
118253ca24 fix a misleading sleep identifier. 2004-12-20 21:38:13 +00:00
Poul-Henning Kamp
e87047b437 We can only ever get to vgonechrl() from a devfs vnode, so we do not
need to reassign the vp->v_op to devfs_specops, we know that is the
value already.

Make devfs_specops private to devfs.
2004-12-20 21:34:29 +00:00
David Xu
839f811c6a 1. msleep returns EWOULDBLOCK not ETIMEDOUT, use EWOULDBLOCK instead.
2. Eliminate a possible lock leak in timed wait loop.
2004-12-18 13:43:16 +00:00
David Xu
50586e8b6b 1. make umtx sharable between processes, the way is two or more processes
call mmap() to create a shared space, and then initialize umtx on it,
   after that, each thread in different processes can use the umtx same
   as threads in same process.
2. introduce a new syscall _umtx_op to support timed lock and condition
   variable semantics. also, orignal umtx_lock and umtx_unlock inline
   functions now are reimplemented by using _umtx_op, the _umtx_op can
   use arbitrary id not just a thread id.
2004-12-18 12:52:44 +00:00
Sam Leffler
a37c415e66 fix m_append for case where additional mbufs are required 2004-12-15 19:04:07 +00:00
Poul-Henning Kamp
662d80dc23 Fix a deadlock I introduced this morning.
Mostly from:	tegge
2004-12-14 20:48:40 +00:00
Jeff Roberson
7842f65e7f - Garbage collect several unused members of struct kse and struce ksegrp.
As best as I can tell, some of these were never used.
2004-12-14 10:53:55 +00:00
Jeff Roberson
8ffb8f5558 - In kseq_choose(), don't recalculate slice values for processes with a
nice of 0.  Doing so can cause an infinite loop because they should be
   running, but a nice -20 process could prevent them from doing so.
 - Add a new flag KEF_PRIOELEV to flag a thread that has had its priority
   elevated due to priority propagation.  If a thread has had its priority
   elevated, we assume that it must go on the current queue and it must
   get a slice.
 - In sched_userret() if our priority was elevated and we shouldn't have
   a timeslice, yield here until we should.

Found/Tested by:	glebius
2004-12-14 10:34:27 +00:00
Poul-Henning Kamp
d986dbb448 Add a new kind of reference count (fd_holdcnt) to struct filedesc
which holds on to just the data structure and the mutex.  (The
existing refcount (fd_refcnt) holds onto the open files in the
descriptor.)

The fd_holdcnt is protected by fdesc_mtx, fd_refcnt by FILEDESC_LOCK.

Add fdhold(struct proc *) which gets a hold on the filedescriptors of
the specified proc..

Add fddrop(struct filedesc *) which drops the fd_holdcnt and if zero
destroys the mutex and frees the memory.

Initialize the fd_holdcnt to one in fdinit().  Normal operations on
the filedesc structure will not change it.

In fdfree() use fddrop() to dispose of the mutex and structure.  Hold
the FILEDESC_LOCK() until we have cleaned out the contents and carefully
set the fields to null values during cleanup.

Use fdhold()/fddrop() in mountcheckdirs() and sysctl_kern_file().
2004-12-14 09:09:51 +00:00
Poul-Henning Kamp
30abaa53df Make fdesc_mtx private to kern_descrip.c now that the flock has come home. 2004-12-14 08:44:51 +00:00
Poul-Henning Kamp
12b18fdab4 Move the checkdirs() function from vfs_mount.c to kern_descrip.c and
call it mountcheckdirs().
2004-12-14 08:23:18 +00:00
Poul-Henning Kamp
c113083c5a Add new function fdunshare() which encapsulates the necessary light magic
for ensuring that a process' filedesc is not shared with anybody.

Use it in the two places which previously had private implmentations.

This collects all fd_refcnt handling in kern_descrip.c
2004-12-14 07:20:03 +00:00
Jeff Roberson
3ef6ac3361 - If delivering a signal will result in killing a process that has a
nice value above 0, set it to 0 so that it may proceed with haste.
   This is especially important on ULE, where adjusting the priority
   does not guarantee that a thread will be granted a greater time slice.
2004-12-13 16:45:57 +00:00
Jeff Roberson
2d59a44dc0 - Take up a 'slot' while we're on the assigned queue, waiting to be
posted to another processor.  Otherwise, kern_switch() gets confused
   and tries to sched_add(NULL).
2004-12-13 13:09:33 +00:00
Pawel Jakub Dawidek
bf4843166f Add bioq_insert_head() function.
OK'd by:	phk
2004-12-13 12:57:21 +00:00
Alan Cox
db24060c25 Correct the handling of two unusual cases by the zero-copy receive path,
specifically, vm_pgmoveco():
1. If vm_pgmoveco() sleeps on a busy page, it must redo the look up
because the page may have been freed.
2. If the receive buffer is copy-on-write due to, for example, a fork,
then although the first vm object in the shadow chain may not contain
a page there may still be one from a backing object that is mapped.
Thus, a pmap_remove() is required for the new page rather than the
backing object's page to been seen by the application.

Also, add some comments to vm_pgmoveco() and update some assertions.

Tested by: ken@
2004-12-13 06:24:14 +00:00
Poul-Henning Kamp
1ab58cc2df Copy the entire stats structure. Let compiler decide how. 2004-12-11 22:13:02 +00:00
Poul-Henning Kamp
e40da1f149 Fix whitespace.
Spotted by:	njl
2004-12-11 20:41:32 +00:00