Commit Graph

308 Commits

Author SHA1 Message Date
John Baldwin
8c4b6380c7 Move the initialization of the devmtx into the mutex_init() function
called during early init before cninit().

Tested on:	i386, alpha, sparc64
Reviewed by:	phk, imp
Reported by:	Divacky Roman xdivac02 at stud dot fit dot vutbr dot cz
MFC after:	1 week
2005-10-18 18:27:44 +00:00
John Baldwin
83cece6fa1 - Add an assertion to panic if one tries to call mtx_trylock() on a spin
mutex.
- Don't panic if a spin lock is held too long inside _mtx_lock_spin() if
  panicstr is set (meaning that we are already in a panic).  Just keep
  spinning forever instead.
2005-09-02 20:21:49 +00:00
Paul Saab
1126349ae7 Ignore mutex asserts when we're dumping as well. This allows me
to panic a system from DDB when INVARIANTS is compiled into the
kernel on a scsi system.
2005-07-30 05:54:30 +00:00
John Baldwin
122eceef61 Convert the atomic_ptr() operations over to operating on uintptr_t
variables rather than void * variables.  This makes it easier and simpler
to get asm constraints and volatile keywords correct.

MFC after:	3 days
Tested on:	i386, alpha, sparc64
Compiled on:	ia64, powerpc, amd64
Kernel toolchain busted on:	arm
2005-07-15 18:17:59 +00:00
Gleb Smirnoff
4f20185860 Add additional newline to debug.mutex.prof.stats header, so that
column names are printed exactly above the columns.
2005-04-08 14:14:09 +00:00
John Baldwin
c6a37e8413 Divorce critical sections from spinlocks. Critical sections as denoted by
critical_enter() and critical_exit() are now solely a mechanism for
deferring kernel preemptions.  They no longer have any affect on
interrupts.  This means that standalone critical sections are now very
cheap as they are simply unlocked integer increments and decrements for the
common case.

Spin mutexes now use a separate KPI implemented in MD code: spinlock_enter()
and spinlock_exit().  This KPI is responsible for providing whatever MD
guarantees are needed to ensure that a thread holding a spin lock won't
be preempted by any other code that will try to lock the same lock.  For
now all archs continue to block interrupts in a "spinlock section" as they
did formerly in all critical sections.  Note that I've also taken this
opportunity to push a few things into MD code rather than MI.  For example,
critical_fork_exit() no longer exists.  Instead, MD code ensures that new
threads have the correct state when they are created.  Also, we no longer
try to fixup the idlethreads for APs in MI code.  Instead, each arch sets
the initial curthread and adjusts the state of the idle thread it borrows
in order to perform the initial context switch.

This change is largely a big NOP, but the cleaner separation it provides
will allow for more efficient alternative locking schemes in other parts
of the kernel (bare critical sections rather than per-CPU spin mutexes
for per-CPU data for example).

Reviewed by:	grehan, cognet, arch@, others
Tested on:	i386, alpha, sparc64, powerpc, arm, possibly more
2005-04-04 21:53:56 +00:00
John Baldwin
33fb8a386e Rework the optimization for spinlocks on UP to be slightly less drastic and
turn it back on.  Specifically, the actual changes are now less intrusive
in that the _get_spin_lock() and _rel_spin_lock() macros now have their
contents changed for UP vs SMP kernels which centralizes the changes.
Also, UP kernels do not use _mtx_lock_spin() and no longer include it.  The
UP versions of the spin lock functions do not use any atomic operations,
but simple compares and stores which allow mtx_owned() to still work for
spin locks while removing the overhead of atomic operations.

Tested on:	i386, alpha
2005-01-05 21:13:27 +00:00
John Baldwin
2ff0e645d1 Refine the turnstile and sleep queue interfaces just a bit:
- Add a new _lock() call to each API that locks the associated chain lock
  for a lock_object pointer or wait channel.  The _lookup() functions now
  require that the chain lock be locked via _lock() when they are called.
- Change sleepq_add(), turnstile_wait() and turnstile_claim() to lookup
  the associated queue structure internally via _lookup() rather than
  accepting a pointer from the caller.  For turnstiles, this means that
  the actual lookup of the turnstile in the hash table is only done when
  the thread actually blocks rather than being done on each loop iteration
  in _mtx_lock_sleep().  For sleep queues, this means that sleepq_lookup()
  is no longer used outside of the sleep queue code except to implement an
  assertion in cv_destroy().
- Change sleepq_broadcast() and sleepq_signal() to require that the chain
  lock is already required.  For condition variables, this lets the
  cv_broadcast() and cv_signal() functions lock the sleep queue chain lock
  while testing the waiters count.  This means that the waiters count
  internal to condition variables is no longer protected by the interlock
  mutex and cv_broadcast() and cv_signal() now no longer require that the
  interlock be held when they are called.  This lets consumers of condition
  variables drop the lock before waking other threads which can result in
  fewer context switches.

MFC after:	1 month
2004-10-12 18:36:20 +00:00
Stephan Uphoff
b9a80acadb Force MUTEX_WAKE_ALL.
A race condition in single thread wakeup may break priority inheritance.

Tested   by: pho
Reviewed by: jhb,julian
Approved by: sam (mentor)
MFC: ASAP
2004-10-12 16:28:18 +00:00
Scott Long
9923b511ed Turn PREEMPTION into a kernel option. Make sure that it's defined if
FULL_PREEMPTION is defined.  Add a runtime warning to ULE if PREEMPTION is
enabled (code inspired by the PREEMPTION warning in kern_switch.c).  This
is a possible MT5 candidate.
2004-09-02 18:59:15 +00:00
John-Mark Gurney
000968010a add options MPROF_BUFFERS and MPROF_HASH_SIZE that adjust the sizes of
the mutex profiling buffers.  Document them in the man page and in NOTES.
Ensure _HASH_SIZE is larger than _BUFFERS with a cpp error.
2004-08-19 06:38:26 +00:00
John Baldwin
bdcfcf5bc4 Cache the value of curthread in the _get_sleep_lock() and _get_spin_lock()
macros and pass the value to the associated _mtx_*() functions to avoid
more curthread dereferences in the function implementations.  This provided
a very modest perf improvement in some benchmarks.

Suggested by:	rwatson
Tested by:	scottl
2004-08-04 20:18:45 +00:00
Maxime Henrion
9f1b87f106 Instead of calling ia32_pause() conditionally on __i386__ or __amd64__
being defined, define and use a new MD macro, cpu_spinwait().  It only
expands to something on i386 and amd64, so the compiled code should be
identical.

Name of the macro found by:	jhb
Reviewed by:	jhb
2004-08-03 18:44:27 +00:00
Robert Watson
a9abdce44a Add "options ADAPTIVE_GIANT" which causes Giant to also be treated in
an adaptive fashion when adaptive mutexes are enabled.  The theory
behind non-adaptive Giant is that Giant will be held for long periods
of time, and therefore spinning waiting on it is wasteful.  However,
in MySQL benchmarks which are relatively Giant-free, running Giant
adaptive makes an observable difference on SMP (5% transaction rate
improvement).  As such, make adaptive behavior on Giant an option so
it can be more widely benchmarked.
2004-07-27 16:34:48 +00:00
Peter Wemm
b09cb1027b #ifdef __i386__ -> __i386__ || __amd64__ 2004-07-20 02:15:10 +00:00
Pawel Jakub Dawidek
ece2d9891e Now we have NO_ADAPTIVE_MUTEXES option, so use it here too.
Missed by:	scottl
2004-07-18 23:27:14 +00:00
Scott Long
701f140800 Enable ADAPTIVE_MUTEXES by default by changing the sense of the option to
NO_ADAPTIVE_MUTEXES.  This option has been enabled by default on amd64 for
quite some time, and has been extensively tested on i386 and sparc64.  It
shows measurable performance gains in many circumstances, and few negative
effects.  It would be nice in t he future if adaptive mutexes actually went
to sleep after a certain amount of spinning, but that will require quite a
bit more testing.
2004-07-18 15:59:03 +00:00
Marcel Moolenaar
2d50560abc Update for the KDB framework:
o  Make debugging code conditional upon KDB instead of DDB.
o  Call kdb_enter() instead of Debugger().
o  Call kdb_backtrace() instead of db_print_backtrace() or backtrace().

kern_mutex.c:
o  Replace checks for db_active with checks for kdb_active and make
   them unconditional.

kern_shutdown.c:
o  s/DDB_UNATTENDED/KDB_UNATTENDED/g
o  s/DDB_TRACE/KDB_TRACE/g
o  Save the TID of the thread doing the kernel dump so the debugger
   knows which thread to select as the current when debugging the
   kernel core file.
o  Clear kdb_active instead of db_active and do so unconditionally.
o  Remove backtrace() implementation.

kern_synch.c:
o  Call kdb_reenter() instead of db_error().
2004-07-10 21:36:01 +00:00
John Baldwin
0c0b25ae91 Implement preemption of kernel threads natively in the scheduler rather
than as one-off hacks in various other parts of the kernel:
- Add a function maybe_preempt() that is called from sched_add() to
  determine if a thread about to be added to a run queue should be
  preempted to directly.  If it is not safe to preempt or if the new
  thread does not have a high enough priority, then the function returns
  false and sched_add() adds the thread to the run queue.  If the thread
  should be preempted to but the current thread is in a nested critical
  section, then the flag TDF_OWEPREEMPT is set and the thread is added
  to the run queue.  Otherwise, mi_switch() is called immediately and the
  thread is never added to the run queue since it is switch to directly.
  When exiting an outermost critical section, if TDF_OWEPREEMPT is set,
  then clear it and call mi_switch() to perform the deferred preemption.
- Remove explicit preemption from ithread_schedule() as calling
  setrunqueue() now does all the correct work.  This also removes the
  do_switch argument from ithread_schedule().
- Do not use the manual preemption code in mtx_unlock if the architecture
  supports native preemption.
- Don't call mi_switch() in a loop during shutdown to give ithreads a
  chance to run if the architecture supports native preemption since
  the ithreads will just preempt DELAY().
- Don't call mi_switch() from the page zeroing idle thread for
  architectures that support native preemption as it is unnecessary.
- Native preemption is enabled on the same archs that supported ithread
  preemption, namely alpha, i386, and amd64.

This change should largely be a NOP for the default case as committed
except that we will do fewer context switches in a few cases and will
avoid the run queues completely when preempting.

Approved by:	scottl (with his re@ hat)
2004-07-02 20:21:44 +00:00
John Baldwin
bf0acc273a - Change mi_switch() and sched_switch() to accept an optional thread to
switch to.  If a non-NULL thread pointer is passed in, then the CPU will
  switch to that thread directly rather than calling choosethread() to pick
  a thread to choose to.
- Make sched_switch() aware of idle threads and know to do
  TD_SET_CAN_RUN() instead of sticking them on the run queue rather than
  requiring all callers of mi_switch() to know to do this if they can be
  called from an idlethread.
- Move constants for arguments to mi_switch() and thread_single() out of
  the middle of the function prototypes and up above into their own
  section.
2004-07-02 19:09:50 +00:00
John Baldwin
535eb30962 Add a new kernel option MUTEX_WAKE_ALL that changes the mutex unlock code
to awaken all waiters when a contested mutex is released instead of just
the highest priority waiter.  If the various threads are awakened in
sequence then each thread may acquire and release the lock in question
without contention resulting in fewer expensive unlock and lock
operations.  This old behavior of waking just the highest priority is
still used if this option is specified.  Making the algorithm conditional
on a kernel option will allows us to benchmark both cases later and
determine which one should be used by default.

Requested by:	tanimura-san
2004-04-06 19:12:24 +00:00
Robert Watson
94ffb20d72 Add a reset sysctl for mutex profiling: zeros all of the mutex
profiling buffers and hash table.  This makes it a lot easier to
do multiple profiling runs without rebooting or performing
gratuitous arithmetic.  Sysctl is named debug.mutex.prof.reset.

Reviewed by:	jake
2004-01-28 22:11:53 +00:00
John Baldwin
8d768e7676 Rework witness_lock() to make it slightly more useful and flexible.
- witness_lock() is split into two pieces: witness_checkorder() and
  witness_lock().  Witness_checkorder() determines if acquiring a specified
  lock at the time it is called would result in a lock order.  It
  optionally adds a new lock order relationship as well.  witness_lock()
  updates witness's data structures to assume that a lock has been acquired
  by stick a new lock instance in the appropriate lock instance list.
- The mutex and sx lock functions now call checkorder() prior to trying to
  acquire a lock and continue to call witness_lock() after the acquire is
  completed.  This will let witness catch a deadlock before it happens
  rather than trying to do so after the threads have deadlocked (i.e. never
  actually report it).
- A new function witness_defineorder() has been added that adds a lock
  order between two locks at runtime without having to acquire the locks.
  If the lock order cannot be added it will return an error.  This function
  is available to programmers via the WITNESS_DEFINEORDER() macro which
  accepts either two mutexes or two sx locks as its arguments.
- A few simple wrapper macros were added to allow developers to call
  witness_checkorder() anywhere as a way of enforcing locking assertions
  in code that might acquire a certain lock in some situations.  The
  macros are: witness_check_{mutex,shared_sx,exclusive_sx} and take an
  appropriate lock as the sole argument.
- The code to remove a lock instance from a lock list in witness_unlock()
  was unnested by using a goto to vastly improve the readability of this
  function.
2004-01-28 20:39:57 +00:00
Jeff Roberson
29bcc4514f - Add a flags parameter to mi_switch. The value of flags may be SW_VOL or
SW_INVOL.  Assert that one of these is set in mi_switch() and propery
   adjust the rusage statistics.  This is to simplify the large number of
   users of this interface which were previously all required to adjust the
   proper counter prior to calling mi_switch().  This also facilitates more
   switch and locking optimizations.
 - Change all callers of mi_switch() to pass the appropriate paramter and
   remove direct references to the process statistics.
2004-01-25 03:54:52 +00:00
Robert Watson
8dc10be885 Add some basic support for measuring sleep mutex contention to the
mutex profiling code.  As with existing mutex profiling, measurement
is done with respect to mtx_lock() instances in the code, as opposed
to specific mutexes.  In particular, measure two things:

(1) Lock contention.  How often did this mtx_lock() call get made and
    have to sleep (or almost sleep) waiting for the lock.  This helps
    identify the "victims" of contention.

(2) Hold contention.  How often, while the lock was held by a thread
    as a result of this mtx_lock(), did another thread try to acquire
    the same mutex.  This helps identify the causes of contention.

I'm currently exploring adding measurement of "time waited for the
lock", but the current implementation has proven useful to me so far
so I figured I'd commit it so others could try it out.  Note that this
increases the size of mutexes when MUTEX_PROFILING is enabled, so you
might find you need to further bump UMA_BOOT_PAGES.  Fixes welcome.

The once over:	des, others
2004-01-25 01:59:27 +00:00
John Baldwin
eac097962f - Allow mtx_trylock() to recurse on a recursive mutex. Attempts to recurse
on a non-recursive mutex will fail but will not trigger any assertions.
- Add an assertion to mtx_lock() that one never recurses on a non-recursive
  mutex.  This is mostly useful for the non-WITNESS case.

Requested by:	deischen, julian, others (1)
2004-01-05 23:09:51 +00:00
John Baldwin
961a7b244d Add an implementation of turnstiles and change the sleep mutex code to use
turnstiles to implement blocking isntead of implementing a thread queue
directly.  These turnstiles are somewhat similar to those used in Solaris 7
as described in Solaris Internals but are also different.

Turnstiles do not come out of a fixed-sized pool.  Rather, each thread is
assigned a turnstile when it is created that it frees when it is destroyed.
When a thread blocks on a lock, it donates its turnstile to that lock to
serve as queue of blocked threads.  The queue associated with a given lock
is found by a lookup in a simple hash table.  The turnstile itself is
protected by a lock associated with its entry in the hash table.  This
means that sched_lock is no longer needed to contest on a mutex.  Instead,
sched_lock is only used when manipulating run queues or thread priorities.
Turnstiles also implement priority propagation inherently.

Currently turnstiles only support mutexes.  Eventually, however, turnstiles
may grow two queue's to support a non-sleepable reader/writer lock
implementation.  For more details, see the comments in sys/turnstile.h and
kern/subr_turnstile.c.

The two primary advantages from the turnstile code include: 1) the size
of struct mutex shrinks by four pointers as it no longer stores the
thread queue linkages directly, and 2) less contention on sched_lock in
SMP systems including the ability for multiple CPUs to contend on different
locks simultaneously (not that this last detail is necessarily that much of
a big win).  Note that 1) means that this commit is a kernel ABI breaker,
so don't mix old modules with a new kernel and vice versa.

Tested on:	i386 SMP, sparc64 SMP, alpha SMP
2003-11-11 22:07:29 +00:00
John Baldwin
4110951861 If a spin lock is held for too long and WITNESS is enabled, then call
witness_display_spinlock() to see if we can find out where the current
owner of the spin lock last acquired the lock.
2003-07-31 18:52:18 +00:00
John Baldwin
47b722c1af When complaining about a sleeping thread owning a mutex, display the
thread's pid to make debugging easier for people who don't want to have to
use the intended tool for these panics (witness).

Indirectly prodded by:	kris
2003-07-30 20:42:15 +00:00
John Baldwin
f7ee15901a - Add comments about the maintenance of the per-thread list of contested
locks held by each thread.
- Fix a bug in the original BSD/OS code where a contested lock was not
  properly handed off from the old thread to the new thread when a
  contested lock with more than one blocked thread was transferred from
  one thread to another.
- Don't use an atomic operation to write the MTX_CONTESTED value to
  mtx_lock in the aforementioned special case.  The memory barriers and
  exclusion provided by sched_lock are sufficient.

Spotted by:	alc (2)
2003-07-02 16:14:09 +00:00
David E. O'Brien
677b542ea2 Use __FBSDID(). 2003-06-11 00:56:59 +00:00
Poul-Henning Kamp
b82af320cf Add "" around mutex name to make message less confusing. 2003-05-31 21:11:01 +00:00
John Baldwin
27dad03c97 Use TD_IS_RUNNING() instead of thread_running() in the adaptive mutex
code.
2003-04-17 22:28:58 +00:00
Julian Elischer
060563ec50 Move the _oncpu entry from the KSE to the thread.
The entry in the KSE still exists but it's purpose will change a bit
when we add the ability to lock a KSE to a cpu.
2003-04-10 17:35:44 +00:00
Tim J. Robbins
f949f795aa Remove unused mtx_lock_giant(), mtx_unlock_giant(), related globals
and sysctls.
2003-03-23 11:26:11 +00:00
Poul-Henning Kamp
b4b138c27f Including <sys/stdint.h> is (almost?) universally only to be able to use
%j in printfs, so put a newsted include in <sys/systm.h> where the printf
prototype lives and save everybody else the trouble.
2003-03-18 08:45:25 +00:00
John Baldwin
75d468ee12 Axe the useless MTX_SLEEPABLE flag. mutexes are not sleepable locks.
Nothing used this flag and WITNESS would have panic'd during mtx_init()
if anything had.
2003-03-11 20:02:57 +00:00
John Baldwin
1106937d99 Remove safety belt: it is now ok to do a mtx_trylock() on a mutex you
already own.  The mtx_trylock() will fail however.  Enhance the comment
at the top of the try lock function to explain this.

Requested by:	jlemon and his evil netisr locking
2003-03-04 21:32:25 +00:00
John Baldwin
5fa8dd90f9 Miscellaneous cleanups to _mtx_lock_sleep():
- Declare some local variables at the top of the function instead of in a
  nested block.
- Use mtx_owned() instead of masking off bits from mtx_lock manually.
- Read the value of mtx_lock into 'v' as a separate line rather than inside
  an if statement for clarity.  This code is hairy enough as it is.
2003-03-04 20:32:41 +00:00
John Baldwin
6b869595c5 Properly assert that mtx_trylock() is not called on a mutex we already
owned.  Previously the KASSERT would only trigger if we successfully
acquired a lock that we already held.  However, _obtain_lock() fails to
acquire locks that we already hold, so the KASSERT was never checked in
the case it was supposed to fail.
2003-03-04 20:30:30 +00:00
Mike Makonnen
0bd5f7979d Unbreak mutex profiling (at least for me).
o Always check for null when dereferencing the filename component.
	o Implement a try-and-backoff method for allocating memory to
	  dump stats to avoid a spin-lock -> sleep-lock mutex lock order
	  panic with WITNESS.

Approved by:	des, markm (mentor)
Not objected:	jhb
2003-02-25 22:28:46 +00:00
Dag-Erling Smørgrav
ecf031c9ad There's absolutely no need for a struct-within-a-struct, so move the
counters out of the inner struct and remove it.
2003-01-21 20:33:27 +00:00
Poul-Henning Kamp
fa669ab7b8 Disable the kernacc() check in mtx_validate() until such time that kernacc
does not require Giant.

This means that we may miss panics on a class of mutex programming bugs,
but only if running with a Chernobyl setting of debug-flags.

Spotted by:	Pete Carah <pete@ns.altadena.net>
2002-10-25 08:40:20 +00:00
Dag-Erling Smørgrav
f2c1ea8152 Whitespace cleanup. 2002-10-23 10:26:54 +00:00
Robert Drehmel
d08926b1f6 Change the `mutex_prof' structure to use three variables contained
in an anonymous structure as counters, instead of an array with
preprocessor-defined names for indices.  Remove the associated XXX-
comment.
2002-10-22 16:06:28 +00:00
Dag-Erling Smørgrav
6d0369001a Reduce the overhead of the mutex statistics gathering code, try to produce
shorter lines in the report, and clean up some minor style issues.
2002-10-21 18:48:28 +00:00
Jeff Roberson
b43179fbe8 - Create a new scheduler api that is defined in sys/sched.h
- Begin moving scheduler specific functionality into sched_4bsd.c
 - Replace direct manipulation of scheduler data with hooks provided by the
   new api.
 - Remove KSE specific state modifications and single runq assumptions from
   kern_switch.c

Reviewed by:	-arch
2002-10-12 05:32:24 +00:00
John Baldwin
551cf4e150 Rename the mutex thread and process states to use a more generic 'LOCK'
name instead.  (e.g., SLOCK instead of SMTX, TD_ON_LOCK() instead of
TD_ON_MUTEX())  Eventually a turnstile abstraction will be added that
will be shared with mutexes and other types of locks.  SLOCK/TDI_LOCK will
be used internally by the turnstile code and will not be specific to
mutexes.  Making the change now ensures that turnstiles can be dropped
in at a later date without affecting the ABI of userland applications.
2002-10-02 20:31:47 +00:00
Julian Elischer
2735483034 uh, commit all of the patch 2002-09-29 23:28:58 +00:00
Julian Elischer
e081731767 commit the version I actually tested..
Submitted by:	davidxu
2002-09-29 23:23:25 +00:00
Julian Elischer
9eb1fdea37 Implement basic KSE loaning. This stops a hread that is blocked in BOUND mode
from stopping another thread from completing a syscall, and this allows it to
release its resources etc. Probably more related commits to follow (at least
one I know of)

Initial concept by: julian, dillon
Submitted by:	davidxu
2002-09-29 23:04:34 +00:00
Julian Elischer
71fad9fdee Completely redo thread states.
Reviewed by:	davidxu@freebsd.org
2002-09-11 08:13:56 +00:00
John Baldwin
0d975d6341 Add some KASSERT()'s to ensure that we don't perform spin mutex ops on
sleep mutexes and vice versa.  WITNESS normally should catch this but
not everyone uses WITNESS so this is a fallback to catch nasty but easy
to do bugs.
2002-09-03 18:25:16 +00:00
Ian Dowse
02bd1bcd2a Add a new KTR type KTR_CONTENTION, and use it in the mutex code to
log the start and end of periods during which mtx_lock() is waiting
to acquire a sleep mutex. The log message includes the file and
line of both the waiter and the holder.

Reviewed by:	jhb, jake
2002-08-26 18:39:38 +00:00
John Baldwin
ce39e722ec Disable optimization of spinlocks on UP kernels w/o debugging for now
since it breaks mtx_owned() on spin mutexes when used outside of
mtx_assert().  Unfortunately we currently use it in the i386 MD code
and in the sio(4) driver.

Reported by:	bde
2002-07-27 16:54:23 +00:00
Dag-Erling Smørgrav
b61860ad2d Add mtx_ prefixes to the fields used for mutex profiling, and fix a bug
where the profiling code would report the release point instead of the
acquisition point.

Requested by:	bde
2002-07-03 01:50:27 +00:00
Julian Elischer
e602ba25fd Part 1 of KSE-III
The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)

Reviewed by:	Almost everyone who counts
	(at various times, peter, jhb, matt, alfred, mini, bernd,
	and a cast of thousands)

	NOTE: this is still Beta code, and contains lots of debugging stuff.
	expect slight instability in signals..
2002-06-29 17:26:22 +00:00
John Baldwin
6a95e08f2f Replace thread_runnable() with thread_running() as the latter is more
accurate.

Suggested by:	julian
2002-06-04 22:36:24 +00:00
John Baldwin
7fcca6096f Optimize the adaptive mutex spin a bit. Use a simple while loop with
simple reads (and on IA32, a "pause" instruction for each interation of the
loop) to spin until either the mutex owner field changes, or the lock owner
stops executing.

Suggested by:	tanimura
Tested on:	i386
2002-06-04 21:53:48 +00:00
John Baldwin
5853d37d3b Add a private thread_runnable() macro to make the code more readable and
make the KSE diff easier to maintain.
2002-06-04 21:50:02 +00:00
Dag-Erling Smørgrav
db586c8b7c Make the counters uintmax_ts, and use %ju rather than %llu. 2002-05-23 03:08:42 +00:00
John Baldwin
6b8c698908 Rename pause() to ia32_pause() so it doesn't conflict with the pause()
function defined in <unistd.h>.  I didn't #ifdef _KERNEL it because the
mutex implementation in libpthread will probably need this.
2002-05-22 20:32:39 +00:00
John Baldwin
0228ea4e0b Rename cpu_pause() to pause(). Originally I was going to make this an
MI API with empty cpu_pause() functions on other arch's, but this
functionality is definitely unique to IA-32, so I decided to leave it
as i386-only and wrap it in #ifdef's.  I should have dropped the cpu_
prefix when I made that decision.

Requested by:	bde
2002-05-22 13:19:22 +00:00
John Baldwin
703fc290fb Add appropriate IA32 "pause" instructions to improve performanec on
Pentium 4's and newer IA32 processors.  The "pause" instruction has been
verified by Intel to be a NOP on all currently existing IA32 processors
prior to the Pentium 4.
2002-05-21 22:26:35 +00:00
John Baldwin
0e54ddadd9 Fix an old cut 'n' paste bug inherited from BSD/OS: don't increment 'i'
twice once we are in the long wait stage of spinning on a spin mutex.
2002-05-21 21:27:05 +00:00
John Baldwin
e6302957fe Whitespace fixup, properly indent the body of an else clause. 2002-05-21 21:13:27 +00:00
John Baldwin
2498cf8c42 Add code to make default mutexes adaptive if the ADAPTIVE_MUTEXES kernel
option is used (not on by default).

- In the case of trying to lock a mutex, if the MTX_CONTESTED flag is set,
  then we can safely read the thread pointer from the mtx_lock member while
  holding sched_lock.  We then examine the thread to see if it is currently
  executing on another CPU.  If it is, then we keep looping instead of
  blocking.
- In the case of trying to unlock a mutex, it is now possible for a mutex
  to have MTX_CONTESTED set in mtx_lock but to not have any threads
  actually blocked on it, so we need to handle that case.  In that case,
  we just release the lock as if MTX_CONTESTED was not set and return.
- We do not adaptively spin on Giant as Giant is held for long times and
  it slows SMP systems down to a crawl (it was taking several minutes,
  like 5-10 or so for my test alpha and sparc64 SMP boxes to boot up when
  they adaptively spinned on Giant).
- We only compile in the code to do this for SMP kernels, it doesn't make
  sense for UP kernels.

Tested on:	i386, alpha, sparc64
2002-05-21 20:47:11 +00:00
John Baldwin
e8fdcfb57a Optimize spin mutexes for UP kernels without debugging to just enter and
exit critical sections.  We only contest on a spin mutex on an SMP kernel
running on an SMP machine.
2002-05-21 20:34:28 +00:00
John Baldwin
0c88508a78 Change mtx_init() to now take an extra argument. The third argument is
the generic lock type for use with witness.  If this argument is NULL then
the lock name is used as the lock type.  Add a macro for a lock type name
for network driver locks.
2002-04-04 20:52:27 +00:00
Dag-Erling Smørgrav
e633070431 Revert to open hashing. It makes the code simpler, and works farily well
even when the number of records approaches the size of the hash table.
Besides, the previous implementation (using linear probing) was broken :)

Also, use the newly introduced MTX_SYSINIT.
2002-04-02 23:26:32 +00:00
John Baldwin
c53c013bae - Move the MI mutexes sched_lock and Giant from being declared in the
various machdep.c's to being declared in kern_mutex.c.
- Add a new function mutex_init() used to perform early initialization
  needed for mutexes such as setting up thread0's contested lock list
  and initializing MI mutexes.  Change the various MD startup routines
  to call this function instead of duplicating all the code themselves.

Tested on:	alpha, i386
2002-04-02 22:19:16 +00:00
John Baldwin
7feefcd6ce Spelling police. 2002-04-02 20:44:30 +00:00
Andrew R. Reiter
c27b56999e - Add MTX_SYSINIT and SX_SYSINIT as macro glue for allowing sx and mtx
locks to be able to setup a SYSINIT call.  This helps in places where
  a lock is needed to protect some data, but the data is not truly
  associated with a subsystem that can properly initialize it's lock.
  The macros use the mtx_sysinit() and sx_sysinit() functions,
  respectively, as the handler argument to SYSINIT().

Reviewed by: alfred, jhb, smp@
2002-04-02 16:05:43 +00:00
Dag-Erling Smørgrav
b784ffe91a Instead of get_cyclecount(9), use nanotime(9) to record acquisition and
release times.  Measurements are made and stored in nanoseconds but
presented in microseconds, which should be sufficient for the locks for
which we actually want this (those that are held long and / or often).
Also, rename some variables and structure members to unit-agnostic names.
2002-04-02 14:42:01 +00:00
Dag-Erling Smørgrav
6c35e80948 Mutex profiling code, conditional on the MUTEX_PROFILING option. Adds the
following sysctl variables:

  debug.mutex.prof.enable	    enable / disable profiling
  debug.mutex.prof.acquisitions	    number of mutex acquisitions recorded
  debug.mutex.prof.records	    number of acquisition points recorded
  debug.mutex.prof.maxrecords	    max number of acquisition points
  debug.mutex.prof.rejected	    number of rejections (due to full table)
  debug.mutex.prof.hashsize	    hash size
  debug.mutex.prof.collisions	    number of hash collisions
  debug.mutex.prof.stats	    profiling statistics

The code records four numbers for each acquisition point (identified by
source file name and line number): longest time held, total time held,
number of non-recursive acquisitions, average time held.  The measurements
are in clock cycles (as returned by get_cyclecount(9)); this may cause
measurements on some SMP systems to be unreliable.  This can probably be
worked around by replacing get_cyclecount(9) by some incarnation of
nanotime(9).

This work was derived from initial patches by eivind.
2002-04-02 00:01:49 +00:00
Jeff Roberson
f22a4b62f5 Add a new mtx_init option "MTX_DUPOK" which allows duplicate acquires of locks
with this flag.  Remove the dup_list and dup_ok code from subr_witness.  Now
we just check for the flag instead of doing string compares.

Also, switch the process lock, process group lock, and uma per cpu locks over
to this interface.  The original mechanism did not work well for uma because
per cpu lock names are unique to each zone.

Approved by:	jhb
2002-03-27 09:23:41 +00:00
Alfred Perlstein
4d77a549fe Remove __P. 2002-03-19 21:25:46 +00:00
Peter Wemm
114730b0a8 Tidy up some unused variables 2002-02-20 21:25:44 +00:00
Matthew Dillon
735da6de88 Add kern_giant_ucred to instrument Giant around ucred related operations
such a getgid(), setgid(), etc...
2002-02-18 17:51:47 +00:00
Julian Elischer
2c1007663f In a threaded world, differnt priorirites become properties of
different entities.  Make it so.

Reviewed by:	jhb@freebsd.org (john baldwin)
2002-02-11 20:37:54 +00:00
John Baldwin
18fc2ba9ff Use the mtx_owner() macro in one spot in _mtx_lock_sleep() to make the
code easier to read.
2002-02-09 00:12:53 +00:00
John Baldwin
bf07c922ac Bump the limits for determining if we've held a spinlock too long as they
seem to be too short for the 500 Mhz DS20 I'm testing on.  The rather
arbitrary numbers are rather bogus anyways.  We should probably have
variables for these limits that are calibrated in the MD startup code
somehow.
2002-01-15 14:20:33 +00:00
John Baldwin
c86b6ff551 Change the preemption code for software interrupt thread schedules and
mutex releases to not require flags for the cases when preemption is
not allowed:

The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent
switching to a higher priority thread on mutex releease and swi schedule,
respectively when that switch is not safe.  Now that the critical section
API maintains a per-thread nesting count, the kernel can easily check
whether or not it should switch without relying on flags from the
programmer.  This fixes a few bugs in that all current callers of
swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from
fast interrupt handlers and the swi_sched of softclock needed this flag.
Note that to ensure that swi_sched()'s in clock and fast interrupt
handlers do not switch, these handlers have to be explicitly wrapped
in critical_enter/exit pairs.  Presently, just wrapping the handlers is
sufficient, but in the future with the fully preemptive kernel, the
interrupt must be EOI'd before critical_exit() is called.  (critical_exit()
can switch due to a deferred preemption in a fully preemptive kernel.)

I've tested the changes to the interrupt code on i386 and alpha.  I have
not tested ia64, but the interrupt code is almost identical to the alpha
code, so I expect it will work fine.  PowerPC and ARM do not yet have
interrupt code in the tree so they shouldn't be broken.  Sparc64 is
broken, but that's been ok'd by jake and tmm who will be fixing the
interrupt code for sparc64 shortly.

Reviewed by:	peter
Tested on:	i386, alpha
2002-01-05 08:47:13 +00:00
John Baldwin
7e1f6dfe9d Modify the critical section API as follows:
- The MD functions critical_enter/exit are renamed to start with a cpu_
  prefix.
- MI wrapper functions critical_enter/exit maintain a per-thread nesting
  count and a per-thread critical section saved state set when entering
  a critical section while at nesting level 0 and restored when exiting
  to nesting level 0.  This moves the saved state out of spin mutexes so
  that interlocking spin mutexes works properly.
- Most low-level MD code that used critical_enter/exit now use
  cpu_critical_enter/exit.  MI code such as device drivers and spin
  mutexes use the MI wrappers.  Note that since the MI wrappers store
  the state in the current thread, they do not have any return values or
  arguments.
- mtx_intr_enable() is replaced with a constant CRITICAL_FORK which is
  assigned to curthread->td_savecrit during fork_exit().

Tested on:	i386, alpha
2001-12-18 00:27:18 +00:00
John Baldwin
ba48b69a13 Remove definition of witness and comment stating that this file implements
witness.  Witness moved off to subr_witness.c a while ago.
2001-11-15 19:08:55 +00:00
Matthew Dillon
d23f5958bc Add mtx_lock_giant() and mtx_unlock_giant() wrappers for sysctl management
of Giant during the Giant unwinding phase, and start work on instrumenting
Giant for the file and proc mutexes.

These wrappers allow developers to turn on and off Giant around various
subsystems.  DEVELOPERS SHOULD NEVER TURN OFF GIANT AROUND A SUBSYSTEM JUST
BECAUSE THE SYSCTL EXISTS!  General developers should only considering
turning on Giant for a subsystem whos default is off (to help track down
bugs).  Only developers working on particular subsystems who know what
they are doing should consider turning off Giant.

These wrappers will greatly improve our ability to unwind Giant and test
the kernel on a (mostly) subsystem by subsystem basis.   They allow Giant
unwinding developers (GUDs) to emplace appropriate subsystem and structural
mutexes in the main tree and then request that the larger community test
the work by turning off Giant around the subsystem(s), without the larger
community having to mess around with patches.  These wrappers also allow
GUDs to boot into a (more likely to be) working system in the midst of
their unwinding work and to test that work under more controlled
circumstances.

There is a master sysctl, kern.giant.all, which defaults to 0 (off).  If
turned on it overrides *ALL* other kern.giant sysctls and forces Giant to
be turned on for all wrapped subsystems.  If turned off then Giant around
individual subsystems are controlled by various other kern.giant.XXX sysctls.

Code which overlaps multiple subsystems must have all related subsystem Giant
sysctls turned off in order to run without Giant.
2001-10-26 20:48:04 +00:00
John Baldwin
7ada587697 The mtx_init() and sx_init() functions bzero'd locks before handing them
off to witness_init() making the check for double intializating a lock by
testing the LO_INITIALIZED flag moot.  Workaround this by checking the
LO_INITIALIZED flag ourself before we bzero the lock structure.
2001-10-20 01:22:42 +00:00
John Baldwin
21377ce065 Remove superflous parens after de-macroizing. 2001-09-26 00:05:18 +00:00
John Baldwin
dde96c9933 Since we no longer inline any debugging code in the mutex operations, move
all the debugging code into the function versions of the mutex operations
in kern_mutex.c.  This reduced the __mtx_* macros to simply wrappers of
the _{get,rel}_lock_* macros, so the __mtx_* macros were also abolished in
favor of just calling the _{get,rel}_lock_* macros.  The tangled hairy mass
of macros calling macros is at least a bit more sane now.
2001-09-22 21:19:55 +00:00
John Baldwin
a44f918bf9 Fix a bug in propagate priority: the kse group pointer wasn't being
updated in the loop so the new thread always seemd to have the same
priority as the original thread and no actual priorities were changed.
2001-09-19 22:52:59 +00:00
Julian Elischer
b40ce4165d KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
Bosko Milekic
76dcbd6f9f Force a commit on kern_mutex.c to explain reason for last commit but while
I'm at it also add a comment in mtx_validate() explaining the purpose
of the last change.

Basically, this fixes booting kernels compiled with MUTEX_DEBUG. What used
to happen is before we setidt from init386() [still using BTX idt], we
called mtx_init() on several mutex locks, notably Giant and some others.
This is a problem for MUTEX_DEBUG because it enables mtx_validate() which
calls kernacc(), some of which in turn requires Giant.
Fix by calling kernacc() from mtx_validate() only if (!cold).
2001-08-24 23:00:59 +00:00
Bosko Milekic
ab07087e16 *** empty log message *** 2001-08-24 22:53:45 +00:00
John Baldwin
5cb0fbe47e If we have already panic'd then don't bother enforcing mutex asserts as
things are pretty much shot already and all panic'ing does is hurt our
chances of getting a dump.

Inspired by:	sheldonh
2001-07-31 17:45:50 +00:00
John Baldwin
c4f7a18726 Count the context switch when blocking on a mutex as a voluntary context
switch.  Count the context switch when preempting the current thread to let
a higher priority thread blocked on a mutex we just released run as an
involuntary context switch.

Reported by:	bde
2001-06-25 18:29:32 +00:00
John Baldwin
2d96f0b145 - Move state about lock objects out of struct lock_object and into a new
struct lock_instance that is stored in the per-process and per-CPU lock
  lists.  Previously, the lock lists just kept a pointer to each lock held.
  That pointer is now replaced by a lock instance which contains a pointer
  to the lock object, the file and line of the last acquisition of a lock,
  and various flags about a lock including its recursion count.
- If we sleep while holding a sleepable lock, then mark that lock instance
  as having slept and ignore any lock order violations that occur while
  acquiring Giant when we wake up with slept locks.  This is ok because of
  Giant's special nature.
- Allow witness to differentiate between shared and exclusive locks and
  unlocks of a lock.  Witness will now detect the case when a lock is
  acquired first in one mode and then in another.  Mutexes are always
  locked and unlocked exclusively.  Witness will also now detect the case
  where a process attempts to unlock a shared lock while holding an
  exclusive lock and vice versa.
- Fix a bug in the lock list implementation where we used the wrong
  constant to detect the case where a lock list entry was full.
2001-05-04 17:15:16 +00:00
Mark Murray
fb919e4d5a Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by:	bde (with reservations)
2001-05-01 08:13:21 +00:00
John Baldwin
7141f2ad46 Exit and re-enter the critical section while spinning for a spinlock so
that interrupts can come in while we are waiting for a lock.
2001-04-17 03:34:52 +00:00
Mark Murray
f0b60d7560 Handle a rare but fatal race invoked sometimes when SIGSTOP is
invoked.
2001-04-13 09:29:34 +00:00
John Baldwin
192846463a Rework the witness code to work with sx locks as well as mutexes.
- Introduce lock classes and lock objects.  Each lock class specifies a
  name and set of flags (or properties) shared by all locks of a given
  type.  Currently there are three lock classes: spin mutexes, sleep
  mutexes, and sx locks.  A lock object specifies properties of an
  additional lock along with a lock name and all of the extra stuff needed
  to make witness work with a given lock.  This abstract lock stuff is
  defined in sys/lock.h.  The lockmgr constants, types, and prototypes have
  been moved to sys/lockmgr.h.  For temporary backwards compatability,
  sys/lock.h includes sys/lockmgr.h.
- Replace proc->p_spinlocks with a per-CPU list, PCPU(spinlocks), of spin
  locks held.  By making this per-cpu, we do not have to jump through
  magic hoops to deal with sched_lock changing ownership during context
  switches.
- Replace proc->p_heldmtx, formerly a list of held sleep mutexes, with
  proc->p_sleeplocks, which is a list of held sleep locks including sleep
  mutexes and sx locks.
- Add helper macros for logging lock events via the KTR_LOCK KTR logging
  level so that the log messages are consistent.
- Add some new flags that can be passed to mtx_init():
  - MTX_NOWITNESS - specifies that this lock should be ignored by witness.
    This is used for the mutex that blocks a sx lock for example.
  - MTX_QUIET - this is not new, but you can pass this to mtx_init() now
    and no events will be logged for this lock, so that one doesn't have
    to change all the individual mtx_lock/unlock() operations.
- All lock objects maintain an initialized flag.  Use this flag to export
  a mtx_initialized() macro that can be safely called from drivers.  Also,
  we on longer walk the all_mtx list if MUTEX_DEBUG is defined as witness
  performs the corresponding checks using the initialized flag.
- The lock order reversal messages have been improved to output slightly
  more accurate file and line numbers.
2001-03-28 09:03:24 +00:00
John Baldwin
6283b7d01b - Switch from using save/disable/restore_intr to using critical_enter/exit
and change the u_int mtx_saveintr member of struct mtx to a critical_t
  mtx_savecrit.
- On the alpha we no longer need a custom _get_spin_lock() macro to avoid
  an extra PAL call, so remove it.
- Partially fix using mutexes with WITNESS in modules.  Change all the
  _mtx_{un,}lock_{spin,}_flags() macros to accept explicit file and line
  parameters and rename them to use a prefix of two underscores.  Inside
  of kern_mutex.c, generate wrapper functions for
  _mtx_{un,}lock_{spin,}_flags() (only using a prefix of one underscore)
  that are called from modules.  The macros mtx_{un,}lock_{spin,}_flags()
  are mapped to the __mtx_* macros inside of the kernel to inline the
  usual case of mutex operations and map to the internal _mtx_* functions
  in the module case so that modules will use WITNESS and KTR logging if
  the kernel is compiled with support for it.
2001-03-28 02:40:47 +00:00
John Baldwin
5db078a9be Fix mtx_legal2block. The only time that it is bad to block on a mutex is
if we hold a spin mutex, since we can trivially get into deadlocks if we
start switching out of processes that hold spinlocks.  Checking to see if
interrupts were disabled was a sort of cheap way of doing this since most
of the time interrupts were only disabled when holding a spin lock.  At
least on the i386.  To fix this properly, use a per-process counter
p_spinlocks that counts the number of spin locks currently held, and
instead of checking to see if interrupts are disabled in the witness code,
check to see if we hold any spin locks.  Since child processes always
start up with the sched lock magically held in fork_exit(), we initialize
p_spinlocks to 1 for child processes.  Note that proc0 doesn't go through
fork_exit(), so it starts with no spin locks held.

Consulting from:	cp
2001-03-09 07:24:17 +00:00
John Baldwin
1b43703b47 - Add an extra check in priority_propagation() for UP systems to ensure we
don't end up back at ourselves which would indicate deadlock.
- Add the proc lock to the witness dup_list as we may hold more than one
  process lock at a time.
- Don't assert a mutex is owned in _mtx_unlock_sleep() as that is too late.
  We do the checks in the macros instead.
2001-03-07 02:45:15 +00:00
Julian Elischer
a96dcd84d2 Shuffle netgraph mutexes a bit and hold a reference on a node
from the function that is calling the destructor.
2001-02-28 18:49:09 +00:00
Jake Burkholder
5b270b2a55 Sigh. Try to get priorities sorted out. Don't bother trying to
update native priority, it is diffcult to get right and likely
to end up horribly wrong.  Use an honestly wrong fixed value
that seems to work; PUSER for user threads, and the interrupt
priority for ithreads.  Set it once when the process is created
and forget about it.

Suggested by:	bde
Pointy hat:	me
2001-02-28 02:53:44 +00:00
Jake Burkholder
be15bfc091 Initialize native priority to PRI_MAX. It was usually 0 which made a
process's priority go through the roof when it released a (contested)
mutex.  Only set the native priority in mtx_lock if hasn't already
been set.

Reviewed by:	jhb
2001-02-26 23:27:35 +00:00
Jake Burkholder
a10f496636 Remove brackets around variables in a function that used to be
a macro.
2001-02-25 16:18:13 +00:00
Julian Elischer
7433466190 Move netgraph spimlock order entries out of
the #ifdef SMP section. They need to be there for UP too.
2001-02-25 04:56:23 +00:00
John Baldwin
1103f3b05b Grrr, s/INVARIANTS_SUPPORT/INVARIANT_SUPPORT/. 2001-02-24 21:29:32 +00:00
John Baldwin
15ec816acc - Axe RETIP() as it was very i386 specific and unwieldy. Instead, use the
passed in filename and line number in the KTR tracepoint message.
- Even though it is #if 0'd code, change the code to detect that a process
  is an interrupt thread to check p->p_ithd against NULL rather than
  checking non-existant process flags from BSD/OS.
- Use '%p' to print pointers in KTR log messages instead of assuming
  sizeof(int) == sizeof(void *).
- Don't set p_mtxname to NULL when releasing a mutex.  It doesn't hurt
  to leave it set (we don't clear w_mesg for example) and at least at
  one time in the past, there used to be race conditions in the kernel
  that would result in setting this to NULL causing the kernel to
  dereference NULL.
- Make the _mtx_assert() function be compiled in if INVARIANTS_SUPPORT is
  defined rather than if INVARIANTS is defined so that a KLD compiled
  with INVARIANTS that uses mtx_assert() can be used with a kernel that
  just has INVARIANT_SUPPORT compiled in.
2001-02-24 19:36:13 +00:00
Julian Elischer
33338e7370 Add knowledge of the netgraph spinlocks into the Witness code.
Well, at least I think that's how it's done.
2001-02-24 14:29:47 +00:00
John Baldwin
25d209f260 - Use the NOCPU constant.
- Move the ithread spin locks before sched lock and clk in preparation for
  future commits to the ithread code.
2001-02-22 02:12:54 +00:00
Bosko Milekic
2786342687 Change all instances of CURPROC' and CURTHD' to `curproc,' in order
to stay consistent.

Requested by: bde
2001-02-12 03:15:43 +00:00
Jake Burkholder
d5a08a6065 Implement a unified run queue and adjust priority levels accordingly.
- All processes go into the same array of queues, with different
  scheduling classes using different portions of the array.  This
  allows user processes to have their priorities propogated up into
  interrupt thread range if need be.
- I chose 64 run queues as an arbitrary number that is greater than
  32.  We used to have 4 separate arrays of 32 queues each, so this
  may not be optimal.  The new run queue code was written with this
  in mind; changing the number of run queues only requires changing
  constants in runq.h and adjusting the priority levels.
- The new run queue code takes the run queue as a parameter.  This
  is intended to be used to create per-cpu run queues.  Implement
  wrappers for compatibility with the old interface which pass in
  the global run queue structure.
- Group the priority level, user priority, native priority (before
  propogation) and the scheduling class into a struct priority.
- Change any hard coded priority levels that I found to use
  symbolic constants (TTIPRI and TTOPRI).
- Remove the curpriority global variable and use that of curproc.
  This was used to detect when a process' priority had lowered and
  it should yield.  We now effectively yield on every interrupt.
- Activate propogate_priority().  It should now have the desired
  effect without needing to also propogate the scheduling class.
- Temporarily comment out the call to vm_page_zero_idle() in the
  idle loop.  It interfered with propogate_priority() because
  the idle process needed to do a non-blocking acquire of Giant
  and then other processes would try to propogate their priority
  onto it.  The idle process should not do anything except idle.
  vm_page_zero_idle() will return in the form of an idle priority
  kernel thread which is woken up at apprioriate times by the vm
  system.
- Update struct kinfo_proc to the new priority interface.  Deliberately
  change its size by adjusting the spare fields.  It remained the same
  size, but the layout has changed, so userland processes that use it
  would parse the data incorrectly.  The size constraint should really
  be changed to an arbitrary version number.  Also add a debug.sizeof
  sysctl node for struct kinfo_proc.
2001-02-12 00:20:08 +00:00
Bosko Milekic
5746a1d866 - Place back STR string declarations for lock/unlock strings used for KTR_LOCK
tracing in order to avoid duplication.
- Insert some tracepoints back into the mutex acq/rel code, thus ensuring
  that we can trace all lock acq/rel's again.
- All CURPROC != NULL checks are MPASS()es (under MUTEX_DEBUG) because they
  signify a serious mutex corruption.
- Change up some KASSERT()s to MPASS()es, and vice-versa, depending on the
  type of problem we're debugging (INVARIANTS is used here to check that
  the API is being used properly whereas MUTEX_DEBUG is used to ensure that
  something general isn't happening that will have bad impact on mutex
  locks).

Reminded by: jhb, jake, asmodai
2001-02-11 02:54:16 +00:00
John Baldwin
c75e5182ce Unify the two sleep lock order lists to enforce the process lock ->
uidinfo lock locking order.
2001-02-09 20:52:02 +00:00
John Baldwin
e910ba59fc - Change the 'witness_list' ddb command to 'show mutexes'. Note that this
will only display sleep mutexes held by the current process.
- Clean up some nits in the witness_display() function and add a ddb
  command 'show witness' that dumps the hierarchy and order lists to the
  console.
- Use queue(3) macros where appropriate.
- Resort the spin lock order list so that "com" is before "sched_lock".
  Also, add appropriate #ifdef's around SMP and i386-specific mutexes.
- Add two new mutexes used to protect the ithread lists and tables to the
  order list.

Requested by:	bde (1)
2001-02-09 15:19:41 +00:00
Bosko Milekic
9ed346bab0 Change and clean the mutex lock interface.
mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)
2001-02-09 06:11:45 +00:00
John Baldwin
d38b8dbfc8 Add a new ddb command 'witness_list' that lists the mutexes held by
curproc.

Requested by:	peter
2001-01-27 07:51:34 +00:00
Jason Evans
1b367556b5 Convert all simplelocks to mutexes and remove the simplelock implementations. 2001-01-24 12:35:55 +00:00
John Baldwin
8484de7555 - Don't use a union and fun tricks to shave one extra pointer off of struct
mtx right now as it makes debugging harder.  When we are in optimizing
  mode, we can revisit this.
- Fix the KTR trace messages to use %p rather than 0x%p to avoid duplicate
  0x's in KTR output.
- During witness_fixup, release Giant so that witness doesn't get confused.
  Also, grab all_mtx while walking the list of mutexes.
- Remove w_sleep and w_recurse.  Instead, perform checks on mutexes using
  the mutex's mtx_flags field.
- Allow debug.witness_ddb and debug.witness_skipspin to be set from the
  loader.
- Add Giant to the front of existing order_list entries to help ensure
  Giant is always first.
- Add an order entry for the various proc locks.  Note that this only
  helps keep proc in order mostly as the allproc and proctree mutexes are
  only obtained during a lockmgr operation on the specified mutex.
2001-01-24 10:57:01 +00:00
Jason Evans
56771ca74b Print correct file name and line number in mtx_assert().
Noticed by:	jake
2001-01-22 05:56:55 +00:00
Jason Evans
0cde2e34af Move most of sys/mutex.h into kern/kern_mutex.c, thereby making the mutex
inline functions non-inlined.  Hide parts of the mutex implementation that
should not be exposed.

Make sure that WITNESS code is not executed during boot until the mutexes
are fully initialized by SI_SUB_MUTEX (the original motivation for this
commit).

Submitted by:	peter
2001-01-21 22:34:43 +00:00
Jason Evans
527c2fd277 Make the order of the static initializer for all_mtx match the order of
fields in struct mtx.

Found by:	jake
2001-01-21 11:05:02 +00:00
Jason Evans
d1c1b8413e Remove MUTEX_DECLARE() and MTX_COLD. Instead, postpone full mutex
initialization until after malloc() is safe to call, then iterate through
all mutexes and complete their initialization.

This change is necessary in order to avoid some circular bootstrapping
dependencies.
2001-01-21 07:52:20 +00:00
Jake Burkholder
c1ef8aac9e - Make npx_intr INTR_MPSAFE and move acquiring Giant into the
function itself.
- Remove a hack to allow acquiring Giant from the npx asm trap
  vector.
2001-01-20 02:30:58 +00:00
Bosko Milekic
08812b3925 Implement MTX_RECURSE flag for mtx_init().
All calls to mtx_init() for mutexes that recurse must now include
the MTX_RECURSE bit in the flag argument variable. This change is in
preparation for an upcoming (further) mutex API cleanup.
The witness code will call panic() if a lock is found to recurse but
the MTX_RECURSE bit was not set during the lock's initialization.

The old MTX_RECURSE "state" bit (in mtx_lock) has been renamed to
MTX_RECURSED, which is more appropriate given its meaning.

The following locks have been made "recursive," thus far:
eventhandler, Giant, callout, sched_lock, possibly some others declared
in the architecture-specific code, all of the network card driver locks
in pci/, as well as some other locks in dev/ stuff that I've found to
be recursive.

Reviewed by: jhb
2001-01-19 01:59:14 +00:00
Jake Burkholder
ef73ae4b0c Use PCPU_GET, PCPU_PTR and PCPU_SET to access all per-cpu variables
other then curproc.
2001-01-10 04:43:51 +00:00
John Baldwin
562e4ffe86 - Add a new flag MTX_QUIET that can be passed to the various mtx_*
functions.  If this flag is set, then no KTR log messages are issued.
  This is useful for blocking excessive logging, such as with the internal
  mutex used by the witness code.
- Use MTX_QUIET on all of the mtx_enter/exit operations on the internal
  mutex used by the witness code.
- If we are in a panic, don't do witness checks in witness_enter(),
  witness_exit(), and witness_try_enter(), just return.
2000-12-13 21:53:42 +00:00
Jake Burkholder
92cf772d8d - Add code to detect if a system call returns with locks other than Giant
held and panic if so (conditional on witness).
- Change witness_list to return the number of locks held so this is easier.
- Add kern/syscalls.c to the kernel build if witness is defined so that the
  panic message can contain the name of the offending system call.
- Add assertions that Giant and sched_lock are not held when returning from
  a system call, which were missing for alpha and ia64.
2000-12-12 01:14:32 +00:00
John Baldwin
428b4b5562 Oops, the witness mutex is a spin lock, so use MTX_SPIN in the call to
mtx_init().  Since the witness code ignores its internal mutex, this
doesn't result in any functional change.
2000-12-12 00:37:18 +00:00
David Malone
7cc0979fd6 Convert more malloc+bzero to malloc+M_ZERO.
Submitted by:	josh@zipperup.org
Submitted by:	Robert Drehmel <robd@gmx.net>
2000-12-08 21:51:06 +00:00
John Baldwin
6936206ebd Split the WITNESS and MUTEX_DEBUG options apart so that WITNESS does not
depend on MUTEX_DEBUG.  The MUTEX_DEBUG option turns on extra assertions
and checks to verify that mutexes themselves are implemented properly.
The WITNESS option uses extra checks and diagnostics to verify that other
code is using mutexes properly.
2000-12-01 00:10:59 +00:00
John Baldwin
1bd0eefb4c Fix up priority propagation:
- Use a better test for determining when a process is running.
- Convert some checks to assertions.
- Remove unnecessary tests.
- Save the priority before acquiring a mutex rather than in msleep(9).
2000-11-30 00:51:16 +00:00
John Baldwin
86327ad8a4 Set p_mtxname when blocking on a mutex and clear it when waking up. 2000-11-29 20:17:15 +00:00
John Baldwin
f404050e44 Use an atomic operation with an appropriate memory barrier when releasing
a contested sleep mutex in the case that at least two processes are blocked
on the contested mutex.
2000-11-29 18:41:19 +00:00
John Baldwin
8f838cb563 The sched_lock mutex goes after the sio mutex in the locking order since
a software interrupt can be scheduled in the sio interrupt handler while
the sio mutex is held.
2000-11-29 18:38:14 +00:00
John Baldwin
bbc7a98a31 Save the line number and filename of the last mtx_enter operation for
spin locks.  We already do this for sleep locks.
2000-11-29 18:37:01 +00:00
Alfred Perlstein
0931dcefb3 Move the #define of _KERN_MUTEX_C_ so that it's before any system headers
are included.  System headers can include sys/mutex.h and then certain
macros do not get defined.

Reviewed by: jake
2000-11-26 21:14:17 +00:00
Jake Burkholder
a5d5c61c12 Add uidinfo hash and uidinfo struct to the witness order list. 2000-11-26 15:05:46 +00:00
Jake Burkholder
fa2fbc3dac - Protect the callout wheel with a separate spin mutex, callout_lock.
- Use the mutex in hardclock to ensure no races between it and
  softclock.
- Make softclock be INTR_MPSAFE and provide a flag,
  CALLOUT_MPSAFE, which specifies that a callout handler does not
  need giant.  There is still no way to set this flag when
  regstering a callout.

Reviewed by:	-smp@, jlemon
2000-11-19 06:02:32 +00:00
Jake Burkholder
7da6f97772 - Split the run queue and sleep queue linkage, so that a process
may block on a mutex while on the sleep queue without corrupting
it.
- Move dropping of Giant to after the acquire of sched_lock.

Tested by:	John Hay <jhay@icomtek.csir.co.za>
		jhb
2000-11-17 18:09:18 +00:00
John Baldwin
20cdcc5b73 Don't release and acquire Giant in mi_switch(). Instead, release and
acquire Giant as needed in functions that call mi_switch().  The releases
need to be done outside of the sched_lock to avoid potential deadlocks
from trying to acquire Giant while interrupts are disabled.

Submitted by:	witness
2000-11-16 02:16:44 +00:00
John Baldwin
9c36c934a1 Include the right headers to get the DDB #define and the db_active variable. 2000-11-15 22:08:16 +00:00
John Baldwin
59f857e4ea Declare the 'witness_spin_check' properly as a per-CPU variable in the
non-SMP case.
2000-11-15 22:02:05 +00:00
John Baldwin
ecbd8e3710 Don't perform witness checks in witness_enter() during a panic. 2000-11-15 22:00:31 +00:00
John Baldwin
0fe4e534b1 Minor whitespace nit in a comment. 2000-11-10 21:21:20 +00:00
John Baldwin
a5a96a1978 - Use MUTEX_DECLARE() and MTX_COLD for the WITNESS code's internal mutex so
it can function before malloc(9) is up and running.
- Add two new options WITNESS_DDB and WITNESS_SKIPSPIN.  If WITNESS_SKIPSPIN
  is enabled, then spin mutexes are ignored by the WITNESS code.  If
  WITNESS_DDB is turned on and DDB is compiled into the kernel, then the
  kernel will drop into DDB when either a lock hierarchy violation occurs
  or mutexes are held when going to sleep.
- Add some new sysctls:
  debug.witness_ddb is a read-write sysctl that corresponds to WITNESS_DDB.
     The kernel option merely changes the default value to on at boot.
  debug.witness_skipspin is a read-only sysctl that one can use to determine
     if the kernel was compiled with WITNESS_SKIPSPIN.
- Wipe out the BSD/OS-specific lock order lists.  We get to build our own
  lists now as we add mutexes to the kernel.
2000-10-27 02:59:30 +00:00
John Baldwin
3127162743 Quite some warnings. 2000-10-25 04:37:54 +00:00
John Baldwin
b67a3e6e85 Propogate the 'const'ness of mutex descriptions to the witness code to
quiet warnings.
2000-10-20 22:45:01 +00:00
John Baldwin
78f0da0373 Actually enable the witness code if the WITNESS kernel option is enabled. 2000-10-20 21:58:11 +00:00
John Baldwin
f5271ebc2f Doh. Fix a 64-bit-ism by using uintptr_t for a temporary lock variable
instead of int.
2000-10-20 20:24:40 +00:00
John Baldwin
36412d79b4 - Make the mutex code almost completely machine independent. This greatly
reducues the maintenance load for the mutex code.  The only MD portions
  of the mutex code are in machine/mutex.h now, which include the assembly
  macros for handling mutexes as well as optionally overriding the mutex
  micro-operations.  For example, we use optimized micro-ops on the x86
  platform #ifndef I386_CPU.
- Change the behavior of the SMP_DEBUG kernel option.  In the new code,
  mtx_assert() only depends on INVARIANTS, allowing other kernel developers
  to have working mutex assertiions without having to include all of the
  mutex debugging code.  The SMP_DEBUG kernel option has been renamed to
  MUTEX_DEBUG and now just controls extra mutex debugging code.
- Abolish the ugly mtx_f hack.  Instead, we dynamically allocate
  seperate mtx_debug structures on the fly in mtx_init, except for mutexes
  that are initiated very early in the boot process.   These mutexes
  are declared using a special MUTEX_DECLARE() macro, and use a new
  flag MTX_COLD when calling mtx_init.  This is still somewhat hackish,
  but it is less evil than the mtx_f filler struct, and the mtx struct is
  now the same size with and without mutex debugging code.
- Add some micro-micro-operation macros for doing the actual atomic
  operations on the mutex mtx_lock field to make it easier for other archs
  to override/optimize mutex ops if needed.  These new tiny ops also clean
  up the code in some places by replacing long atomic operation function
  calls that spanned 2-3 lines with a short 1-line macro call.
- Don't call mi_switch() from mtx_enter_hard() when we block while trying
  to obtain a sleep mutex.  Calling mi_switch() would bogusly release
  Giant before switching to the next process.  Instead, inline most of the
  code from mi_switch() in the mtx_enter_hard() function.  Note that when
  we finally kill Giant we can back this out and go back to calling
  mi_switch().
2000-10-20 07:26:37 +00:00
John Baldwin
606f8eb27a Remove the mtx_t, witness_t, and witness_blessed_t types. Instead, just
use struct mtx, struct witness, and struct witness_blessed.

Requested by:	bde
2000-09-14 20:15:16 +00:00
Jason Evans
5340642a2e Style cleanups. No functional changes. 2000-09-09 23:18:48 +00:00
Jason Evans
46bf3fe5a6 Add file and line arguments to WITNESS_ENTER() and WITNESS_EXIT, since
__FILE__ and __LINE__ don't get expanded usefully in inline functions.

Add const to all witness*() arguments that are filenames.
2000-09-09 22:43:22 +00:00
Jason Evans
12473b76dc Rename mtx_enter(), mtx_try_enter(), and mtx_exit() and wrap them with cpp
macros that expand to pass filename and line number information.  This is
necessary since we're using inline functions instead of macros now.

Add const to the filename pointers passed througout the mtx and witness
code.
2000-09-08 21:48:06 +00:00
Jason Evans
0384fff8c5 Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*().  See mutex(9).  (Note: The
  alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
  preempted (i386 only).

Partially contributed by:	BSDi (BSD/OS)
Submissions by (at least):	cp, dfr, dillon, grog, jake, jhb, sheldonh
2000-09-07 01:33:02 +00:00