Commit Graph

273 Commits

Author SHA1 Message Date
Alan Cox
0e2056ee7f Remove the vm page queue free mutex from the CDEV order. 2007-02-07 05:43:31 +00:00
Mike Pritchard
af7a34173d The change to the vm_page_queue_freelist lock from a spin lock to a
sleep lock missed the witness code, and the system will panic
immediately on boot if WITNESS is enabled.

Changed the witness definition to the new type.
2007-02-06 05:51:55 +00:00
Konstantin Belousov
e6a4f4cd40 Record kqueue -> struct mount mtx -> vnode interlock lock order to
catch the places where reverse lock order is instantiated.

OKed by:	jeff
2007-02-02 09:02:18 +00:00
Suleiman Souhlal
e8ac01c56a Remove hptlock from the static witness table, now that it's a regular sleep
mutex.
2007-01-16 22:56:28 +00:00
Kip Macy
7c0435b933 MUTEX_PROFILING has been generalized to LOCK_PROFILING. We now profile
wait (time waited to acquire) and hold times for *all* kernel locks. If
the architecture has a system synchronized TSC, the profiling code will
use that - thereby minimizing profiling overhead. Large chunks of profiling
code have been moved out of line, the overhead measured on the T1 for when
it is compiled in but not enabled is < 1%.

Approved by: scottl (standing in for mentor rwatson)
Reviewed by: des and jhb
2006-11-11 03:18:07 +00:00
Robert Watson
acd3428b7d Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges.  These may
require some future tweaking.

Sponsored by:           nCircle Network Security, Inc.
Obtained from:          TrustedBSD Project
Discussed on:           arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
                        Alex Lyashkov <umka at sevcity dot net>,
                        Skip Ford <skip dot ford at verizon dot net>,
                        Antoine Brodin <antoine dot brodin at laposte dot net>
2006-11-06 13:42:10 +00:00
Scott Long
988129b824 Introduce a spinlock for synchronizing access to the video output hardware
in syscons.  This replaces a simple access semaphore that was assumed to be
protected by Giant but often was not.  If two threads that were otherwise
SMP-safe called printf at the same time, there was a high likelyhood that
the semaphore would get corrupted and result in a permanently frozen video
console.  This is similar to what is already done in the serial console
drivers.
2006-09-13 15:48:15 +00:00
Suleiman Souhlal
bec31a8fee The "taskqueue_fast" spinlocks were renamed to "fast_taskqueue" in
subr_taskqueue.c:r1.32

Reported by:	rdivacky
2006-08-26 11:21:25 +00:00
John Baldwin
de833b7c0c Use db_lookup_thread() to lookup the thread for the passed in address
and change 'show locks' to only list the locks for a given thread
rather than for all the threads in the process containing a specified
thread.
2006-04-25 20:24:23 +00:00
Marius Strobl
fa63296aba Remove last vestiges of sab(4). 2006-04-25 19:43:53 +00:00
Marcel Moolenaar
07c8931358 Add the scc_hwmtx spin mutex, defined by scc(4). 2006-04-07 22:15:54 +00:00
John Baldwin
6b81555744 Axe KTR_ALQ_MASK now that KTR_WITNESS is off unless you hack an #ifdef
in subr_witness.c.  I did add a comment in subr_witness.c noting that
KTR_WITNESS is incompatible with KTR_ALQ.
2006-01-25 14:57:23 +00:00
John Baldwin
2b604e82b2 - Add a new KTR_SUBSYS in place of KTR_SPARE1 to serve as a subsystem
placeholder similar to KTR_DEV.  Explain the use of KTR_DEV and
  KTR_SUBSYS in a comment as well.
- Retire KTR_WITNESS and instead have KTR_WITNESS default to off but use
  KTR_SUBSYS if it is enabled.
2006-01-24 22:23:45 +00:00
John Baldwin
83a81bcb14 Add a new file (kern/subr_lock.c) for holding code related to struct
lock_obj objects:
- Add new lock_init() and lock_destroy() functions to setup and teardown
  lock_object objects including KTR logging and registering with WITNESS.
- Move all the handling of LO_INITIALIZED out of witness and the various
  lock init functions into lock_init() and lock_destroy().
- Remove the constants for static indices into the lock_classes[] array
  and change the code outside of subr_lock.c to use LOCK_CLASS to compare
  against a known lock class.
- Move the 'show lock' ddb function and lock_classes[] array out of
  kern_mutex.c over to subr_lock.c.
2006-01-17 16:55:17 +00:00
John Baldwin
3c6decc327 Trim another pointer from struct lock_object (and thus from struct mtx and
struct sx).  Instead of storing a direct pointer to a our lock_class
struct in lock_object, reserve 4 bits in the lo_flags field to serve as an
index into a global lock_classes array that contains pointers to the lock
classes.  Only debugging code such as WITNESS or INVARIANTS checks and KTR
logging need to access the lock_class member, so this shouldn't add any
overhead to production kernels.  It might add some slight overhead to
kernels using those debug options however.

As with the previous set of changes to lock_object, this is going to
completely obliterate the kernel ABI, so be sure to recompile all your
modules.
2006-01-06 18:07:32 +00:00
John Baldwin
b0e9883e2f Teach WITNESS_SAVE() and WITNESS_RESTORE() to work with spin locks instead
of only sleep locks.
2005-12-29 20:54:25 +00:00
John Baldwin
0a46ed7d56 Fix a deadlock I introduced with the recently added printf to warn about
spin locks that are not in the static order list.  It is not safe to call
printf while holding the witness spin mutex since the console drivers that
back printf may need to use their own spin locks which would try to talk
to witness when they were locked.  Given this, it is possible for one
CPU to lock a console driver lock (such as sio) which then tries to lock
the witness lock while another CPU is doing the printf while holding the
witness lock.  Fix this by moving the printf outside of the witness lock.
All other printf's in witness are already correct.

MFC after:	3 days
2005-12-29 20:53:01 +00:00
John Baldwin
5d2162b2f8 Tweak witness handling of lock object to shave 2 pointers off of each
lock object (and thus off of each mutex and sx lock):
- Rename the all_locks list to pending_locks and only put locks initialized
  before SI_SUB_WITNESS on the list so that the SI_SUB_WITNESS can add them
  to witness once it starts up.
- Now that pending_locks is only used during early startup, change it from
  a TAILQ to an STAILQ.  This removes a pointer from the STAILQ_ENTRY in
  struct lock_object.
- Since the pending_locks list is only used during the single-threaded
  early boot it no longer needs to be protected by a mutex, so remove
  all_mtx.
- Since the lo_list member of struct lock_object is now only used during
  early boot before witness is running, collapse lo_list and lo_witness
  into a union.  This shaves the second pointer off of struct lock_object.
- Axe lock_cur_cnt and lock_max_cnt.

With these changes, struct mtx shrinks from 36 to 28 bytes on 32-bit
platforms and from 72 to 56 bytes on 64-bit platforms.  Note that this
commit will completely and utterly destroy the kernel ABI, so no MFC.

Tested on:	alpha, amd64, i386, sparc64
2005-12-05 20:45:24 +00:00
John Baldwin
e0f66ef861 Reorganize the interrupt handling code a bit to make a few things cleaner
and increase flexibility to allow various different approaches to be tried
in the future.
- Split struct ithd up into two pieces.  struct intr_event holds the list
  of interrupt handlers associated with interrupt sources.
  struct intr_thread contains the data relative to an interrupt thread.
  Currently we still provide a 1:1 relationship of events to threads
  with the exception that events only have an associated thread if there
  is at least one threaded interrupt handler attached to the event.  This
  means that on x86 we no longer have 4 bazillion interrupt threads with
  no handlers.  It also means that interrupt events with only INTR_FAST
  handlers no longer have an associated thread either.
- Renamed struct intrhand to struct intr_handler to follow the struct
  intr_foo naming convention.  This did require renaming the powerpc
  MD struct intr_handler to struct ppc_intr_handler.
- INTR_FAST no longer implies INTR_EXCL on all architectures except for
  powerpc.  This means that multiple INTR_FAST handlers can attach to the
  same interrupt and that INTR_FAST and non-INTR_FAST handlers can attach
  to the same interrupt.  Sharing INTR_FAST handlers may not always be
  desirable, but having sio(4) and uhci(4) fight over an IRQ isn't fun
  either.  Drivers can always still use INTR_EXCL to ask for an interrupt
  exclusively.  The way this sharing works is that when an interrupt
  comes in, all the INTR_FAST handlers are executed first, and if any
  threaded handlers exist, the interrupt thread is scheduled afterwards.
  This type of layout also makes it possible to investigate using interrupt
  filters ala OS X where the filter determines whether or not its companion
  threaded handler should run.
- Aside from the INTR_FAST changes above, the impact on MD interrupt code
  is mostly just 's/ithread/intr_event/'.
- A new MI ddb command 'show intrs' walks the list of interrupt events
  dumping their state.  It also has a '/v' verbose switch which dumps
  info about all of the handlers attached to each event.
- We currently don't destroy an interrupt thread when the last threaded
  handler is removed because it would suck for things like ppbus(8)'s
  braindead behavior.  The code is present, though, it is just under
  #if 0 for now.
- Move the code to actually execute the threaded handlers for an interrrupt
  event into a separate function so that ithread_loop() becomes more
  readable.  Previously this code was all in the middle of ithread_loop()
  and indented halfway across the screen.
- Made struct intr_thread private to kern_intr.c and replaced td_ithd
  with a thread private flag TDP_ITHREAD.
- In statclock, check curthread against idlethread directly rather than
  curthread's proc against idlethread's proc. (Not really related to intr
  changes)

Tested on:	alpha, amd64, i386, sparc64
Tested on:	arm, ia64 (older version of patch by cognet and marcel)
2005-10-25 19:48:48 +00:00
John Baldwin
cf23efc12a Don't panic if a spin lock is initialized that isn't in our static order
list.  Just warn about it instead.

Requested by:	scottl
MFC after:	1 day
2005-10-24 20:14:24 +00:00
John Baldwin
971d0ad835 Spell hierarchy correctly in comments.
Submitted by:	Wojciech A. Koszek dunstan at freebsd dot czest dot pl
2005-10-24 15:57:27 +00:00
John Baldwin
2a2b58faa4 Add entry for the spin mutex used by the hptmv(4) driver.
MFC after: 	1 day
Tested by:	Philip Kizer pckizer at nostrum dot com
2005-10-20 14:49:59 +00:00
John Baldwin
d27acf445e Add the spin lock used by the binary nvidia driver to the static lock
order list so that WITNESS and the driver play together nicely.

Tested by:	Harald Schmalzbauer
MFC after:	3 days
2005-09-26 18:30:12 +00:00
John Baldwin
b27dbfbf4a - Enforce an implicit lock order that Giant cannot be locked while holding
any other non-sleepable lock.  In plain English: Giant comes before all
  other mutexes.
- Add some extra description to the lock order reversal printf's to indicate
  when a reversal is triggered by a hard-coded implicit rule.

Requested by:	truckman (2)
MFC after:	1 week
2005-09-15 19:07:14 +00:00
Don Lewis
908b3deb2b Relocate witness_levelall(), witness_leveldescendents(), and
witness_displaydescendants() so that they are protected by
"#ifdef DDB/#endif" to unbreak kernels not using "option DDB".

MFC after:	3 weeks
2005-09-11 07:57:06 +00:00
John Baldwin
acc0265cc2 - Add some comments to some of the static lock orders. Don't explicitly
link proctree and allproc to Giant since that order is already implicitly
  enforced.
- Use a goto to handle the case where we want to enforce a reversal before
  calling isitmydescendant() in witness_checkorder() so that the logic is
  easier to follow and so that it is easier to add more forced-reversal
  cases in the future.

MFC after:	 3 days
2005-09-02 20:23:49 +00:00
Don Lewis
4053cae340 Track all lock relationships instead of pruning direct relationships
if an indirect relationship exists (keep both A->B->C and A->C).
This allows witness_checkorder() to use isitmychild() instead of
the much more expensive isitmydescendant() to check for valid lock
ordering.

Don't do an expensive tree walk to update the w_level values when
the tree is updated.  Only update the w_level values when using the
debugger to display the tree.

Nuke the experimental "witness_watch > 1" mode that only compared
w_level for the two locks.  This information is no longer maintained
at run time, and the use of isitmychild() in witness_checkorder
should bring performance close enough to the acceptable level that
this hack is not needed.

Report witness data structure allocation statistics under the
debug.witness sysctl.

Reviewed by:	jhb
MFC after:	30 days
2005-08-25 03:47:37 +00:00
Robert Watson
ae018704a1 Add an order between UDP inpcb locks and the IPv4 multicast address
list lock, as there has been a report that an alternative lock order
is getting introduced.  This should help ferret it out.

Reported by:	Ed Maste <emaste at phaedrus dot sandvine dot ca>
2005-08-09 13:27:50 +00:00
Robert Watson
dd5a318ba3 Introduce in_multi_mtx, which will protect IPv4-layer multicast address
lists, as well as accessor macros.  For now, this is a recursive mutex
due code sequences where IPv4 multicast calls into IGMP calls into
ip_output(), which then tests for a multicast forwarding case.

For support macros in in_var.h to check multicast address lists, assert
that in_multi_mtx is held.

Acquire in_multi_mtx around iteration over the IPv4 multicast address
lists, such as in ip_input() and ip_output().

Acquire in_multi_mtx when manipulating the IPv4 layer multicast addresses,
as well as over the manipulation of ifnet multicast address lists in order
to keep the two layers in sync.

Lock down accesses to IPv4 multicast addresses in IGMP, or assert the
lock when performing IGMP join/leave events.

Eliminate spl's associated with IPv4 multicast addresses, portions of
IGMP that weren't previously expunged by IGMP locking.

Add in_multi_mtx, igmp_mtx, and if_addr_mtx lock order to hard-coded
lock order in WITNESS, in that order.

Problem reported by:	Ed Maste <emaste at phaedrus dot sandvine dot ca>
MFC after:		10 days
2005-08-03 19:29:47 +00:00
Marius Strobl
fce21e7e25 After some input from bde@ and rereading the datasheet use a MTX_SPIN
mutex instead of a MTX_DEF one in order to defer preemption while
reading the date and time registers. If we don't manage to read them
within the time slot where we are guaranteed that no updates occur we
might actually read them during an update in which case the output is
undefined.
2005-06-04 23:24:50 +00:00
Jeff Roberson
17314e6286 - Define the real lock order with cdev and a few vm/vfs related locks. This
can be removed once cdev no longer calls free() with the cdev lock held.
2005-04-22 22:43:31 +00:00
Jeff Roberson
57f66be038 - Check LO_DUPOK as well as LOP_DUPOK when determining whether we should
warn about duplicate acquires.

Sponsored by:	Isilon Systems, Inc.
2005-04-22 22:39:46 +00:00
Vinod Kashyap
f0c1dee27f The latest release of the FreeBSD driver (twa) for
3ware's 9xxx series controllers.  This corresponds to
the 9.2 release (for FreeBSD 5.2.1) on the 3ware website.

Highlights of this release are:

1. The driver has been re-architected to use a "Common Layer"
    (all tw_cl* files), which is a consolidation of all OS-independent
    parts of the driver.  The FreeBSD OS specific portions of the
    driver go into an "OS Layer" (all tw_osl* files).
    This re-architecture is to achieve better maintainability, consistency
    of behavior across OS's, and better portability to new OS's (drivers
    for new OS's can be written by just adding an OS Layer that's specific
    to the OS, by complying to a "Common Layer Programming Interface" API.

2. The driver takes advantage of multiple processors.

3. The driver has a new firmware image bundled, the new features of which
   include Online Capacity Expansion and multi-lun support, among others.
   More details about 3ware's 9.2 release can be found here:
   http://www.3ware.com/download/Escalade9000Series/9.2/9.2_Release_Notes_Web.pdf

Since the Common Layer is used across OS's, the FreeBSD specific include
path for header files (/sys/dev/twa) is not part of the #include pre-processor
directive in any of the source files.  For being able to integrate twa into
the kernel despite this, Makefile.<arch> has been changed to add the include
path to CFLAGS.

Reviewed by: scottl
2005-04-12 22:07:11 +00:00
Pawel Jakub Dawidek
c19618dd7d CDEV lock should be before 'system map' lock.
Hardcode this order to help track down reported LOR.

LOR reported by:	Thierry Herbelot <thierry@herbelot.com>
LOR info:		http://sources.zabbadoz.net/freebsd/lor.html#080
2005-04-09 13:32:01 +00:00
Pawel Jakub Dawidek
cd104dd3c1 Add a missing terminator.
Confirmed by:	rwatson
2005-04-09 11:31:31 +00:00
Robert Watson
53358cc907 Document, via WITNESS, that the NFS server mutex falls ahead of the socket
buffer mutexes.
2005-03-09 21:38:53 +00:00
Bill Paul
58a6edd121 When you call MiniportInitialize() for an 802.11 driver, it will
at some point result in a status event being triggered (it should
be a link down event: the Microsoft driver design guide says you
should generate one when the NIC is initialized). Some drivers
generate the event during MiniportInitialize(), such that by the
time MiniportInitialize() completes, the NIC is ready to go. But
some drivers, in particular the ones for Atheros wireless NICs,
don't generate the event until after a device interrupt occurs
at some point after MiniportInitialize() has completed.

The gotcha is that you have to wait until the link status event
occurs one way or the other before you try to fiddle with any
settings (ssid, channel, etc...). For the drivers that set the
event sycnhronously this isn't a problem, but for the others
we have to pause after calling ndis_init_nic() and wait for the event
to arrive before continuing. Failing to wait can cause big trouble:
on my SMP system, calling ndis_setstate_80211() after ndis_init_nic()
completes, but _before_ the link event arrives, will lock up or
reset the system.

What we do now is check to see if a link event arrived while
ndis_init_nic() was running, and if it didn't we msleep() until
it does.

Along the way, I discovered a few other problems:

- Defered procedure calls run at PASSIVE_LEVEL, not DISPATCH_LEVEL.
  ntoskrnl_run_dpc() has been fixed accordingly. (I read the documentation
  wrong.)

- Similarly, the NDIS interrupt handler, which is essentially a
  DPC, also doesn't need to run at DISPATCH_LEVEL. ndis_intrtask()
  has been fixed accordingly.

- MiniportQueryInformation() and MiniportSetInformation() run at
  DISPATCH_LEVEL, and each request must complete before another
  can be submitted. ndis_get_info() and ndis_set_info() have been
  fixed accordingly.

- Turned the sleep lock that guards the NDIS thread job list into
  a spin lock. We never do anything with this lock held except manage
  the job list (no other locks are held), so it's safe to do this,
  and it's possible that ndis_sched() and ndis_unsched() can be
  called from DISPATCH_LEVEL, so using a sleep lock here is
  semantically incorrect. Also updated subr_witness.c to add the
  lock to the order list.
2005-03-07 03:05:31 +00:00
Robert Watson
5324bda309 When DDB is not defined, don't implement witness_thread_has_locks() and
witness_proc_has_locks(), as they are unused, which results in a compiler
error.  This problem was introduced with the implementation of "show
alllocks".

Spotted by:	Artem Kuchin <matrix at itlegion dot ru>
2005-01-22 21:14:21 +00:00
John Baldwin
83ae089aab - Up the WITNESS_COUNT macro from 200 to 1024 to support the growing number
of lock types in the kernel.  This results in an increase of witness
  data usage from ~145k to ~280k on i386 for kernels with
  'options WITNESS'.
- Remove the unused witness malloc bucket.

Submitted by:	Michal Mertl mime at traveller dot cz (1)
2004-12-28 21:21:27 +00:00
Robert Watson
6ce8940626 Attempt to slightly refine the print out from "show alllocks" -- list
the process and thread numbers/names on the same line rather than on
separate lines, and print the thread pointer not just the tid.
2004-12-27 10:47:08 +00:00
Robert Watson
b6dd9ef2fe Add "show alllocks" command to DDB, which dumps a list of processes
and threads currently holding sleep mutexes (and spin mutexes for
curthread).  This can be quite useful in looking for a lock condition
summary for a system, as it avoids manually iterating through threads
and processes to find all the interesting locks.

NB: "alllocks" is up there with "lockedvnods" for a bad argument for
show.

MFC after:	2 weeks
2004-12-26 22:52:24 +00:00
John-Mark Gurney
e211cad004 clean up some tunables that should of been removed a while ago... 2004-11-09 06:46:14 +00:00
Robert Watson
cc34aa2094 Add entropy harvest mutex to hard-coded spin lock witness lock order,
remove previous entropy harvesting mutex names as they are no longer
present.  Commit to this file was ommitted when randomdev_soft.c:1.5
was made.

Feet shot:	Robert Huff <roberthuff at rcn dot com>
2004-10-11 08:26:18 +00:00
Brian Feldman
41f57cbc8d Don't "implicitly order all sleep locks before spin locks" in witness
when the spin lock in question isn't -- it's the critical_enter() that
KDB set.  No more panic in DDB for console -> syscons -> tty -> knote
operations.
2004-10-09 08:16:37 +00:00
Robert Watson
030c6fb156 Hard code witness lock order for BPF locks. 2004-09-09 05:01:37 +00:00
John-Mark Gurney
80e6bbe95b make witness it's own sysctl branch instead of using _ to do this. I have
left the old tunables in to give people a few days to transition their
loader.conf and sysctl.conf's over to the new names..

MFC after:	5 days
2004-09-06 23:27:28 +00:00
John Baldwin
0e5a07e533 Remove a potential deadlock on i386 SMP by changing the lazypmap ipi and
spin-wait code to use the same spin mutex (smp_tlb_mtx) as the TLB ipi
and spin-wait code snippets so that you can't get into the situation of
one CPU doing a TLB shootdown to another CPU that is doing a lazy pmap
shootdown each of which are waiting on each other.  With this change, only
one of the CPUs would do an IPI and spin-wait at a time.
2004-08-04 20:31:19 +00:00
Robert Watson
3ed994c6c3 Add netatalk mutexes to hard-coded WITNESS lock order. 2004-07-25 20:16:51 +00:00
Marcel Moolenaar
eba21ad501 Update for the KDB framework:
o  Make debugging code conditional upon KDB instead of DDB.
o  s/WITNESS_DDB/WITNESS_KDB/g
o  s/witness_ddb/witness_kdb/g
o  Rename the debug.witness_ddb sysctl to debug.witness_kdb.
o  Call kdb_backtrace() instead of backtrace().
o  Call kdb_enter() instead Debugger().
o  Assert kdb_active instead of db_active.
2004-07-10 21:42:16 +00:00
John Baldwin
776b99ee1a Check the lock lists to see if they are empty directly rather than
assigning a pointer to the list and then dereferencing the pointer as a
second step.  When the first spin lock is acquired, curthread is not in
a critical section so it may be preempted and would end up using another
CPUs lock list instead of its own.

When this code was in witness_lock() this sequence was safe as curthread
was in a critical section already since witness_lock() is called after the
lock is acquired.

Tested by:	Daniel Lang dl at leo.org
2004-07-09 17:46:27 +00:00
Robert Watson
cce9e3f104 Introduce socket and UNIX domain socket locks into hard-coded lock
order definition for witness.  Send lock before receive lock, and
socket locks after accept but  before select:

  filedesc -> accept -> so_snd -> so_rcv -> sellck

All routing locks after send lock:

  so_rcv -> radix node head

All protocol locks before socket locks:

  unp -> so_snd
  udp -> udpinp -> so_snd
  tcp -> tcpinp -> so_snd
2004-06-13 00:23:03 +00:00
John Baldwin
ba8b26f960 - Comment out NULL, NULL barrier for Unix domain sockets section as the
double NULL entries signal Witness to stop processing the array of
  order entries meaning none of the spin locks are added resulting in
  panics on boot.
- Add a missing NULL, NULL terminator to the Slip locks list to keep them
  separate from the spin locks.
2004-06-03 20:07:44 +00:00
Robert Watson
d97e0534fa Expand the hard-coded WITNESS lock order to include the following
relationships:

Sockets:    filedesc->accept->sellck
Routing:    radix node head->rtentry->ifaddr
UDP:        udp->udpinp
TCP:        tcp->tcpinp
SLIP:       slip_mtx->slip sc_mtx

Drop in a place holder section for UNIX domain sockets.  Various
sections to be expanded over the next few days.
2004-06-02 23:28:06 +00:00
Alfred Perlstein
12e9993f65 Emit a traceback when witness_trace is set and witness_warn() is
called and triggers (typically caused by sleeping with a non-sleepable
lock).

Reviewed by: jhb
2004-03-23 00:32:27 +00:00
John Baldwin
dd75b0a90d Add an implementation of a generic sleep queue abstraction that is used
to queue threads sleeping on a wait channel similar to how turnstiles are
used to queue threads waiting for a lock.  This subsystem will be used as
the backend for sleep/wakeup and condition variables initially.  Eventually
it will also be used to replace the ithread-specific iwait thread
inhibitor.

Sleep queues are also not locked by sched_lock, so this splits sched_lock
up a bit further increasing concurrency within the scheduler.  Sleep queues
also natively support timeouts on sleeps and interruptible sleeps allowing
for the reduction of a lot of duplicated code between the sleep/wakeup and
condition variable implementations.  For more details on the sleep queue
implementation, check the comments in sys/sleepqueue.h and
kern/subr_sleepqueue.c.
2004-02-27 18:33:09 +00:00
John Baldwin
3e9ac3ebf2 Remove a bogus assertion.
Noticed by:	bde
Pointy hat to:	jhb
2004-02-03 15:14:27 +00:00
John Baldwin
9c9c52a3ed - Assert that witness_cold is not true in enroll().
- Only check witness_watch once in enroll().

Reported by:	ru (2)
2004-02-02 22:15:17 +00:00
John Baldwin
8d768e7676 Rework witness_lock() to make it slightly more useful and flexible.
- witness_lock() is split into two pieces: witness_checkorder() and
  witness_lock().  Witness_checkorder() determines if acquiring a specified
  lock at the time it is called would result in a lock order.  It
  optionally adds a new lock order relationship as well.  witness_lock()
  updates witness's data structures to assume that a lock has been acquired
  by stick a new lock instance in the appropriate lock instance list.
- The mutex and sx lock functions now call checkorder() prior to trying to
  acquire a lock and continue to call witness_lock() after the acquire is
  completed.  This will let witness catch a deadlock before it happens
  rather than trying to do so after the threads have deadlocked (i.e. never
  actually report it).
- A new function witness_defineorder() has been added that adds a lock
  order between two locks at runtime without having to acquire the locks.
  If the lock order cannot be added it will return an error.  This function
  is available to programmers via the WITNESS_DEFINEORDER() macro which
  accepts either two mutexes or two sx locks as its arguments.
- A few simple wrapper macros were added to allow developers to call
  witness_checkorder() anywhere as a way of enforcing locking assertions
  in code that might acquire a certain lock in some situations.  The
  macros are: witness_check_{mutex,shared_sx,exclusive_sx} and take an
  appropriate lock as the sole argument.
- The code to remove a lock instance from a lock list in witness_unlock()
  was unnested by using a goto to vastly improve the readability of this
  function.
2004-01-28 20:39:57 +00:00
Ruslan Ermilov
33fe8fd0df Register the uart(4)'s spin lock with witness(4). 2004-01-25 15:04:37 +00:00
Mark Murray
4e3a7a14d9 Fix a major faux pas of mine. I was causing 2 very bad things to
happen in interrupt context; 1) sleep locks, and 2) malloc/free
calls.

1) is fixed by using spin locks instead.

2) is fixed by preallocating a FIFO (implemented with a STAILQ)
   and using elements from this FIFO instead. This turns out
   to be rather fast.

OK'ed by:	re (scottl)
Thanks to:	peter, jhb, rwatson, jake
Apologies to:	*
2003-11-20 15:35:48 +00:00
Peter Wemm
0d2a298904 Initial landing of SMP support for FreeBSD/amd64.
- This is heavily derived from John Baldwin's apic/pci cleanup on i386.
- I have completely rewritten or drastically cleaned up some other parts.
  (in particular, bootstrap)
- This is still a WIP.  It seems that there are some highly bogus bioses
  on nVidia nForce3-150 boards.  I can't stress how broken these boards
  are.  I have a workaround in mind, but right now the Asus SK8N is broken.
  The Gigabyte K8NPro (nVidia based) is also mind-numbingly hosed.
- Most of my testing has been with SCHED_ULE.  SCHED_4BSD works.
- the apic and acpi components are 'standard'.
- If you have an nVidia nForce3-150 board, you are stuck with 'device
  atpic' in addition, because they somehow managed to forget to connect the
  8254 timer to the apic, even though its in the same silicon!  ARGH!
  This directly violates the ACPI spec.
2003-11-17 08:58:16 +00:00
Bruce Evans
416ab90e6b Localized the cy driver's locking. 2003-11-16 00:55:54 +00:00
John Baldwin
961a7b244d Add an implementation of turnstiles and change the sleep mutex code to use
turnstiles to implement blocking isntead of implementing a thread queue
directly.  These turnstiles are somewhat similar to those used in Solaris 7
as described in Solaris Internals but are also different.

Turnstiles do not come out of a fixed-sized pool.  Rather, each thread is
assigned a turnstile when it is created that it frees when it is destroyed.
When a thread blocks on a lock, it donates its turnstile to that lock to
serve as queue of blocked threads.  The queue associated with a given lock
is found by a lookup in a simple hash table.  The turnstile itself is
protected by a lock associated with its entry in the hash table.  This
means that sched_lock is no longer needed to contest on a mutex.  Instead,
sched_lock is only used when manipulating run queues or thread priorities.
Turnstiles also implement priority propagation inherently.

Currently turnstiles only support mutexes.  Eventually, however, turnstiles
may grow two queue's to support a non-sleepable reader/writer lock
implementation.  For more details, see the comments in sys/turnstile.h and
kern/subr_turnstile.c.

The two primary advantages from the turnstile code include: 1) the size
of struct mutex shrinks by four pointers as it no longer stores the
thread queue linkages directly, and 2) less contention on sched_lock in
SMP systems including the ability for multiple CPUs to contend on different
locks simultaneously (not that this last detail is necessarily that much of
a big win).  Note that 1) means that this commit is a kernel ABI breaker,
so don't mix old modules with a new kernel and vice versa.

Tested on:	i386 SMP, sparc64 SMP, alpha SMP
2003-11-11 22:07:29 +00:00
John Baldwin
b95bb3e62b Update spin lock order list for new i386 interrupt and SMP code. 2003-11-03 22:38:30 +00:00
Mike Silbersack
184dcdc7c8 Change all SYSCTLS which are readonly and have a related TUNABLE
from CTLFLAG_RD to CTLFLAG_RDTUN so that sysctl(8) can provide
more useful error messages.
2003-10-21 18:28:36 +00:00
Sam Leffler
6c024e8ef6 add fast swi taskqueue spinlock to the order_list so witness doesn't complain
Submitted by:	Tor Egge <Tor.Egge@cvsup.no.freebsd.org>
2003-09-06 21:06:08 +00:00
John Baldwin
b35b737201 Insert cosmetic spaces.
Reported by:	kris
2003-08-04 19:24:25 +00:00
John Baldwin
1beccae67c Add a new function to look for a spinlock's instance when it is held by
another thread.  We use the td_oncpu member of the other field to locate
it's associated CPU and then search the that CPU's list of spin locks
contained in its per-CPU data.  This is not always safe and may in fact
panic or just not work, but it is useful in at least one case.
2003-07-31 18:50:58 +00:00
Peter Wemm
e95babf3a8 unifdef -DLAZY_SWITCH and start to tidy up the associated glue. 2003-07-10 01:02:59 +00:00
David E. O'Brien
677b542ea2 Use __FBSDID(). 2003-06-11 00:56:59 +00:00
Poul-Henning Kamp
b1921a6f33 Remove return after panic.
Found by:       FlexeLint
2003-05-31 20:09:42 +00:00
Peter Wemm
d5167abf3c Add __amd64__ to the ifdefs that introduce the "pcicfg" spinlock to
witness.

Approved by:  re (safe amd64 support)
2003-05-31 06:42:37 +00:00
Julian Elischer
060563ec50 Move the _oncpu entry from the KSE to the thread.
The entry in the KSE still exists but it's purpose will change a bit
when we add the ability to lock a KSE to a cpu.
2003-04-10 17:35:44 +00:00
Mike Barcroft
fd7a8150fb o In struct prison, add an allprison linked list of prisons (protected
by allprison_mtx), a unique prison/jail identifier field, two path
  fields (pr_path for reporting and pr_root vnode instance) to store
  the chroot() point of each jail.
o Add jail_attach(2) to allow a process to bind to an existing jail.
o Add change_root() to perform the chroot operation on a specified
  vnode.
o Generalize change_dir() to accept a vnode, and move namei() calls
  to callers of change_dir().
o Add a new sysctl (security.jail.list) which is a group of
  struct xprison instances that represent a snapshot of active jails.

Reviewed by:	rwatson, tjr
2003-04-09 02:55:18 +00:00
Peter Wemm
cc66ebe2a9 Commit a partial lazy thread switch mechanism for i386. it isn't as lazy
as it could be and can do with some more cleanup.  Currently its under
options LAZY_SWITCH.  What this does is avoid %cr3 reloads for short
context switches that do not involve another user process.  ie: we can
take an interrupt, switch to a kthread and return to the user without
explicitly flushing the tlb.  However, this isn't as exciting as it could
be, the interrupt overhead is still high and too much blocks on Giant
still.  There are some debug sysctls, for stats and for an on/off switch.

The main problem with doing this has been "what if the process that you're
running on exits while we're borrowing its address space?" - in this case
we use an IPI to give it a kick when we're about to reclaim the pmap.

Its not compiled in unless you add the LAZY_SWITCH option.  I want to fix a
few more things and get some more feedback before turning it on by default.

This is NOT a replacement for Bosko's lazy interrupt stuff.  This was more
meant for the kthread case, while his was for interrupts.  Mine helps a
little for interrupts, but his helps a lot more.

The stats are enabled with options SWTCH_OPTIM_STATS - this has been a
pseudo-option for years, I just added a bunch of stuff to it.

One non-trivial change was to select a new thread before calling
cpu_switch() in the first place.  This allows us to catch the silly
case of doing a cpu_switch() to the current process.  This happens
uncomfortably often.  This simplifies a bit of the asm code in cpu_switch
(no longer have to call choosethread() in the middle).  This has been
implemented on i386 and (thanks to jake) sparc64.  The others will come
soon.  This is actually seperate to the lazy switch stuff.

Glanced at by:  jake, jhb
2003-04-02 23:53:30 +00:00
John Baldwin
959d22329a - Remove witness_dead and just use witness_watch instead. If witness_watch
is set to 0, it now has the same affect as setting witness_dead used to
  have.
- Added a sysctl handler that allows root to change witness_watch from a
  non-zero value to zero to disable witness at runtime.  Note that you
  can't turn witness back on once it is off.  You can only turn it off as
  a one-way switch.
- Added a comment describing the possible values of witness_watch.
2003-03-24 21:03:53 +00:00
John Baldwin
2ca9461a05 Trim an extra blank line that snuck into the last commit. 2003-03-11 22:33:42 +00:00
John Baldwin
427b3a6549 - Change witness_displaydescendants() to accept the indentation level as
a parameter instead of using the level of a given witness.  When
  recursing, pass an indent level of indent + 1.
- Make use of the information witness_levelall() provides in
  witness_display_list() to use an O(n) algorithm instead of an O(n^2)
  algo to decide which witnesses to display hierarchies from.  Basically,
  we only display a hierarchy for witnesses with a level of 0.
- Add a new per-witness flag that is reset at the start of
  witness_display() for all witness's and is set the first time a witness
  is displayed in witness_displaydescendants().  If a witness is
  encountered more than once in the lock order tree (which happens often),
  witness_displaydescendants() marks the later occurrences with the string
  "(already displayed)" and doesn't display the subtree under that
  witness.  This avoids duplicating large amounts of the lock order tree
  in the 'show witness' output in DDB.

All these changes serve to make 'show witness' a lot more readable and
useful than it was previously.
2003-03-11 22:14:21 +00:00
John Baldwin
f82c6950be - Split the itismychild() function into two functions: insertchild()
adds a witness to the child list of a parent witness.  rebalancetree()
  runs through the entire tree removing direct descendants of witnesses
  who already have said child witness as an indirect descendant through
  another direct descendant.  itismychild() now calls insertchild()
  followed by rebalancetree() and no longer needs the evil hack of
  having static recursed variable.
- Add a function reparentchildren() that adds all the direct descendants
  of one witness as direct descendants of another witness.
- Change the return value of itismychild() and similar functions so that
  they return 0 in the case of failure due to lack of resources instead
  of 1.  This makes the return value more intuitive.
- Check the return value of itismychild() when defining the static lock
  order in witness_initialize().
- Don't try to setup a lock instance in witness_lock() if itismychild()
  fails.  Witness is hosed anyways so no need to do any more witness
  related activity at that point.  It also makes the code flow easier to
  understand.
- Add a new depart() function as the opposite of enroll().  When the
  reference count of a witness drops to 0 in witness_destroy(), this
  function is called on that witness.  First, it runs through the
  lock order tree using reparentchildren() to reparent direct descendants
  of the departing witness to each of the witness' parents in the tree.
  Next, it releases it's own child list and other associated resources.
  Finally it calls rebalanacetree() to rebalance the lock order tree.
- Sort function prototypes into something closer to alphabetical order.

As a result of these changes, there should no longer be 'dead' witnesses
in the order tree, and repeatedly loading and unloading a module should no
longer exhaust witness of its internal resources.

Inspired by:	gallatin
2003-03-11 22:07:35 +00:00
John Baldwin
d5b13ee082 Trim useless "../" leading strings from filenames passed into witness. 2003-03-11 21:53:12 +00:00
John Baldwin
28e4d137a2 Adjust style of #ifdef's and #endif's to be more consistent and in line
with recent additions to style(9).
2003-03-11 21:38:49 +00:00
John Baldwin
d278a7f9ba Do the lock order check skip for the LOP_TRYLOCK case after the check for
recursing on a lock instead of before.  This fixes a bug where WITNESS
could get a little confused if you did an sx_tryslock() on a sx lock that
you already had an slock on.  WITNESS would still function correctly but
it could result in weirdness in the output of 'show locks'.  This also
makes it possible for mtx_trylock() to recurse on a lock.
2003-03-11 20:54:37 +00:00
John Baldwin
0e8677f68b Now that we have WITNESS_WARN(), we only call witness_list() from the
ddb 'show locks' command.  Thus, move witness_list() to the #ifdef DDB
section and remove extra checks for calling this function outside of
DDB.  Also, witness_list() now returns void instead of returning an int.

Reported by:	Steve Ames <steve@energistic.com>
Prodded by:	davidxu
2003-03-10 17:03:57 +00:00
John Baldwin
9da590b49b Oops, fix the double faults people were seeing with the recent changes to
witness.  Sleepable locks such as sx locks always come before all mutexes
including Giant.  However, the static lock order list placed Giant before
the proctree and allproc sx locks.  This resulted in witness creating a
cycle in its lock order "tree" (real trees don't have cycles) leading to
infinite recursion and eventually a double fault.  To fix, put Giant after
sx locks in the lock order list.
2003-03-06 17:25:06 +00:00
John Baldwin
c141c242ac Bah, fix a bogon in the last commit: get the sense of a compare test right
so that we allow a sleepable lock to be acquired with Giant held rather
than allowing a sleepable lock to be acquired with anything but Giant held.
2003-03-04 22:34:07 +00:00
John Baldwin
35580ede37 A small overhaul of witness:
- Add a comment about special lock order rules and Giant near the top of
  subr_witness.c.  Specifically, this documents and explains the real lock
  order relationship between Giant and sleepable locks (i.e. lockmgr locks
  and sx locks).  Basically, Giant can be safely acquired either before or
  after sleepable locks and the case of Giant before a sleepable lock is
  exempted as a special case.
- Add a new static function 'witness_list_lock()' that displays a single
  line of information about a struct lock_instance.  This is used to
  make the output of witness messages more consistent and reduce some code
  duplication.
- Fixup a few comments in witness_lock().
- Properly handle the Giant-before-sleepable-lock lock order exception in
  a more general fashion and remove the no longer needed LI_SLEPT flag.
- Break up the last condition before assuming a reversal a bit to try
  and make the logic less confusing in witness_lock().
- Axe WITNESS_SLEEP() now that LI_SLEPT is no longer needed and replace it
  with a more general WITNESS_WARN() macro/function combination.
  WITNESS_WARN() allows you to output a customized message out to the
  console along with a list of held locks.  It will optionally drop into
  the debugger as well.  You can exempt a single lock from the check by
  passing it in as the second argument.  You can also use flags to specify
  if Giant should be exempt from the check, if all sleepable locks should
  be exempt from the check, and if witness should panic if any non-exempt
  locks are found.
- Make the witness_list() function static.  Other areas of the kernel
  should use the new WITNESS_WARN() instead.
2003-03-04 20:56:39 +00:00
Peter Wemm
af3d516f55 Initiate de-orbit burn for USE_PCI_BIOS_FOR_READ_WRITE. This has been
#if'ed out for a while.  Complete the deed and tidy up some other bits.

We need to be able to call this stuff from outer edges of interrupt
handlers for devices that have the ISR bits in pci config space.  Making
the bios code mpsafe was just too hairy.  We had also stubbed it out some
time ago due to there simply being too much brokenness in too many systems.
This adds a leaf lock so that it is safe to use pci_read_config() and
pci_write_config() from interrupt handlers.  We still will use pcibios
to do interrupt routing if there is no acpi.. [yes, I tested this]

Briefly glanced at by:  imp
2003-02-18 03:36:49 +00:00
Jeff Roberson
5215b1872f - Split the struct kse into struct upcall and struct kse. struct kse will
soon be visible only to schedulers.  This greatly simplifies much the
   KSE code.

Submitted by:	davidxu
2003-02-17 05:14:26 +00:00
Peter Wemm
1c425b874c Add a 'debug.witness_trace' sysctl (and tunable) when DDB is present.
This causes LOR and could-sleep messages to come with a stack trace.
2003-02-13 01:35:56 +00:00
Julian Elischer
6f8132a867 Reversion of commit by Davidxu plus fixes since applied.
I'm not convinced there is anything major wrong with the patch but
them's the rules..

I am using my "David's mentor" hat to revert this as he's
offline for a while.
2003-02-01 12:17:09 +00:00
David Xu
0dbb100b9b Move UPCALL related data structure out of kse, introduce a new
data structure called kse_upcall to manage UPCALL. All KSE binding
and loaning code are gone.

A thread owns an upcall can collect all completed syscall contexts in
its ksegrp, turn itself into UPCALL mode, and takes those contexts back
to userland. Any thread without upcall structure has to export their
contexts and exit at user boundary.

Any thread running in user mode owns an upcall structure, when it enters
kernel, if the kse mailbox's current thread pointer is not NULL, then
when the thread is blocked in kernel, a new UPCALL thread is created and
the upcall structure is transfered to the new UPCALL thread. if the kse
mailbox's current thread pointer is NULL, then when a thread is blocked
in kernel, no UPCALL thread will be created.

Each upcall always has an owner thread. Userland can remove an upcall by
calling kse_exit, when all upcalls in ksegrp are removed, the group is
atomatically shutdown. An upcall owner thread also exits when process is
in exiting state. when an owner thread exits, the upcall it owns is also
removed.

KSE is a pure scheduler entity. it represents a virtual cpu. when a thread
is running, it always has a KSE associated with it. scheduler is free to
assign a KSE to thread according thread priority, if thread priority is changed,
KSE can be moved from one thread to another.

When a ksegrp is created, there is always N KSEs created in the group. the
N is the number of physical cpu in the current system. This makes it is
possible that even an userland UTS is single CPU safe, threads in kernel still
can execute on different cpu in parallel. Userland calls kse_create to add more
upcall structures into ksegrp to increase concurrent in userland itself, kernel
is not restricted by number of upcalls userland provides.

The code hasn't been tested under SMP by author due to lack of hardware.

Reviewed by: julian
2003-01-26 11:41:35 +00:00
Jake Burkholder
be0800e85e Oops, add zstty to the witness order list.
Noticed by:	benno
2003-01-09 15:45:28 +00:00
Jake Burkholder
c3c2862df4 - Add a spin lock to single thread cache invalidation and tlb flush ipis,
which allows ipis to be sent outside of Giant.
- Remove the ap boot mutex, which is unused.
2002-12-22 20:50:23 +00:00
Kris Kennaway
4ef3d7a27b Enforce correct ordering of the filedesc structure and pipe mutex, because
WITNESS can get the order wrong if it guesses based on first use.

Reviewed by:	jhb, alfred
2002-12-22 16:32:34 +00:00
John Baldwin
d2b28e078a Correct an assertion in the code to traverse the list of locks to find an
earlier acquired lock with the same witness as the lock currently being
acquired.  If we had released several earlier acquired locks after
acquiring enough locks to require another lock_list_entry bucket in the
lock list, then subsequent lock_list_entry buckets could contain only one
lock instance in which case i would be zero.

Reported by:	Joel M. Baldwin <qumqats@outel.org>
2002-11-11 16:36:20 +00:00
Alan Cox
151113a946 Catch up with the removal of the vm page buckets spin mutex. 2002-11-02 22:42:18 +00:00
Poul-Henning Kamp
ab33958276 #unifdef the code for checking blessed lock collisions until we need it.
Spotted by:	DARPA & NAI Labs.
2002-10-20 08:48:39 +00:00
Peter Wemm
d2575b9651 Register the machine check private state spinlock on ia64. 2002-10-12 00:33:36 +00:00
Poul-Henning Kamp
37c841831f Be consistent about "static" functions: if the function is marked
static in its prototype, mark it static at the definition too.

Inspired by:    FlexeLint warning #512
2002-09-28 17:15:38 +00:00
Jeff Roberson
c76e20451c - Tell witness about ALQ's spin lock. 2002-09-22 07:11:57 +00:00