freebsd-skq

Author	SHA1	Message	Date
jeff	3a4cdc52dc	Commit 10/14 of sched_lock decomposition. - Add new spinlocks to support thread_lock() and adjust ordering. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-04 23:55:45 +00:00
attilio	d5b0dfc29b	Fix some problems introduced with the last descriptors tables locking patch: - Do the correct test for ldt allocation - Drop dt_lock just before to call kmem_free (since it acquires blocking locks inside) - Solve a deadlock with smp_rendezvous() where other CPU will wait undefinitively for dt_lock acquisition. - Add dt_lock in the WITNESS list of spinlocks While applying these modifies, change the requirement for user_ldt_free() making that returning without dt_lock held. Tested by: marcus, tegge Reviewed by: tegge Approved by: jeff (mentor)	2007-05-29 18:55:41 +00:00
jeff	027ae03b49	- Move clock synchronization into a seperate clock lock so the global scheduler lock is not involved. sched_lock still protects the sched_clock call. Another patch will remedy this. Contributed by: Attilio Rao <attilio@FreeBSD.org> Tested by: kris, jeff	2007-05-20 22:11:50 +00:00
jkoshy	4552216c34	Fix witness(4) warnings about mutex use. Group mutexes used in hwpmc(4) into 3 "types" in the sense of witness(4): - leaf spin mutexes---only one of these should be held at a time, so these mutexes are specified as belonging to a single witness type "pmc-leaf". - `struct pmc_owner' descriptors are protected by a spin mutex of witness type "pmc-owner-proc". Since we call wakeup_one() while holding these mutexes, the witness type of these mutexes needs to dominate that of "sleepq chain" mutexes. - logger threads use a sleep mutex, of type "pmc-sleep". Submitted by: wkoszek (earlier patch)	2007-04-19 08:02:51 +00:00
pjd	5cde9c6089	allprison mutex was converted to sx(9) lock.	2007-04-05 23:32:32 +00:00
rwatson	765a83fd79	Replace custom file descriptor array sleep lock constructed using a mutex and flags with an sxlock. This leads to a significant and measurable performance improvement as a result of access to shared locking for frequent lookup operations, reduced general overhead, and reduced overhead in the event of contention. All of these are imported for threaded applications where simultaneous access to a shared file descriptor array occurs frequently. Kris has reported 2x-4x transaction rate improvements on 8-core MySQL benchmarks; smaller improvements can be expected for many workloads as a result of reduced overhead. - Generally eliminate the distinction between "fast" and regular acquisisition of the filedesc lock; the plan is that they will now all be fast. Change all locking instances to either shared or exclusive locks. - Correct a bug (pointed out by kib) in fdfree() where previously msleep() was called without the mutex held; sx_sleep() is now always called with the sxlock held exclusively. - Universally hold the struct file lock over changes to struct file, rather than the filedesc lock or no lock. Always update the f_ops field last. A further memory barrier is required here in the future (discussed with jhb). - Improve locking and reference management in linux_at(), which fails to properly acquire vnode references before using vnode pointers. Annotate improper use of vn_fullpath(), which will be replaced at a future date. In fcntl(), we conservatively acquire an exclusive lock, even though in some cases a shared lock may be sufficient, which should be revisited. The dropping of the filedesc lock in fdgrowtable() is no longer required as the sxlock can be held over the sleep operation; we should consider removing that (pointed out by attilio). Tested by: kris Discussed with: jhb, kris, attilio, jeff	2007-04-04 09:11:34 +00:00
wkoszek	aae26bbf9f	ng_node and ng_worklist locks both migrated from being spinning locks to adaptive mutexes. Let witness(4) calm down and bring proper types of those locks to the lock order database. Glanced at by: rwatson	2007-04-01 15:48:10 +00:00
jhb	a84f74bb36	Rename the 'mtx_object', 'rw_object', and 'sx_object' members of mutexes, rwlocks, and sx locks to 'lock_object'.	2007-03-21 21:20:51 +00:00
rwatson	2bf000ef9b	Remove unnecessary privilege and privilege check for WITNESS sysctl. Head nod: jhb	2007-02-20 23:49:31 +00:00
alc	c1270b41ec	Remove the vm page queue free mutex from the CDEV order.	2007-02-07 05:43:31 +00:00
mpp	f010375878	The change to the vm_page_queue_freelist lock from a spin lock to a sleep lock missed the witness code, and the system will panic immediately on boot if WITNESS is enabled. Changed the witness definition to the new type.	2007-02-06 05:51:55 +00:00
kib	a816abd565	Record kqueue -> struct mount mtx -> vnode interlock lock order to catch the places where reverse lock order is instantiated. OKed by: jeff	2007-02-02 09:02:18 +00:00
ssouhlal	d4434aa6e9	Remove hptlock from the static witness table, now that it's a regular sleep mutex.	2007-01-16 22:56:28 +00:00
kmacy	9eefcf3161	MUTEX_PROFILING has been generalized to LOCK_PROFILING. We now profile wait (time waited to acquire) and hold times for all kernel locks. If the architecture has a system synchronized TSC, the profiling code will use that - thereby minimizing profiling overhead. Large chunks of profiling code have been moved out of line, the overhead measured on the T1 for when it is compiled in but not enabled is < 1%. Approved by: scottl (standing in for mentor rwatson) Reviewed by: des and jhb	2006-11-11 03:18:07 +00:00
rwatson	10d0d9cf47	Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>	2006-11-06 13:42:10 +00:00
scottl	ae1ca6fd73	Introduce a spinlock for synchronizing access to the video output hardware in syscons. This replaces a simple access semaphore that was assumed to be protected by Giant but often was not. If two threads that were otherwise SMP-safe called printf at the same time, there was a high likelyhood that the semaphore would get corrupted and result in a permanently frozen video console. This is similar to what is already done in the serial console drivers.	2006-09-13 15:48:15 +00:00
ssouhlal	4fdc09f2f9	The "taskqueue_fast" spinlocks were renamed to "fast_taskqueue" in subr_taskqueue.c:r1.32 Reported by: rdivacky	2006-08-26 11:21:25 +00:00
jhb	9a8083cd6c	Use db_lookup_thread() to lookup the thread for the passed in address and change 'show locks' to only list the locks for a given thread rather than for all the threads in the process containing a specified thread.	2006-04-25 20:24:23 +00:00
marius	49cbf99f47	Remove last vestiges of sab(4).	2006-04-25 19:43:53 +00:00
marcel	aa6df7801f	Add the scc_hwmtx spin mutex, defined by scc(4).	2006-04-07 22:15:54 +00:00
jhb	39a7d62e79	Axe KTR_ALQ_MASK now that KTR_WITNESS is off unless you hack an #ifdef in subr_witness.c. I did add a comment in subr_witness.c noting that KTR_WITNESS is incompatible with KTR_ALQ.	2006-01-25 14:57:23 +00:00
jhb	35abe809de	- Add a new KTR_SUBSYS in place of KTR_SPARE1 to serve as a subsystem placeholder similar to KTR_DEV. Explain the use of KTR_DEV and KTR_SUBSYS in a comment as well. - Retire KTR_WITNESS and instead have KTR_WITNESS default to off but use KTR_SUBSYS if it is enabled.	2006-01-24 22:23:45 +00:00
jhb	c0cf4870f4	Add a new file (kern/subr_lock.c) for holding code related to struct lock_obj objects: - Add new lock_init() and lock_destroy() functions to setup and teardown lock_object objects including KTR logging and registering with WITNESS. - Move all the handling of LO_INITIALIZED out of witness and the various lock init functions into lock_init() and lock_destroy(). - Remove the constants for static indices into the lock_classes[] array and change the code outside of subr_lock.c to use LOCK_CLASS to compare against a known lock class. - Move the 'show lock' ddb function and lock_classes[] array out of kern_mutex.c over to subr_lock.c.	2006-01-17 16:55:17 +00:00
jhb	8f18f21de1	Trim another pointer from struct lock_object (and thus from struct mtx and struct sx). Instead of storing a direct pointer to a our lock_class struct in lock_object, reserve 4 bits in the lo_flags field to serve as an index into a global lock_classes array that contains pointers to the lock classes. Only debugging code such as WITNESS or INVARIANTS checks and KTR logging need to access the lock_class member, so this shouldn't add any overhead to production kernels. It might add some slight overhead to kernels using those debug options however. As with the previous set of changes to lock_object, this is going to completely obliterate the kernel ABI, so be sure to recompile all your modules.	2006-01-06 18:07:32 +00:00
jhb	efb6208d84	Teach WITNESS_SAVE() and WITNESS_RESTORE() to work with spin locks instead of only sleep locks.	2005-12-29 20:54:25 +00:00
jhb	e782568056	Fix a deadlock I introduced with the recently added printf to warn about spin locks that are not in the static order list. It is not safe to call printf while holding the witness spin mutex since the console drivers that back printf may need to use their own spin locks which would try to talk to witness when they were locked. Given this, it is possible for one CPU to lock a console driver lock (such as sio) which then tries to lock the witness lock while another CPU is doing the printf while holding the witness lock. Fix this by moving the printf outside of the witness lock. All other printf's in witness are already correct. MFC after: 3 days	2005-12-29 20:53:01 +00:00
jhb	7e42aad088	Tweak witness handling of lock object to shave 2 pointers off of each lock object (and thus off of each mutex and sx lock): - Rename the all_locks list to pending_locks and only put locks initialized before SI_SUB_WITNESS on the list so that the SI_SUB_WITNESS can add them to witness once it starts up. - Now that pending_locks is only used during early startup, change it from a TAILQ to an STAILQ. This removes a pointer from the STAILQ_ENTRY in struct lock_object. - Since the pending_locks list is only used during the single-threaded early boot it no longer needs to be protected by a mutex, so remove all_mtx. - Since the lo_list member of struct lock_object is now only used during early boot before witness is running, collapse lo_list and lo_witness into a union. This shaves the second pointer off of struct lock_object. - Axe lock_cur_cnt and lock_max_cnt. With these changes, struct mtx shrinks from 36 to 28 bytes on 32-bit platforms and from 72 to 56 bytes on 64-bit platforms. Note that this commit will completely and utterly destroy the kernel ABI, so no MFC. Tested on: alpha, amd64, i386, sparc64	2005-12-05 20:45:24 +00:00
jhb	e20e5c07ce	Reorganize the interrupt handling code a bit to make a few things cleaner and increase flexibility to allow various different approaches to be tried in the future. - Split struct ithd up into two pieces. struct intr_event holds the list of interrupt handlers associated with interrupt sources. struct intr_thread contains the data relative to an interrupt thread. Currently we still provide a 1:1 relationship of events to threads with the exception that events only have an associated thread if there is at least one threaded interrupt handler attached to the event. This means that on x86 we no longer have 4 bazillion interrupt threads with no handlers. It also means that interrupt events with only INTR_FAST handlers no longer have an associated thread either. - Renamed struct intrhand to struct intr_handler to follow the struct intr_foo naming convention. This did require renaming the powerpc MD struct intr_handler to struct ppc_intr_handler. - INTR_FAST no longer implies INTR_EXCL on all architectures except for powerpc. This means that multiple INTR_FAST handlers can attach to the same interrupt and that INTR_FAST and non-INTR_FAST handlers can attach to the same interrupt. Sharing INTR_FAST handlers may not always be desirable, but having sio(4) and uhci(4) fight over an IRQ isn't fun either. Drivers can always still use INTR_EXCL to ask for an interrupt exclusively. The way this sharing works is that when an interrupt comes in, all the INTR_FAST handlers are executed first, and if any threaded handlers exist, the interrupt thread is scheduled afterwards. This type of layout also makes it possible to investigate using interrupt filters ala OS X where the filter determines whether or not its companion threaded handler should run. - Aside from the INTR_FAST changes above, the impact on MD interrupt code is mostly just 's/ithread/intr_event/'. - A new MI ddb command 'show intrs' walks the list of interrupt events dumping their state. It also has a '/v' verbose switch which dumps info about all of the handlers attached to each event. - We currently don't destroy an interrupt thread when the last threaded handler is removed because it would suck for things like ppbus(8)'s braindead behavior. The code is present, though, it is just under #if 0 for now. - Move the code to actually execute the threaded handlers for an interrrupt event into a separate function so that ithread_loop() becomes more readable. Previously this code was all in the middle of ithread_loop() and indented halfway across the screen. - Made struct intr_thread private to kern_intr.c and replaced td_ithd with a thread private flag TDP_ITHREAD. - In statclock, check curthread against idlethread directly rather than curthread's proc against idlethread's proc. (Not really related to intr changes) Tested on: alpha, amd64, i386, sparc64 Tested on: arm, ia64 (older version of patch by cognet and marcel)	2005-10-25 19:48:48 +00:00
jhb	f8720df670	Don't panic if a spin lock is initialized that isn't in our static order list. Just warn about it instead. Requested by: scottl MFC after: 1 day	2005-10-24 20:14:24 +00:00
jhb	14d19bb694	Spell hierarchy correctly in comments. Submitted by: Wojciech A. Koszek dunstan at freebsd dot czest dot pl	2005-10-24 15:57:27 +00:00
jhb	98080966b3	Add entry for the spin mutex used by the hptmv(4) driver. MFC after: 1 day Tested by: Philip Kizer pckizer at nostrum dot com	2005-10-20 14:49:59 +00:00
jhb	be5947abab	Add the spin lock used by the binary nvidia driver to the static lock order list so that WITNESS and the driver play together nicely. Tested by: Harald Schmalzbauer MFC after: 3 days	2005-09-26 18:30:12 +00:00
jhb	00aec40493	- Enforce an implicit lock order that Giant cannot be locked while holding any other non-sleepable lock. In plain English: Giant comes before all other mutexes. - Add some extra description to the lock order reversal printf's to indicate when a reversal is triggered by a hard-coded implicit rule. Requested by: truckman (2) MFC after: 1 week	2005-09-15 19:07:14 +00:00
truckman	cc90fab85a	Relocate witness_levelall(), witness_leveldescendents(), and witness_displaydescendants() so that they are protected by "#ifdef DDB/#endif" to unbreak kernels not using "option DDB". MFC after: 3 weeks	2005-09-11 07:57:06 +00:00
jhb	107f288b6f	- Add some comments to some of the static lock orders. Don't explicitly link proctree and allproc to Giant since that order is already implicitly enforced. - Use a goto to handle the case where we want to enforce a reversal before calling isitmydescendant() in witness_checkorder() so that the logic is easier to follow and so that it is easier to add more forced-reversal cases in the future. MFC after: 3 days	2005-09-02 20:23:49 +00:00
truckman	c6df9cec44	Track all lock relationships instead of pruning direct relationships if an indirect relationship exists (keep both A->B->C and A->C). This allows witness_checkorder() to use isitmychild() instead of the much more expensive isitmydescendant() to check for valid lock ordering. Don't do an expensive tree walk to update the w_level values when the tree is updated. Only update the w_level values when using the debugger to display the tree. Nuke the experimental "witness_watch > 1" mode that only compared w_level for the two locks. This information is no longer maintained at run time, and the use of isitmychild() in witness_checkorder should bring performance close enough to the acceptable level that this hack is not needed. Report witness data structure allocation statistics under the debug.witness sysctl. Reviewed by: jhb MFC after: 30 days	2005-08-25 03:47:37 +00:00
rwatson	9025d9a2b8	Add an order between UDP inpcb locks and the IPv4 multicast address list lock, as there has been a report that an alternative lock order is getting introduced. This should help ferret it out. Reported by: Ed Maste <emaste at phaedrus dot sandvine dot ca>	2005-08-09 13:27:50 +00:00
rwatson	7504160c1e	Introduce in_multi_mtx, which will protect IPv4-layer multicast address lists, as well as accessor macros. For now, this is a recursive mutex due code sequences where IPv4 multicast calls into IGMP calls into ip_output(), which then tests for a multicast forwarding case. For support macros in in_var.h to check multicast address lists, assert that in_multi_mtx is held. Acquire in_multi_mtx around iteration over the IPv4 multicast address lists, such as in ip_input() and ip_output(). Acquire in_multi_mtx when manipulating the IPv4 layer multicast addresses, as well as over the manipulation of ifnet multicast address lists in order to keep the two layers in sync. Lock down accesses to IPv4 multicast addresses in IGMP, or assert the lock when performing IGMP join/leave events. Eliminate spl's associated with IPv4 multicast addresses, portions of IGMP that weren't previously expunged by IGMP locking. Add in_multi_mtx, igmp_mtx, and if_addr_mtx lock order to hard-coded lock order in WITNESS, in that order. Problem reported by: Ed Maste <emaste at phaedrus dot sandvine dot ca> MFC after: 10 days	2005-08-03 19:29:47 +00:00
marius	c74fc16e2d	After some input from bde@ and rereading the datasheet use a MTX_SPIN mutex instead of a MTX_DEF one in order to defer preemption while reading the date and time registers. If we don't manage to read them within the time slot where we are guaranteed that no updates occur we might actually read them during an update in which case the output is undefined.	2005-06-04 23:24:50 +00:00
jeff	4eaa5ebe1b	- Define the real lock order with cdev and a few vm/vfs related locks. This can be removed once cdev no longer calls free() with the cdev lock held.	2005-04-22 22:43:31 +00:00
jeff	b29bfc6efa	- Check LO_DUPOK as well as LOP_DUPOK when determining whether we should warn about duplicate acquires. Sponsored by: Isilon Systems, Inc.	2005-04-22 22:39:46 +00:00
vkashyap	363d709ec6	The latest release of the FreeBSD driver (twa) for 3ware's 9xxx series controllers. This corresponds to the 9.2 release (for FreeBSD 5.2.1) on the 3ware website. Highlights of this release are: 1. The driver has been re-architected to use a "Common Layer" (all tw_cl* files), which is a consolidation of all OS-independent parts of the driver. The FreeBSD OS specific portions of the driver go into an "OS Layer" (all tw_osl* files). This re-architecture is to achieve better maintainability, consistency of behavior across OS's, and better portability to new OS's (drivers for new OS's can be written by just adding an OS Layer that's specific to the OS, by complying to a "Common Layer Programming Interface" API. 2. The driver takes advantage of multiple processors. 3. The driver has a new firmware image bundled, the new features of which include Online Capacity Expansion and multi-lun support, among others. More details about 3ware's 9.2 release can be found here: http://www.3ware.com/download/Escalade9000Series/9.2/9.2_Release_Notes_Web.pdf Since the Common Layer is used across OS's, the FreeBSD specific include path for header files (/sys/dev/twa) is not part of the #include pre-processor directive in any of the source files. For being able to integrate twa into the kernel despite this, Makefile.<arch> has been changed to add the include path to CFLAGS. Reviewed by: scottl	2005-04-12 22:07:11 +00:00
pjd	853bddab29	CDEV lock should be before 'system map' lock. Hardcode this order to help track down reported LOR. LOR reported by: Thierry Herbelot <thierry@herbelot.com> LOR info: http://sources.zabbadoz.net/freebsd/lor.html#080	2005-04-09 13:32:01 +00:00
pjd	1a889a3a93	Add a missing terminator. Confirmed by: rwatson	2005-04-09 11:31:31 +00:00
rwatson	8963f5c8c0	Document, via WITNESS, that the NFS server mutex falls ahead of the socket buffer mutexes.	2005-03-09 21:38:53 +00:00
wpaul	a72168b811	When you call MiniportInitialize() for an 802.11 driver, it will at some point result in a status event being triggered (it should be a link down event: the Microsoft driver design guide says you should generate one when the NIC is initialized). Some drivers generate the event during MiniportInitialize(), such that by the time MiniportInitialize() completes, the NIC is ready to go. But some drivers, in particular the ones for Atheros wireless NICs, don't generate the event until after a device interrupt occurs at some point after MiniportInitialize() has completed. The gotcha is that you have to wait until the link status event occurs one way or the other before you try to fiddle with any settings (ssid, channel, etc...). For the drivers that set the event sycnhronously this isn't a problem, but for the others we have to pause after calling ndis_init_nic() and wait for the event to arrive before continuing. Failing to wait can cause big trouble: on my SMP system, calling ndis_setstate_80211() after ndis_init_nic() completes, but _before_ the link event arrives, will lock up or reset the system. What we do now is check to see if a link event arrived while ndis_init_nic() was running, and if it didn't we msleep() until it does. Along the way, I discovered a few other problems: - Defered procedure calls run at PASSIVE_LEVEL, not DISPATCH_LEVEL. ntoskrnl_run_dpc() has been fixed accordingly. (I read the documentation wrong.) - Similarly, the NDIS interrupt handler, which is essentially a DPC, also doesn't need to run at DISPATCH_LEVEL. ndis_intrtask() has been fixed accordingly. - MiniportQueryInformation() and MiniportSetInformation() run at DISPATCH_LEVEL, and each request must complete before another can be submitted. ndis_get_info() and ndis_set_info() have been fixed accordingly. - Turned the sleep lock that guards the NDIS thread job list into a spin lock. We never do anything with this lock held except manage the job list (no other locks are held), so it's safe to do this, and it's possible that ndis_sched() and ndis_unsched() can be called from DISPATCH_LEVEL, so using a sleep lock here is semantically incorrect. Also updated subr_witness.c to add the lock to the order list.	2005-03-07 03:05:31 +00:00
rwatson	57c91a09d8	When DDB is not defined, don't implement witness_thread_has_locks() and witness_proc_has_locks(), as they are unused, which results in a compiler error. This problem was introduced with the implementation of "show alllocks". Spotted by: Artem Kuchin <matrix at itlegion dot ru>	2005-01-22 21:14:21 +00:00
jhb	38cd373d81	- Up the WITNESS_COUNT macro from 200 to 1024 to support the growing number of lock types in the kernel. This results in an increase of witness data usage from ~145k to ~280k on i386 for kernels with 'options WITNESS'. - Remove the unused witness malloc bucket. Submitted by: Michal Mertl mime at traveller dot cz (1)	2004-12-28 21:21:27 +00:00
rwatson	c2459d7b3f	Attempt to slightly refine the print out from "show alllocks" -- list the process and thread numbers/names on the same line rather than on separate lines, and print the thread pointer not just the tid.	2004-12-27 10:47:08 +00:00
rwatson	b864ac486c	Add "show alllocks" command to DDB, which dumps a list of processes and threads currently holding sleep mutexes (and spin mutexes for curthread). This can be quite useful in looking for a lock condition summary for a system, as it avoids manually iterating through threads and processes to find all the interesting locks. NB: "alllocks" is up there with "lockedvnods" for a bad argument for show. MFC after: 2 weeks	2004-12-26 22:52:24 +00:00

1 2 3 4 5

232 Commits