freebsd-skq

Author	SHA1	Message	Date
attilio	fde84f320b	Introduce the new kernel thread called "deadlock resolver". While the name is pretentious, a good explanation of its targets is reported in this 17 months old presentation e-mail: http://lists.freebsd.org/pipermail/freebsd-arch/2008-August/008452.html In order to implement it, the sq_type in sleepqueues is mandatory and not only compiled along with INVARIANTS option. Additively, a new sleepqueue function, sleepq_type() is added, returning the type of the sleepqueue linked to a wchan. Three new sysctls are added in order to configure the thread: debug.deadlkres.slptime_threshold debug.deadlkres.blktime_threshold debug.deadlkres.sleepfreq rappresenting the thresholds for sleep and block time that will lead to a deadlock matching (when exceeded), while the sleepfreq rappresents the number of seconds between 2 consecutive thread runnings. In order to enable the deadlock resolver thread recompile your kernel with the option DEADLKRES. Reviewed by: jeff Tested by: pho, Giovanni Trematerra Sponsored by: Nokia Incorporated, Sandvine Incorporated MFC after: 2 weeks	2010-01-09 01:46:38 +00:00
ed	f527d43fa5	Fix indentation.	2009-12-20 22:55:27 +00:00
sam	05a7094fc1	Make ddb command registration dynamic so modules can extend the command set (only so long as the module is present): o add db_command_register and db_command_unregister to add and remove commands, respectively o replace linker sets with SYSINIT's (and SYSUINIT's) that register commands o expose 3 list heads: db_cmd_table, db_show_table, and db_show_all_table for registering top-level commands, show operands, and show all operands, respectively While here also: o sort command lists o add DB_ALIAS, DB_SHOW_ALIAS, and DB_SHOW_ALL_ALIAS to add aliases for existing commands o add "show all trace" as an alias for "show alltrace" o add "show all locks" as an alias for "show alllocks" Submitted by: Guillaume Ballet <gballet@gmail.com> (original version) Reviewed by: jhb MFC after: 1 month	2008-09-15 22:45:14 +00:00
jhb	7851995759	- Reduce scope of #ifdef's in uma_zcreate() call in init_turnstile0(). - Set UMA_ZONE_NOFREE so that the per-turnstile spin locks are type stable to avoid a race where one thread might dereference a lock in a free'd turnstile that was previously used by another thread. Theorized by: tegge (2) MFC after: 1 week	2008-09-08 21:40:15 +00:00
jeff	9d30d1d7a4	- Make SCHED_STATS more generic by adding a wrapper to create the variables and sysctl nodes. - In reset walk the children of kern_sched_stats and reset the counters via the oid_arg1 pointer. This allows us to add arbitrary counters to the tree and still reset them properly. - Define a set of switch types to be passed with flags to mi_switch(). These types are named SWT_*. These types correspond to SCHED_STATS counters and are automatically handled in this way. - Make the new SWT_ types more specific than the older switch stats. There are now stats for idle switches, remote idle wakeups, remote preemption ithreads idling, etc. - Add switch statistics for ULE's pickcpu algorithm. These stats include how much migration there is, how often affinity was successful, how often threads were migrated to the local cpu on wakeup, etc. Sponsored by: Nokia	2008-04-17 04:20:10 +00:00
jeff	e5687b20d7	- Add THREAD_LOCKPTR_ASSERT() to assert that the thread's lock points at the provided lock or &blocked_lock. The thread may be temporarily assigned to the blocked_lock by the scheduler so a direct comparison can not always be made. - Use THREAD_LOCKPTR_ASSERT() in the primary consumers of the scheduling interfaces. The schedulers themselves still use more explicit asserts. Sponsored by: Nokia	2008-02-07 06:55:38 +00:00
jeff	77ea5a24c7	Adaptive spinning in write path with readers and writer starvation avoidance. - Move recursion checking into rwlock inlines to free a bit for use with adaptive spinners. - Clear the RW_LOCK_WRITE_SPINNERS flag whenever the lock state changes causing write spinners to restart their loop. - Write spinners are limited by a count while readers hold the lock as there is no way to know for certain whether readers are running still. - In the read path block if there are write waiters or spinners to avoid starving writers. Use a new per-thread count, td_rw_rlocks, to skip starvation avoidance if it might cause a deadlock. - Remove or change invalid assertions in turnstiles. Reviewed by: attilio (developed parts of the patch as well) Sponsored by: Nokia	2008-02-06 01:02:13 +00:00
julian	b2732e0c22	generally we are interested in what thread did something as opposed to what process. Since threads by default have teh name of the process unless over-written with more useful information, just print the thread name instead.	2007-11-14 06:21:24 +00:00
jeff	c5314fa1e0	- Include opt_sched.h for SCHED_STATS.	2007-06-12 23:27:31 +00:00
jeff	15c2dd7a1f	Commit 3/14 of sched_lock decomposition. - Add a per-turnstile spinlock to solve potential priority propagation deadlocks that are possible with thread_lock(). - The turnstile lock order is defined as the exact opposite of the lock order used with the sleep locks they represent. This allows us to walk in reverse order in priority_propagate and this is the only place we wish to multiply acquire turnstile locks. - Use the turnstile_chain lock to protect assigning mutexes to turnstiles. - Change the turnstile interface to pass back turnstile pointers to the consumers. This allows us to reduce some locking and makes it easier to cancel turnstile assignment while the turnstile chain lock is held. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-04 23:51:44 +00:00
jeff	beb495eff1	- Convert turnstiles and sleepqueus to use UMA. This provides a modest speedup and will be more useful after each gains a spinlock in the impending thread_lock() commit. - Move initialization and asserts into init/fini routines. fini routines are only needed in the INVARIANTS case for now. Submitted by: Attilio Rao <attilio@FreeBSD.org> Tested by: kris, jeff	2007-05-18 06:32:24 +00:00
jeff	474b917526	- Remove setrunqueue and replace it with direct calls to sched_add(). setrunqueue() was mostly empty. The few asserts and thread state setting were moved to the individual schedulers. sched_add() was chosen to displace it for naming consistency reasons. - Remove adjustrunqueue, it was 4 lines of code that was ifdef'd to be different on all three schedulers where it was only called in one place each. - Remove the long ifdef'd out remrunqueue code. - Remove the now redundant ts_state. Inspect the thread state directly. - Don't set TSF_* flags from kern_switch.c, we were only doing this to support a feature in one scheduler. - Change sched_choose() to return a thread rather than a td_sched. Also, rely on the schedulers to return the idlethread. This simplifies the logic in choosethread(). Aside from the run queue links kern_switch.c mostly does not care about the contents of td_sched. Discussed with: julian - Move the idle thread loop into the per scheduler area. ULE wants to do something different from the other schedulers. Suggested by: jhb Tested on: x86/amd64 sched_{4BSD, ULE, CORE}.	2007-01-23 08:46:51 +00:00
delphij	2e20bff54b	Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form.	2007-01-17 14:58:53 +00:00
jhb	496f904eab	Wrap propagate_priority() in a critical section to prevent unwanted preemptions when adjusting the priority of a thread that is on a run queue. This was only observed when FULL_PREEMPTION was enabled. Reported by: kris Diagnosed by: ups MFC after: 1 week	2007-01-11 19:13:27 +00:00
jhb	4e96206d8a	Add a new 'show sleepchain' ddb command similar to 'show lockchain' except that it operates on lockmgr and sx locks. This can be useful for tracking down vnode deadlocks in VFS for example. Note that this command is a bit more fragile than 'show lockchain' as we have to poke around at the wait channel of a thread to see if it points to either a struct lock or a condition variable inside of a struct sx. If td_wchan points to something unmapped, then this command will terminate early due to a fault, but no harm will be done.	2006-08-15 18:29:01 +00:00
jhb	c8c91ce0a9	Rename 'show lockchain' to 'show locktree' and 'show threadchain' to 'show lockchain'. The churn is because I'm about to add a new 'show sleepchain' similar to 'show lockchain' for sleep locks (lockmgr and sx) and 'show threadchain' was a bit ambiguous as both commands show a chain of thread dependencies, 'lockchain' is for non-sleepable locks (mtx and rw) and 'sleepchain' is for sleepable locks.	2006-08-15 16:44:18 +00:00
jhb	df5fe093b1	Honor db_pager_quit in 'show threadchain', 'show allchains', and 'show lockchain'. This is especially helpful for the first 2 as a threadchain could get stuck in an infinite loop during a mutex deadlock.	2006-07-12 21:25:24 +00:00
jhb	0b071af547	Add some new commands to hopefully make it easier to diagnose lock-related problems in ddb: - "show threadchain [thread]" will start with the specified thread (or the current kdb thread by default) and show it's state. If it is blocked on a lock, it will find the owner of the lock and show its state, etc. - "show allchains" will find all of the threads that are blocked on a lock (but do not have any threads blocked on a lock they hold) and show the resulting thread chain. - "show lockchain <lock>" takes a pointer to a lock_object (such as a mutex or rwlock). If there is a turnstile for that lock, then it will display all the threads blocked on the lock. In addition, for each thread blocked on the lock, it will display any contested locks they hold, and recurse on those locks to show any threads blocked on those locks, etc.	2006-04-25 20:28:17 +00:00
jhb	8c0b6ba0a3	Print td_name instead of p_comm if td_name is non-empty for 'show turnstile' and 'show sleepq'.	2006-04-21 20:40:43 +00:00
jhb	084bf8cc1a	- Bring back turnstile_empty() which can check to see if an individual queue on a turnstile is empty. - Add a turnstile_disown() function that allows a thread to give up ownership of a turnstile w/o waking up any waiters.	2006-04-18 18:16:54 +00:00
jhb	113c41cffd	Always explicitly panic in propogate_priority() if we try to propogate a lock's priority to a sleeping thread. When we panic, dump a stack trace of the thread that is asleep if DDB is compiled into the kernel just before calling panic(). This is much more informative and useful for debugging than the current behavior of getting a page fault and not having an easy way of determining which thread caused the original problem. MFC after: 1 week	2006-03-29 23:24:55 +00:00
jhb	752dede518	- Add support for having both a shared and exclusive queue of threads in each turnstile. Also, allow for the owner thread pointer of a turnstile to be NULL. This is needed for the upcoming reader/writer lock implementation. - Add a new ddb command 'show turnstile' that will look up the turnstile associated with the given lock argument and display useful information like the list of threads blocked on each queue, etc. If there isn't an active turnstile for a lock at the specified address, then the function will see if there is an active turnstile at the specified address and display info about it if so. - Adjust the mutex code to handle the turnstile API changes. Tested on: i386 (all), alpha, amd64, sparc64 (1 and 3)	2006-01-27 22:42:12 +00:00
jhb	b24626498e	Initialize thread0.td_contested in init_turnstiles() rather than mutex_init() as it is used by the turnstile code and is not mutex-specific.	2006-01-17 16:47:42 +00:00
jhb	a8d64eb19c	Garbage collect turnstile_empty() since it is unused.	2006-01-17 16:40:20 +00:00
jhb	7758b042a5	Trim a couple of unneeded includes.	2005-09-29 19:13:52 +00:00
phk	13100c3699	Make a bunch of malloc types static. Found by: src/tools/tools/kernxref	2005-02-10 12:02:37 +00:00
jhb	3f307e93e3	Rework the interface between priority propagation (lending) and the schedulers a bit to ensure more correct handling of priorities and fewer priority inversions: - Add two functions to the sched(9) API to handle priority lending: sched_lend_prio() and sched_unlend_prio(). The turnstile code uses these functions to ask the scheduler to lend a thread a set priority and to tell the scheduler when it thinks it is ok for a thread to stop borrowing priority. The unlend case is slightly complex in that the turnstile code tells the scheduler what the minimum priority of the thread needs to be to satisfy the requirements of any other threads blocked on locks owned by the thread in question. The scheduler then decides where the thread can go back to normal mode (if it's normal priority is high enough to satisfy the pending lock requests) or it it should continue to use the priority specified to the sched_unlend_prio() call. This involves adding a new per-thread flag TDF_BORROWING that replaces the ULE-only kse flag for priority elevation. - Schedulers now refuse to lower the priority of a thread that is currently borrowing another therad's priority. - If a scheduler changes the priority of a thread that is currently sitting on a turnstile, it will call a new function turnstile_adjust() to inform the turnstile code of the change. This function resorts the thread on the priority list of the turnstile if needed, and if the thread ends up at the head of the list (due to having the highest priority) and its priority was raised, then it will propagate that new priority to the owner of the lock it is blocked on. Some additional fixes specific to the 4BSD scheduler include: - Common code for updating the priority of a thread when the user priority of its associated kse group has been consolidated in a new static function resetpriority_thread(). One change to this function is that it will now only adjust the priority of a thread if it already has a time sharing priority, thus preserving any boosts from a tsleep() until the thread returns to userland. Also, resetpriority() no longer calls maybe_resched() on each thread in the group. Instead, the code calling resetpriority() is responsible for calling resetpriority_thread() on any threads that need to be updated. - schedcpu() now uses resetpriority_thread() instead of just calling sched_prio() directly after it updates a kse group's user priority. - sched_clock() now uses resetpriority_thread() rather than writing directly to td_priority. - sched_nice() now updates all the priorities of the threads after the group priority has been adjusted. Discussed with: bde Reviewed by: ups, jeffr Tested on: 4bsd, ule Tested on: i386, alpha, sparc64	2004-12-30 20:52:44 +00:00
jhb	a8c1c80ef5	Refine the turnstile and sleep queue interfaces just a bit: - Add a new _lock() call to each API that locks the associated chain lock for a lock_object pointer or wait channel. The _lookup() functions now require that the chain lock be locked via _lock() when they are called. - Change sleepq_add(), turnstile_wait() and turnstile_claim() to lookup the associated queue structure internally via _lookup() rather than accepting a pointer from the caller. For turnstiles, this means that the actual lookup of the turnstile in the hash table is only done when the thread actually blocks rather than being done on each loop iteration in _mtx_lock_sleep(). For sleep queues, this means that sleepq_lookup() is no longer used outside of the sleep queue code except to implement an assertion in cv_destroy(). - Change sleepq_broadcast() and sleepq_signal() to require that the chain lock is already required. For condition variables, this lets the cv_broadcast() and cv_signal() functions lock the sleep queue chain lock while testing the waiters count. This means that the waiters count internal to condition variables is no longer protected by the interlock mutex and cv_broadcast() and cv_signal() now no longer require that the interlock be held when they are called. This lets consumers of condition variables drop the lock before waking other threads which can result in fewer context switches. MFC after: 1 month	2004-10-12 18:36:20 +00:00
jhb	9536269a6d	Add a critical section in turnstile_unpend() from before dropping the turnstile chain lock until after making all the awakened threads runnable. First, this fixes a priority inversion race. Second, this attempts to finish waking up all of the threads waiting on a turnstile before doing a preemption. Reviewed by: Stephan Uphoff (who found the priority inversion race)	2004-10-05 18:00:30 +00:00
julian	e9d9514975	Give setrunqueue() and sched_add() more of a clue as to where they are coming from and what is expected from them. MFC after: 2 days	2004-09-01 02:11:28 +00:00
rwatson	ec34d4330f	Revert modification of subr_turnstile.c accidentally included in the last commit; this assertion was provided by jhb for local debugging and not intended for broader consumption.	2004-07-25 23:32:32 +00:00
rwatson	4c9acdbfaf	In uipc_connect(), assert that the passed thread is curthread, and pass td into unp_connect() instead of reading curthread.	2004-07-25 23:30:43 +00:00
jhb	1b16b181d1	- Change mi_switch() and sched_switch() to accept an optional thread to switch to. If a non-NULL thread pointer is passed in, then the CPU will switch to that thread directly rather than calling choosethread() to pick a thread to choose to. - Make sched_switch() aware of idle threads and know to do TD_SET_CAN_RUN() instead of sticking them on the run queue rather than requiring all callers of mi_switch() to know to do this if they can be called from an idlethread. - Move constants for arguments to mi_switch() and thread_single() out of the middle of the function prototypes and up above into their own section.	2004-07-02 19:09:50 +00:00
jhb	9c6cf2340f	Oops, this didn't make it into my submit before I committed: Defer creation of the sysctl tree for the turnstile profiling stats until a SI_SUB_LOCK sysinit. Doing it in init_turnstiles() is too early as it is called before mi_startup().	2004-06-29 03:48:49 +00:00
jhb	6502f84a50	Add two new kernel options to allow rudimentary profiling of the internal hash tables used in the sleep queue and turnstile code. Each option adds a sysctl tree under debug containing the maximum depth of any bucket in the hash table as well as a separate node for each bucket (or chain) containing the current depth and maximum depth for that bucket.	2004-06-29 02:30:12 +00:00
jhb	8ab84688c3	Rename turnstile_wakeup() to turnstile_broadcast() to make the naming more consistent with other APIs. sleepq and cv's use signal/broadcast, and msleep uses wakeup_one/wakeup. Prior to this turnstiles were using a signal/wakeup mixture.	2004-04-06 19:07:21 +00:00
jhb	6103cfbeb5	Fixup a comment.	2004-03-12 19:05:46 +00:00
jhb	d07a9130c6	Add an implementation of a generic sleep queue abstraction that is used to queue threads sleeping on a wait channel similar to how turnstiles are used to queue threads waiting for a lock. This subsystem will be used as the backend for sleep/wakeup and condition variables initially. Eventually it will also be used to replace the ithread-specific iwait thread inhibitor. Sleep queues are also not locked by sched_lock, so this splits sched_lock up a bit further increasing concurrency within the scheduler. Sleep queues also natively support timeouts on sleeps and interruptible sleeps allowing for the reduction of a lot of duplicated code between the sleep/wakeup and condition variable implementations. For more details on the sleep queue implementation, check the comments in sys/sleepqueue.h and kern/subr_sleepqueue.c.	2004-02-27 18:33:09 +00:00
jhb	b23d8371fa	Clarify and tweak some comments.	2004-02-27 16:14:27 +00:00
jeff	c85cdc3d0f	- Add a flags parameter to mi_switch. The value of flags may be SW_VOL or SW_INVOL. Assert that one of these is set in mi_switch() and propery adjust the rusage statistics. This is to simplify the large number of users of this interface which were previously all required to adjust the proper counter prior to calling mi_switch(). This also facilitates more switch and locking optimizations. - Change all callers of mi_switch() to pass the appropriate paramter and remove direct references to the process statistics.	2004-01-25 03:54:52 +00:00
jhb	d8b6cc614a	Adjust an assertion for the TDF_TSNOBLOCK race handling in turnstile_unpend(). A racing thread that does not have TDI_LOCK set may either be running on another CPU or it may be sitting on a run queue if it was preempted during the very small window in turnstile_wait() between unlocking the turnstile chain lock and locking sched_lock.	2003-12-09 21:14:31 +00:00
jhb	f110a9ab64	Assert that the we never give a thread a NULL turnstile when waking it up.	2003-12-09 21:09:54 +00:00
jhb	66cc89fadf	Revert the previous race fix and replace it with a more general fix. The case of a turnstile having no threads is just one instance of the more general case where the thread we are examining has been partially awakened already in that it has been removed from the turnstile's blocked list but still has TDI_LOCK set. We detect that case by checking to see if the thread has already had a turnstile reassigned to it.	2003-12-09 21:09:04 +00:00
jhb	989e0408dd	- Close a race where a thread on another CPU could release a contested lock and empty its turnstile while the blocking threads still pointed to the turnstile. If the thread on the first CPU blocked on a lock owned by one of the threads blocked on the turnstile just woken up, then the first CPU could try to manipulate a bogus thread queue in the turnstile during priority propagation. - Update locking notes for ts_owner and always clear ts_owner, not just under INVARIANTS. Tested by: sam (1)	2003-11-12 23:48:42 +00:00
jhb	b996af9fb8	Fix a typo in a comment. Submitted by: das	2003-11-12 14:55:45 +00:00
jhb	6cc1f7e330	Add an implementation of turnstiles and change the sleep mutex code to use turnstiles to implement blocking isntead of implementing a thread queue directly. These turnstiles are somewhat similar to those used in Solaris 7 as described in Solaris Internals but are also different. Turnstiles do not come out of a fixed-sized pool. Rather, each thread is assigned a turnstile when it is created that it frees when it is destroyed. When a thread blocks on a lock, it donates its turnstile to that lock to serve as queue of blocked threads. The queue associated with a given lock is found by a lookup in a simple hash table. The turnstile itself is protected by a lock associated with its entry in the hash table. This means that sched_lock is no longer needed to contest on a mutex. Instead, sched_lock is only used when manipulating run queues or thread priorities. Turnstiles also implement priority propagation inherently. Currently turnstiles only support mutexes. Eventually, however, turnstiles may grow two queue's to support a non-sleepable reader/writer lock implementation. For more details, see the comments in sys/turnstile.h and kern/subr_turnstile.c. The two primary advantages from the turnstile code include: 1) the size of struct mutex shrinks by four pointers as it no longer stores the thread queue linkages directly, and 2) less contention on sched_lock in SMP systems including the ability for multiple CPUs to contend on different locks simultaneously (not that this last detail is necessarily that much of a big win). Note that 1) means that this commit is a kernel ABI breaker, so don't mix old modules with a new kernel and vice versa. Tested on: i386 SMP, sparc64 SMP, alpha SMP	2003-11-11 22:07:29 +00:00
jhb	937519b3ea	If a spin lock is held for too long and WITNESS is enabled, then call witness_display_spinlock() to see if we can find out where the current owner of the spin lock last acquired the lock.	2003-07-31 18:52:18 +00:00
jhb	97e378fb00	When complaining about a sleeping thread owning a mutex, display the thread's pid to make debugging easier for people who don't want to have to use the intended tool for these panics (witness). Indirectly prodded by: kris	2003-07-30 20:42:15 +00:00
jhb	58598b39f8	- Add comments about the maintenance of the per-thread list of contested locks held by each thread. - Fix a bug in the original BSD/OS code where a contested lock was not properly handed off from the old thread to the new thread when a contested lock with more than one blocked thread was transferred from one thread to another. - Don't use an atomic operation to write the MTX_CONTESTED value to mtx_lock in the aforementioned special case. The memory barriers and exclusion provided by sched_lock are sufficient. Spotted by: alc (2)	2003-07-02 16:14:09 +00:00
obrien	3b8fff9e4c	Use __FBSDID().	2003-06-11 00:56:59 +00:00

1 2 3 4

177 Commits