freebsd-skq

Author	SHA1	Message	Date
julian	6f175a0e20	Move the _oncpu entry from the KSE to the thread. The entry in the KSE still exists but it's purpose will change a bit when we add the ability to lock a KSE to a cpu.	2003-04-10 17:35:44 +00:00
mike	75859ca578	o In struct prison, add an allprison linked list of prisons (protected by allprison_mtx), a unique prison/jail identifier field, two path fields (pr_path for reporting and pr_root vnode instance) to store the chroot() point of each jail. o Add jail_attach(2) to allow a process to bind to an existing jail. o Add change_root() to perform the chroot operation on a specified vnode. o Generalize change_dir() to accept a vnode, and move namei() calls to callers of change_dir(). o Add a new sysctl (security.jail.list) which is a group of struct xprison instances that represent a snapshot of active jails. Reviewed by: rwatson, tjr	2003-04-09 02:55:18 +00:00
peter	46969da5f8	Commit a partial lazy thread switch mechanism for i386. it isn't as lazy as it could be and can do with some more cleanup. Currently its under options LAZY_SWITCH. What this does is avoid %cr3 reloads for short context switches that do not involve another user process. ie: we can take an interrupt, switch to a kthread and return to the user without explicitly flushing the tlb. However, this isn't as exciting as it could be, the interrupt overhead is still high and too much blocks on Giant still. There are some debug sysctls, for stats and for an on/off switch. The main problem with doing this has been "what if the process that you're running on exits while we're borrowing its address space?" - in this case we use an IPI to give it a kick when we're about to reclaim the pmap. Its not compiled in unless you add the LAZY_SWITCH option. I want to fix a few more things and get some more feedback before turning it on by default. This is NOT a replacement for Bosko's lazy interrupt stuff. This was more meant for the kthread case, while his was for interrupts. Mine helps a little for interrupts, but his helps a lot more. The stats are enabled with options SWTCH_OPTIM_STATS - this has been a pseudo-option for years, I just added a bunch of stuff to it. One non-trivial change was to select a new thread before calling cpu_switch() in the first place. This allows us to catch the silly case of doing a cpu_switch() to the current process. This happens uncomfortably often. This simplifies a bit of the asm code in cpu_switch (no longer have to call choosethread() in the middle). This has been implemented on i386 and (thanks to jake) sparc64. The others will come soon. This is actually seperate to the lazy switch stuff. Glanced at by: jake, jhb	2003-04-02 23:53:30 +00:00
jhb	966c72c345	- Remove witness_dead and just use witness_watch instead. If witness_watch is set to 0, it now has the same affect as setting witness_dead used to have. - Added a sysctl handler that allows root to change witness_watch from a non-zero value to zero to disable witness at runtime. Note that you can't turn witness back on once it is off. You can only turn it off as a one-way switch. - Added a comment describing the possible values of witness_watch.	2003-03-24 21:03:53 +00:00
jhb	0c3ac305c8	Trim an extra blank line that snuck into the last commit.	2003-03-11 22:33:42 +00:00
jhb	b2bb08b487	- Change witness_displaydescendants() to accept the indentation level as a parameter instead of using the level of a given witness. When recursing, pass an indent level of indent + 1. - Make use of the information witness_levelall() provides in witness_display_list() to use an O(n) algorithm instead of an O(n^2) algo to decide which witnesses to display hierarchies from. Basically, we only display a hierarchy for witnesses with a level of 0. - Add a new per-witness flag that is reset at the start of witness_display() for all witness's and is set the first time a witness is displayed in witness_displaydescendants(). If a witness is encountered more than once in the lock order tree (which happens often), witness_displaydescendants() marks the later occurrences with the string "(already displayed)" and doesn't display the subtree under that witness. This avoids duplicating large amounts of the lock order tree in the 'show witness' output in DDB. All these changes serve to make 'show witness' a lot more readable and useful than it was previously.	2003-03-11 22:14:21 +00:00
jhb	736396fc05	- Split the itismychild() function into two functions: insertchild() adds a witness to the child list of a parent witness. rebalancetree() runs through the entire tree removing direct descendants of witnesses who already have said child witness as an indirect descendant through another direct descendant. itismychild() now calls insertchild() followed by rebalancetree() and no longer needs the evil hack of having static recursed variable. - Add a function reparentchildren() that adds all the direct descendants of one witness as direct descendants of another witness. - Change the return value of itismychild() and similar functions so that they return 0 in the case of failure due to lack of resources instead of 1. This makes the return value more intuitive. - Check the return value of itismychild() when defining the static lock order in witness_initialize(). - Don't try to setup a lock instance in witness_lock() if itismychild() fails. Witness is hosed anyways so no need to do any more witness related activity at that point. It also makes the code flow easier to understand. - Add a new depart() function as the opposite of enroll(). When the reference count of a witness drops to 0 in witness_destroy(), this function is called on that witness. First, it runs through the lock order tree using reparentchildren() to reparent direct descendants of the departing witness to each of the witness' parents in the tree. Next, it releases it's own child list and other associated resources. Finally it calls rebalanacetree() to rebalance the lock order tree. - Sort function prototypes into something closer to alphabetical order. As a result of these changes, there should no longer be 'dead' witnesses in the order tree, and repeatedly loading and unloading a module should no longer exhaust witness of its internal resources. Inspired by: gallatin	2003-03-11 22:07:35 +00:00
jhb	5aeddebb51	Trim useless "../" leading strings from filenames passed into witness.	2003-03-11 21:53:12 +00:00
jhb	42e39caaab	Adjust style of #ifdef's and #endif's to be more consistent and in line with recent additions to style(9).	2003-03-11 21:38:49 +00:00
jhb	f7f4727b44	Do the lock order check skip for the LOP_TRYLOCK case after the check for recursing on a lock instead of before. This fixes a bug where WITNESS could get a little confused if you did an sx_tryslock() on a sx lock that you already had an slock on. WITNESS would still function correctly but it could result in weirdness in the output of 'show locks'. This also makes it possible for mtx_trylock() to recurse on a lock.	2003-03-11 20:54:37 +00:00
jhb	cb710fa095	Now that we have WITNESS_WARN(), we only call witness_list() from the ddb 'show locks' command. Thus, move witness_list() to the #ifdef DDB section and remove extra checks for calling this function outside of DDB. Also, witness_list() now returns void instead of returning an int. Reported by: Steve Ames <steve@energistic.com> Prodded by: davidxu	2003-03-10 17:03:57 +00:00
jhb	7e6cfbee4e	Oops, fix the double faults people were seeing with the recent changes to witness. Sleepable locks such as sx locks always come before all mutexes including Giant. However, the static lock order list placed Giant before the proctree and allproc sx locks. This resulted in witness creating a cycle in its lock order "tree" (real trees don't have cycles) leading to infinite recursion and eventually a double fault. To fix, put Giant after sx locks in the lock order list.	2003-03-06 17:25:06 +00:00
jhb	45fcac94f4	Bah, fix a bogon in the last commit: get the sense of a compare test right so that we allow a sleepable lock to be acquired with Giant held rather than allowing a sleepable lock to be acquired with anything but Giant held.	2003-03-04 22:34:07 +00:00
jhb	c5a53306ce	A small overhaul of witness: - Add a comment about special lock order rules and Giant near the top of subr_witness.c. Specifically, this documents and explains the real lock order relationship between Giant and sleepable locks (i.e. lockmgr locks and sx locks). Basically, Giant can be safely acquired either before or after sleepable locks and the case of Giant before a sleepable lock is exempted as a special case. - Add a new static function 'witness_list_lock()' that displays a single line of information about a struct lock_instance. This is used to make the output of witness messages more consistent and reduce some code duplication. - Fixup a few comments in witness_lock(). - Properly handle the Giant-before-sleepable-lock lock order exception in a more general fashion and remove the no longer needed LI_SLEPT flag. - Break up the last condition before assuming a reversal a bit to try and make the logic less confusing in witness_lock(). - Axe WITNESS_SLEEP() now that LI_SLEPT is no longer needed and replace it with a more general WITNESS_WARN() macro/function combination. WITNESS_WARN() allows you to output a customized message out to the console along with a list of held locks. It will optionally drop into the debugger as well. You can exempt a single lock from the check by passing it in as the second argument. You can also use flags to specify if Giant should be exempt from the check, if all sleepable locks should be exempt from the check, and if witness should panic if any non-exempt locks are found. - Make the witness_list() function static. Other areas of the kernel should use the new WITNESS_WARN() instead.	2003-03-04 20:56:39 +00:00
peter	c8ccde8063	Initiate de-orbit burn for USE_PCI_BIOS_FOR_READ_WRITE. This has been #if'ed out for a while. Complete the deed and tidy up some other bits. We need to be able to call this stuff from outer edges of interrupt handlers for devices that have the ISR bits in pci config space. Making the bios code mpsafe was just too hairy. We had also stubbed it out some time ago due to there simply being too much brokenness in too many systems. This adds a leaf lock so that it is safe to use pci_read_config() and pci_write_config() from interrupt handlers. We still will use pcibios to do interrupt routing if there is no acpi.. [yes, I tested this] Briefly glanced at by: imp	2003-02-18 03:36:49 +00:00
jeff	590a39e29b	- Split the struct kse into struct upcall and struct kse. struct kse will soon be visible only to schedulers. This greatly simplifies much the KSE code. Submitted by: davidxu	2003-02-17 05:14:26 +00:00
peter	30c571736e	Add a 'debug.witness_trace' sysctl (and tunable) when DDB is present. This causes LOR and could-sleep messages to come with a stack trace.	2003-02-13 01:35:56 +00:00
julian	e8efa7328e	Reversion of commit by Davidxu plus fixes since applied. I'm not convinced there is anything major wrong with the patch but them's the rules.. I am using my "David's mentor" hat to revert this as he's offline for a while.	2003-02-01 12:17:09 +00:00
davidxu	4b9b549ca2	Move UPCALL related data structure out of kse, introduce a new data structure called kse_upcall to manage UPCALL. All KSE binding and loaning code are gone. A thread owns an upcall can collect all completed syscall contexts in its ksegrp, turn itself into UPCALL mode, and takes those contexts back to userland. Any thread without upcall structure has to export their contexts and exit at user boundary. Any thread running in user mode owns an upcall structure, when it enters kernel, if the kse mailbox's current thread pointer is not NULL, then when the thread is blocked in kernel, a new UPCALL thread is created and the upcall structure is transfered to the new UPCALL thread. if the kse mailbox's current thread pointer is NULL, then when a thread is blocked in kernel, no UPCALL thread will be created. Each upcall always has an owner thread. Userland can remove an upcall by calling kse_exit, when all upcalls in ksegrp are removed, the group is atomatically shutdown. An upcall owner thread also exits when process is in exiting state. when an owner thread exits, the upcall it owns is also removed. KSE is a pure scheduler entity. it represents a virtual cpu. when a thread is running, it always has a KSE associated with it. scheduler is free to assign a KSE to thread according thread priority, if thread priority is changed, KSE can be moved from one thread to another. When a ksegrp is created, there is always N KSEs created in the group. the N is the number of physical cpu in the current system. This makes it is possible that even an userland UTS is single CPU safe, threads in kernel still can execute on different cpu in parallel. Userland calls kse_create to add more upcall structures into ksegrp to increase concurrent in userland itself, kernel is not restricted by number of upcalls userland provides. The code hasn't been tested under SMP by author due to lack of hardware. Reviewed by: julian	2003-01-26 11:41:35 +00:00
jake	0aecfa195b	Oops, add zstty to the witness order list. Noticed by: benno	2003-01-09 15:45:28 +00:00
jake	02d8249471	- Add a spin lock to single thread cache invalidation and tlb flush ipis, which allows ipis to be sent outside of Giant. - Remove the ap boot mutex, which is unused.	2002-12-22 20:50:23 +00:00
kris	a658b7f4f9	Enforce correct ordering of the filedesc structure and pipe mutex, because WITNESS can get the order wrong if it guesses based on first use. Reviewed by: jhb, alfred	2002-12-22 16:32:34 +00:00
jhb	2dfe09331e	Correct an assertion in the code to traverse the list of locks to find an earlier acquired lock with the same witness as the lock currently being acquired. If we had released several earlier acquired locks after acquiring enough locks to require another lock_list_entry bucket in the lock list, then subsequent lock_list_entry buckets could contain only one lock instance in which case i would be zero. Reported by: Joel M. Baldwin <qumqats@outel.org>	2002-11-11 16:36:20 +00:00
alc	b3cfb0145e	Catch up with the removal of the vm page buckets spin mutex.	2002-11-02 22:42:18 +00:00
phk	5a491baa51	#unifdef the code for checking blessed lock collisions until we need it. Spotted by: DARPA & NAI Labs.	2002-10-20 08:48:39 +00:00
peter	439214de2e	Register the machine check private state spinlock on ia64.	2002-10-12 00:33:36 +00:00
phk	1dfc2c167f	Be consistent about "static" functions: if the function is marked static in its prototype, mark it static at the definition too. Inspired by: FlexeLint warning #512	2002-09-28 17:15:38 +00:00
jeff	4e382c3af5	- Tell witness about ALQ's spin lock.	2002-09-22 07:11:57 +00:00
jake	c964ec41e1	Make this driver work a whole lot better. - Get the initial mode from the prom settings and don't clobber the mode on open. - Copy output into an internal ring buffer instead of accessing the tty outq directly in the interrupt handler. This fixes a problem where garbage would show up in the output stream. - Reset the console port completely and reprogram all the parameters before enabling it. This fixes seemingly random hangs on startup when using a fast interrupt handler. - Add minimal locking in place of spls. - Remove dead code and minor cleanups.	2002-09-08 04:45:16 +00:00
iedowse	3643fd9d4d	Add WITNESS_FILE() and WITNESS_LINE(), which allow users of witness to print out the file and line from the lock object. These will be used shortly by CTR() calls in the mutex code. Reviewed by: jhb, jake	2002-08-26 18:31:26 +00:00
mp	0c587ffca0	Silence compiler warnings when DDB is not defined. PR: 36002 Submitted by: Yoshikazu GOTO <goto@snowy.to>	2002-07-15 02:03:17 +00:00
peter	4d88d6566a	Revive backed out pmap related changes from Feb 2002. The highlights are: - It actually works this time, honest! - Fine grained TLB shootdowns for SMP on i386. IPI's are very expensive, so try and optimize things where possible. - Introduce ranged shootdowns that can be done as a single IPI. - PG_G support for i386 - Specific-cpu targeted shootdowns. For example, there is no sense in globally purging the TLB cache for where we are stealing a page from the local unshared process on the local cpu. Use pm_active to track this. - Add some instrumentation for the tlb shootdown code. - Rip out SMP code from <machine/cpufunc.h> - Try and fix some very bogus PG_G and PG_PS interactions that were bad enough to cause vm86 bios calls to break. vm86 depended on our existing bugs and this was the cause of the VESA panics last time. - Fix the silly one-line error that caused the 'panic: bad pte' last time. - Fix a couple of other silly one-line errors that should have caused more pain than they did. Some more work is needed: - pmap_{zero,copy}_page[_idle]. These can be done without IPI's if we have a hook in cpu_switch. - The IPI handlers need some cleanup. I have a bogus %ds load that can be avoided. - APTD handling is rather bogus and appears to be a large source of global TLB IPI shootdowns for no really good reason. I see speedups of between 1.5% and ~4% on buildworlds in a while 1 loop. I expect to see a bigger difference when there is significant pageout activity or the system otherwise has memory shortages. I have backed out a few optimizations that I had been using over the last few days in order to be a little more conservative. I'll revisit these again over the next few days as the dust settles. New option: DISABLE_PG_G - In case I missed something.	2002-07-12 07:56:11 +00:00
alc	29ede8a9d4	o Resurrect vm_page_lock_queues(), vm_page_unlock_queues(), and the free queue lock (revision 1.33 of vm/vm_page.c removed them). o Make the free queue lock a spin lock because it's sometimes acquired inside of a critical section.	2002-07-04 22:07:37 +00:00
julian	aa2dc0a5d9	Part 1 of KSE-III The ability to schedule multiple threads per process (one one cpu) by making ALL system calls optionally asynchronous. to come: ia64 and power-pc patches, patches for gdb, test program (in tools) Reviewed by: Almost everyone who counts (at various times, peter, jhb, matt, alfred, mini, bernd, and a cast of thousands) NOTE: this is still Beta code, and contains lots of debugging stuff. expect slight instability in signals..	2002-06-29 17:26:22 +00:00
jhb	165d918ce2	Change the all locks list from a STAILQ to a TAILQ. This bloats struct lock_object by another pointer (though all of lock_object should be conditional on LOCK_DEBUG anyways) in exchange for an O(1) TAILQ_REMOVE() in witness_destroy() (called for every mtx_destroy() and sx_destroy()) instead of an O(n) STAILQ_REMOVE. Since WITNESS is so dog slow as it is, the speed-up is worth the space cost. Suggested by: iedowse	2002-06-06 20:51:04 +00:00
jhb	de3e290d8f	Handle "dead" witnesses better in the situation of several short term locks being created and destroyed without a single long-term one around to ensure the witness associated with that group of locks stays alive. The pipe mutexes are an example of this group. For a dead witness we no longer clear the witness name. Instead, when looking up the witness for a lock, if a dead witness' (a witness with a refcount of 0) w_name pointer is identical to the witness name of the lock then we revive that witness instead of using a new witness for the lock. This results in far fewer dead witness objects and also better preserves locking orders over the long term resulting in more correct lock order checking. Note that we can't ever derefence w_name of a dead witness since we don't know if the string it is pointing to has been free()'d or kldunload()'d out from under us.	2002-06-06 19:04:38 +00:00
jhb	a4a680304c	In witness_unlock(), when updating a lock list entry bucket, decrement the count of lock list entries after we fixup the bucket of lock list entries. In theory we can remove the intr_disable/intr_restore() calls now.	2002-05-20 19:16:22 +00:00
jhb	4423d1f90a	- Allow witness_sleep() to be called when witness hasn't been initialized yet. We just return without performing any checks. - Don't explicitly enter and exit critical sections when walking lock lists. We don't need a critical section to walk the list of sleep locks for a thread. We check to see if a spin lock list is empty before we walk it. If the list is empty we don't need to walk it. If it isn't then we already hold at least one spin lock and are already in a critical section and thus don't need our own explicit critical section.	2002-05-20 17:49:46 +00:00
alfred	d1e340364b	Make funsetown() take a 'struct sigio **' so that the locking can be done internally. Ensure that no one can fsetown() to a dying process/pgrp. We need to check the process for P_WEXIT to see if it's exiting. Process groups are already safe because there is no such thing as a pgrp zombie, therefore the proctree lock completely protects the pgrp from having sigio structures associated with it after it runs funsetownlst. Add sigio lock to witness list under proctree and allproc, but over proc and pgrp. Seigo Tanimura helped with this.	2002-05-06 19:31:28 +00:00
alc	0e84366ae7	o Convert the vm_page buckets mutex to a spin lock. (This resolves an issue on the Alpha platform found by jeff@.) o Simplify vm_page_lookup(). Reviewed by: jhb	2002-04-30 21:24:47 +00:00
jhb	366bb5db9c	Whitespace bogon.	2002-04-27 04:48:36 +00:00
marcel	37e2e2ecca	Insert a semi-colon between label 'skip:' and the closing brace of the FOREACH loop to silence GCC 3.	2002-04-27 02:58:18 +00:00
des	b3648bf706	Add the mutex profiling lock to the witness list. This hopefully unbreaks the MUTEX_PROFILING + WITNESS + !WITNESS_SKIPSPIN case. Submitted by: Hiten Pandya <hiten@uk.FreeBSD.org>	2002-04-25 22:48:40 +00:00
jhb	7202da4491	- Merge the pgrpsess_lock and proctree_lock sx locks into one proctree_lock sx lock. Trying to get the lock order between these locks was getting too complicated as the locking in wait1() was being fixed. - leavepgrp() now requires an exclusive lock of proctree_lock to be held when it is called. - fixjobc() no longer gets a shared lock of proctree_lock now that it requires an xlock be held by the caller. - Locking notes in sys/proc.h are adjusted to note that everything that used to be protected by the pgrpsess_lock is now protected by the proctree_lock.	2002-04-16 17:03:05 +00:00
jhb	78e19df6f6	Display the recursion count in the lock_instance in the show locks output. Indirectly requested by: peter	2002-04-10 01:25:11 +00:00
jhb	ad726578d6	Cosmetic fixup in output of lock types in show locks output.	2002-04-10 01:19:53 +00:00
jhb	8143d2b80e	Add a new char * pointer lo_type to struct lock_object that is used to point to a more generic name for a lock that is more suitable for use by witness when grouping locks. For example, although network driver locks use the interface name for the name of each lock, they should all use the same witness and be treated the same as witness. Another example is that all UMA zone locks should be treated the same. The witness code has also been updated to print out the lock type in addition to the lock name in a few places where it is relevant.	2002-04-04 20:45:21 +00:00
jhb	2c4739409a	Enforce an implicit lock order of sleepable locks before non-sleepable locks.	2002-04-02 19:27:21 +00:00
jhb	77dc513737	Explicitly document how we implicitly enforce the lock order of sleep locks before spin locks.	2002-04-02 16:51:20 +00:00
jeff	dff418f166	Add a new mtx_init option "MTX_DUPOK" which allows duplicate acquires of locks with this flag. Remove the dup_list and dup_ok code from subr_witness. Now we just check for the flag instead of doing string compares. Also, switch the process lock, process group lock, and uma per cpu locks over to this interface. The original mechanism did not work well for uma because per cpu lock names are unique to each zone. Approved by: jhb	2002-03-27 09:23:41 +00:00

1 2 3 4

151 Commits