freebsd-skq

Author	SHA1	Message	Date
fabient	5edfb77dd3	Add software PMC support. New kernel events can be added at various location for sampling or counting. This will for example allow easy system profiling whatever the processor is with known tools like pmcstat(8). Simultaneous usage of software PMC and hardware PMC is possible, for example looking at the lock acquire failure, page fault while sampling on instructions. Sponsored by: NETASQ MFC after: 1 month	2012-03-28 20:58:30 +00:00
avg	7d367dba13	put sys/systm.h at its proper place or add it if missing Reported by: lstewart, tinderbox Pointyhat to: avg, attilio MFC after: 1 week MFC with: r228430	2011-12-12 10:05:13 +00:00
avg	75ddaeae80	panic: add a switch and infrastructure for stopping other CPUs in SMP case Historical behavior of letting other CPUs merily go on is a default for time being. The new behavior can be switched on via kern.stop_scheduler_on_panic tunable and sysctl. Stopping of the CPUs has (at least) the following benefits: - more of the system state at panic time is preserved intact - threads and interrupts do not interfere with dumping of the system state Only one thread runs uninterrupted after panic if stop_scheduler_on_panic is set. That thread might call code that is also used in normal context and that code might use locks to prevent concurrent execution of certain parts. Those locks might be held by the stopped threads and would never be released. To work around this issue, it was decided that instead of explicit checks for panic context, we would rather put those checks inside the locking primitives. This change has substantial portions written and re-written by attilio and kib at various times. Other changes are heavily based on the ideas and patches submitted by jhb and mdf. bde has provided many insights into the details and history of the current code. The new behavior may cause problems for systems that use a USB keyboard for interfacing with system console. This is because of some unusual locking patterns in the ukbd code which have to be used because on one hand ukbd is below syscons, but on the other hand it has to interface with other usb code that uses regular mutexes/Giant for its concurrency protection. Dumping to USB-connected disks may also be affected. PR: amd64/139614 (at least) In cooperation with: attilio, jhb, kib, mdf Discussed with: arch@, bde Tested by: Eugene Grosbein <eugen@grosbein.net>, gnn, Steven Hartland <killing@multiplay.co.uk>, glebius, Andrew Boyer <aboyer@averesystems.com> (various versions of the patch) MFC after: 3 months (or never)	2011-12-11 21:02:01 +00:00
attilio	b95134ea01	Introduce the same mutex-wise fix in r227758 for sx locks. The functions that offer file and line specifications are: - sx_assert_ - sx_downgrade_ - sx_slock_ - sx_slock_sig_ - sx_sunlock_ - sx_try_slock_ - sx_try_xlock_ - sx_try_upgrade_ - sx_unlock_ - sx_xlock_ - sx_xlock_sig_ - sx_xunlock_ Now vm_map locking is fully converted and can avoid to know specifics about locking procedures. Reviewed by: kib MFC after: 1 month	2011-11-21 12:59:52 +00:00
pjd	a3e664d830	Constify arguments for locking KPIs where possible. This enables locking consumers to pass their own structures around as const and be able to assert locks embedded into those structures. Reviewed by: ed, kib, jhb	2011-11-16 21:51:17 +00:00
ed	0c56cf839d	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.	2011-11-07 15:43:11 +00:00
jeff	2d7d8c05e7	- Merge changes to the base system to support OFED. These include a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND, and other miscellaneous small features.	2011-03-21 09:40:01 +00:00
mdf	f6a71a40b2	sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly. Commit the kernel changes.	2011-01-12 19:54:19 +00:00
jhb	c17f46e472	Remove unneeded includes of <sys/linker_set.h>. Other headers that use it internally contain nested includes. Reviewed by: bde	2011-01-11 13:59:06 +00:00
jhb	72cdd6ef99	Fix a sign bug that caused adaptive spinning in sx_xlock() to not work properly. Among other things it did not drop Giant while spinning leading to livelocks. Reviewed by: rookie, kib, jmallett MFC after: 3 days	2010-06-08 16:17:47 +00:00
attilio	b1c6888d87	In current code, threads performing an interruptible sleep (on both sxlock, via the sx_{s, x}lock_sig() interface, or plain lockmgr), will leave the waiters flag on forcing the owner to do a wakeup even when if the waiter queue is empty. That operation may lead to a deadlock in the case of doing a fake wakeup on the "preferred" (based on the wakeup algorithm) queue while the other queue has real waiters on it, because nobody is going to wakeup the 2nd queue waiters and they will sleep indefinitively. A similar bug, is present, for lockmgr in the case the waiters are sleeping with LK_SLEEPFAIL on. In this case, even if the waiters queue is not empty, the waiters won't progress after being awake but they will just fail, still not taking care of the 2nd queue waiters (as instead the lock owned doing the wakeup would expect). In order to fix this bug in a cheap way (without adding too much locking and complicating too much the semantic) add a sleepqueue interface which does report the actual number of waiters on a specified queue of a waitchannel (sleepq_sleepcnt()) and use it in order to determine if the exclusive waiters (or shared waiters) are actually present on the lockmgr (or sx) before to give them precedence in the wakeup algorithm. This fix alone, however doesn't solve the LK_SLEEPFAIL bug. In order to cope with it, add the tracking of how many exclusive LK_SLEEPFAIL waiters a lockmgr has and if all the waiters on the exclusive waiters queue are LK_SLEEPFAIL just wake both queues. The sleepq_sleepcnt() introduction and ABI breakage require __FreeBSD_version bumping. Reported by: avg, kib, pho Reviewed by: kib Tested by: pho	2009-12-12 21:31:07 +00:00
attilio	e9f2530ebf	When releasing a read/shared lock we need to use a write memory barrier in order to avoid, on architectures which doesn't have strong ordered writes, CPU instructions reordering. Diagnosed by: fabio Reviewed by: jhb Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2009-09-30 13:26:31 +00:00
attilio	ce3b4baf3c	Fix some bugs related to adaptive spinning: In the lockmgr support: - GIANT_RESTORE() is just called when the sleep finishes, so the current code can ends up into a giant unlock problem. Fix it by appropriately call GIANT_RESTORE() when needed. Note that this is not exactly ideal because for any interation of the adaptive spinning we drop and restore Giant, but the overhead should be not a factor. - In the lock held in exclusive mode case, after the adaptive spinning is brought to completition, we should just retry to acquire the lock instead to fallthrough. Fix that. - Fix a style nit In the sx support: - Call GIANT_SAVE() before than looping. This saves some overhead because in the current code GIANT_SAVE() is called several times. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2009-09-02 17:33:51 +00:00
attilio	e75d30c87f	* Change the scope of the ASSERT_ATOMIC_LOAD() from a generic check to a pointer-fetching specific operation check. Consequently, rename the operation ASSERT_ATOMIC_LOAD_PTR(). * Fix the implementation of ASSERT_ATOMIC_LOAD_PTR() by checking directly alignment on the word boundry, for all the given specific architectures. That's a bit too strict for some common case, but it assures safety. * Add a comment explaining the scope of the macro * Add a new stub in the lockmgr specific implementation Tested by: marcel (initial version), marius Reviewed by: rwatson, jhb (comment specific review) Approved by: re (kib)	2009-08-17 16:17:21 +00:00
bz	bf6acf7985	Add a new macro to test that a variable could be loaded atomically. Check that the given variable is at most uintptr_t in size and that it is aligned. Note: ASSERT_ATOMIC_LOAD() uses ALIGN() to check for adequate alignment -- however, the function of ALIGN() is to guarantee alignment, and therefore may lead to stronger alignment enforcement than necessary for types that are smaller than sizeof(uintptr_t). Add checks to mtx, rw and sx locks init functions to detect possible breakage. This was used during debugging of the problem fixed with r196118 where a pointer was on an un-aligned address in the dpcpu area. In collaboration with: rwatson Reviewed by: rwatson Approved by: re (kib)	2009-08-14 21:46:54 +00:00
attilio	44c490ae17	Handle lock recursion differenty by always checking against LO_RECURSABLE instead the lock own flag itself. Tested by: pho	2009-06-02 13:03:35 +00:00
attilio	4ae494ac2c	The patch for r193011 was partially rejected when applied, complete it.	2009-05-29 08:01:48 +00:00
attilio	e05714ba70	Reverse the logic for ADAPTIVE_SX option and enable it by default. Introduce for this operation the reverse NO_ADAPTIVE_SX option. The flag SX_ADAPTIVESPIN to be passed to sx_init_flags(9) gets suppressed and the new flag, offering the reversed logic, SX_NOADAPTIVE is added. Additively implements adaptive spininning for sx held in shared mode. The spinning limit can be handled through sysctls in order to be tuned while the code doesn't reach the release, after which time they should be dropped probabilly. This change has made been necessary by recent benchmarks where it does improve concurrency of workloads in presence of high contention (ie. ZFS). KPI breakage is documented by __FreeBSD_version bumping, manpage and UPDATING updates. Requested by: jeff, kmacy Reviewed by: jeff Tested by: pho	2009-05-29 01:49:27 +00:00
sson	c0d5996eb6	Add the OpenSolaris dtrace lockstat provider. The lockstat provider adds probes for mutexes, reader/writer and shared/exclusive locks to gather contention statistics and other locking information for dtrace scripts, the lockstat(1M) command and other potential consumers. Reviewed by: attilio jhb jb Approved by: gnn (mentor)	2009-05-26 20:28:22 +00:00
jeff	9fedeedb8d	- Wrap lock profiling state variables in #ifdef LOCK_PROFILING blocks.	2009-03-15 08:03:54 +00:00
jhb	af0471aaec	Teach WITNESS about the interlocks used with lockmgr. This removes a bunch of spurious witness warnings since lockmgr grew witness support. Before this, every time you passed an interlock to a lockmgr lock WITNESS treated it as a LOR. Reviewed by: attilio	2008-09-10 19:13:30 +00:00
jhb	8af56fb687	If a thread that is swapped out is made runnable, then the setrunnable() routine wakes up proc0 so that proc0 can swap the thread back in. Historically, this has been done by waking up proc0 directly from setrunnable() itself via a wakeup(). When waking up a sleeping thread that was swapped out (the usual case when waking proc0 since only sleeping threads are eligible to be swapped out), this resulted in a bit of recursion (e.g. wakeup() -> setrunnable() -> wakeup()). With sleep queues having separate locks in 6.x and later, this caused a spin lock LOR (sleepq lock -> sched_lock/thread lock -> sleepq lock). An attempt was made to fix this in 7.0 by making the proc0 wakeup use the ithread mechanism for doing the wakeup. However, this required grabbing proc0's thread lock to perform the wakeup. If proc0 was asleep elsewhere in the kernel (e.g. waiting for disk I/O), then this degenerated into the same LOR since the thread lock would be some other sleepq lock. Fix this by deferring the wakeup of the swapper until after the sleepq lock held by the upper layer has been locked. The setrunnable() routine now returns a boolean value to indicate whether or not proc0 needs to be woken up. The end result is that consumers of the sleepq API such as *sleep/wakeup, condition variables, sx locks, and lockmgr, have to wakeup proc0 if they get a non-zero return value from sleepq_abort(), sleepq_broadcast(), or sleepq_signal(). Discussed with: jeff Glanced at by: sam Tested by: Jurgen Weber jurgen - ish com au MFC after: 2 weeks	2008-08-05 20:02:31 +00:00
attilio	f7f31164f1	- Embed the recursion counter for any locking primitive directly in the lock_object, using an unified field called lo_data. - Replace lo_type usage with the w_name usage and at init time pass the lock "type" directly to witness_init() from the parent lock init function. Handle delayed initialization before than witness_initialize() is called through the witness_pendhelp structure. - Axe out LO_ENROLLPEND as it is not really needed. The case where the mutex init delayed wants to be destroyed can't happen because witness_destroy() checks for witness_cold and panic in case. - In enroll(), if we cannot allocate a new object from the freelist, notify that to userspace through a printf(). - Modify the depart function in order to return nothing as in the current CVS version it always returns true and adjust callers accordingly. - Fix the witness_addgraph() argument name prototype. - Remove unuseful code from itismychild(). This commit leads to a shrinked struct lock_object and so smaller locks, in particular on amd64 where 2 uintptr_t (16 bytes per-primitive) are gained. Reviewed by: jhb	2008-05-15 20:10:06 +00:00
jeff	3b1acbdce2	- Pass the priority argument from sleep() into sleepq and down into sched_sleep(). This removes extra thread_lock() acquisition and allows the scheduler to decide what to do with the static boost. - Change the priority arguments to cv_ to match sleepq/msleep/etc. where 0 means no priority change. Catch -1 in cv_broadcastpri() and convert it to 0 for now. - Set a flag when sleeping in a way that is compatible with swapping since direct priority comparisons are meaningless now. - Add a sysctl to ule, kern.sched.static_boost, that defaults to on which controls the boost behavior. Turning it off gives better performance in some workloads but needs more investigation. - While we're modifying sleepq, change signal and broadcast to both return with the lock held as the lock was held on enter. Reviewed by: jhb, peter	2008-03-12 06:31:06 +00:00
jeff	12adc443d6	- Re-implement lock profiling in such a way that it no longer breaks the ABI when enabled. There is no longer an embedded lock_profile_object in each lock. Instead a list of lock_profile_objects is kept per-thread for each lock it may own. The cnt_hold statistic is now always 0 to facilitate this. - Support shared locking by tracking individual lock instances and statistics in the per-thread per-instance lock_profile_object. - Make the lock profiling hash table a per-cpu singly linked list with a per-cpu static lock_prof allocator. This removes the need for an array of spinlocks and reduces cache contention between cores. - Use a seperate hash for spinlocks and other locks so that only a critical_enter() is required and not a spinlock_enter() to modify the per-cpu tables. - Count time spent spinning in the lock statistics. - Remove the LOCK_PROFILE_SHARED option as it is always supported now. - Specifically drop and release the scheduler locks in both schedulers since we track owners now. In collaboration with: Kip Macy Sponsored by: Nokia	2007-12-15 23:13:31 +00:00
attilio	bfc761fdba	Expand lock class with the "virtual" function lc_assert which will offer an unified way for all the lock primitives to express lock assertions. Currenty, lockmgrs and rmlocks don't have assertions, so just panic in that case. This will be a base for more callout improvements. Ok'ed by: jhb, jeff	2007-11-18 14:43:53 +00:00
julian	b2732e0c22	generally we are interested in what thread did something as opposed to what process. Since threads by default have teh name of the process unless over-written with more useful information, just print the thread name instead.	2007-11-14 06:21:24 +00:00
pjd	9281633138	Fix sx_try_slock(), so it only fails when there is an exclusive owner. Before that fix, it was possible for the function to fail if number of sharers changes between 'x = sx->sx_lock' step and atomic_cmpset_acq_ptr() call. This fixes ZFS problem when ZFS returns strange EIO errors under load. In ZFS there is a code that depends on the fact that sx_try_slock() can only fail if there is an exclusive owner. Discussed with: attilio Reviewed by: jhb Approved by: re (kensmith)	2007-10-02 14:48:48 +00:00
attilio	7f10997905	Fix some problems with lock_profiling in sx locks: - Adjust lock_profiling stubs semantic in the hard functions in order to be more accurate and trustable - Disable shared paths for lock_profiling. Actually, lock_profiling has a subtle race which makes results caming from shared paths not completely trustable. A macro stub (LOCK_PROFILING_SHARED) can be actually used for re-enabling this paths, but is currently intended for developing use only. - Use homogeneous names for automatic variables in hard functions regarding lock_profiling - Style fixes - Add a CTASSERT for some flags building Discussed with: kmacy, kris Approved by: jeff (mentor) Approved by: re	2007-07-06 13:20:44 +00:00
attilio	d5fdf88def	Add functions sx_xlock_sig() and sx_slock_sig(). These functions are intended to do the same actions of sx_xlock() and sx_slock() but with the difference to perform an interruptible sleep, so that sleep can be interrupted by external events. In order to support these new featueres, some code renstruction is needed, but external API won't be affected at all. Note: use "void" cast for "int" returning functions in order to avoid tools like Coverity prevents to whine. Requested by: rwatson Tested by: rwatson Reviewed by: jhb Approved by: jeff (mentor)	2007-05-31 09:14:48 +00:00
attilio	3fb9f2f242	style(9) fixes for sx locks. Approved by: jeff (mentor)	2007-05-29 19:46:37 +00:00
attilio	8ab4fe4253	Add a small fix for lock profiling in sx locks. "0" cannot be a correct value since when the function is entered at least one shared holder must be present and since we want the last one "1" is the correct value. Note that lock_profiling for sx locks is far from being perfect. Expect further fixes for that. Approved by: jeff (mentor)	2007-05-29 19:34:32 +00:00
jhb	124d541ab4	Rename the macros for assertion flags passed to sx_assert() from SX_* to SA_* to match mutexes and rwlocks. The old flags still exist for backwards compatiblity. Requested by: attilio	2007-05-19 21:26:05 +00:00
jhb	3fb1eb1627	Expose sx_xholder() as a public macro. It returns a pointer to the thread that holds the current exclusive lock, or NULL if no thread holds an exclusive lock. Requested by: pjd	2007-05-19 20:18:12 +00:00
jhb	b98c11a0e2	Oops, didn't include SX_ADAPTIVESPIN in the list of valid flags for the assert in sx_init_flags(). Submitted by: attilio	2007-05-19 18:34:24 +00:00
jhb	410a2fc7eb	Add a new SX_RECURSE flag to make support for recursive exclusive locks conditional. By default, sx(9) locks are back to not supporting recursive exclusive locks. Submitted by: attilio	2007-05-19 16:35:27 +00:00
jhb	b12787f5fb	Fix a comment.	2007-05-18 15:05:41 +00:00
jhb	f78df6dd24	Move lock_profile_object_{init,destroy}() into lock_{init,destroy}().	2007-05-18 15:04:59 +00:00
jhb	7b6b99abf6	Add destroyed cookie values for sx locks and rwlocks as well as extra KASSERTs so that any lock operations on a destroyed lock will panic or hang.	2007-05-08 21:51:37 +00:00
kmacy	16fabe1e36	fix typo	2007-04-04 00:11:22 +00:00
kmacy	f66541917b	style fixes and make sure that the lock is treated as released in the sharers == 0 case not that this is somewhat racy because a new sharer can come in while we're updating stats	2007-04-04 00:01:05 +00:00
kmacy	836059f8b9	Fixes to sx for newsx - fix recursed case and move out of inline Submitted by: Attilio Rao <attilio@freebsd.org>	2007-04-03 22:58:21 +00:00
jhb	b0b93a3c55	Optimize sx locks to use simple atomic operations for the common cases of obtaining and releasing shared and exclusive locks. The algorithms for manipulating the lock cookie are very similar to that rwlocks. This patch also adds support for exclusive locks using the same algorithm as mutexes. A new sx_init_flags() function has been added so that optional flags can be specified to alter a given locks behavior. The flags include SX_DUPOK, SX_NOWITNESS, SX_NOPROFILE, and SX_QUITE which are all identical in nature to the similar flags for mutexes. Adaptive spinning on select locks may be enabled by enabling the ADAPTIVE_SX kernel option. Only locks initialized with the SX_ADAPTIVESPIN flag via sx_init_flags() will adaptively spin. The common cases for sx_slock(), sx_sunlock(), sx_xlock(), and sx_xunlock() are now performed inline in non-debug kernels. As a result, <sys/sx.h> now requires <sys/lock.h> to be included prior to <sys/sx.h>. The new kernel option SX_NOINLINE can be used to disable the aforementioned inlining in non-debug kernels. The size of struct sx has changed, so the kernel ABI is probably greatly disturbed. MFC after: 1 month Submitted by: attilio Tested by: kris, pjd	2007-03-31 23:23:42 +00:00
jhb	a84f74bb36	Rename the 'mtx_object', 'rw_object', and 'sx_object' members of mutexes, rwlocks, and sx locks to 'lock_object'.	2007-03-21 21:20:51 +00:00
jhb	60ad130f46	Add two new function pointers 'lc_lock' and 'lc_unlock' to lock classes. These functions are intended to be used to drop a lock and then reacquire it when doing an sleep such as msleep(9). Both functions accept a 'struct lock_object *' as their first parameter. The 'lc_unlock' function returns an integer that is then passed as the second paramter to the subsequent 'lc_lock' function. This can be used to communicate state. For example, sx locks and rwlocks use this to indicate if the lock was share/read locked vs exclusive/write locked. Currently, spin mutexes and lockmgr locks do not provide working lc_lock and lc_unlock functions.	2007-03-09 16:27:11 +00:00
jhb	8e9f4f8a39	Use C99-style struct member initialization for lock classes.	2007-03-09 16:04:44 +00:00
kmacy	e1b1eae592	lock stats updates need to be protected by the lock	2007-03-02 07:21:20 +00:00
kmacy	6993a10996	Evidently I've overestimated gcc's ability to peak inside inline functions and optimize away unused stack values. The 48 bytes that the lock_profile_object adds to the stack evidently has a measurable performance impact on certain workloads.	2007-03-01 09:35:48 +00:00
kmacy	b7672bad26	Further improvements to LOCK_PROFILING: - Fix missing initialization in kern_rwlock.c causing bogus times to be collected - Move updates to the lock hash to after the lock is released for spin mutexes, sleep mutexes, and sx locks - Add new kernel build option LOCK_PROFILE_FAST - only update lock profiling statistics when an acquisition is contended. This reduces the overhead of LOCK_PROFILING to increasing system time by 20%-25% which on "make -j8 kernel-toolchain" on a dual woodcrest is unmeasurable in terms of wall-clock time. Contrast this to enabling lock profiling without LOCK_PROFILE_FAST and I see a 5x-6x slowdown in wall-clock time.	2007-02-27 06:42:05 +00:00
kmacy	6508c4f27b	general LOCK_PROFILING cleanup - only collect timestamps when a lock is contested - this reduces the overhead of collecting profiles from 20x to 5x - remove unused function from subr_lock.c - generalize cnt_hold and cnt_lock statistics to be kept for all locks - NOTE: rwlock profiling generates invalid statistics (and most likely always has) someone familiar with that should review	2007-02-26 08:26:44 +00:00

1 2

82 Commits