freebsd-dev

Author	SHA1	Message	Date
Attilio Rao	1283e9cd60	Reintroduce the fix already discussed in r216805 (please check its history for a detailed explanation of the problems). The only difference with the previous fix is in Solution2: CPUBLOCK is no longer set when exiting from callout_reset_*() functions, which avoid the deadlock (leading to r217161). There is no need to CPUBLOCK there because the running-and-migrating assumption is strong enough to avoid problems there. Furthermore add a better !SMP compliancy (leading to shrinked code and structures) and facility macros/functions. Tested by: gianni, pho, dim MFC after: 3 weeks	2011-04-08 18:48:57 +00:00
Attilio Rao	08e4ac8ad6	Revert r216805. That revision is introducing a bug which is more visible than problems it is trying to fix. As long as my time is very limited in this period I am going to commit back this patch just once it is fully fixed. Reported by: dim, Nicholas Esborn	2011-01-08 18:51:15 +00:00
Attilio Rao	3d7acbbabf	Fix several callout migration races: - Problem1: Hypothesis: thread1 is doing a callout_reset_on(), within his callout handler, willing to implicitly or explicitly migrate the callout. thread2 is draining the callout. Thesys: * thread1 calls callout_lock() and locks the old callout cpu * thread1 performs the checks in the first path of the callout_reset_on() * thread1 hits this codepiece: /* * If the lock must migrate we have to check the state again as * we can't hold both the new and old locks simultaneously. / if (c->c_cpu != cpu) { c->c_cpu = cpu; CC_UNLOCK(cc); goto retry; } which means it will drop the lock and 'retry' thread2 will callout_lock() and locks the new callout cpu. thread1 spins on the new lock and will not keep going for the moment. * thread2 checks that the callout is not pending (as callout is currently running) and that it is not on cc->cc_curr (because cc now refers to the new callout and the callout is running on the old callout cpu) thus it thinks it is done and returns. * thread1 will now acquire the lock and then adds the callout to the new callout cpu queue That seems an obvious race as callout_stop() falsely reports the callout stopped or worse, callout_drain() falsely returns while the callout is still in use. - Solution1: Fixing this problem would require, in general, to lock both callout cpus at once while switching the c_cpu field and avoid cyclic deadlocks between callout cpus locks. The concept of CPUBLOCK is then introduced (working more or less like the blocked_lock for thread_lock() function) meaning: "in callout_lock(), spin until the c->c_cpu is not different from CPUBLOCK". That way the "original" callout cpu, referred to the above mentioned code snippet, will remain blocked until the lock handover is over critical path will remain covered. - Problem2: Having the callout currently executed on a specific callout cpu and contemporary pending on another callout cpu (as it can happen with current code) breaks, at least, the assumption callout_drain() returns just once the callout cannot be referenced anymore. - Solution2: Callout migration is deferred if the current callout is already under execution. The best place to do that is in softclock() and new members are added to the callout cpu structure in order to specify a pending migration is requested. That is necessary because the callout cannot be trusted (not freed) the 100% of times after the execution of the callout handler. CPUBLOCK will prevent, in the "deferred migration" case, that the callout gets freed in this case, stopping any callout_stop() and callout_drain() possible activity until the migration is actually performed. - Problem3: There is a further race in callout_drain(). In order to avoid a race between sleepqueue lock and callout cpu spinlock, in _callout_stop_safe(), the callout cpu lock is dropped, the sleepqueue lock is acquired and a new callout cpu lookup is performed. Note that the channel used for locking the sleepqueue is obtained from the "current" callout cpu (&cc->cc_waiting). If the callout migrated in the meanwhile, callout_drain() will end up using the wrong wchan for the sleepqueue (the locked one will be the older, while the new one will not really be locked) leading to a lock leak and a race access to sleepqueue. - Solution3: It is enough to check if a migration happened between the operation of acquiring the sleepqueue lock and the new callout cpu lock and eventually unwind all those and try again. This problems can lead to deathly races on moderate (4-ways) SMP environment, leading to easy panic or deadlocks. The 24-ways of the reporter, could easilly panic, with completely normal workload, almost daily. gianni@ kindly wrote the following prof-of-concept which can panic a FreeBSD machine in less than one hour, in smaller SMP: http://www.freebsd.org/~attilio/callout/test.c Reported by: Nicholas Esborn <nick at desert dot net>, DesertNet In collabouration with: gianni, pho, Nicholas Esborn Reviewed by: jhb MFC after: 1 week () Usually, I would aim for a larger MFC timeout, but I really want this in before 8.2-RELEASE, thus re@ accepted a shorter timeout as a special case for this patch	2010-12-29 18:17:36 +00:00
John Baldwin	3350df4899	Remove 'softclock_ih' as it is no longer used.	2010-11-03 15:38:52 +00:00
Alexander Motin	189795fe68	Fix callout_tickstofirst() behavior after signed integer ticks overflow. This should fix callout precision drop to 1/4s after 25 days of uptime with HZ = 1000. Submitted by: Taku YAMAMOTO <taku@tackymt.homeip.net>	2010-10-31 11:44:41 +00:00
Alexander Motin	9aff0c8ff7	Fix panic on NULL dereference possible after r212541.	2010-09-14 10:26:49 +00:00
Alexander Motin	0e18987383	Make kern_tc.c provide minimum frequency of tc_ticktock() calls, required to handle current timecounter wraps. Make kern_clocksource.c to honor that requirement, scheduling sleeps on first CPU for no more then specified period. Allow other CPUs to sleep up to 1/4 second (for any case).	2010-09-14 08:48:06 +00:00
Alexander Motin	a157e42516	Refactor timer management code with priority to one-shot operation mode. The main goal of this is to generate timer interrupts only when there is some work to do. When CPU is busy interrupts are generating at full rate of hz + stathz to fullfill scheduler and timekeeping requirements. But when CPU is idle, only minimum set of interrupts (down to 8 interrupts per second per CPU now), needed to handle scheduled callouts is executed. This allows significantly increase idle CPU sleep time, increasing effect of static power-saving technologies. Also it should reduce host CPU load on virtualized systems, when guest system is idle. There is set of tunables, also available as writable sysctls, allowing to control wanted event timer subsystem behavior: kern.eventtimer.timer - allows to choose event timer hardware to use. On x86 there is up to 4 different kinds of timers. Depending on whether chosen timer is per-CPU, behavior of other options slightly differs. kern.eventtimer.periodic - allows to choose periodic and one-shot operation mode. In periodic mode, current timer hardware taken as the only source of time for time events. This mode is quite alike to previous kernel behavior. One-shot mode instead uses currently selected time counter hardware to schedule all needed events one by one and program timer to generate interrupt exactly in specified time. Default value depends of chosen timer capabilities, but one-shot mode is preferred, until other is forced by user or hardware. kern.eventtimer.singlemul - in periodic mode specifies how much times higher timer frequency should be, to not strictly alias hardclock() and statclock() events. Default values are 2 and 4, but could be reduced to 1 if extra interrupts are unwanted. kern.eventtimer.idletick - makes each CPU to receive every timer interrupt independently of whether they busy or not. By default this options is disabled. If chosen timer is per-CPU and runs in periodic mode, this option has no effect - all interrupts are generating. As soon as this patch modifies cpu_idle() on some platforms, I have also refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions (if supported) under high sleep/wakeup rate, as fast alternative to other methods. It allows SMP scheduler to wake up sleeping CPUs much faster without using IPI, significantly increasing performance on some highly task-switching loads. Tested by: many (on i386, amd64, sparc64 and powerc) H/W donated by: Gheorghe Ardelean Sponsored by: iXsystems, Inc.	2010-09-13 07:25:35 +00:00
Rui Paulo	79856499bd	Add an extra comment to the SDT probes definition. This allows us to get use '-' in probe names, matching the probe names in Solaris.[1] Add userland SDT probes definitions to sys/sdt.h. Sponsored by: The FreeBSD Foundation Discussed with: rwaston [1]	2010-08-22 11:18:57 +00:00
John Baldwin	3aa6d94e0c	Update several places that iterate over CPUs to use CPU_FOREACH().	2010-06-11 18:46:34 +00:00
Luigi Rizzo	20c510f826	Properly fix callout handling by putting all the per-cpu info in struct callout_cpu. From the comment in the file: + * There is one struct callout_cpu per cpu, holding all relevant + * state for the callout processing thread on the individual CPU. + * In particular: + * cc_ticks is incremented once per tick in callout_cpu(). + * It tracks the global 'ticks' but in a way that the individual + * threads should not worry about races in the order in which + * hardclock() and hardclock_cpu() run on the various CPUs. + * cc_softclock is advanced in callout_cpu() to point to the + * first entry in cc_callwheel that may need handling. In turn, + * a softclock() is scheduled so it can serve the various entries i + * such that cc_softclock <= i <= cc_ticks . Together with a smaller patch committed in september, this fixes a bug that affects 8.0 with apps that rely on callouts to fire exactly in the number of ticks specified (qemu among them). Right now, callouts in 8.0 fire one tick late. This was discussed in september with JeffR and jhb MFC after: 3 days	2009-12-14 12:23:46 +00:00
Luigi Rizzo	446e861708	Make sure callouts are not processed one tick late. The problem was introduced in SVN 180608/ rev 1.114 and affects all users of callout_reset() (including select, usleep, setitimer). A better fix probably involves replicating 'ticks' in the struct callout_cpu; this commit is just a temporary thing so that we can MFC it after a suitable test time and RE approval. MFC after: 3 days	2009-09-12 21:44:34 +00:00
Robert Watson	91dd9aae1a	Add explicit static DTrace tracing to the callout mechanism, capturing pointers to the callout handler just before and just after the callout it invoked. I attempted to do this in a manner congruent to tracing in Solaris's callout mechanism, but couldn't quite use the same names due to convention and syntax differences. Example DTrace script to generate a distribution graph of callout execution times: callout_execute:::callout_start { self->cstart = timestamp; } callout_execute:::callout_end { @length = quantize(timestamp - self->cstart); } Reviewed by: jb MFC after: 3 days	2009-01-24 10:22:49 +00:00
John Baldwin	b7f1c1d210	Add a new KTR tracepoint in the KTR_CALLOUT class to note when a callout routine finishes executing. MFC after: 1 week	2009-01-13 15:56:53 +00:00
Peter Wemm	1d387fe73b	After a machine has been up for a bit more than 20 days with HZ=1000, "ticks" goes negative. This breaks the signed comparison in softclock. This causes sleep() to never wake up, tcp to stop, etc etc. This is bad(TM). Use the SEQ_LT() method from tcp's sequence number comparisons.	2008-10-28 03:26:25 +00:00
Sam Leffler	6e0186d5ee	add callout_schedule; besides being useful it also improves compatibility with other systems Reviewed by: ed, battlez	2008-08-02 17:42:38 +00:00
Jeff Roberson	9fc51b0bf4	Fix a race which could result in some timeout buckets being skipped. - When a tick occurs on a cpu, iterate from cs_softticks until ticks. The per-cpu tick processing happens asynchronously with the actual adjustment of the 'ticks' variable. Sometimes the results may be visible before the local call and sometimes after. Previously this could cause a one tick window where we didn't evaluate the bucket. - In softclock fetch curticks before incrementing cc_softticks so we don't skip insertions which were made for the current time. Sponsored by: Nokia	2008-07-19 05:18:29 +00:00
Jeff Roberson	ce62b59c88	- Correct a major error introduced in the per-cpu timeout commit. Sleep and wakeup require the same wait channel to function properly. Found by: kris Pointy hat: me	2008-04-06 11:08:49 +00:00
Jeff Roberson	8d809d5061	Implement per-cpu callout threads, wheels, and locks. - Move callout thread creation from kern_intr.c to kern_timeout.c - Call callout_tick() on every processor via hardclock_cpu() rather than inspecting callout internal details in kern_clock.c. - Remove callout implementation details from callout.h - Package up all of the global variables into a per-cpu callout structure. - Start one thread per-cpu. Threads are not strictly bound. They prefer to execute on the native cpu but may migrate temporarily if interrupts are starving callout processing. - Run all callouts by default in the thread for cpu0 to maintain current ordering and concurrency guarantees. Many consumers may not properly handle concurrent execution. - The new callout_reset_on() api allows specifying a particular cpu to execute the callout on. This may migrate a callout to a new cpu. callout_reset() schedules on the last assigned cpu while callout_reset_curcpu() schedules on the current cpu. Reviewed by: phk Sponsored by: Nokia	2008-04-02 11:20:30 +00:00
Alfred Perlstein	435cdf88ea	Fix a race where timeout/untimeout could cause crashes for Giant locked code. The bug: There exists a race condition for timeout/untimeout(9) due to the way that the softclock thread dequeues timeouts. The softclock thread sets the c_func and c_arg of the callout to NULL while holding the callout lock but not Giant. It then drops the callout lock and acquires Giant. It is at this point where untimeout(9) on another cpu/thread could be called. Since c_arg and c_func are cleared, untimeout(9) does not touch the callout and returns as if the callout is canceled. The softclock then tries to acquire Giant and likely blocks due to the other cpu/thread holding it. The other cpu/thread then likely deallocates the backing store that c_arg points to and finishes working and hence drops Giant. Softclock resumes and acquires giant and calls the function with the now free'd c_arg and we have corruption/crash. The fix: We need to track curr_callout even for timeout(9) (LOCAL_ALLOC) callouts. We need to free the callout after the softclock processes it to deal with the race here. Obtained from: Juniper Networks, iedowse Reviewed by: jhb, iedowse MFC After: 2 weeks.	2008-03-22 07:29:45 +00:00
Jeff Roberson	c5aa6b581d	- Pass the priority argument from sleep() into sleepq and down into sched_sleep(). This removes extra thread_lock() acquisition and allows the scheduler to decide what to do with the static boost. - Change the priority arguments to cv_ to match sleepq/msleep/etc. where 0 means no priority change. Catch -1 in cv_broadcastpri() and convert it to 0 for now. - Set a flag when sleeping in a way that is compatible with swapping since direct priority comparisons are meaningless now. - Add a sysctl to ule, kern.sched.static_boost, that defaults to on which controls the boost behavior. Turning it off gives better performance in some workloads but needs more investigation. - While we're modifying sleepq, change signal and broadcast to both return with the lock held as the lock was held on enter. Reviewed by: jhb, peter	2008-03-12 06:31:06 +00:00
Attilio Rao	13ddf72de7	Really, no explicit checks against against lock_class_* object should be done in consumers code: using locks properties is much more appropriate. Fix current code doing these bogus checks. Note: Really, callout are not usable by all !(LC_SPINLOCK \| LC_SLEEPABLE) primitives like rmlocks doesn't implement the generic lock layer functions, but they can be equipped for this, so the check is still valid. Tested by: matteo, kris (earlier version) Reviewed by: jhb	2008-02-06 00:04:09 +00:00
Attilio Rao	557f5e51e9	Cache the value of c_lock as it can change, in the struct, while the global callout spinlock is not held, and can lead to PF#. Reported by: dougb, Mark Atkinson <atkin901 at yahoo dot com> Tested by: dougb Diagnosed by: jhb	2007-11-22 12:15:54 +00:00
Attilio Rao	64b9ee201a	Add the function callout_init_rw() to callout facility in order to use rwlocks in conjuction with callouts. The function does basically what callout_init_mtx() alredy does with the difference of using a rwlock as extra argument. CALLOUT_SHAREDLOCK flag can be used, now, in order to acquire the lock only in read mode when running the callout handler. It has no effects when used in conjuction with mtx. In order to implement this, underlying callout functions have been made completely lock type-unaware, so accordingly with this, sysctl debug.to_avg_mtxcalls is now changed in the generic debug.to_avg_lockcalls. Note: currently the allowed lock classes are mutexes and rwlocks because callout handlers run in softclock swi, so they cannot sleep and they cannot acquire sleepable locks like sx or lockmgr. Requested by: kmacy, pjd, rwatson Reviewed by: jhb	2007-11-20 00:37:45 +00:00
Robert Watson	dce5df0dfc	Remove the definition and implementation of 'CALLOUT_NETGIANT', a now- (and possibly always-) unused define. Reported by: kmacy Approved by: re (kensmith)	2007-09-15 12:33:24 +00:00
John Baldwin	67b158d888	Close a race that snuck in with the recent changes to fix a LOR between the callout_lock spin lock and the sleepqueue spin locks. In the fix, callout_drain() has to drop the callout_lock so it can acquire the sleepqueue lock. The state of the callout can change while the callout_lock is held however (for example, it can be rescheduled via callout_reset()). The previous code assumed that the only state change that could happen is that the callout could finish executing. This change alters callout_drain() to effectively restart and recheck everything after it acquires the sleepqueue lock thus handling all the possible states that the callout could be in after any changes while callout_lock was dropped. Approved by: re (kensmith) Tested by: kris	2007-08-31 19:01:30 +00:00
Attilio Rao	6a0ce57d10	Fix an old standing LOR between callout_lock and sleepqueues chain (which could lead to a deadlock). - sleepq_set_timeout acquires callout_lock (via callout_reset()) only with sleepq chain lock held - msleep_spin in _callout_stop_safe lock the sleepqueue chain with callout_lock held In order to solve this don't use msleep_spin in _callout_stop_safe() but use directly sleepqueues as inline msleep_spin code. Rearrange the wakeup path in order to have it consistent too. Reported by: kris (via stress2 test suite) Tested by: Timothy Redaelli <drizzt@gufi.org> Reviewed by: jhb Approved by: jeff (mentor) Approved by: re	2007-06-26 21:42:01 +00:00
Andre Oppermann	0489b64c5e	Make the TCP timer callout obtain Giant if the network stack is marked as non-mpsafe. This change is to be removed when all protocols are mp-safe.	2007-05-11 20:52:47 +00:00
Gleb Smirnoff	68a57ebfad	Improve ktr(4) logging for callout(9) subsystem. Log all inserts and removals, including failures, into the callwheel. XXX: Most of the CTR() macros are called with callout_lock spin mutex held, thus won't be logged into file, if KTR_ALQ is used. Moving the CTR() macros out from the spinlocked code would require copying of all arguments. I'm too lazy to do this.	2006-10-11 14:57:03 +00:00
John Baldwin	b36f458861	Use the recently added msleep_spin() function to simplify the callout_drain() logic. We no longer need a separate non-spin mutex to do sleep/wakeup with, instead we can now just use the one spin mutex to manage all the callout functionality.	2006-02-23 19:13:12 +00:00
John Baldwin	21f9e816cd	Oops, missed adding the required include. Pointy hat to: jhb	2005-09-15 20:20:36 +00:00
John Baldwin	53c0e1ff7d	Replace the dont_sleep_in_callout mutex hack (similar to g_x{up,down}) with the disallow sleeping facility.	2005-09-15 20:09:08 +00:00
Gleb Smirnoff	d04304d155	Make callout_reset() return a non-zero value if a pending callout was rescheduled. If there was no pending callout, then return 0. Reviewed by: iedowse, cperciva	2005-09-08 14:20:39 +00:00
Ian Dowse	57c037be1c	When processing a timeout() callout and returning it to the free list, set `curr_callout' to NULL. This ensures that we won't attempt to cancel the current callout if the original callout structure gets recycled while we wait to acquire Giant. This is reported to fix an intermittent syscons problem that was introduced by revision 1.96.	2005-02-11 00:14:00 +00:00
Ian Dowse	98c926b20f	Add a mechanism for associating a mutex with a callout when the callout is first initialised, using a new function callout_init_mtx(). The callout system will acquire this mutex before calling the callout function and release it on return. In addition, the callout system uses the mutex to avoid most of the complications and race conditions inherent in asynchronous timer facilities, so mutex-protected callouts have much simpler semantics. As long as the mutex is held when invoking callout_stop() or callout_reset(), then these functions will guarantee that the callout will be stopped, even if softclock() had already begun to process the callout. Existing Giant-locked callouts will automatically pick up the new race-free semantics. This should close a number of race conditions in the USB code and probably other areas of the kernel too. There should be no change in behaviour for "MP-safe" callouts; these still need to use the techniques mentioned in timeout(9) to avoid race conditions.	2005-02-07 02:47:33 +00:00
Colin Percival	7834081c88	Make "c->c_func = NULL" conditional on CALLOUT_LOCAL_ALLOC in both places where it occurs, not just one. :-) Pointed out by: glebius Pointy had to: cperciva	2005-01-19 21:15:58 +00:00
Colin Percival	0ceba3d69c	Make "c->c_func = NULL" conditional on the CALLOUT_LOCAL_ALLOC flag, i.e., only clear c->c_func if the callout c is being used via the old timeout(9) interface. Requested by: glebius	2005-01-19 20:34:46 +00:00
Colin Percival	86fd19de7b	Clarify the description of the callout_active() macro: It is cleared by callout_stop, callout_drain, and callout_deactivate, but is not automatically cleared when a callout returns.	2005-01-19 19:46:35 +00:00
Colin Percival	e9dec2c41b	Adjust two of my comments to the new world order: Indent protection in the first column is performed using /*, not /-.	2005-01-07 03:25:45 +00:00
Robert Watson	ff7ec58af8	Cut a KTR record whenever a callout is invoked. Mark whether it runs with Giant or not, and include the function point so it can be looked up against the kernel symbol table during trace analysis.	2004-08-06 21:49:00 +00:00
Colin Percival	0413bacd09	When reseting a pending callout, perform the deregistration in callout_reset rather than calling callout_stop. This results in a few lines of code duplication, but it provides a significant performance improvement because it avoids recursing on callout_lock. Requested by: rwatson	2004-08-06 02:44:58 +00:00
Hiten Pandya	024035e822	The paper "Hashed Timers and Hierarchical Wheels: Data Structures for the Efficient Implementation of a Timer Facility" was co-author'ed by T. Lauk, not A. Lauk. Adjust nearby whitespace.	2004-04-25 04:10:17 +00:00
Colin Percival	05641e82d7	1. Remove callout_stop binary compatibility. 2. Document that this means that kernel modules must be rebuilt. 3. While I'm here, fix my sorting error in callout.h Requested by: many [1], scottl [2], bde [3]	2004-04-20 15:49:31 +00:00
Colin Percival	49a74476a6	Add whitespace before comment blocks. (reported by njl) Remove spurious whitespace, add indent protection, fix punctuation, remove initialization of static variables to zero, put wakeup_ctr and wakeup_needed in the correct order. (reported by bde) This doesn't fix all the style bugs I introduced, but the remaining style bugs make it easier for me to understand what I did here.	2004-04-08 02:03:49 +00:00
Colin Percival	2c1bb20746	Introduce a callout_drain() function. This acts in the same manner as callout_stop(), except that if the callout being stopped is currently in progress, it blocks attempts to reset the callout and waits until the callout is completed before it returns. This makes it possible to clean up callout-using code safely, e.g., without potentially freeing memory which is still being used by a callout. Reviewed by: mux, gallatin, rwatson, jhb	2004-04-06 23:08:49 +00:00
Warner Losh	7f8a436ff2	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999. Approved by: core	2004-04-05 21:03:37 +00:00
Poul-Henning Kamp	377e7be416	Make the DIAGNOSTIC code which complains about long {call\|time}out(9) functions less noisy: We printf if a new function took longer than the previous record holder, or of the previous record holder took more than twice as long as the current record.	2003-12-07 20:03:28 +00:00
Poul-Henning Kamp	d87526cf43	Rename the debugging mutex "callout_no_sleep" to "dont_sleep_in_callout".	2003-11-15 18:33:54 +00:00
Kirk McKusick	48b0f4b67d	At the request of several developers, restore the DIAGNOSIC code deleted in 1.81. Increase the initial timeout limit to 2ms to eliminate spurious messages of excessive timeouts in the NFS client code. Requested by: Poul-Henning Kamp <phk@phk.freebsd.dk> Requested by: Mike Silbersack <silby@silby.com> Requested by: Sam Leffler <sam@errno.com>	2003-11-12 22:28:27 +00:00
Kirk McKusick	b932dd9b28	Get rid of DIAGNOSTIC that gives false positives on slow CPUs.	2003-11-04 08:03:11 +00:00

1 2 3

130 Commits