freebsd-nq

Author	SHA1	Message	Date
Hans Petter Selasky	9acc0eafd7	Implement callout_drain_async(), inspired by the projects/hps_head branch. This function is used to drain a callout via a callback instead of blocking the caller until the drain is complete. Refer to the callout_drain_async() manual page for a detailed description. Limitation: If a lock is used with the callout, the callout can only be drained asynchronously one time unless the callout_init_mtx() function is called again. This limitation is not present in projects/hps_head and will require more invasive changes to the timeout code, which was not in the scope of this patch. Differential Revision: https://reviews.freebsd.org/D3521 Reviewed by: wblock MFC after: 1 month	2015-09-14 10:52:26 +00:00
Andriy Gapon	378d5c6c89	callout_reset: fix a reversed check for cc_exec_cancel The typo was introduced in r278469 / `344ecf88af`. As a result of the bug there was a timing window where callout_reset() would fail to cancel a concurrent execution of a callout that is about to start and would schedule the callout again. The callout would fire more times than it is scheduled. That would happen even if the callout is initialized with a lock. For example, the bug triggered the "Stray timeout" assertion in taskqueue_timeout_func(). MFC after: 5 days	2015-09-01 09:27:14 +00:00
Julien Charbon	2ea3089cb1	Revert r286880: If at first this change made sense, it turns out it helps only the TCP timers callout(9) usage. As the benefit for others callout(9) usages did not reach a consensus the historical usage should prevail. Differential Revision: https://reviews.freebsd.org/D3078	2015-08-30 13:44:46 +00:00
Julien Charbon	cd252ea74d	Silent a compilation warning on callout_stop()	2015-08-27 10:43:35 +00:00
Julien Charbon	682d0e15b5	In callout_stop(), do not forget to initialize not_running variable. Thanks to hselasky for noticing that. Differential Revision: https://reviews.freebsd.org/D3078 (Updated) Submitted by: hselasky Pointy hat to: jch	2015-08-27 08:58:03 +00:00
Julien Charbon	0cfae4b4bc	In callout_stop(), if a callout is both pending and currently being serviced return 0 (fail) but it is applicable only mpsafe callouts. Thanks to hselasky for finding this. Differential Revision: https://reviews.freebsd.org/D3078 (Updated) Submitted by: hselasky Reviewed by: jch	2015-08-27 08:15:32 +00:00
Julien Charbon	a1e6f8ff27	callout_stop() should return 0 (fail) when the callout is currently being serviced and indeed unstoppable. A scenario to reproduce this case is: - the callout is being serviced and at same time, - callout_reset() is called on this callout that sets the CALLOUT_PENDING flag and at same time, - callout_stop() is called on this callout and returns 1 (success) even if the callout is indeed currently running and unstoppable. This issue was caught up while making r284245 (D2763) workaround, and was discussed at BSDCan 2015. Once applied the r284245 workaround is not needed anymore and will be reverted. Differential Revision: https://reviews.freebsd.org/D3078 Reviewed by: jhb Sponsored by: Verisign, Inc.	2015-08-18 10:15:09 +00:00
Randall Stewart	b132edb56f	Fix my stupid restoral of old code.. must be c_iflags now. Thanks jhb for catching my stupidity... MFC after: 3 days	2015-04-14 00:02:39 +00:00
Randall Stewart	07a2df5d83	Restore the two lines accidentally deleted that allow CALLOUT_DIRECT to be specifed in the flags. Thanks Mark Johnston for noticing this ;-o MFC after: 3 days	2015-04-13 23:06:13 +00:00
Randall Stewart	403df7a672	Adopt jhb's suggested changes, updated comments and callout_migration() moving to kern/kern_timeout.c This does not address his -1 -> NOCPU comment. Sponsored by: Netflix Inc.	2015-03-31 00:18:00 +00:00
Bjoern A. Zeeb	a04d412295	Try to unbreak !SMP kernels broken in r280785 by using the proper macros to access cc_cpu.	2015-03-28 15:07:19 +00:00
Randall Stewart	15b1eb142c	Change the callout to supply -1 to indicate we are not changing CPU, also add protection against invalid CPU's as well as split c_flags and c_iflags so that if a user plays with the active flag (the one expected to be played with by callers in MPSAFE) without a lock, it won't adversely affect the callout system by causing a corrupt list. This also means that all callers need to use the macros and not play with the falgs directly (like netgraph used to). Differential Revision: htts://reviews.freebsd.org/D1894 Reviewed by: .. timed out but looked at by jhb, imp, adrian hselasky tested by hiren and netflix. Sponsored by: Netflix Inc.	2015-03-28 12:50:24 +00:00
Randall Stewart	66525b2d16	This fixes a bug I in-advertantly inserted when I updated the callout code in my last commit. The cc_exec_next is used to track the next when a direct call is being made from callout. It is never used in the in-direct method. When macro-izing I made it so that it would separate out direct/vs/non-direct. This is incorrect and can cause panics as Peter Holm has found for me (Thanks so much Peter for all your help in this). What this change does is restore that behavior but also get rid of the cc_next from the array and instead make it be part of the base callout structure. This way no one else will get confused since we will never use it for non-direct. Reviewed by: Peter Holm and more importantly tested by him ;-) MFC after: 3 days. Sponsored by: Netflix Inc.	2015-02-12 13:31:08 +00:00
Randall Stewart	d2854fa488	This fixes two conditions that can incur when migration is being done in the callout code and harmonizes the macro use.: 1) The callout_active() will lie. Basically if a migration is occuring and the callout is about to expire and the migration has been deferred, the callout_active will no longer return true until after the migration. This confuses and breaks callers that are doing callout_init(&c, 1); such as TCP. 2) The migration code had a bug in it where when migrating, if a two calls to callout_reset came in and they both collided with the callout on the wheel about to run, then the second call to callout_reset would corrupt the list the callout wheel uses putting the callout thread into a endless loop. 3) Per imp, I have fixed all the macro occurance in the code that were for the most part being ignored. Phabricator D1711 and looked at by lstewart and jhb and sbruno. Reviewed by: kostikbel, imp, adrian, hselasky MFC after: 3 days Sponsored by: Netflix Inc.	2015-02-09 19:19:44 +00:00
Adrian Chadd	9500dd9f0b	Call WITNESS_WARN() in callout_drain() to check whether any locks are being held before sleeping. This has bitten me (in ath(4)) once before and I'd like to see this not bite anyone else. Differential Revision: D1638 Reviewed by: jhb, hselasky MFC after: 1 week	2015-01-26 04:04:57 +00:00
Hans Petter Selasky	a115fb62ed	Revert for r277213: FreeBSD developers need more time to review patches in the surrounding areas like the TCP stack which are using MPSAFE callouts to restore distribution of callouts on multiple CPUs. Bump the __FreeBSD_version instead of reverting it. Suggested by: kmacy, adrian, glebius and kib Differential Revision: https://reviews.freebsd.org/D1438	2015-01-22 11:12:42 +00:00
Hans Petter Selasky	1a26c3c047	Major callout subsystem cleanup and rewrite: - Close a migration race where callout_reset() failed to set the CALLOUT_ACTIVE flag. - Callout callback functions are now allowed to be protected by spinlocks. - Switching the callout CPU number cannot always be done on a per-callout basis. See the updated timeout(9) manual page for more information. - The timeout(9) manual page has been updated to reflect how all the functions inside the callout API are working. The manual page has been made function oriented to make it easier to deduce how each of the functions making up the callout API are working without having to first read the whole manual page. Group all functions into a handful of sections which should give a quick top-level overview when the different functions should be used. - The CALLOUT_SHAREDLOCK flag and its functionality has been removed to reduce the complexity in the callout code and to avoid problems about atomically stopping callouts via callout_stop(). If someone needs it, it can be re-added. From my quick grep there are no CALLOUT_SHAREDLOCK clients in the kernel. - A new callout API function named "callout_drain_async()" has been added. See the updated timeout(9) manual page for a complete description. - Update the callout clients in the "kern/" folder to use the callout API properly, like cv_timedwait(). Previously there was some custom sleepqueue code in the callout subsystem, which has been removed, because we now allow callouts to be protected by spinlocks. This allows us to tear down the callout like done with regular mutexes, and a "td_slpmutex" has been added to "struct thread" to atomically teardown the "td_slpcallout". Further the "TDF_TIMOFAIL" and "SWT_SLEEPQTIMO" states can now be completely removed. Currently they are marked as available and will be cleaned up in a follow up commit. - Bump the __FreeBSD_version to indicate kernel modules need recompilation. - There has been several reports that this patch "seems to squash a serious bug leading to a callout timeout and panic". Kernel build testing: all architectures were built MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D1438 Sponsored by: Mellanox Technologies Reviewed by: jhb, adrian, sbruno and emaste	2015-01-15 15:32:30 +00:00
John Baldwin	232e8b52b0	Add schedgraph traces for callout handlers. Specifically, a callwheel logs a running event each time it executes a callout function. The event includes the function pointer, argument, and whether or not it was run from hardware interrupt context. The callwheel is marked idle when each handler completes. This effectively logs the duration of each callout routine in the graph.	2014-10-08 16:22:59 +00:00
Adrian Chadd	c445c3c7f6	If we're doing RSS then ensure that the callwheel swi's are CPU pinned.	2014-06-30 04:25:51 +00:00
Hans Petter Selasky	af3b2549c4	Pull in r267961 and r267973 again. Fix for issues reported will follow.	2014-06-28 03:56:17 +00:00
Glen Barber	37a107a407	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory	2014-06-27 22:05:21 +00:00
Hans Petter Selasky	3da1cf1e88	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies	2014-06-27 16:33:43 +00:00
Davide Italiano	e392e44c27	Convert functions to the new-style format. Submitted by: Vijay Singh <vijju.singh@gmail.com> via -hackers	2014-06-05 03:46:46 +00:00
Adrian Chadd	ac75ee9fa3	Add in support to optionally pin the swi threads. Under enough load, the swi's can actually be preempted and migrated to other currently free cores. When doing RSS experiments, this lead to the per-CPU TCP timers not lining up any more with the RX CPU said flows were ending up on, leading to increased lock contention. Since there was a little pushback on flipping them on by default, I've left the default at "don't pin." The other less obvious problem here is that the default swi is also the same as the destination swi for CPU #0. So if one pins the swi on CPU #0, there's no default floating swi. A nice future project would be to create a separate swi for the "default" floating swi, as well as per-CPU swis that are (optionally) pinned. Tested: * parallel TCP tests (2 x 1g unfortunately for now); CPU: Intel(R) Xeon(R) CPU E5-2650 Note: This is based on some initial investigation into RSS/TCP stack lock contention on FreeBSD-HEAD whilst at Netflix in January 2014.	2014-05-10 00:53:36 +00:00
Davide Italiano	4bc38a5ab0	Hide internal details of sbintime_t implementation wrapping INT64_MAX into SBT_MAX, to make it more robust in case internal type representation will change in the future. All the consumers were migrated to SBT_MAX and every new consumer (if any) should from now use this interface. Requested by: bapt, jmg, Ryan Lortie (implictly) Reviewed by: mav, bde	2014-04-12 23:29:29 +00:00
Adrian Chadd	f44e2a4c0f	Include the CPU id in the per-CPU timer swi thread descriptions. Original patch by: jhb	2014-02-14 23:19:51 +00:00
Andriy Gapon	d9fae5ab88	dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE In its stead use the Solaris / illumos approach of emulating '-' (dash) in probe names with '__' (two consecutive underscores). Reviewed by: markj MFC after: 3 weeks	2013-11-26 08:46:27 +00:00
Attilio Rao	54366c0bd7	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip	2013-11-25 07:38:45 +00:00
Davide Italiano	1b0c144fc2	Make the callout arithmetic more robust adding checks for overflow. Without these, if the timeout value passed is "large enough", the value of the sum of it and other factors (e.g. current time as returned by sbinuptime() or 'precision' argument) might result in a negative number. This negative number is then passed to eventtimers(4), which causes et_start() routine to load et_min_period into eventtimer, making the CPU where the thread is stuck forever in timer interrupt handler routine. This is now avoided rounding to INT64_MAX the timeout period in case of overflow. Reported by: kib, pho Discussed with: kib, mav Tested by: pho (stress2 suite, kevent7.sh scenario) Approved by: re (kib)	2013-09-26 10:06:50 +00:00
Davide Italiano	1f96759fb1	Fix callout_init_rm() in the shared case, allocating storage for 'struct rm_priotracker' directly in the softclock thread. Now consumers can pass CALLOUT_SHAREDLOCK flag to callout initialization routine safely. The choice of the already existing flags instead of special casing shared rmlocks is done to prevent consumer footshooting. Suggested by: jhb Reviewed by: jhb Approved by: re (delphij)	2013-09-20 23:16:15 +00:00
Mark Johnston	7b77e1fe0f	Specify SDT probe argument types in the probe definition itself rather than using SDT_PROBE_ARGTYPE(). This will make it easy to extend the SDT(9) API to allow probes with dynamically-translated types. There is no functional change. MFC after: 2 weeks	2013-08-15 04:08:55 +00:00
Davide Italiano	3f321a4eac	Cache the callout precision argument as part of the informations required for migrating callouts to new CPU. This value is passed to callout_cc_add() in order to update properly precision field in case of rescheduling/migration. Reviewed by: mav	2013-03-25 09:43:50 +00:00
Andre Oppermann	a7aea132cf	Bring back the comment on the sizing of the callout array that got lost in r248031. Requested by: alc, alfred	2013-03-10 22:55:35 +00:00
Davide Italiano	c5904471dc	Fixup r248032: Change size requested to malloc(9) now that callwheel buckets are callout_list and not callout_tailq anymore. This change was already there but it seems it got lost after code churn in r248032. Reported by: alc, kib	2013-03-09 20:03:10 +00:00
Andre Oppermann	15ae0c9af9	Move the callout subsystem initialization to its own SYSINIT() from being indirectly called via cpu_startup()+vm_ksubmap_init(). The boot order position remains the same at SI_SUB_CPU. Allocation of the callout array is changed to stardard kernel malloc from a slightly obscure direct kernel_map allocation. kern_timeout_callwheel_alloc() is renamed to callout_callwheel_init() to better describe its purpose. kern_timeout_callwheel_init() is removed simplifying the per-cpu initialization. Reviewed by: davide	2013-03-08 10:37:17 +00:00
Andre Oppermann	f8ccf82a4c	Move the auto-sizing of the callout array from init_param2() to kern_timeout_callwheel_alloc() where it is actually used. This is a mechanical move and no tuning parameters are changed. The pre-allocated callout array is only used for legacy timeout(9) calls and is only allocated and active on cpu0. Eventually all remaining users of timeout(9) should switch to the callout_* API. Reviewed by: davide	2013-03-08 10:14:58 +00:00
Davide Italiano	ac42a1726a	Complete r247813: Use true/false instead of TRUE/FALSE. Reported by: attilio Requested by: jhb	2013-03-04 21:52:12 +00:00
Davide Italiano	a4a3ce9919	Use C99 'bool' rather than Machish 'boolean_t'. Requested by: jhb	2013-03-04 21:09:22 +00:00
Davide Italiano	037637812d	Fix build with DIAGNOSTIC/CALLOUT_PROFILING options turned on. Reported by: kib, David Wolfskill <david at catwhisker dot org> Pointy-hat to: davide	2013-03-04 15:03:52 +00:00
Davide Italiano	5b999a6be0	- Make callout(9) tickless, relying on eventtimers(4) as backend for precise time event generation. This greatly improves granularity of callouts which are not anymore constrained to wait next tick to be scheduled. - Extend the callout KPI introducing a set of callout_reset_sbt* functions, which take a sbintime_t as timeout argument. The new KPI also offers a way for consumers to specify precision tolerance they allow, so that callout can coalesce events and reduce number of interrupts as well as potentially avoid scheduling a SWI thread. - Introduce support for dispatching callouts directly from hardware interrupt context, specifying an additional flag. This feature should be used carefully, as long as interrupt context has some limitations (e.g. no sleeping locks can be held). - Enhance mechanisms to gather informations about callwheel, introducing a new sysctl to obtain stats. This change breaks the KBI. struct callout fields has been changed, in particular 'int ticks' (4 bytes) has been replaced with 'sbintime_t' (8 bytes) and another 'sbintime_t' field was added for precision. Together with: mav Reviewed by: attilio, bde, luigi, phk Sponsored by: Google Summer of Code 2012, iXsystems inc. Tested by: flo (amd64, sparc64), marius (sparc64), ian (arm), markj (amd64), mav, Fabian Keil	2013-03-04 11:09:56 +00:00
Davide Italiano	3f555c45eb	callwheelmask and callwheelsize are always greater than zero. Switch their type to u_int.	2013-03-03 15:01:33 +00:00
Davide Italiano	0fb285b716	Remove a couple of unused include.	2013-03-03 14:47:02 +00:00
Alexander Motin	4514d6fa18	MFcalloutng: Some whitespace fixes.	2013-03-03 09:11:24 +00:00
Davide Italiano	e234a588cb	MFcalloutng: Style fixes.	2013-02-28 16:22:49 +00:00
Attilio Rao	bdf9120c16	Fixup r243901: - As the comment report, CALLOUT_LOCAL_ALLOC cannot be checked directly from the callout flags but might be checked by a cached value. Hence, do so before to actually remove the callout, when needed, in softclock_call_cc(). - In softclock_call_cc() also add a comment in the waiting and deferred migration case explaining that the dereference should be safe because of the migration dereference invariants. Additively: - In softclock_call_cc(), for the deferred migration case, move all the accesses to callout structure after the comment stating the callout must not be destroyed. - For consistency with this last tweak, use cached c_flags for the KASSERT() in the deferred migration case. It is not strictly necessary but this way all the callout accesses happen after the above mentioned comment, improving consistency. Pointy hat to: me Sponsored by: Isilon Systems / EMC Corporation Reviewed by: kib MFC after: 2 weeks X-MFC: 243901	2012-12-05 22:32:12 +00:00
Konstantin Belousov	eb8a718686	The softclock_call_cc() is executing with the callout already removed from the callwheel. Calculate the cc->cc_next before removing the callout, otherwise the code followed the invalid tailq links. After this, make softclock_call_cc() return void, since it always return cc->cc_next, which is immediately available to the softclock() anyway. This also allows to eliminate a label under #ifdef SMP. Remove the assignment of cc->cc_next from callout_cc_del(), since the function is called with the callout already removed from callwheel. If cancelling the migration, also clear the CALLOUT_DFRMIGRATION flag. Postpone the free of the timeout(9) allocated callouts after the migration checks are done. Add some more strict asserts about the state of the callout in callout_call_cc(). Reviewed by: attilio Reported and tested by: pho (previous version) MFC after: 2 weeks	2012-12-05 19:02:22 +00:00
Alfred Perlstein	922314f018	replace bit shifting loop with 1<<fls(n), improve comments. Reviewed by: davide	2012-12-04 05:28:20 +00:00
Attilio Rao	4ceaf45de5	Rework the known mutexes to benefit about staying on their own cache line in order to avoid manual frobbing but using struct mtx_padalign. The sole exception being nvme and sxfge drivers, where the author redefined CACHE_LINE_SIZE manually, so they need to be analyzed and dealt with separately. Reviwed by: jimharris, alc	2012-10-31 18:07:18 +00:00
Jim Harris	84e7a2ebb7	Pad and align the callout_cpu mtx to its own cacheline to reduce false sharing especially on the default CPU 0 callout_cpu structure. This will be followed up by attilio@ with a conversion to the new struct mtx_padalign but doing this manual conversion first gives an easy MFC candidate since mtx_padalign is a more extensive system change. Sponsored by: Intel Reviewed by: jeff, attilio MFC after: 1 week	2012-10-31 17:12:12 +00:00
Konstantin Belousov	6098e7acff	Move the code to call the callout callback into the helper function softclock_call_cc(). While there, move some common code to callout_cc_del(). Requested by: avg, jhb Reviewed by: jhb MFC after: 1 week	2012-05-03 20:00:30 +00:00

1 2 3 4

183 Commits