freebsd-skq

Author	SHA1	Message	Date
nwhitehorn	3cd3799a91	Remove some, but not all, assumptions that the BSP is CPU 0 and that CPUs are numbered densely from there to n_cpus. MFC after: 1 month	2017-11-25 23:41:05 +00:00
pfg	4736ccfd9c	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.	2017-11-20 19:43:44 +00:00
emaste	1901c3e1f2	Remove register keyword from sys/ and ANSIfy prototypes A long long time ago the register keyword told the compiler to store the corresponding variable in a CPU register, but it is not relevant for any compiler used in the FreeBSD world today. ANSIfy related prototypes while here. Reviewed by: cem, jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D10193	2017-05-17 00:34:34 +00:00
markj	460a69c6a0	When draining a callout, don't clear CALLOUT_ACTIVE while it is running. The callout may reschedule itself and execute again before callout_drain() returns, but we should not clear CALLOUT_ACTIVE until the callout is stopped. Tested by: pho MFC after: 2 weeks Sponsored by: Dell EMC Isilon	2017-03-15 00:29:27 +00:00
jhb	f91c1f1791	Permit timed sleeps for threads other than thread0 before timers are working. The callout subsystem already handles early callouts and schedules the first clock interrupt appropriately based on the currently pending callouts. The one nit to fix was that callouts scheduled via C_HARDCLOCK during early boot could fire too early once timers were enabled as the per-CPU base time is always zero until timers are initialized. The change in callout_when() handles this case by using the current uptime as the base time of the callout during bootup if the per-CPU base time is zero. Reviewed by: kib MFC after: 2 weeks Sponsored by: Netflix	2016-11-25 18:02:43 +00:00
emaste	00b67b15b9	Renumber license clauses in sys/kern to avoid skipping #3	2016-09-15 13:16:20 +00:00
glebius	0d6b2c40a8	Fix a stupid typo (or copy/paste buffer malfunction).	2016-08-16 23:00:22 +00:00
glebius	0495c9a2f3	We should not be allowing a timeout to reset when a drain is in progress on it (either async or sync drain). At this moment the only user of drain is TCP, but TCP wouldn't reschedule a callout after it has drained it, since it drains only when a tcpcb is closed. This for now the problem isn't observed. Submitted by: rrs	2016-08-16 21:55:34 +00:00
kib	9e755a47c1	Fix indentation. Reported by: hselasky MFC after: 17 days	2016-08-10 14:41:53 +00:00
kib	2b85baaf40	Extract the calculation of the callout fire time into the new function callout_when(9). See the man page update for the description of the intended use. Tested by: pho Reviewed by: jhb, bjk (man page updates) Sponsored by: The FreeBSD Foundation MFC after: 1 month X-Differential revision: https://reviews.freebsd.org/D7137	2016-07-28 08:57:01 +00:00
glebius	ec1d5b2486	Redo the r302894: the very new value for a non-scheduled callout is -1. This was recently added in r290664. Noticed by: hselasky Tested by: Larry Rosenman <ler lerctr.org> PR: 210884	2016-07-20 16:48:25 +00:00
glebius	40325c7987	Revert r303037. It re-introduces the panic with TCP timers. Agreed by: rrs, re (gjb)	2016-07-20 16:44:22 +00:00
rrs	992908d82b	This reverts out Gleb's changes and adds three small fixes that I think closes up the races Gleb was looking for. This is running quite nicely in Netflix and now no longer causes TCP-tcb leaks. Differential Revision: 7135	2016-07-19 18:31:19 +00:00
glebius	26eb470bad	Revert the last commit. It must get more review and testing first.	2016-07-18 09:29:08 +00:00
glebius	65e3443cc5	Redo the r302894: the very new value for a non-scheduled callout is -1. This was recently added in r290664. Noticed by: hselasky PR: 210884	2016-07-18 09:26:06 +00:00
glebius	ba9382e34a	Fix regression introduced by r302350. The change of return value for a callout that wasn't scheduled at all was unintentional and yielded in several panics. PR: 210884	2016-07-15 09:28:32 +00:00
glebius	2b2f782761	The paradigm of a callout is that it has three consequent states: not scheduled -> scheduled -> running -> not scheduled. The API and the manual page assume that, some comments in the code assume that, and looks like some contributors to the code also did. The problem is that this paradigm isn't true. A callout can be scheduled and running at the same time, which makes API description ambigouous. In such case callout_stop() family of functions/macros should return 1 and 0 at the same time, since it successfully unscheduled future callout but the current one is running. Before this change we returned 1 in such a case, with an exception that if running callout was migrating we returned 0, unless CS_MIGRBLOCK was specified. With this change, we now return 0 in case if future callout was unscheduled, but another one is still in action, indicating to API users that resources are not yet safe to be freed. However, the sleepqueue code relies on getting 1 return code in that case, and there already was CS_MIGRBLOCK flag, that covered one of the edge cases. In the new return path we will also use this flag, to keep sleepqueue safe. Since the flag CS_MIGRBLOCK doesn't block migration and now isn't limited to migration edge case, rename it to CS_EXECUTING. This change fixes panics on a high loaded TCP server. Reviewed by: jch, hselasky, rrs, kib Approved by: re (gjb) Differential Revision: https://reviews.freebsd.org/D7042	2016-07-05 18:47:17 +00:00
bz	2919ef7927	Implement a `show panic` command to DDB which will helpfully print the panic string again if set, in case it scrolled out of the active window. This avoids having to remember the symbol name. Also add a show callout <addr> command to DDB in order to inspect some struct callout fields in case of panics in the callout code. This may help to see if there was memory corruption or to further ease debugging problems. Obtained from: projects/vnet MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Reviewed by: jhb (comment only on the show panic initally) Differential Revision: https://reviews.freebsd.org/D4527	2016-06-06 20:57:24 +00:00
pfg	28823d0656	sys/kern: spelling fixes in comments. No functional change.	2016-04-29 22:15:33 +00:00
pfg	fc01419148	sys: extend use of the howmany() macro when available. We have a howmany() macro in the <sys/param.h> header that is convenient to re-use as it makes things easier to read.	2016-04-26 15:38:17 +00:00
kib	486320aac4	If callout_stop_safe() noted that the callout is currently executing, but next invocation is cancelled while migrating, sleepq_check_timeout() needs to be informed that the callout is stopped. Otherwise the thread switches off CPU and never become runnable, since running callout could have already raced with us, while the migrating and cancelled callout could be one which is expected to set TDP_TIMOFAIL flag for us. This contradicts with the expected behaviour of callout_stop() for other callers, which e.g. decrement references from the callout callbacks. Add a new flag CS_MIGRBLOCK requesting report of the situation as 'successfully stopped'. Reviewed by: jhb (previous version) Tested by: cognet, pho PR: 200992 Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D5221	2016-03-02 18:46:17 +00:00
markj	fa1b8e9a4f	Fix style issues around existing SDT probes. - Use SDT_PROBE<N>() instead of SDT_PROBE(). This has no functional effect at the moment, but will be needed for some future changes. - Don't hardcode the module component of the probe identifier. This is set automatically by the SDT framework. MFC after: 1 week	2015-12-16 23:39:27 +00:00
rrs	24a4335f25	Add new async_drain to the callout system. This is so-far not used but should be used by TCP for sure in its cleanup of the IN-PCB (will be coming shortly). Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D4076	2015-11-10 14:49:32 +00:00
avg	425c0bb088	save some bytes by using more concise SDT_PROBE<n> instead of SDT_PROBE SDT_PROBE requires 5 parameters whereas SDT_PROBE<n> requires n parameters where n is typically smaller than 5. Perhaps SDT_PROBE should be made a private implementation detail. MFC after: 20 days	2015-09-28 12:14:16 +00:00
hselasky	d198450d08	Revert r287780 until more developers have their say. Differential Revision: https://reviews.freebsd.org/D3521 Requested by: gnn	2015-09-22 06:51:55 +00:00
hselasky	1a99c0928d	Implement callout_drain_async(), inspired by the projects/hps_head branch. This function is used to drain a callout via a callback instead of blocking the caller until the drain is complete. Refer to the callout_drain_async() manual page for a detailed description. Limitation: If a lock is used with the callout, the callout can only be drained asynchronously one time unless the callout_init_mtx() function is called again. This limitation is not present in projects/hps_head and will require more invasive changes to the timeout code, which was not in the scope of this patch. Differential Revision: https://reviews.freebsd.org/D3521 Reviewed by: wblock MFC after: 1 month	2015-09-14 10:52:26 +00:00
avg	a047794b25	callout_reset: fix a reversed check for cc_exec_cancel The typo was introduced in r278469 / `344ecf88af`. As a result of the bug there was a timing window where callout_reset() would fail to cancel a concurrent execution of a callout that is about to start and would schedule the callout again. The callout would fire more times than it is scheduled. That would happen even if the callout is initialized with a lock. For example, the bug triggered the "Stray timeout" assertion in taskqueue_timeout_func(). MFC after: 5 days	2015-09-01 09:27:14 +00:00
jch	f011b0c89e	Revert r286880: If at first this change made sense, it turns out it helps only the TCP timers callout(9) usage. As the benefit for others callout(9) usages did not reach a consensus the historical usage should prevail. Differential Revision: https://reviews.freebsd.org/D3078	2015-08-30 13:44:46 +00:00
jch	761d6ba633	Silent a compilation warning on callout_stop()	2015-08-27 10:43:35 +00:00
jch	4015a832a3	In callout_stop(), do not forget to initialize not_running variable. Thanks to hselasky for noticing that. Differential Revision: https://reviews.freebsd.org/D3078 (Updated) Submitted by: hselasky Pointy hat to: jch	2015-08-27 08:58:03 +00:00
jch	433b16759b	In callout_stop(), if a callout is both pending and currently being serviced return 0 (fail) but it is applicable only mpsafe callouts. Thanks to hselasky for finding this. Differential Revision: https://reviews.freebsd.org/D3078 (Updated) Submitted by: hselasky Reviewed by: jch	2015-08-27 08:15:32 +00:00
jch	7242654eab	callout_stop() should return 0 (fail) when the callout is currently being serviced and indeed unstoppable. A scenario to reproduce this case is: - the callout is being serviced and at same time, - callout_reset() is called on this callout that sets the CALLOUT_PENDING flag and at same time, - callout_stop() is called on this callout and returns 1 (success) even if the callout is indeed currently running and unstoppable. This issue was caught up while making r284245 (D2763) workaround, and was discussed at BSDCan 2015. Once applied the r284245 workaround is not needed anymore and will be reverted. Differential Revision: https://reviews.freebsd.org/D3078 Reviewed by: jhb Sponsored by: Verisign, Inc.	2015-08-18 10:15:09 +00:00
rrs	9d7646111e	Fix my stupid restoral of old code.. must be c_iflags now. Thanks jhb for catching my stupidity... MFC after: 3 days	2015-04-14 00:02:39 +00:00
rrs	89f83321ce	Restore the two lines accidentally deleted that allow CALLOUT_DIRECT to be specifed in the flags. Thanks Mark Johnston for noticing this ;-o MFC after: 3 days	2015-04-13 23:06:13 +00:00
rrs	59d85f7f18	Adopt jhb's suggested changes, updated comments and callout_migration() moving to kern/kern_timeout.c This does not address his -1 -> NOCPU comment. Sponsored by: Netflix Inc.	2015-03-31 00:18:00 +00:00
bz	edcfd40470	Try to unbreak !SMP kernels broken in r280785 by using the proper macros to access cc_cpu.	2015-03-28 15:07:19 +00:00
rrs	8e2d411633	Change the callout to supply -1 to indicate we are not changing CPU, also add protection against invalid CPU's as well as split c_flags and c_iflags so that if a user plays with the active flag (the one expected to be played with by callers in MPSAFE) without a lock, it won't adversely affect the callout system by causing a corrupt list. This also means that all callers need to use the macros and not play with the falgs directly (like netgraph used to). Differential Revision: htts://reviews.freebsd.org/D1894 Reviewed by: .. timed out but looked at by jhb, imp, adrian hselasky tested by hiren and netflix. Sponsored by: Netflix Inc.	2015-03-28 12:50:24 +00:00
rrs	e878a76f46	This fixes a bug I in-advertantly inserted when I updated the callout code in my last commit. The cc_exec_next is used to track the next when a direct call is being made from callout. It is never used in the in-direct method. When macro-izing I made it so that it would separate out direct/vs/non-direct. This is incorrect and can cause panics as Peter Holm has found for me (Thanks so much Peter for all your help in this). What this change does is restore that behavior but also get rid of the cc_next from the array and instead make it be part of the base callout structure. This way no one else will get confused since we will never use it for non-direct. Reviewed by: Peter Holm and more importantly tested by him ;-) MFC after: 3 days. Sponsored by: Netflix Inc.	2015-02-12 13:31:08 +00:00
rrs	344ecf88af	This fixes two conditions that can incur when migration is being done in the callout code and harmonizes the macro use.: 1) The callout_active() will lie. Basically if a migration is occuring and the callout is about to expire and the migration has been deferred, the callout_active will no longer return true until after the migration. This confuses and breaks callers that are doing callout_init(&c, 1); such as TCP. 2) The migration code had a bug in it where when migrating, if a two calls to callout_reset came in and they both collided with the callout on the wheel about to run, then the second call to callout_reset would corrupt the list the callout wheel uses putting the callout thread into a endless loop. 3) Per imp, I have fixed all the macro occurance in the code that were for the most part being ignored. Phabricator D1711 and looked at by lstewart and jhb and sbruno. Reviewed by: kostikbel, imp, adrian, hselasky MFC after: 3 days Sponsored by: Netflix Inc.	2015-02-09 19:19:44 +00:00
adrian	70dc4fad7a	Call WITNESS_WARN() in callout_drain() to check whether any locks are being held before sleeping. This has bitten me (in ath(4)) once before and I'd like to see this not bite anyone else. Differential Revision: D1638 Reviewed by: jhb, hselasky MFC after: 1 week	2015-01-26 04:04:57 +00:00
hselasky	c0aba3b50d	Revert for r277213: FreeBSD developers need more time to review patches in the surrounding areas like the TCP stack which are using MPSAFE callouts to restore distribution of callouts on multiple CPUs. Bump the __FreeBSD_version instead of reverting it. Suggested by: kmacy, adrian, glebius and kib Differential Revision: https://reviews.freebsd.org/D1438	2015-01-22 11:12:42 +00:00
hselasky	a9eed96a6b	Major callout subsystem cleanup and rewrite: - Close a migration race where callout_reset() failed to set the CALLOUT_ACTIVE flag. - Callout callback functions are now allowed to be protected by spinlocks. - Switching the callout CPU number cannot always be done on a per-callout basis. See the updated timeout(9) manual page for more information. - The timeout(9) manual page has been updated to reflect how all the functions inside the callout API are working. The manual page has been made function oriented to make it easier to deduce how each of the functions making up the callout API are working without having to first read the whole manual page. Group all functions into a handful of sections which should give a quick top-level overview when the different functions should be used. - The CALLOUT_SHAREDLOCK flag and its functionality has been removed to reduce the complexity in the callout code and to avoid problems about atomically stopping callouts via callout_stop(). If someone needs it, it can be re-added. From my quick grep there are no CALLOUT_SHAREDLOCK clients in the kernel. - A new callout API function named "callout_drain_async()" has been added. See the updated timeout(9) manual page for a complete description. - Update the callout clients in the "kern/" folder to use the callout API properly, like cv_timedwait(). Previously there was some custom sleepqueue code in the callout subsystem, which has been removed, because we now allow callouts to be protected by spinlocks. This allows us to tear down the callout like done with regular mutexes, and a "td_slpmutex" has been added to "struct thread" to atomically teardown the "td_slpcallout". Further the "TDF_TIMOFAIL" and "SWT_SLEEPQTIMO" states can now be completely removed. Currently they are marked as available and will be cleaned up in a follow up commit. - Bump the __FreeBSD_version to indicate kernel modules need recompilation. - There has been several reports that this patch "seems to squash a serious bug leading to a callout timeout and panic". Kernel build testing: all architectures were built MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D1438 Sponsored by: Mellanox Technologies Reviewed by: jhb, adrian, sbruno and emaste	2015-01-15 15:32:30 +00:00
jhb	877e018654	Add schedgraph traces for callout handlers. Specifically, a callwheel logs a running event each time it executes a callout function. The event includes the function pointer, argument, and whether or not it was run from hardware interrupt context. The callwheel is marked idle when each handler completes. This effectively logs the duration of each callout routine in the graph.	2014-10-08 16:22:59 +00:00
adrian	b39e0d405b	If we're doing RSS then ensure that the callwheel swi's are CPU pinned.	2014-06-30 04:25:51 +00:00
hselasky	35b126e324	Pull in r267961 and r267973 again. Fix for issues reported will follow.	2014-06-28 03:56:17 +00:00
gjb	fc21f40567	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory	2014-06-27 22:05:21 +00:00
hselasky	bd1ed65f0f	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies	2014-06-27 16:33:43 +00:00
davide	e9bd39eb77	Convert functions to the new-style format. Submitted by: Vijay Singh <vijju.singh@gmail.com> via -hackers	2014-06-05 03:46:46 +00:00
adrian	65fc060c58	Add in support to optionally pin the swi threads. Under enough load, the swi's can actually be preempted and migrated to other currently free cores. When doing RSS experiments, this lead to the per-CPU TCP timers not lining up any more with the RX CPU said flows were ending up on, leading to increased lock contention. Since there was a little pushback on flipping them on by default, I've left the default at "don't pin." The other less obvious problem here is that the default swi is also the same as the destination swi for CPU #0. So if one pins the swi on CPU #0, there's no default floating swi. A nice future project would be to create a separate swi for the "default" floating swi, as well as per-CPU swis that are (optionally) pinned. Tested: * parallel TCP tests (2 x 1g unfortunately for now); CPU: Intel(R) Xeon(R) CPU E5-2650 Note: This is based on some initial investigation into RSS/TCP stack lock contention on FreeBSD-HEAD whilst at Netflix in January 2014.	2014-05-10 00:53:36 +00:00
davide	aeb4d19339	Hide internal details of sbintime_t implementation wrapping INT64_MAX into SBT_MAX, to make it more robust in case internal type representation will change in the future. All the consumers were migrated to SBT_MAX and every new consumer (if any) should from now use this interface. Requested by: bapt, jmg, Ryan Lortie (implictly) Reviewed by: mav, bde	2014-04-12 23:29:29 +00:00

1 2 3 4 5

208 Commits