freebsd-dev

Author	SHA1	Message	Date
Hans Petter Selasky	826c079373	Add full support support for dynamic allocation and freeing of epoch's. Make sure to reclaim epoch structures when they are freed to support dynamic allocation and freeing of epoch structures. While at it, move the 64 supported epoch control structures to the static memory domain. This overall simplifies the management and debugging of system epoch's. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D25960 MFC after: 1 week Sponsored by: Mellanox Technologies	2020-08-07 15:32:42 +00:00
Pawel Biernacki	7029da5c36	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718	2020-02-26 14:26:36 +00:00
Mateusz Guzik	48baf00f54	epoch: convert zpcpu_get_cpua(.., curcpu) to zpcpu_get	2020-02-12 11:10:10 +00:00
Gleb Smirnoff	66c6c556b6	Change argument order of epoch_call() to more natural, first function, then its argument. Reviewed by: imp, cem, jhb	2020-01-17 06:10:24 +00:00
Konstantin Belousov	fedab1b499	Code must not unlock a mutex while owning the thread lock. Reviewed by: hselasky, markj Sponsored by: Mellanox Technologies MFC after: 1 week Differential revision: https://reviews.freebsd.org/D23150	2020-01-13 14:30:19 +00:00
Hans Petter Selasky	cc79ea3a26	Restore important comment in RCU/EPOCH support in FreeBSD after r355784. Sponsored by: Mellanox Technologies	2019-12-18 09:30:32 +00:00
Jeff Roberson	686bcb5c14	schedlock 4/4 Don't hold the scheduler lock while doing context switches. Instead we unlock after selecting the new thread and switch within a spinlock section leaving interrupts and preemption disabled to prevent local concurrency. This means that mi_switch() is entered with the thread locked but returns without. This dramatically simplifies scheduler locking because we will not hold the schedlock while spinning on blocked lock in switch. This change has not been made to 4BSD but in principle it would be more straightforward. Discussed with: markj Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D22778	2019-12-15 21:26:50 +00:00
Bjoern A. Zeeb	173c062a56	Improve EPOCH_TRACE Two changes to EPOCH_TRACE: (1) add a sysctl to surpress the backtrace from epoch_trace_report(). Sometimes the log line for the recursion is enough and the backtrace massively spams the console. (2) In order to be able to go without the backtrace do not only print where the previous occurance happened, but also where the current one happens. That way we have file:line information for both and can look at them without the need for getting line numbers from backtrace and a debugging tool. Reviewed by: glebius Sponsored by: Netflix (originally) Differential Revision: https://reviews.freebsd.org/D22641	2019-12-06 16:34:04 +00:00
Conrad Meyer	7993a104a1	Add explicit SI_SUB_EPOCH Add explicit SI_SUB_EPOCH, after SI_SUB_TASKQ and before SI_SUB_SMP (EARLY_AP_STARTUP). Rename existing "SI_SUB_TASKQ + 1" to SI_SUB_EPOCH. epoch(9) consumers cannot epoch_alloc() before SI_SUB_EPOCH:SI_ORDER_SECOND, but likely should allocate before SI_SUB_SMP. Prior to this change, consumers (well, epoch itself, and net/if.c) just open-coded the SI_SUB_TASKQ + 1 order to match epoch.c, but this was fragile. Reviewed by: mmacy Differential Revision: https://reviews.freebsd.org/D22503	2019-11-22 23:23:40 +00:00
Gleb Smirnoff	5757b59f3e	Merge td_epochnest with td_no_sleeping. Epoch itself doesn't rely on the counter and it is provided merely for sleeping subsystems to check it. - In functions that sleep use THREAD_CAN_SLEEP() to assert correctness. With EPOCH_TRACE compiled print epoch info. - _sleep() was a wrong place to put the assertion for epoch, right place is sleepq_add(), as there ways to call the latter bypassing _sleep(). - Do not increase td_no_sleeping in non-preemptible epochs. The critical section would trigger all possible safeguards, no sleeping counter is extraneous. Reviewed by: kib	2019-10-29 17:28:25 +00:00
Gleb Smirnoff	080e9496b8	Allow epoch tracker to use the very last byte of the stack. Not sure this will help to avoid panic in this function, since it will also use some stack, but makes code more strict. Submitted by: hselasky	2019-10-22 18:05:15 +00:00
Gleb Smirnoff	77d70e515f	Assert that any epoch tracker belongs to the thread stack. Reviewed by: kib	2019-10-21 23:12:14 +00:00
Gleb Smirnoff	279b9aabe3	Remove epoch tracker from struct thread. It was an ugly crutch to emulate locking semantics for if_addr_rlock() and if_maddr_rlock().	2019-10-21 18:19:32 +00:00
Gleb Smirnoff	bac060388f	When assertion for a thread not being in an epoch fails also print all entered epochs. Works with EPOCH_TRACE only. Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D22017	2019-10-15 21:24:25 +00:00
Gleb Smirnoff	f6eccf96a0	Since EPOCH_TRACE had been moved to opt_global.h, we don't need to waste extra space in struct thread.	2019-10-14 04:17:56 +00:00
Gleb Smirnoff	dd902d015a	Add debugging facility EPOCH_TRACE that checks that epochs entered are properly nested and warns about recursive entrances. Unlike with locks, there is nothing fundamentally wrong with such use, the intent of tracer is to help to review complex epoch-protected code paths, and we mean the network stack here. Reviewed by: hselasky Sponsored by: Netflix Pull Request: https://reviews.freebsd.org/D21610	2019-09-25 18:26:31 +00:00
Mark Johnston	2fb62b1a46	Fix the turnstile_lock() KPI. turnstile_{lock,unlock}() were added for use in epoch. turnstile_lock() returned NULL to indicate that the calling thread had lost a race and the turnstile was no longer associated with the given lock, or the lock owner. However, reader-writer locks may not have a designated owner, in which case turnstile_lock() would return NULL and epoch_block_handler_preempt() would leak spinlocks as a result. Apply a minimal fix: return the lock owner as a separate return value. Reviewed by: kib MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21048	2019-07-24 23:04:59 +00:00
Hans Petter Selasky	131b2b7658	Implement API for draining EPOCH(9) callbacks. The epoch_drain_callbacks() function is used to drain all pending callbacks which have been invoked by prior epoch_call() function calls on the same epoch. This function is useful when there are shared memory structure(s) referred to by the epoch callback(s) which are not refcounted and are rarely freed. The typical place for calling this function is right before freeing or invalidating the shared resource(s) used by the epoch callback(s). This function can sleep and is not optimized for performance. Differential Revision: https://reviews.freebsd.org/D20109 MFC after: 1 week Sponsored by: Mellanox Technologies	2019-06-28 10:38:56 +00:00
Marius Strobl	f855ec814d	Make taskqgroup_attach{,_cpu}(9) work across architectures So far, intr_{g,s}etaffinity(9) take a single int for identifying a device interrupt. This approach doesn't work on all architectures supported, as a single int isn't sufficient to globally specify a device interrupt. In particular, with multiple interrupt controllers in one system as found on e. g. arm and arm64 machines, an interrupt number as returned by rman_get_start(9) may be only unique relative to the bus and, thus, interrupt controller, a certain device hangs off from. In turn, this makes taskqgroup_attach{,_cpu}(9) and - internal to the gtaskqueue implementation - taskqgroup_attach_deferred{,_cpu}() not work across architectures. Yet in turn, iflib(4) as gtaskqueue consumer so far doesn't fit architectures where interrupt numbers aren't globally unique. However, at least for intr_setaffinity(..., CPU_WHICH_IRQ, ...) as employed by the gtaskqueue implementation to bind an interrupt to a particular CPU, using bus_bind_intr(9) instead is equivalent from a functional point of view, with bus_bind_intr(9) taking the device and interrupt resource arguments required for uniquely specifying a device interrupt. Thus, change the gtaskqueue implementation to employ bus_bind_intr(9) instead and intr_{g,s}etaffinity(9) to take the device and interrupt resource arguments required respectively. This change also moves struct grouptask from <sys/_task.h> to <sys/gtaskqueue.h> and wraps struct gtask along with the gtask_fn_t typedef into #ifdef _KERNEL as userland likes to include <sys/_task.h> or indirectly drags it in - for better or worse also with _KERNEL defined -, which with device_t and struct resource dependencies otherwise is no longer as easily possible now. The userland inclusion problem probably can be improved a bit by introducing a _WANT_TASK (as well as a _WANT_MOUNT) akin to the existing _WANT_PRISON etc., which is orthogonal to this change, though, and likely needs an exp-run. While at it: - Change the gt_cpu member in the grouptask structure to be of type int as used elswhere for specifying CPUs (an int16_t may be too narrow sooner or later), - move the gtaskqueue_enqueue_fn typedef from <sys/gtaskqueue.h> to the gtaskqueue implementation as it's only used and needed there, - change the GTASK_INIT macro to use "gtask" rather than "task" as argument given that it actually operates on a struct gtask rather than a struct task, and - let subr_gtaskqueue.c consistently use __func__ to print functions names. Reported by: mmel Reviewed by: mmel Differential Revision: https://reviews.freebsd.org/D19139	2019-02-12 21:23:59 +00:00
Matt Macy	91cf497515	epoch(9) revert r340097 - no longer a need for multiple sections per cpu I spoke with Samy Bahra and recent changes to CK to make ck_epoch_call and ck_epoch_poll not modify the record have eliminated the need for this.	2018-11-14 00:12:04 +00:00
Gleb Smirnoff	635c18840a	style(9), mostly adjusting overly long lines.	2018-11-13 23:57:34 +00:00
Gleb Smirnoff	a760c50c9e	With epoch not inlined, there is no point in using _lite KPI. While here, remove some unnecessary casts.	2018-11-13 23:45:38 +00:00
Gleb Smirnoff	9f360eecf9	The dualism between epoch_tracker and epoch_thread is fragile and unnecessary. So, expose CK types to kernel and use a single normal structure for epoch_tracker. Reviewed by: jtl, gallatin	2018-11-13 23:20:55 +00:00
Gleb Smirnoff	b79aa45e0e	For compatibility KPI functions like if_addr_rlock() that used to have mutexes but now are converted to epoch(9) use thread-private epoch_tracker. Embedding tracker into ifnet(9) or ifnet derived structures creates a non reentrable function, that will fail miserably if called simultaneously from two different contexts. A thread private tracker will provide a single tracker that would allow to call these functions safely. It doesn't allow nested call, but this is not expected from compatibility KPIs. Reviewed by: markj	2018-11-13 22:58:38 +00:00
Gleb Smirnoff	a82296c2df	Uninline epoch(9) entrance and exit. There is no proof that modern processors would benefit from avoiding a function call, but bloating code. In fact, clang created an uninlined real function for many object files in the network stack. - Move epoch_private.h into subr_epoch.c. Code copied exactly, avoiding any changes, including style(9). - Remove private copies of critical_enter/exit. Reviewed by: kib, jtl Differential Revision: https://reviews.freebsd.org/D17879	2018-11-13 19:02:11 +00:00
Matt Macy	10f42d244b	Convert epoch to read / write records per cpu In discussing D17503 "Run epoch calls sooner and more reliably" with sbahra@ we came to the conclusion that epoch is currently misusing the ck_epoch API. It isn't safe to do a "write side" operation (ck_epoch_call or ck_epoch_poll) in the middle of a "read side" section. Since, by definition, it's possible to be preempted during the middle of an EPOCH_PREEMPT epoch the GC task might call ck_epoch_poll or another thread might call ck_epoch_call on the same section. The right solution is ultimately to change the way that ck_epoch works for this use case. However, as a stopgap for 12 we agreed to simply have separate records for each use case. Tested by: pho@ MFC after: 3 days	2018-11-03 03:43:32 +00:00
Matt Macy	9fec45d8e5	epoch_block_wait: don't check TD_RUNNING struct epoch_thread is not type safe (stack allocated) and thus cannot be dereferenced from another CPU Reported by: novel@	2018-08-09 05:18:27 +00:00
Matt Macy	822e50e3f6	epoch(9): simplify initialization replace manual NUMA aware allocation with a pcpu zone	2018-07-06 06:20:03 +00:00
Matt Macy	10b8cd7f55	epoch(9): make nesting assert in epoch_wait_preempt more specific Reported by: markj	2018-07-04 21:34:08 +00:00
Matt Macy	6573d7580b	epoch(9): allow preemptible epochs to compose - Add tracker argument to preemptible epochs - Inline epoch read path in kernel and tied modules - Change in_epoch to take an epoch as argument - Simplify tfb_tcp_do_segment to not take a ti_locked argument, there's no longer any benefit to dropping the pcbinfo lock and trying to do so just adds an error prone branchfest to these functions - Remove cases of same function recursion on the epoch as recursing is no longer free. - Remove the the TAILQ_ENTRY and epoch_section from struct thread as the tracker field is now stack or heap allocated as appropriate. Tested by: pho and Limelight Networks Reviewed by: kbowling at llnw dot com Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16066	2018-07-04 02:47:16 +00:00
Matt Macy	74333b3dee	fix assert and conditionally allow mutexes to be held across epoch_wait_preempt	2018-06-24 18:57:06 +00:00
Matt Macy	0bcfb47363	epoch(9): Don't trigger taskq enqueue before the grouptaskqs are setup If EARLY_AP_STARTUP is not defined it is possible for an epoch to be allocated prior to it being possible to call epoch_call without issue. Based on patch by andrew@ PR: 229014 Reported by: andrew	2018-06-23 07:14:08 +00:00
Matt Macy	ae25f40b72	epoch(9): make non-preemptible variant work early boot	2018-06-22 00:47:18 +00:00
Matt Macy	e445381f13	epoch(9): make epoch closer to style(9)	2018-05-30 03:39:57 +00:00
Mark Johnston	13679ebac9	Don't pass a section cookie to CK for non-preemptible epoch sections. They're only useful when multiple threads may share an epoch record, and that can't happen with non-preemptible sections. Reviewed by: mmacy Differential Revision: https://reviews.freebsd.org/D15507	2018-05-21 16:03:51 +00:00
Matt Macy	e339e43685	subr_epoch.c fix unused variable warnings	2018-05-19 03:47:37 +00:00
Matt Macy	20ba6811e6	epoch(9): assert that epoch is allocated post-configure	2018-05-18 18:27:17 +00:00
Matt Macy	70398c2f86	epoch(9): Make epochs non-preemptible by default There are risks associated with waiting on a preemptible epoch section. Change the name to make them not be the default and document the issue under CAVEATS. Reported by: markj	2018-05-18 17:29:43 +00:00
Matt Macy	60b7b90d65	epoch: actually allocate the counters we've assigned sysctls too Approved by: sbruno	2018-05-18 02:57:39 +00:00
Matt Macy	5e68a3dfe3	epoch: add non-preemptible "critical" variant adds: - epoch_enter_critical() - can be called inside a different epoch, starts a section that will acquire any MTX_DEF mutexes or do anything that might sleep. - epoch_exit_critical() - corresponding exit call - epoch_wait_critical() - wait variant that is guaranteed that any threads in a section are running. - epoch_global_critical - an epoch_wait_critical safe epoch instance Requested by: markj Approved by: sbruno	2018-05-18 01:52:51 +00:00
Matt Macy	a5f1042498	epoch: skip poll function call in hardclock unless there are callbacks pending Reported by: mjg Approved by: sbruno	2018-05-17 21:39:15 +00:00
Matt Macy	c4d901e9bd	epoch(9): schedule pcpu callback task in hardclock if there are callbacks pending Approved by: sbruno	2018-05-17 19:57:07 +00:00
Matt Macy	2a45e8282a	epoch(9): eliminate the need to wait when polling for callbacks to run by using ck's own callback handling mechanism we can simply check which callbacks have had a grace period elapse Approved by: sbruno	2018-05-17 19:50:55 +00:00
Matt Macy	d1bcb409f6	epoch(9): fix potential deadlock Don't acquire a waiting thread's lock while holding our own Approved by: sbruno	2018-05-17 19:41:58 +00:00
Matt Macy	766d225326	epoch(9): restore thread priority on exit if it was changed by a waiter Reported by: markj Approved by: sbruno	2018-05-17 19:08:28 +00:00
Matt Macy	fdf71aeb54	epoch(9): make recursion lighter weight There isn't any real work to do except bump td_epochnest when recursing. Skip the additional work in this case. Approved by: sbruno	2018-05-17 01:13:40 +00:00
Matt Macy	b8205686b4	epoch(9): Guarantee forward progress on busy sections Add epoch section to struct thread. We can use this to ennable epoch counter to advance even if a section is perpetually occupied by a thread. Approved by: sbruno	2018-05-17 00:45:35 +00:00
Matt Macy	0c58f85b8d	epoch(9): allow sx locks to be held across epoch_wait() The INVARIANTS checks in epoch_wait() were intended to prevent the block handler from returning with locks held. What it in fact did was preventing anything except Giant from being held across it. Check that the number of locks held has not changed instead. Approved by: sbruno@	2018-05-14 00:14:00 +00:00
Matt Macy	1f4beb6312	epoch(9): cleanups, additional debug checks, and add global_epoch - GC the _nopreempt routines - to really benefit we'd need a separate routine - they're not currently in use - they complicate the API for no benefit at this time - check that we're actually in a epoch section at exit - handle epoch_call() early in boot - Fix copyright declaration language Approved by: sbruno@	2018-05-13 23:24:48 +00:00
Matt Macy	f1401123c5	hwpmc/epoch - don't reference domain if NUMA is not set It appears that domain information is set correctly independent of whether or not NUMA is defined. However, there is no memory backing secondary domains leading to allocation failure. Reported by: pho@, np@ Approved by: sbruno@	2018-05-12 20:00:29 +00:00

1 2

54 Commits