freebsd-skq

Author	SHA1	Message	Date
Mark Johnston	304dcfb0d8	Handle PCATCH in blockcount_sleep() so it can be interrupted. blockcount_wait() still unconditionally waits for the count to reach zero before returning. Tested by: pho (a larger patch) Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24513	2020-04-21 17:13:06 +00:00
Pawel Biernacki	b05ca4290c	sys/: Document few more sysctls. Submitted by: Antranig Vartanian <antranigv@freebsd.am> Reviewed by: kaktus Commented by: jhb Approved by: kib (mentor) Sponsored by: illuria security Differential Revision: https://reviews.freebsd.org/D23759	2020-03-02 15:30:52 +00:00
Mark Johnston	c99d0c5801	Add a blocking counter KPI. refcount(9) was recently extended to support waiting on a refcount to drop to zero, as this was needed for a lockless VM object paging-in-progress counter. However, this adds overhead to all uses of refcount(9) and doesn't really match traditional refcounting semantics: once a counter has dropped to zero, the protected object may be freed at any point and it is not safe to dereference the counter. This change removes that extension and instead adds a new set of KPIs, blockcount_*, for use by VM object PIP and busy. Reviewed by: jeff, kib, mjg Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23723	2020-02-28 16:05:18 +00:00
Mateusz Guzik	d8a84f08e8	refcount: update comments about fencing when releasing counts after r357989 Requested by: kib Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23719	2020-02-16 18:20:09 +00:00
Jeff Roberson	811d05fcb7	Provide an API for interlocked refcount sleeps. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D22908	2020-01-19 18:18:17 +00:00
Mateusz Guzik	879e0604ee	Add KERNEL_PANICKED macro for use in place of direct panicstr tests	2020-01-12 06:07:54 +00:00
Conrad Meyer	fea73412a0	sleep(9), sleepqueue(9): const'ify wchan pointers _sleep(9), wakeup(9), sleepqueue(9), et al do not dereference or modify the channel pointers provided in any way; they are merely used as intptrs into a dictionary structure to match waiters with wakers. Correctly annotate this such that _sleep() and wakeup() may be used on const pointers without invoking ugly patterns like __DECONST(). Plumb const through all of the underlying sleepqueue bits. No functional change. Reviewed by: rlibby Discussed with: kib, markj Differential Revision: https://reviews.freebsd.org/D22914	2019-12-24 16:19:33 +00:00
Conrad Meyer	e30f025ff9	kern_synch: Fix some UB It is UB to evaluate pointer comparisons when pointers do not point within the same object. Instead, convert the pointers to numbers and compare the numbers. Reported by: kib Discussed with: rlibby	2019-12-24 06:08:29 +00:00
Jeff Roberson	686bcb5c14	schedlock 4/4 Don't hold the scheduler lock while doing context switches. Instead we unlock after selecting the new thread and switch within a spinlock section leaving interrupts and preemption disabled to prevent local concurrency. This means that mi_switch() is entered with the thread locked but returns without. This dramatically simplifies scheduler locking because we will not hold the schedlock while spinning on blocked lock in switch. This change has not been made to 4BSD but in principle it would be more straightforward. Discussed with: markj Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D22778	2019-12-15 21:26:50 +00:00
Jeff Roberson	61a74c5ccd	schedlock 1/4 Eliminate recursion from most thread_lock consumers. Return from sched_add() without the thread_lock held. This eliminates unnecessary atomics and lock word loads as well as reducing the hold time for scheduler locks. This will eventually allow for lockless remote adds. Discussed with: kib Reviewed by: jhb Tested by: pho Differential Revision: https://reviews.freebsd.org/D22626	2019-12-15 21:11:15 +00:00
Gleb Smirnoff	5757b59f3e	Merge td_epochnest with td_no_sleeping. Epoch itself doesn't rely on the counter and it is provided merely for sleeping subsystems to check it. - In functions that sleep use THREAD_CAN_SLEEP() to assert correctness. With EPOCH_TRACE compiled print epoch info. - _sleep() was a wrong place to put the assertion for epoch, right place is sleepq_add(), as there ways to call the latter bypassing _sleep(). - Do not increase td_no_sleeping in non-preemptible epochs. The critical section would trigger all possible safeguards, no sleeping counter is extraneous. Reviewed by: kib	2019-10-29 17:28:25 +00:00
Gleb Smirnoff	4b25d1f2e3	Missing from r353596.	2019-10-15 21:32:38 +00:00
Gleb Smirnoff	bac060388f	When assertion for a thread not being in an epoch fails also print all entered epochs. Works with EPOCH_TRACE only. Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D22017	2019-10-15 21:24:25 +00:00
Jeff Roberson	33205c60e7	Add a blocking wait bit to refcount. This allows refs to be used as a simple barrier. Reviewed by: markj, kib Discussed with: jhb Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21254	2019-08-18 11:43:58 +00:00
Alexander Motin	f91aa773be	Add wakeup_any(), cheaper wakeup_one() for taskqueue(9). wakeup_one() and underlying sleepq_signal() spend additional time trying to be fair, waking thread with highest priority, sleeping longest time. But in case of taskqueue there are many absolutely identical threads, and any fairness between them is quite pointless. It makes even worse, since round-robin wakeups not only make previous CPU affinity in scheduler quite useless, but also hide from user chance to see CPU bottlenecks, when sequential workload with one request at a time looks evenly distributed between multiple threads. This change adds new SLEEPQ_UNFAIR flag to sleepq_signal(), making it wakeup thread that went to sleep last, but no longer in context switch (to avoid immediate spinning on the thread lock). On top of that new wakeup_any() function is added, equivalent to wakeup_one(), but setting the flag. On top of that taskqueue(9) is switchied to wakeup_any() to wakeup its threads. As result, on 72-core Xeon v4 machine sequential ZFS write to 12 ZVOLs with 16KB block size spend 34% less time in wakeup_any() and descendants then it was spending in wakeup_one(), and total write throughput increased by ~10% with the same as before CPU usage. Reviewed by: markj, mmacy MFC after: 2 weeks Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D20669	2019-06-20 01:15:33 +00:00
Mateusz Guzik	13a45e4b14	Provide SDT_PROBES_ENABLED macro. Sponsored by: The FreeBSD Foundation	2018-12-08 06:30:41 +00:00
Mateusz Guzik	99ece3a9cd	Reduce sdt-related branch-fest in mi_switch. The code was evaluating flags before resorting to checking if dtrace is enabled. This was inducing forward jumps in the common case.	2018-05-22 08:27:33 +00:00
Ed Maste	891cf3ed44	Use NULL for SYSINIT's last arg, which is a pointer type Sponsored by: The FreeBSD Foundation	2018-05-18 17:58:09 +00:00
Matt Macy	06bf2a6aef	Add simple preempt safe epoch API Read locking is over used in the kernel to guarantee liveness. This API makes it easy to provide livenes guarantees without atomics. Includes epoch_test kernel module to stress test the API. Documentation will follow initial use case. Test case and improvements to preemption handling in response to discussion with mjg@ Reviewed by: imp@, shurd@ Approved by: sbruno@	2018-05-10 17:55:24 +00:00
Hans Petter Selasky	2077229b56	Allow pause_sbt() to catch signals during sleep by passing C_CATCH flag. Define pause_sig() function macro helper similarly to other kernel functions which catch signals. Update outdated function description. Discussed with: kib@ MFC after: 1 week Sponsored by: Mellanox Technologies	2018-03-03 18:36:38 +00:00
Hans Petter Selasky	54fc03834a	Correct the return code from pause() during cold startup from zero to EWOULDBLOCK. This also matches the description in pause(9). Discussed with: kib@ MFC after: 1 week Sponsored by: Mellanox Technologies	2018-03-03 18:12:21 +00:00
Alexander Kabaev	151ba7933a	Do pass removing some write-only variables from the kernel. This reduces noise when kernel is compiled by newer GCC versions, such as one used by external toolchain ports. Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial) Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c) Differential Revision: https://reviews.freebsd.org/D10385	2017-12-25 04:48:39 +00:00
Pedro F. Giffuni	51369649b0	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.	2017-11-20 19:43:44 +00:00
Gleb Smirnoff	83c9dea1ba	- Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeter in place. To do per-cpu stats, convert all fields that previously were maintained in the vmmeters that sit in pcpus to counter(9). - Since some vmmeter stats may be touched at very early stages of boot, before we have set up UMA and we can do counter_u64_alloc(), provide an early counter mechanism: o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter. o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter, so that at early stages of boot, before counters are allocated we already point to a counter that can be safely written to. o For sparc64 that required a whole dummy pcpu[MAXCPU] array. Further related changes: - Don't include vmmeter.h into pcpu.h. - vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit, to match kernel representation. - struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion. This is based on benno@'s 4-year old patch: https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html Reviewed by: kib, gallatin, marius, lidl Differential Revision: https://reviews.freebsd.org/D10156	2017-04-17 17:34:47 +00:00
Andriy Gapon	20c69e76b4	dtrace sched:::preempt should fire only when there is preemption The probe fire on any thread switch before. Reviewed by: markj MFC after: 1 week Sponsored by: Panzura	2017-03-25 19:08:51 +00:00
Andriy Gapon	afa0a46cfd	move thread switch tracing from mi_switch to sched_switch This is done so that the thread state changes during the switch are not confused with the thread state changes reported when the thread spins on a lock. Here is an example, three consecutive entries for the same thread (from top to bottom): KTRGRAPH group:"thread", id:"zio_write_intr_3 tid 100260", state:"sleep", attributes: prio:84, wmesg:"-", lockname:"(null)" KTRGRAPH group:"thread", id:"zio_write_intr_3 tid 100260", state:"spinning", attributes: lockname:"sched lock 1" KTRGRAPH group:"thread", id:"zio_write_intr_3 tid 100260", state:"running", attributes: none The above trace could leave an impression that the final state of the thread was "running". After this change the sleep state will be reported after the "spinning" and "running" states reported for the sched lock. Reviewed by: jhb, markj MFC after: 1 week Sponsored by: Panzura Differential Revision: https://reviews.freebsd.org/D9961	2017-03-23 08:57:04 +00:00
Mateusz Guzik	91fa47076d	Introduce SCHEDULER_STOPPED_TD for use when the thread pointer was already read Sprinkle in few places.	2017-02-17 06:45:04 +00:00
Ed Maste	bf9ebe74e2	disambiguate msleep KASSERT diagnostics Previously "panic: msleep" could happen for a few different reasons. Break the KASSERTs out into individual cases to identify the failing condition. Found during the investigation that resulted in r308288. Reviewed by: kib, jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D8604	2017-01-16 20:34:42 +00:00
John Baldwin	99bc7e4123	Don't spin in pause() during early boot for kthreads other than thread0. pause() uses a spin loop to simulate a sleep during early boot. However, we only need this for thread0 to get far enough in the boot process to enable timers (at which point pause() can sleep). For other kthreads, sleeping in pause() is ok as the callout will be scheduled and will eventually fire once thread0 initializes timers. Tested by: Steven Kargl Sleuthing by: markj MFC after: 1 week Sponsored by: Netflix	2016-12-20 19:44:44 +00:00
Konstantin Belousov	75409ce1dd	Remove remnants of the recursive sleep support. Instead assert that we never try to sleep while the thread is on a sleepqueue. Reviewed by: jhb Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D8422	2016-11-02 20:57:20 +00:00
Ed Maste	69a2875821	Renumber license clauses in sys/kern to avoid skipping #3	2016-09-15 13:16:20 +00:00
Konstantin Belousov	93ccd6bf87	Get rid of struct proc p_sched and struct thread td_sched pointers. p_sched is unused. The struct td_sched is always co-allocated with the struct thread, except for the thread0. Avoid useless indirection, instead calculate td_sched location using simple pointer arithmetic in td_get_sched(9). For thread0, which is statically allocated, create a structure to emulate layout of the dynamic allocation. Reviewed by: jhb (previous version) Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D6711	2016-06-05 17:04:03 +00:00
Hans Petter Selasky	bc64e52679	Use DELAY() instead of _sleep() when SCHEDULER_STOPPED() is set inside pause_sbt(). This allows pause() to continue working during a panic() which is not invoking KDB. This is useful when debugging graphics drivers using the LinuxKPI. Obtained from: kmacy @ MFC after: 1 week	2016-05-23 10:31:54 +00:00
Pedro F. Giffuni	55e0987aea	sys: extend use of the howmany() macro when available. We have a howmany() macro in the <sys/param.h> header that is convenient to re-use as it makes things easier to read.	2016-04-26 15:38:17 +00:00
John Baldwin	b4f1d267b7	Rework handling of thread sleeps before timers are working. Previously, calls to sleep() and cv_wait*() immediately returned during early boot. Instead, permit threads that request a sleep without a timeout to sleep as wakeup() works during early boot. Sleeps with timeouts are harder to emulate without working timers, so just punt and panic explicitly if any thread tries to use those before timers are working. Any threads that depend on timeouts should either wait until SI_SUB_KICK_SCHEDULER to start or they should use DELAY() until timers are available. Until APs are started earlier this should be a no-op as other kthreads shouldn't get a chance to start running until after timers are working regardless of when they were created. Reviewed by: kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D5724	2016-03-31 18:10:29 +00:00
Conrad Meyer	c975a5d359	Add td_swinvoltick to track last involuntary context switch Expose in DDB via "show thread." Reviewed by: markj Sponsored by: EMC / Isilon Storage Division	2016-03-25 19:35:29 +00:00
Konstantin Belousov	69baeadc31	Remove several write-only variables, all reported by the gcc 4.9 buildkernel run. Some of them were write-only under some kernel options, e.g. variables keeping values only used by CTR() macros. It costs nothing to the code readability and correctness to eliminate the warnings in those cases too by removing the local cached values used only for single-access. Review: https://reviews.freebsd.org/D2665 Reviewed by: rodrigc Looked at by: bjk Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-05-29 13:24:17 +00:00
Jung-uk Kim	fd90e2ed54	CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten years for head. However, it is continuously misused as the mpsafe argument for callout_init(9). Deprecate the flag and clean up callout_init() calls to make them more consistent. Differential Revision: https://reviews.freebsd.org/D2613 Reviewed by: jhb MFC after: 2 weeks	2015-05-22 17:05:21 +00:00
John Baldwin	ed95805e90	Remove support for Xen PV domU kernels. Support for HVM domU kernels remains. Xen is planning to phase out support for PV upstream since it is harder to maintain and has more overhead. Modern x86 CPUs include virtualization extensions that support HVM guests instead of PV guests. In addition, the PV code was i386 only and not as well maintained recently as the HVM code. - Remove the i386-only NATIVE option that was used to disable certain components for PV kernels. These components are now standard as they are on amd64. - Remove !XENHVM bits from PV drivers. - Remove various shims required for XEN (e.g. PT_UPDATES_FLUSH, LOAD_CR3, etc.) - Remove duplicate copy of <xen/features.h>. - Remove unused, i386-only xenstored.h. Differential Revision: https://reviews.freebsd.org/D2362 Reviewed by: royger Tested by: royger (i386/amd64 HVM domU and amd64 PVH dom0) Relnotes: yes	2015-04-30 15:48:48 +00:00
Mark Johnston	38563f7c92	Remove unimplemented sched provider probes. They were added for compatibility with the sched provider in Solaris and illumos, but our sched provider is already incompatible since it uses native types, so there isn't much point in keeping them around. Differential Revision: https://reviews.freebsd.org/D2167 Reviewed by: rpaulo	2015-04-18 20:36:58 +00:00
Hans Petter Selasky	a115fb62ed	Revert for r277213: FreeBSD developers need more time to review patches in the surrounding areas like the TCP stack which are using MPSAFE callouts to restore distribution of callouts on multiple CPUs. Bump the __FreeBSD_version instead of reverting it. Suggested by: kmacy, adrian, glebius and kib Differential Revision: https://reviews.freebsd.org/D1438	2015-01-22 11:12:42 +00:00
Hans Petter Selasky	1a26c3c047	Major callout subsystem cleanup and rewrite: - Close a migration race where callout_reset() failed to set the CALLOUT_ACTIVE flag. - Callout callback functions are now allowed to be protected by spinlocks. - Switching the callout CPU number cannot always be done on a per-callout basis. See the updated timeout(9) manual page for more information. - The timeout(9) manual page has been updated to reflect how all the functions inside the callout API are working. The manual page has been made function oriented to make it easier to deduce how each of the functions making up the callout API are working without having to first read the whole manual page. Group all functions into a handful of sections which should give a quick top-level overview when the different functions should be used. - The CALLOUT_SHAREDLOCK flag and its functionality has been removed to reduce the complexity in the callout code and to avoid problems about atomically stopping callouts via callout_stop(). If someone needs it, it can be re-added. From my quick grep there are no CALLOUT_SHAREDLOCK clients in the kernel. - A new callout API function named "callout_drain_async()" has been added. See the updated timeout(9) manual page for a complete description. - Update the callout clients in the "kern/" folder to use the callout API properly, like cv_timedwait(). Previously there was some custom sleepqueue code in the callout subsystem, which has been removed, because we now allow callouts to be protected by spinlocks. This allows us to tear down the callout like done with regular mutexes, and a "td_slpmutex" has been added to "struct thread" to atomically teardown the "td_slpcallout". Further the "TDF_TIMOFAIL" and "SWT_SLEEPQTIMO" states can now be completely removed. Currently they are marked as available and will be cleaned up in a follow up commit. - Bump the __FreeBSD_version to indicate kernel modules need recompilation. - There has been several reports that this patch "seems to squash a serious bug leading to a callout timeout and panic". Kernel build testing: all architectures were built MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D1438 Sponsored by: Mellanox Technologies Reviewed by: jhb, adrian, sbruno and emaste	2015-01-15 15:32:30 +00:00
Hans Petter Selasky	f0188618f2	Fix multiple incorrect SYSCTL arguments in the kernel: - Wrong integer type was specified. - Wrong or missing "access" specifier. The "access" specifier sometimes included the SYSCTL type, which it should not, except for procedural SYSCTL nodes. - Logical OR where binary OR was expected. - Properly assert the "access" argument passed to all SYSCTL macros, using the CTASSERT macro. This applies to both static- and dynamically created SYSCTLs. - Properly assert the the data type for both static and dynamic SYSCTLs. In the case of static SYSCTLs we only assert that the data pointed to by the SYSCTL data pointer has the correct size, hence there is no easy way to assert types in the C language outside a C-function. - Rewrote some code which doesn't pass a constant "access" specifier when creating dynamic SYSCTL nodes, which is now a requirement. - Updated "EXAMPLES" section in SYSCTL manual page. MFC after: 3 days Sponsored by: Mellanox Technologies	2014-10-21 07:31:21 +00:00
Jean-Sébastien Pédron	ffea80b445	pause_sbt(): Take the cold path (ie. use DELAY()) if KDB is active This fixes a panic in the i915 driver when one uses debug.kdb.enter=1 under vt(4). PR: 193269 Reported by: emaste@ Submitted by: avg@ MFC after: 3 days	2014-09-08 08:44:50 +00:00
Andriy Gapon	55050ab560	use saner calculations in should_yield This is based on feedback from bde. MFC after: 6 days	2013-11-26 14:00:50 +00:00
Andriy Gapon	d9fae5ab88	dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE In its stead use the Solaris / illumos approach of emulating '-' (dash) in probe names with '__' (two consecutive underscores). Reviewed by: markj MFC after: 3 weeks	2013-11-26 08:46:27 +00:00
Attilio Rao	54366c0bd7	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip	2013-11-25 07:38:45 +00:00
Alexander Motin	ea4af9c09a	Make load average sampling asynchronous to hardclock ticks. This improves measurement of load caused by time-related events still using hardclock. For example, without this change dummynet, scheduling events each hardclock tick, was always miscounted as load of 1. There is still aliasing with events delayed by the new precision mechanism, but it probably can't be avoided without moving this sampling from using callout to some lower-level code or handling it in some other special way. Reviewed by: davide Approved by: re (marius)	2013-09-24 07:03:16 +00:00
Davide Italiano	7faf4d90e8	Fix lc_lock/lc_unlock() support for rmlocks held in shared mode. With current lock classes KPI it was really difficult because there was no way to pass an rmtracker object to the lock/unlock routines. In order to accomplish the task, modify the aforementioned functions so that they can return (or pass as argument) an uinptr_t, which is in the rm case used to hold a pointer to struct rm_priotracker for current thread. As an added bonus, this fixes rm_sleep() in the rm shared case, which right now can communicate priotracker structure between lc_unlock()/lc_lock(). Suggested by: jhb Reviewed by: jhb Approved by: re (delphij)	2013-09-20 23:06:21 +00:00
Hans Petter Selasky	5a280dbdd6	Simplify pause_sbt() logic. Don't call DELAY() if remainder is less than or equal to zero.	2013-08-30 10:39:56 +00:00

1 2 3 4 5 ...

389 Commits