freebsd-dev

Author	SHA1	Message	Date
Mitchell Horne	1029dab634	mi_switch(): clean up switch types and their usage Overall, this is a non-functional change, except for kernels built with SCHED_STATS. However, the switch types are useful for communicating the intent of the caller. 1. Ensure that every caller provides a type. In most cases, we upgrade the basic yield to sched_relinquish() aka SWT_RELINQUISH. 2. The case of sched_bind() is distinct, so add a new switch type SWT_BIND. 3. Remove the two unused types, SWT_PREEMPT and SWT_SLEEPQTIMO. 4. Remove SWT_NONE altogether and assert that callers always provide a type flag. 5. Reference the mi_switch(9) man page in the comments, as these flags will be documented there. Reviewed by: kib, markj Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D38184	2023-02-09 12:01:32 -04:00
Mitchell Horne	d570418bd8	Boolify should_yield() Do this ahead of adding a man page that describes the function. No functional change. Reviewed by: kib, markj MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D38181	2023-02-09 11:58:06 -04:00
Konstantin Belousov	c6d31b8306	AST: rework Make most AST handlers dynamically registered. This allows to have subsystem-specific handler source located in the subsystem files, instead of making subr_trap.c aware of it. For instance, signal delivery code on return to userspace is now moved to kern_sig.c. Also, it allows to have some handlers designated as the cleanup (kclear) type, which are called both at AST and on thread/process exit. For instance, ast(), exit1(), and NFS server no longer need to be aware about UFS softdep processing. The dynamic registration also allows third-party modules to register AST handlers if needed. There is one caveat with loadable modules: the code does not make any effort to ensure that the module is not unloaded before all threads processed through AST handler in it. In fact, this is already present behavior for hwpmc.ko and ufs.ko. I do not think it is worth the efforts and the runtime overhead to try to fix it. Reviewed by: markj Tested by: emaste (arm64), pho Discussed with: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35888	2022-08-02 21:11:09 +03:00
Alan Somers	1d2421ad8b	Correctly measure system load averages > 1024 The old fixed-point arithmetic used for calculating load averages had an overflow at 1024. So on systems with extremely high load, the observed load average would actually fall back to 0 and shoot up again, creating a kind of sawtooth graph. Fix this by using 64-bit math internally, while still reporting the load average to userspace as a 32-bit number. Sponsored by: Axcient Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D35134	2022-05-06 17:25:43 -06:00
Konstantin Belousov	77b2c2f814	Add sched_getcpu() for compatibility with Linux. Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32901	2021-11-10 21:18:54 +02:00
Colin Percival	bd11e253a9	Add _sleep to TSLOG Most of the nvme initialization time in my tests is being spent here (via pause_sbt).	2021-09-05 12:50:15 -07:00
Alexander Motin	4730a8972b	callout(9): Allow spin locks use with callout_init_mtx(). Implement lock_spin()/unlock_spin() lock class methods, moving the assertion to _sleep() instead. Change assertions in callout(9) to allow spin locks for both regular and C_DIRECT_EXEC cases. In case of C_DIRECT_EXEC callouts spin locks are the only locks allowed actually. As the first use case allow taskqueue_enqueue_timeout() use on fast task queues. It actually becomes more efficient due to avoided extra context switches in callout(9) thanks to C_DIRECT_EXEC. MFC after: 2 weeks Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D31778	2021-09-02 21:16:46 -04:00
Andrew Gallatin	1b97a054f3	tsleep: Add a PNOLOCK flag Add a PNOLOCK flag so that, in the race circumstance where wakeup races are externally mitigated, tsleep() can be called with a sleep time of 0 without triggering an an assertion. Reviewed by: jhb Sponsored by: Netflix	2021-08-05 17:16:30 -04:00
Alexander Motin	6df35af4d8	Allow sleepq_signal() to drop the lock. Introduce SLEEPQ_DROP sleepq_signal() flag, allowing one to drop the sleep queue chain lock before returning. Reduced lock scope allows significantly reduce lock contention inside taskqueue_enqueue() for ZFS worker threads doing ~350K disk reads/s on 40-thread system. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2021-06-25 14:12:21 -04:00
Alex Richardson	fa2528ac64	Use atomic loads/stores when updating td->td_state KCSAN complains about racy accesses in the locking code. Those races are fine since they are inside a TD_SET_RUNNING() loop that expects the value to be changed by another CPU. Use relaxed atomic stores/loads to indicate that this variable can be written/read by multiple CPUs at the same time. This will also prevent the compiler from doing unexpected re-ordering. Reported by: GENERIC-KCSAN Test Plan: KCSAN no longer complains, kernel still runs fine. Reviewed By: markj, mjg (earlier version) Differential Revision: https://reviews.freebsd.org/D28569	2021-02-18 14:02:48 +00:00
Mark Johnston	304dcfb0d8	Handle PCATCH in blockcount_sleep() so it can be interrupted. blockcount_wait() still unconditionally waits for the count to reach zero before returning. Tested by: pho (a larger patch) Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24513	2020-04-21 17:13:06 +00:00
Pawel Biernacki	b05ca4290c	sys/: Document few more sysctls. Submitted by: Antranig Vartanian <antranigv@freebsd.am> Reviewed by: kaktus Commented by: jhb Approved by: kib (mentor) Sponsored by: illuria security Differential Revision: https://reviews.freebsd.org/D23759	2020-03-02 15:30:52 +00:00
Mark Johnston	c99d0c5801	Add a blocking counter KPI. refcount(9) was recently extended to support waiting on a refcount to drop to zero, as this was needed for a lockless VM object paging-in-progress counter. However, this adds overhead to all uses of refcount(9) and doesn't really match traditional refcounting semantics: once a counter has dropped to zero, the protected object may be freed at any point and it is not safe to dereference the counter. This change removes that extension and instead adds a new set of KPIs, blockcount_*, for use by VM object PIP and busy. Reviewed by: jeff, kib, mjg Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23723	2020-02-28 16:05:18 +00:00
Mateusz Guzik	d8a84f08e8	refcount: update comments about fencing when releasing counts after r357989 Requested by: kib Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23719	2020-02-16 18:20:09 +00:00
Jeff Roberson	811d05fcb7	Provide an API for interlocked refcount sleeps. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D22908	2020-01-19 18:18:17 +00:00
Mateusz Guzik	879e0604ee	Add KERNEL_PANICKED macro for use in place of direct panicstr tests	2020-01-12 06:07:54 +00:00
Conrad Meyer	fea73412a0	sleep(9), sleepqueue(9): const'ify wchan pointers _sleep(9), wakeup(9), sleepqueue(9), et al do not dereference or modify the channel pointers provided in any way; they are merely used as intptrs into a dictionary structure to match waiters with wakers. Correctly annotate this such that _sleep() and wakeup() may be used on const pointers without invoking ugly patterns like __DECONST(). Plumb const through all of the underlying sleepqueue bits. No functional change. Reviewed by: rlibby Discussed with: kib, markj Differential Revision: https://reviews.freebsd.org/D22914	2019-12-24 16:19:33 +00:00
Conrad Meyer	e30f025ff9	kern_synch: Fix some UB It is UB to evaluate pointer comparisons when pointers do not point within the same object. Instead, convert the pointers to numbers and compare the numbers. Reported by: kib Discussed with: rlibby	2019-12-24 06:08:29 +00:00
Jeff Roberson	686bcb5c14	schedlock 4/4 Don't hold the scheduler lock while doing context switches. Instead we unlock after selecting the new thread and switch within a spinlock section leaving interrupts and preemption disabled to prevent local concurrency. This means that mi_switch() is entered with the thread locked but returns without. This dramatically simplifies scheduler locking because we will not hold the schedlock while spinning on blocked lock in switch. This change has not been made to 4BSD but in principle it would be more straightforward. Discussed with: markj Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D22778	2019-12-15 21:26:50 +00:00
Jeff Roberson	61a74c5ccd	schedlock 1/4 Eliminate recursion from most thread_lock consumers. Return from sched_add() without the thread_lock held. This eliminates unnecessary atomics and lock word loads as well as reducing the hold time for scheduler locks. This will eventually allow for lockless remote adds. Discussed with: kib Reviewed by: jhb Tested by: pho Differential Revision: https://reviews.freebsd.org/D22626	2019-12-15 21:11:15 +00:00
Gleb Smirnoff	5757b59f3e	Merge td_epochnest with td_no_sleeping. Epoch itself doesn't rely on the counter and it is provided merely for sleeping subsystems to check it. - In functions that sleep use THREAD_CAN_SLEEP() to assert correctness. With EPOCH_TRACE compiled print epoch info. - _sleep() was a wrong place to put the assertion for epoch, right place is sleepq_add(), as there ways to call the latter bypassing _sleep(). - Do not increase td_no_sleeping in non-preemptible epochs. The critical section would trigger all possible safeguards, no sleeping counter is extraneous. Reviewed by: kib	2019-10-29 17:28:25 +00:00
Gleb Smirnoff	4b25d1f2e3	Missing from r353596.	2019-10-15 21:32:38 +00:00
Gleb Smirnoff	bac060388f	When assertion for a thread not being in an epoch fails also print all entered epochs. Works with EPOCH_TRACE only. Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D22017	2019-10-15 21:24:25 +00:00
Jeff Roberson	33205c60e7	Add a blocking wait bit to refcount. This allows refs to be used as a simple barrier. Reviewed by: markj, kib Discussed with: jhb Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21254	2019-08-18 11:43:58 +00:00
Alexander Motin	f91aa773be	Add wakeup_any(), cheaper wakeup_one() for taskqueue(9). wakeup_one() and underlying sleepq_signal() spend additional time trying to be fair, waking thread with highest priority, sleeping longest time. But in case of taskqueue there are many absolutely identical threads, and any fairness between them is quite pointless. It makes even worse, since round-robin wakeups not only make previous CPU affinity in scheduler quite useless, but also hide from user chance to see CPU bottlenecks, when sequential workload with one request at a time looks evenly distributed between multiple threads. This change adds new SLEEPQ_UNFAIR flag to sleepq_signal(), making it wakeup thread that went to sleep last, but no longer in context switch (to avoid immediate spinning on the thread lock). On top of that new wakeup_any() function is added, equivalent to wakeup_one(), but setting the flag. On top of that taskqueue(9) is switchied to wakeup_any() to wakeup its threads. As result, on 72-core Xeon v4 machine sequential ZFS write to 12 ZVOLs with 16KB block size spend 34% less time in wakeup_any() and descendants then it was spending in wakeup_one(), and total write throughput increased by ~10% with the same as before CPU usage. Reviewed by: markj, mmacy MFC after: 2 weeks Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D20669	2019-06-20 01:15:33 +00:00
Mateusz Guzik	13a45e4b14	Provide SDT_PROBES_ENABLED macro. Sponsored by: The FreeBSD Foundation	2018-12-08 06:30:41 +00:00
Mateusz Guzik	99ece3a9cd	Reduce sdt-related branch-fest in mi_switch. The code was evaluating flags before resorting to checking if dtrace is enabled. This was inducing forward jumps in the common case.	2018-05-22 08:27:33 +00:00
Ed Maste	891cf3ed44	Use NULL for SYSINIT's last arg, which is a pointer type Sponsored by: The FreeBSD Foundation	2018-05-18 17:58:09 +00:00
Matt Macy	06bf2a6aef	Add simple preempt safe epoch API Read locking is over used in the kernel to guarantee liveness. This API makes it easy to provide livenes guarantees without atomics. Includes epoch_test kernel module to stress test the API. Documentation will follow initial use case. Test case and improvements to preemption handling in response to discussion with mjg@ Reviewed by: imp@, shurd@ Approved by: sbruno@	2018-05-10 17:55:24 +00:00
Hans Petter Selasky	2077229b56	Allow pause_sbt() to catch signals during sleep by passing C_CATCH flag. Define pause_sig() function macro helper similarly to other kernel functions which catch signals. Update outdated function description. Discussed with: kib@ MFC after: 1 week Sponsored by: Mellanox Technologies	2018-03-03 18:36:38 +00:00
Hans Petter Selasky	54fc03834a	Correct the return code from pause() during cold startup from zero to EWOULDBLOCK. This also matches the description in pause(9). Discussed with: kib@ MFC after: 1 week Sponsored by: Mellanox Technologies	2018-03-03 18:12:21 +00:00
Alexander Kabaev	151ba7933a	Do pass removing some write-only variables from the kernel. This reduces noise when kernel is compiled by newer GCC versions, such as one used by external toolchain ports. Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial) Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c) Differential Revision: https://reviews.freebsd.org/D10385	2017-12-25 04:48:39 +00:00
Pedro F. Giffuni	51369649b0	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.	2017-11-20 19:43:44 +00:00
Gleb Smirnoff	83c9dea1ba	- Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeter in place. To do per-cpu stats, convert all fields that previously were maintained in the vmmeters that sit in pcpus to counter(9). - Since some vmmeter stats may be touched at very early stages of boot, before we have set up UMA and we can do counter_u64_alloc(), provide an early counter mechanism: o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter. o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter, so that at early stages of boot, before counters are allocated we already point to a counter that can be safely written to. o For sparc64 that required a whole dummy pcpu[MAXCPU] array. Further related changes: - Don't include vmmeter.h into pcpu.h. - vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit, to match kernel representation. - struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion. This is based on benno@'s 4-year old patch: https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html Reviewed by: kib, gallatin, marius, lidl Differential Revision: https://reviews.freebsd.org/D10156	2017-04-17 17:34:47 +00:00
Andriy Gapon	20c69e76b4	dtrace sched:::preempt should fire only when there is preemption The probe fire on any thread switch before. Reviewed by: markj MFC after: 1 week Sponsored by: Panzura	2017-03-25 19:08:51 +00:00
Andriy Gapon	afa0a46cfd	move thread switch tracing from mi_switch to sched_switch This is done so that the thread state changes during the switch are not confused with the thread state changes reported when the thread spins on a lock. Here is an example, three consecutive entries for the same thread (from top to bottom): KTRGRAPH group:"thread", id:"zio_write_intr_3 tid 100260", state:"sleep", attributes: prio:84, wmesg:"-", lockname:"(null)" KTRGRAPH group:"thread", id:"zio_write_intr_3 tid 100260", state:"spinning", attributes: lockname:"sched lock 1" KTRGRAPH group:"thread", id:"zio_write_intr_3 tid 100260", state:"running", attributes: none The above trace could leave an impression that the final state of the thread was "running". After this change the sleep state will be reported after the "spinning" and "running" states reported for the sched lock. Reviewed by: jhb, markj MFC after: 1 week Sponsored by: Panzura Differential Revision: https://reviews.freebsd.org/D9961	2017-03-23 08:57:04 +00:00
Mateusz Guzik	91fa47076d	Introduce SCHEDULER_STOPPED_TD for use when the thread pointer was already read Sprinkle in few places.	2017-02-17 06:45:04 +00:00
Ed Maste	bf9ebe74e2	disambiguate msleep KASSERT diagnostics Previously "panic: msleep" could happen for a few different reasons. Break the KASSERTs out into individual cases to identify the failing condition. Found during the investigation that resulted in r308288. Reviewed by: kib, jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D8604	2017-01-16 20:34:42 +00:00
John Baldwin	99bc7e4123	Don't spin in pause() during early boot for kthreads other than thread0. pause() uses a spin loop to simulate a sleep during early boot. However, we only need this for thread0 to get far enough in the boot process to enable timers (at which point pause() can sleep). For other kthreads, sleeping in pause() is ok as the callout will be scheduled and will eventually fire once thread0 initializes timers. Tested by: Steven Kargl Sleuthing by: markj MFC after: 1 week Sponsored by: Netflix	2016-12-20 19:44:44 +00:00
Konstantin Belousov	75409ce1dd	Remove remnants of the recursive sleep support. Instead assert that we never try to sleep while the thread is on a sleepqueue. Reviewed by: jhb Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D8422	2016-11-02 20:57:20 +00:00
Ed Maste	69a2875821	Renumber license clauses in sys/kern to avoid skipping #3	2016-09-15 13:16:20 +00:00
Konstantin Belousov	93ccd6bf87	Get rid of struct proc p_sched and struct thread td_sched pointers. p_sched is unused. The struct td_sched is always co-allocated with the struct thread, except for the thread0. Avoid useless indirection, instead calculate td_sched location using simple pointer arithmetic in td_get_sched(9). For thread0, which is statically allocated, create a structure to emulate layout of the dynamic allocation. Reviewed by: jhb (previous version) Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D6711	2016-06-05 17:04:03 +00:00
Hans Petter Selasky	bc64e52679	Use DELAY() instead of _sleep() when SCHEDULER_STOPPED() is set inside pause_sbt(). This allows pause() to continue working during a panic() which is not invoking KDB. This is useful when debugging graphics drivers using the LinuxKPI. Obtained from: kmacy @ MFC after: 1 week	2016-05-23 10:31:54 +00:00
Pedro F. Giffuni	55e0987aea	sys: extend use of the howmany() macro when available. We have a howmany() macro in the <sys/param.h> header that is convenient to re-use as it makes things easier to read.	2016-04-26 15:38:17 +00:00
John Baldwin	b4f1d267b7	Rework handling of thread sleeps before timers are working. Previously, calls to sleep() and cv_wait*() immediately returned during early boot. Instead, permit threads that request a sleep without a timeout to sleep as wakeup() works during early boot. Sleeps with timeouts are harder to emulate without working timers, so just punt and panic explicitly if any thread tries to use those before timers are working. Any threads that depend on timeouts should either wait until SI_SUB_KICK_SCHEDULER to start or they should use DELAY() until timers are available. Until APs are started earlier this should be a no-op as other kthreads shouldn't get a chance to start running until after timers are working regardless of when they were created. Reviewed by: kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D5724	2016-03-31 18:10:29 +00:00
Conrad Meyer	c975a5d359	Add td_swinvoltick to track last involuntary context switch Expose in DDB via "show thread." Reviewed by: markj Sponsored by: EMC / Isilon Storage Division	2016-03-25 19:35:29 +00:00
Konstantin Belousov	69baeadc31	Remove several write-only variables, all reported by the gcc 4.9 buildkernel run. Some of them were write-only under some kernel options, e.g. variables keeping values only used by CTR() macros. It costs nothing to the code readability and correctness to eliminate the warnings in those cases too by removing the local cached values used only for single-access. Review: https://reviews.freebsd.org/D2665 Reviewed by: rodrigc Looked at by: bjk Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-05-29 13:24:17 +00:00
Jung-uk Kim	fd90e2ed54	CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten years for head. However, it is continuously misused as the mpsafe argument for callout_init(9). Deprecate the flag and clean up callout_init() calls to make them more consistent. Differential Revision: https://reviews.freebsd.org/D2613 Reviewed by: jhb MFC after: 2 weeks	2015-05-22 17:05:21 +00:00
John Baldwin	ed95805e90	Remove support for Xen PV domU kernels. Support for HVM domU kernels remains. Xen is planning to phase out support for PV upstream since it is harder to maintain and has more overhead. Modern x86 CPUs include virtualization extensions that support HVM guests instead of PV guests. In addition, the PV code was i386 only and not as well maintained recently as the HVM code. - Remove the i386-only NATIVE option that was used to disable certain components for PV kernels. These components are now standard as they are on amd64. - Remove !XENHVM bits from PV drivers. - Remove various shims required for XEN (e.g. PT_UPDATES_FLUSH, LOAD_CR3, etc.) - Remove duplicate copy of <xen/features.h>. - Remove unused, i386-only xenstored.h. Differential Revision: https://reviews.freebsd.org/D2362 Reviewed by: royger Tested by: royger (i386/amd64 HVM domU and amd64 PVH dom0) Relnotes: yes	2015-04-30 15:48:48 +00:00
Mark Johnston	38563f7c92	Remove unimplemented sched provider probes. They were added for compatibility with the sched provider in Solaris and illumos, but our sched provider is already incompatible since it uses native types, so there isn't much point in keeping them around. Differential Revision: https://reviews.freebsd.org/D2167 Reviewed by: rpaulo	2015-04-18 20:36:58 +00:00

1 2 3 4 5 ...

399 Commits