freebsd-dev

Author	SHA1	Message	Date
Mateusz Guzik	c69a1a50cd	Don't take Giant for SMP status and cpu topology sysctls. Not only this lock doesn't play any role here, dirtying it slows down other things a little bit as giant-held checks (e.g. DROP_GIANT) are spread all over the kernel. MFC after: 1 week	2017-10-18 22:00:44 +00:00
Andriy Gapon	afa0a46cfd	move thread switch tracing from mi_switch to sched_switch This is done so that the thread state changes during the switch are not confused with the thread state changes reported when the thread spins on a lock. Here is an example, three consecutive entries for the same thread (from top to bottom): KTRGRAPH group:"thread", id:"zio_write_intr_3 tid 100260", state:"sleep", attributes: prio:84, wmesg:"-", lockname:"(null)" KTRGRAPH group:"thread", id:"zio_write_intr_3 tid 100260", state:"spinning", attributes: lockname:"sched lock 1" KTRGRAPH group:"thread", id:"zio_write_intr_3 tid 100260", state:"running", attributes: none The above trace could leave an impression that the final state of the thread was "running". After this change the sleep state will be reported after the "spinning" and "running" states reported for the sched lock. Reviewed by: jhb, markj MFC after: 1 week Sponsored by: Panzura Differential Revision: https://reviews.freebsd.org/D9961	2017-03-23 08:57:04 +00:00
Andriy Gapon	28ef18b8c1	trace thread running state when a thread is run for the first time This applies to both KTR_SCHED and DTrace sched:::on-cpu tracing. MFC after: 10 days	2017-03-11 15:57:36 +00:00
Mark Johnston	7813302434	Fix a ticks comparison in sched_pctcpu_update(). We may fail to reset the %CPU tracking window if a thread does not run for over half of the ticks rollover period, resulting in a bogus %CPU value for the thread until ticks fully rolls over. Handle this by comparing the unsigned difference ticks - ts_ltick with SCHED_TICK_TARG instead. Reviewed by: cem, jeff MFC after: 1 week Sponsored by: Dell EMC Isilon	2017-03-03 20:57:40 +00:00
Ryan Stone	27ee18ad33	Revert r313814 and r313816 Something evidently got mangled in my git tree in between testing and review, as an old and broken version of the patch was apparently submitted to svn. Revert this while I work out what went wrong. Reported by: tuexen Pointy hat to: rstone	2017-02-16 21:18:31 +00:00
Ryan Stone	3600f4ba35	Fix a typo in my previous commit Somehow in the late stages of testing my sched_ule patch, a character was accidentally deleted from the file. Correct this. While I'm committing anyway, the previous commit message requires some clarification: in the normal case of unlending priority after releasing a mutex, the thread that was doing the lending will be woken up and immediately become the highest-priority thread, and in that case no priority inversion would take place. However, if that thread is pinned to a different CPU, then the currently running thread that just had its priority lowered will not be preempted and then priority inversion can occur. Reported by: O. Hartmann (typo), jhb (scheduler clarification) MFC after: 1 month Pointy hat to: rstone	2017-02-16 20:06:21 +00:00
Ryan Stone	09ae7c4814	Check for preemption after lowering a thread's priority When a high-priority thread is waiting for a mutex held by a low-priority thread, it temporarily lends its priority to the low-priority thread to prevent priority inversion. When the mutex is released, the lent priority is revoked and the low-priority thread goes back to its original priority. When the priority of that thread is lowered (through a call to sched_priority()), the schedule was not checking whether there is now a high-priority thread in the run queue. This can cause threads with real-time priority to be starved in the run queue while the low-priority thread finishes its quantum. Fix this by explicitly checking whether preemption is necessary when a thread's priority is lowered. Sponsored by: Dell EMC Isilon Obtained from: Sandvine Inc Differential Revision: https://reviews.freebsd.org/D9518 Reviewed by: Jeff Roberson (ule) MFC after: 1 month	2017-02-16 19:41:13 +00:00
Andriy Gapon	ad9dadc437	fix a thread preemption regression in schedulers introduced in r270423 Commit r270423 fixed a regression in sched_yield() that was introduced in earlier changes. Unfortunately, at the same time it introduced an new regression. The problem is that SWT_RELINQUISH (6), like all other SWT_* constants and unlike SW_* flags, is not a bit flag. So, (flags & SWT_RELINQUISH) is true in cases where that was not really indended, for example, with SWT_OWEPREEMPT (2) and SWT_REMOTEPREEMPT (11). A straight forward fix would be to use (flags & SW_TYPE_MASK) == SWT_RELINQUISH, but my impression is that the switch types are designed mostly for gathering statistics, not for influencing scheduling decisions. So, I decided that it would be better to check for SW_PREEMPT flag instead. That's also the same flag that was checked before r239157. I double-checked how that flag is used and I am confident that the flag is set only in the places where we really have the preemption: - critical_exit + td_owepreempt - sched_preempt in the ULE scheduler - sched_preempt in the 4BSD scheduler Reviewed by: kib, mav MFC after: 4 days Sponsored by: Panzura Differential Revision: https://reviews.freebsd.org/D9230	2017-01-19 18:46:41 +00:00
Conrad Meyer	db4fcadf52	"Buses" is the preferred plural of "bus" Replace archaic "busses" with modern form "buses." Intentionally excluded: * Old/random drivers I didn't recognize * Old hardware in general * Use of "busses" in code as identifiers No functional change. http://grammarist.com/spelling/buses-busses/ PR: 216099 Reported by: bltsrc at mail.ru Sponsored by: Dell EMC Isilon	2017-01-15 17:54:01 +00:00
Konstantin Belousov	93ccd6bf87	Get rid of struct proc p_sched and struct thread td_sched pointers. p_sched is unused. The struct td_sched is always co-allocated with the struct thread, except for the thread0. Avoid useless indirection, instead calculate td_sched location using simple pointer arithmetic in td_get_sched(9). For thread0, which is statically allocated, create a structure to emulate layout of the dynamic allocation. Reviewed by: jhb (previous version) Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D6711	2016-06-05 17:04:03 +00:00
Konstantin Belousov	ccd0ec4066	The struct thread td_estcpu member is only used by the 4BSD scheduler. Move it to the struct td_sched for 4BSD, removing always present field, otherwise unused for ULE. New scheduler method sched_estcpu() returns the estimation for kinfo_proc consumption. As before, it always returns 0 for ULE. Remove sched_tick() scheduler method, unused both by 4BSD and ULE. Update locking comment for the 4BSD struct td_sched, copying it from the same comment for ULE. Spell MAXPRI as PRI_MAX_TIMESHARE in the 4BSD comment. Based on some notes from, and reviewed by: bde Sponsored by: The FreeBSD Foundation	2016-04-17 11:04:27 +00:00
George V. Neville-Neil	57031f7912	Summary: Add the interactivity equations to the header comment for our interactivity calculation routine. Suggested by: rwatson	2015-08-26 16:36:41 +00:00
John Baldwin	92de34df2c	kgdb uses td_oncpu to determine if a thread is running and should use a pcb from stoppcbs[] rather than the thread's PCB. However, exited threads retained td_oncpu from the last time they ran, and newborn threads had their CPU fields cleared to zero during fork and thread creation since they are in the set of fields zeroed when threads are setup. To fix, explicitly update the CPU fields for exiting threads in sched_throw() to reflect the switch out and reset the CPU fields for new threads in sched_fork_thread() to NOCPU. Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D3193	2015-08-03 20:43:36 +00:00
Konstantin Belousov	e8677f3885	Change the mb() use in the sched_ult tdq_notify() and sched_idletd() to more C11-ish atomic_thread_fence_seq_cst(). Note that on PowerPC, which currently uses lwsync for mb(), the change actually fixes the missed store/load barrier, intended by r271604 []. Reviewed by: alc Noted by: alc [] Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2015-07-10 08:54:12 +00:00
Pedro F. Giffuni	9129dd59be	Relocate sched_random() within the SMP section. Place sched_random nearer to where it's first used: moving the code nearer to where it is used makes the code easier to read and we can reduce the initial "#ifdef SMP" island. Reword a little the comment and clean some whitespaces while here.	2015-07-07 15:22:29 +00:00
Ian Lepore	b97fa22cd6	Use sbuf_new_for_sysctl() instead of plain sbuf_new() to ensure sysctl string returned to userland is nulterminated. PR: 195668	2015-03-14 18:42:30 +00:00
Warner Losh	5837276ce2	Put back Andy's void for gcc happiness. Submitted by: jchandra@	2015-02-27 23:14:08 +00:00
Warner Losh	b250ad3499	Make sched_random() return an unsigned number, and use uint32_t consistently. This also matches the per-cpu pointer declaration anyway. This changes the tweak we give to the load from -32..31 to be 0..31 which seems more inline with the rest of the code (- rnd and the -= 64). It should also provide the randomness we need, and may fix a signedness bug in the old code (it isn't clear that the effect was intentional as opposed to sloppy, and the right shift of a signed value is undefined to boot). This stores sched_balance() behavior when it used random(). Differential Revision: https://reviews.freebsd.org/D1981	2015-02-27 21:15:12 +00:00
Andrew Turner	ccc41f3e66	Fix sched_ule on sparc64, gcc complains sched_random is not a correct prototype. Sponsored by: The FreeBSD Foundation	2015-02-27 15:05:20 +00:00
Andrew Turner	09d0653552	sched_random is only called for SMP, only define it there. Sponsored by: The FreeBSD Foundation	2015-02-27 12:38:24 +00:00
Warner Losh	0567b6cc16	Create sched_rand() and move the LCG code into that. Call this when we need randomness in ULE. This removes random() call from the rebalance interval code. Submitted by: Harrison Grundy Differential Revision: https://reviews.freebsd.org/D1968	2015-02-27 02:56:58 +00:00
Adrian Chadd	e77f9fed15	Update the ULE scheduler + thread and kinfo structs to use int for cpuid rather than u_char. To try and play nice with the ABI, the u_char CPU ID values are clamped at 254. The new fields now contain the full CPU ID, or -1 for no cpu. Differential Revision: D955 Reviewed by: jhb, kib Sponsored by: Norse Corp, Inc.	2014-10-18 19:36:11 +00:00
Alexander Motin	ae9e9b4fda	Reprase r271616 comments. Submitted by: alc MFC after: 1 month	2014-09-17 17:43:32 +00:00
Alexander Motin	7965496958	Add comments describing r271604 change. MFC after: 3 days	2014-09-15 11:17:36 +00:00
Alexander Motin	7e9b58eaaa	Add couple memory barries to serialize tdq_cpu_idle and tdq_load accesses. This change fixes transient performance drops in some of my benchmarks, vanishing as soon as I am trying to collect any stats from the scheduler. It looks like reordered access to those variables sometimes caused loss of IPI_PREEMPT, that delayed thread execution until some later interrupt. MFC after: 3 days	2014-09-14 22:13:19 +00:00
Alexander Motin	2e7d7bb294	Restore pre-r239157 handling of sched_yield(), when thread time slice was aborted, allowing other threads to run. Without this change thread is just rescheduled again, that was illustrated by provided test tool. PR: 192926 Submitted by: eric@vangyzen.net MFC after: 2 weeks	2014-08-23 17:31:56 +00:00
Konstantin Belousov	2499a5ccef	Micro-manage clang to get the expected inlining for cpu_search(). Mark cpu_search_lowest/cpu_search_highest/cpu_search_both as noinline, while cpu_search() gets always_inline. With the attributes set, cpu_search() is inlined in wrappers, and if()s with constant conditionals are optimized. On some tests on many-core machine, the hwpmc reported samples for cpu_search*() are reduced from 25% total to 9%. Submitted by: "Rang, Anton" <anton.rang@isilon.com> MFC after: 1 week	2014-07-03 11:06:27 +00:00
Konstantin Belousov	a288c757d4	Remove write-only local variable. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-06-08 10:56:25 +00:00
Attilio Rao	c149e542a5	Fix GENERIC build.	2014-03-19 00:38:27 +00:00
Jeff Roberson	8bc713f6c5	- Make runq_steal_from more aggressive. Previously it would examine only a single priority queue. If that queue had a thread or threads which could not be migrated we would fail to steal load. This could cause starvation in situations where cores are idle. Submitted by: Doug Kilpatrick <dkilpatrick@isilon.com> Tested by: pho Reviewed by: mav Sponsored by: EMC / Isilon Storage Division	2014-03-08 00:35:06 +00:00
Nathan Whitehorn	a8a9b1c250	ULE works on Book-E since r258002, so remove statements to the contrary.	2014-02-01 20:46:35 +00:00
Dimitry Andric	3371b88c7b	In sys/kern/sched_ule.c, remove static function sched_both(), which is unused since r232207. MFC after: 3 days	2013-12-25 16:25:54 +00:00
John Baldwin	5457fa234b	Fix an off-by-one error in r228960. The maximum priority delta provided by SCHED_PRI_TICKS should be SCHED_PRI_RANGE - 1 so that the resulting priority value (before nice adjustment) is between SCHED_PRI_MIN and SCHED_PRI_MAX, inclusive. Submitted by: kib Reported by: pho MFC after: 1 week	2013-12-03 14:50:12 +00:00
Andriy Gapon	d9fae5ab88	dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE In its stead use the Solaris / illumos approach of emulating '-' (dash) in probe names with '__' (two consecutive underscores). Reviewed by: markj MFC after: 3 weeks	2013-11-26 08:46:27 +00:00
Attilio Rao	54366c0bd7	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip	2013-11-25 07:38:45 +00:00
Alexander Motin	58909b74b9	Micro-optimize cpu_search(), allowing compiler to use more efficient inline ffsl() implementation, when it is available, instead of homegrown iteration. On dual-E5645 amd64 system (2x6x2 cores) under heavy I/O load that reduces time spent inside cpu_search() from 19% to 13%, while IOPS increased by 5%.	2013-09-07 15:16:30 +00:00
George V. Neville-Neil	8f2ba63493	Point args[0] not at the thread that is ending but at the one that is starting. This is in line with practice in OpenSolaris. Note that this change is only in ULE and not in the 4BSD scheduler. Once this change settles in (MFC timeout has expired) we'll try it out on 4BSD as well. PR: 177706 Submitted by: Tiwei Bie MFC after: 1 month	2013-04-15 17:21:02 +00:00
Alexander Motin	2fd4047f32	Fix bug in r242852 that prevented CPU from becoming idle if kernel built without SMP support.	2012-11-15 14:10:51 +00:00
Alexander Motin	2c27cb3a34	Several optimizations to sched_idletd(): - Do not try to steal load from other CPUs if there was no contest switches on this CPU (i.e. it was idle all the time and woke up just for bus mastering or TLB shutdown). If current CPU was idle, then it is quite unlikely that some other CPU has load to steal. Under high I/O rate, when TLB shutdowns cause numerous CPU wakeups, on 24-CPU system load stealing code may consume up to 25% of all CPU time without giving any benefits. - Change code that implements spinning for load to restart spin in case of context switch. Previous code periodically called cpu_idle() even under high interrupt/context switch rate. - Rise spinning threshold to 10KHz, where it gives at least some effect that may worth consumed power. Reviewed by: jeff@	2012-11-10 07:02:57 +00:00
Jeff Roberson	5e5c387373	- Change ULE to use dynamic slice sizes for the timeshare queue in order to further reduce latency for threads in this queue. This should help as threads transition from realtime to timeshare. The latency is bound to a max of sched_slice until we have more than sched_slice / 6 threads runnable. Then the min slice is allotted to all threads and latency becomes (nthreads - 1) * min_slice. Discussed with: mav	2012-11-08 01:46:47 +00:00
Attilio Rao	4ceaf45de5	Rework the known mutexes to benefit about staying on their own cache line in order to avoid manual frobbing but using struct mtx_padalign. The sole exception being nvme and sxfge drivers, where the author redefined CACHE_LINE_SIZE manually, so they need to be analyzed and dealt with separately. Reviwed by: jimharris, alc	2012-10-31 18:07:18 +00:00
Attilio Rao	a049aa05c9	tdq_lock_pair() already does spinlock_enter() so migration is not possible in sched_balance_pair(). Remove redundant sched_pin(). Reviewed by: marius, jeff	2012-10-30 12:25:52 +00:00
Jim Harris	39f819e2fc	Pad tdq_lock to avoid false sharing with tdq_load and tdq_cpu_idle. This enables CPU searches (which read tdq_load) to operate independently of any contention on the spinlock. Some scheduler-intensive workloads running on an 8C single-socket SNB Xeon show considerable improvement with this change (2-3% perf improvement, 5-6% decrease in CPU util). Sponsored by: Intel Reviewed by: jeff	2012-10-24 18:36:41 +00:00
Eitan Adler	db702c59cf	remove duplicate semicolons where possible. Approved by: cperciva MFC after: 1 week	2012-10-22 03:00:37 +00:00
Andriy Gapon	e87fc7cf7b	sched_ule: fix inverted condition in reporting of priority lending via ktr Reviewed by: kan MFC after: 1 week	2012-09-14 19:55:28 +00:00
John Baldwin	ba96d2d816	Mark the idle threads as non-sleepable and also assert that an idle thread never blocks on a turnstile.	2012-08-22 20:01:38 +00:00
Alexander Motin	37f4e0254f	Some more minor tunings inspired by bde@.	2012-08-11 20:24:39 +00:00
Alexander Motin	bf89d544d0	Allow idle threads to steal second threads from other cores on systems with 8 or more cores to improve utilization. None of my tests on 2xXeon (2x6x2) system shown any slowdown from mentioned "excess thrashing". Same time in pbzip2 test with number of threads more then number of CPUs I see up to 10% speedup with SMT disabled and up 5% with SMT enabled. Thinking about trashing I was trying to limit that stealing within same last level cache, but got only worse results. Present code any way prefers to steal threads from topologically closer cores. Sponsored by: iXsystems, Inc.	2012-08-11 15:08:19 +00:00
Alexander Motin	579895df01	Some minor tunings/cleanups inspired by bde@ after previous commits: - remove extra dynamic variable initializations; - restore (4BSD) and implement (ULE) hogticks variable setting; - make sched_rr_interval() more tolerant to options; - restore (4BSD) and implement (ULE) kern.sched.quantum sysctl, a more user-friendly wrapper for sched_slice; - tune some sysctl descriptions; - make some style fixes.	2012-08-10 19:02:49 +00:00
Alexander Motin	3d7f41175d	Rework r220198 change (by fabient). I believe it solves the problem from the wrong direction. Before it, if preemption and end of time slice happen same time, thread was put to the head of the queue as for only preemption. It could cause single thread to run for indefinitely long time. r220198 handles it by not clearing TDF_NEEDRESCHED in case of preemption. But that causes delayed context switch every time preemption happens, even when not needed. Solve problem by introducing scheduler-specifoc thread flag TDF_SLICEEND, set when thread's time slice is over and it should be put to the tail of queue. Using SW_PREEMPT flag for that purpose as it was before just not enough informative to work correctly. On my tests this by 2-3 times reduces run time deviation (improves fairness) in cases when several threads share one CPU. Reviewed by: fabient MFC after: 2 months Sponsored by: iXsystems, Inc.	2012-08-09 19:26:13 +00:00

1 2 3 4 5 ...

355 Commits