1996-04-03 07:41:27 +00:00
|
|
|
.\"
|
|
|
|
.\" Copyright (c) 1996 Joerg Wunsch
|
|
|
|
.\"
|
|
|
|
.\" All rights reserved.
|
|
|
|
.\"
|
|
|
|
.\" Redistribution and use in source and binary forms, with or without
|
|
|
|
.\" modification, are permitted provided that the following conditions
|
|
|
|
.\" are met:
|
|
|
|
.\" 1. Redistributions of source code must retain the above copyright
|
|
|
|
.\" notice, this list of conditions and the following disclaimer.
|
|
|
|
.\" 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
.\" notice, this list of conditions and the following disclaimer in the
|
|
|
|
.\" documentation and/or other materials provided with the distribution.
|
|
|
|
.\"
|
|
|
|
.\" THIS SOFTWARE IS PROVIDED BY THE DEVELOPERS ``AS IS'' AND ANY EXPRESS OR
|
|
|
|
.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
|
|
|
.\" OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
|
|
|
|
.\" IN NO EVENT SHALL THE DEVELOPERS BE LIABLE FOR ANY DIRECT, INDIRECT,
|
|
|
|
.\" INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
|
|
|
|
.\" NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
|
|
|
.\" DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
|
|
|
.\" THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
|
|
|
.\" (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
|
|
|
|
.\" THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
|
|
|
.\"
|
1999-08-28 00:22:10 +00:00
|
|
|
.\" $FreeBSD$
|
2003-02-24 22:53:26 +00:00
|
|
|
.\"
|
Add wakeup_any(), cheaper wakeup_one() for taskqueue(9).
wakeup_one() and underlying sleepq_signal() spend additional time trying
to be fair, waking thread with highest priority, sleeping longest time.
But in case of taskqueue there are many absolutely identical threads, and
any fairness between them is quite pointless. It makes even worse, since
round-robin wakeups not only make previous CPU affinity in scheduler quite
useless, but also hide from user chance to see CPU bottlenecks, when
sequential workload with one request at a time looks evenly distributed
between multiple threads.
This change adds new SLEEPQ_UNFAIR flag to sleepq_signal(), making it wakeup
thread that went to sleep last, but no longer in context switch (to avoid
immediate spinning on the thread lock). On top of that new wakeup_any()
function is added, equivalent to wakeup_one(), but setting the flag.
On top of that taskqueue(9) is switchied to wakeup_any() to wakeup its
threads.
As result, on 72-core Xeon v4 machine sequential ZFS write to 12 ZVOLs
with 16KB block size spend 34% less time in wakeup_any() and descendants
then it was spending in wakeup_one(), and total write throughput increased
by ~10% with the same as before CPU usage.
Reviewed by: markj, mmacy
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D20669
2019-06-20 01:15:33 +00:00
|
|
|
.Dd June 19, 2019
|
1996-04-03 07:41:27 +00:00
|
|
|
.Dt SLEEP 9
|
2010-04-14 19:08:06 +00:00
|
|
|
.Os
|
1996-04-03 07:41:27 +00:00
|
|
|
.Sh NAME
|
2000-09-11 00:52:31 +00:00
|
|
|
.Nm msleep ,
|
2013-03-04 19:10:39 +00:00
|
|
|
.Nm msleep_sbt ,
|
2006-01-03 17:00:38 +00:00
|
|
|
.Nm msleep_spin ,
|
2013-03-04 19:10:39 +00:00
|
|
|
.Nm msleep_spin_sbt ,
|
2007-02-23 16:22:09 +00:00
|
|
|
.Nm pause ,
|
2018-03-03 23:08:49 +00:00
|
|
|
.Nm pause_sig ,
|
2013-03-04 19:10:39 +00:00
|
|
|
.Nm pause_sbt ,
|
1996-04-05 21:08:40 +00:00
|
|
|
.Nm tsleep ,
|
2013-03-04 19:10:39 +00:00
|
|
|
.Nm tsleep_sbt ,
|
Add wakeup_any(), cheaper wakeup_one() for taskqueue(9).
wakeup_one() and underlying sleepq_signal() spend additional time trying
to be fair, waking thread with highest priority, sleeping longest time.
But in case of taskqueue there are many absolutely identical threads, and
any fairness between them is quite pointless. It makes even worse, since
round-robin wakeups not only make previous CPU affinity in scheduler quite
useless, but also hide from user chance to see CPU bottlenecks, when
sequential workload with one request at a time looks evenly distributed
between multiple threads.
This change adds new SLEEPQ_UNFAIR flag to sleepq_signal(), making it wakeup
thread that went to sleep last, but no longer in context switch (to avoid
immediate spinning on the thread lock). On top of that new wakeup_any()
function is added, equivalent to wakeup_one(), but setting the flag.
On top of that taskqueue(9) is switchied to wakeup_any() to wakeup its
threads.
As result, on 72-core Xeon v4 machine sequential ZFS write to 12 ZVOLs
with 16KB block size spend 34% less time in wakeup_any() and descendants
then it was spending in wakeup_one(), and total write throughput increased
by ~10% with the same as before CPU usage.
Reviewed by: markj, mmacy
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D20669
2019-06-20 01:15:33 +00:00
|
|
|
.Nm wakeup ,
|
|
|
|
.Nm wakeup_one ,
|
|
|
|
.Nm wakeup_any
|
1996-04-03 07:41:27 +00:00
|
|
|
.Nd wait for events
|
|
|
|
.Sh SYNOPSIS
|
2001-10-01 16:09:29 +00:00
|
|
|
.In sys/param.h
|
|
|
|
.In sys/systm.h
|
|
|
|
.In sys/proc.h
|
1996-04-03 07:41:27 +00:00
|
|
|
.Ft int
|
2019-12-24 16:19:33 +00:00
|
|
|
.Fn msleep "const void *chan" "struct mtx *mtx" "int priority" "const char *wmesg" "int timo"
|
2006-01-03 17:00:38 +00:00
|
|
|
.Ft int
|
2019-12-24 16:19:33 +00:00
|
|
|
.Fn msleep_sbt "const void *chan" "struct mtx *mtx" "int priority" \
|
2013-03-04 19:10:39 +00:00
|
|
|
"const char *wmesg" "sbintime_t sbt" "sbintime_t pr" "int flags"
|
|
|
|
.Ft int
|
2019-12-24 16:19:33 +00:00
|
|
|
.Fn msleep_spin "const void *chan" "struct mtx *mtx" "const char *wmesg" "int timo"
|
2013-03-04 19:10:39 +00:00
|
|
|
.Ft int
|
2019-12-24 16:19:33 +00:00
|
|
|
.Fn msleep_spin_sbt "const void *chan" "struct mtx *mtx" "const char *wmesg" \
|
2013-03-04 19:10:39 +00:00
|
|
|
"sbintime_t sbt" "sbintime_t pr" "int flags"
|
2018-03-03 23:08:49 +00:00
|
|
|
.Ft int
|
2007-02-23 16:22:09 +00:00
|
|
|
.Fn pause "const char *wmesg" "int timo"
|
2018-03-03 23:08:49 +00:00
|
|
|
.Ft int
|
|
|
|
.Fn pause_sig "const char *wmesg" "int timo"
|
|
|
|
.Ft int
|
2013-03-04 19:10:39 +00:00
|
|
|
.Fn pause_sbt "const char *wmesg" "sbintime_t sbt" "sbintime_t pr" \
|
|
|
|
"int flags"
|
2007-02-23 16:22:09 +00:00
|
|
|
.Ft int
|
2019-12-24 16:19:33 +00:00
|
|
|
.Fn tsleep "const void *chan" "int priority" "const char *wmesg" "int timo"
|
2013-03-04 19:10:39 +00:00
|
|
|
.Ft int
|
2019-12-24 16:19:33 +00:00
|
|
|
.Fn tsleep_sbt "const void *chan" "int priority" "const char *wmesg" \
|
2013-03-04 19:10:39 +00:00
|
|
|
"sbintime_t sbt" "sbintime_t pr" "int flags"
|
2007-02-23 16:22:09 +00:00
|
|
|
.Ft void
|
2019-12-24 16:19:33 +00:00
|
|
|
.Fn wakeup "const void *chan"
|
1997-04-09 05:39:32 +00:00
|
|
|
.Ft void
|
2019-12-24 16:19:33 +00:00
|
|
|
.Fn wakeup_one "const void *chan"
|
Add wakeup_any(), cheaper wakeup_one() for taskqueue(9).
wakeup_one() and underlying sleepq_signal() spend additional time trying
to be fair, waking thread with highest priority, sleeping longest time.
But in case of taskqueue there are many absolutely identical threads, and
any fairness between them is quite pointless. It makes even worse, since
round-robin wakeups not only make previous CPU affinity in scheduler quite
useless, but also hide from user chance to see CPU bottlenecks, when
sequential workload with one request at a time looks evenly distributed
between multiple threads.
This change adds new SLEEPQ_UNFAIR flag to sleepq_signal(), making it wakeup
thread that went to sleep last, but no longer in context switch (to avoid
immediate spinning on the thread lock). On top of that new wakeup_any()
function is added, equivalent to wakeup_one(), but setting the flag.
On top of that taskqueue(9) is switchied to wakeup_any() to wakeup its
threads.
As result, on 72-core Xeon v4 machine sequential ZFS write to 12 ZVOLs
with 16KB block size spend 34% less time in wakeup_any() and descendants
then it was spending in wakeup_one(), and total write throughput increased
by ~10% with the same as before CPU usage.
Reviewed by: markj, mmacy
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D20669
2019-06-20 01:15:33 +00:00
|
|
|
.Ft void
|
2019-12-24 16:19:33 +00:00
|
|
|
.Fn wakeup_any "const void *chan"
|
1996-04-03 07:41:27 +00:00
|
|
|
.Sh DESCRIPTION
|
|
|
|
The functions
|
2006-04-17 19:11:12 +00:00
|
|
|
.Fn tsleep ,
|
|
|
|
.Fn msleep ,
|
|
|
|
.Fn msleep_spin ,
|
2007-02-23 16:22:09 +00:00
|
|
|
.Fn pause ,
|
2018-03-03 23:08:49 +00:00
|
|
|
.Fn pause_sig ,
|
|
|
|
.Fn pause_sbt ,
|
2006-04-17 19:11:12 +00:00
|
|
|
.Fn wakeup ,
|
Add wakeup_any(), cheaper wakeup_one() for taskqueue(9).
wakeup_one() and underlying sleepq_signal() spend additional time trying
to be fair, waking thread with highest priority, sleeping longest time.
But in case of taskqueue there are many absolutely identical threads, and
any fairness between them is quite pointless. It makes even worse, since
round-robin wakeups not only make previous CPU affinity in scheduler quite
useless, but also hide from user chance to see CPU bottlenecks, when
sequential workload with one request at a time looks evenly distributed
between multiple threads.
This change adds new SLEEPQ_UNFAIR flag to sleepq_signal(), making it wakeup
thread that went to sleep last, but no longer in context switch (to avoid
immediate spinning on the thread lock). On top of that new wakeup_any()
function is added, equivalent to wakeup_one(), but setting the flag.
On top of that taskqueue(9) is switchied to wakeup_any() to wakeup its
threads.
As result, on 72-core Xeon v4 machine sequential ZFS write to 12 ZVOLs
with 16KB block size spend 34% less time in wakeup_any() and descendants
then it was spending in wakeup_one(), and total write throughput increased
by ~10% with the same as before CPU usage.
Reviewed by: markj, mmacy
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D20669
2019-06-20 01:15:33 +00:00
|
|
|
.Fn wakeup_one ,
|
1996-04-03 07:41:27 +00:00
|
|
|
and
|
Add wakeup_any(), cheaper wakeup_one() for taskqueue(9).
wakeup_one() and underlying sleepq_signal() spend additional time trying
to be fair, waking thread with highest priority, sleeping longest time.
But in case of taskqueue there are many absolutely identical threads, and
any fairness between them is quite pointless. It makes even worse, since
round-robin wakeups not only make previous CPU affinity in scheduler quite
useless, but also hide from user chance to see CPU bottlenecks, when
sequential workload with one request at a time looks evenly distributed
between multiple threads.
This change adds new SLEEPQ_UNFAIR flag to sleepq_signal(), making it wakeup
thread that went to sleep last, but no longer in context switch (to avoid
immediate spinning on the thread lock). On top of that new wakeup_any()
function is added, equivalent to wakeup_one(), but setting the flag.
On top of that taskqueue(9) is switchied to wakeup_any() to wakeup its
threads.
As result, on 72-core Xeon v4 machine sequential ZFS write to 12 ZVOLs
with 16KB block size spend 34% less time in wakeup_any() and descendants
then it was spending in wakeup_one(), and total write throughput increased
by ~10% with the same as before CPU usage.
Reviewed by: markj, mmacy
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D20669
2019-06-20 01:15:33 +00:00
|
|
|
.Fn wakeup_any
|
2006-04-17 19:11:12 +00:00
|
|
|
handle event-based thread blocking.
|
|
|
|
If a thread must wait for an
|
|
|
|
external event, it is put to sleep by
|
|
|
|
.Fn tsleep ,
|
|
|
|
.Fn msleep ,
|
2007-02-23 16:22:09 +00:00
|
|
|
.Fn msleep_spin ,
|
2018-03-03 23:08:49 +00:00
|
|
|
.Fn pause ,
|
|
|
|
.Fn pause_sig ,
|
2006-04-17 19:11:12 +00:00
|
|
|
or
|
2018-03-03 23:08:49 +00:00
|
|
|
.Fn pause_sbt .
|
2007-03-09 22:41:01 +00:00
|
|
|
Threads may also wait using one of the locking primitive sleep routines
|
|
|
|
.Xr mtx_sleep 9 ,
|
|
|
|
.Xr rw_sleep 9 ,
|
|
|
|
or
|
|
|
|
.Xr sx_sleep 9 .
|
|
|
|
.Pp
|
1996-04-03 07:41:27 +00:00
|
|
|
The parameter
|
2006-01-03 17:00:38 +00:00
|
|
|
.Fa chan
|
1996-04-03 07:41:27 +00:00
|
|
|
is an arbitrary address that uniquely identifies the event on which
|
2007-02-23 16:22:09 +00:00
|
|
|
the thread is being put to sleep.
|
2006-04-17 19:11:12 +00:00
|
|
|
All threads sleeping on a single
|
2006-01-03 17:00:38 +00:00
|
|
|
.Fa chan
|
1996-04-03 07:41:27 +00:00
|
|
|
are woken up later by
|
2003-02-24 22:53:26 +00:00
|
|
|
.Fn wakeup ,
|
1996-04-03 07:41:27 +00:00
|
|
|
often called from inside an interrupt routine, to indicate that the
|
2006-04-17 19:11:12 +00:00
|
|
|
resource the thread was blocking on is available now.
|
1996-04-03 07:41:27 +00:00
|
|
|
.Pp
|
|
|
|
The parameter
|
2006-04-17 19:11:12 +00:00
|
|
|
.Fa priority
|
2007-03-09 22:41:01 +00:00
|
|
|
specifies a new priority for the thread as well as some optional flags.
|
|
|
|
If the new priority is not 0,
|
2006-04-17 19:11:12 +00:00
|
|
|
then the thread will be made
|
1996-04-03 07:41:27 +00:00
|
|
|
runnable with the specified
|
2006-04-17 19:11:12 +00:00
|
|
|
.Fa priority
|
|
|
|
when it resumes.
|
2012-03-29 05:02:12 +00:00
|
|
|
.Dv PZERO
|
2008-04-04 16:59:58 +00:00
|
|
|
should never be used, as it is for compatibility only.
|
|
|
|
A new priority of 0 means to use the thread's current priority when
|
|
|
|
it is made runnable again.
|
2009-12-12 22:08:37 +00:00
|
|
|
.Pp
|
2006-04-17 19:11:12 +00:00
|
|
|
If
|
2004-02-17 13:31:36 +00:00
|
|
|
.Fa priority
|
1996-04-03 07:41:27 +00:00
|
|
|
includes the
|
1996-04-05 23:23:25 +00:00
|
|
|
.Dv PCATCH
|
2013-01-05 00:23:26 +00:00
|
|
|
flag, pending signals are allowed to interrupt the sleep, otherwise
|
|
|
|
pending signals are ignored during the sleep.
|
2003-01-03 22:39:39 +00:00
|
|
|
If
|
1996-04-05 23:23:25 +00:00
|
|
|
.Dv PCATCH
|
2013-01-05 00:23:26 +00:00
|
|
|
is set and a signal becomes pending,
|
2000-11-22 16:11:48 +00:00
|
|
|
.Er ERESTART
|
1996-04-03 07:41:27 +00:00
|
|
|
is returned if the current system call should be restarted if
|
|
|
|
possible, and
|
2000-11-22 16:11:48 +00:00
|
|
|
.Er EINTR
|
1996-04-03 07:41:27 +00:00
|
|
|
is returned if the system call should be interrupted by the signal
|
2001-08-07 15:48:51 +00:00
|
|
|
(return
|
|
|
|
.Er EINTR ) .
|
1996-04-03 07:41:27 +00:00
|
|
|
.Pp
|
2007-03-09 22:41:01 +00:00
|
|
|
The parameter
|
|
|
|
.Fa wmesg
|
|
|
|
is a string describing the sleep condition for tools like
|
|
|
|
.Xr ps 1 .
|
|
|
|
Due to the limited space of those programs to display arbitrary strings,
|
|
|
|
this message should not be longer than 6 characters.
|
|
|
|
.Pp
|
|
|
|
The parameter
|
|
|
|
.Fa timo
|
|
|
|
specifies a timeout for the sleep.
|
|
|
|
If
|
|
|
|
.Fa timo
|
|
|
|
is not 0,
|
|
|
|
then the thread will sleep for at most
|
|
|
|
.Fa timo No / Va hz
|
|
|
|
seconds.
|
|
|
|
If the timeout expires,
|
|
|
|
then the sleep function will return
|
|
|
|
.Er EWOULDBLOCK .
|
|
|
|
.Pp
|
2013-03-04 19:10:39 +00:00
|
|
|
.Fn msleep_sbt ,
|
|
|
|
.Fn msleep_spin_sbt ,
|
|
|
|
.Fn pause_sbt
|
|
|
|
and
|
|
|
|
.Fn tsleep_sbt
|
|
|
|
functions take
|
|
|
|
.Fa sbt
|
|
|
|
parameter instead of
|
|
|
|
.Fa timo .
|
2013-08-17 01:17:51 +00:00
|
|
|
It allows the caller to specify relative or absolute wakeup time with higher resolution
|
2013-03-04 19:10:39 +00:00
|
|
|
in form of
|
|
|
|
.Vt sbintime_t .
|
|
|
|
The parameter
|
|
|
|
.Fa pr
|
2013-08-17 01:17:51 +00:00
|
|
|
allows the caller to specify wanted absolute event precision.
|
2013-03-04 19:10:39 +00:00
|
|
|
The parameter
|
|
|
|
.Fa flags
|
2013-08-17 01:17:51 +00:00
|
|
|
allows the caller to pass additional
|
2013-03-04 19:10:39 +00:00
|
|
|
.Fn callout_reset_sbt
|
|
|
|
flags.
|
|
|
|
.Pp
|
2007-03-09 22:41:01 +00:00
|
|
|
Several of the sleep functions including
|
|
|
|
.Fn msleep ,
|
|
|
|
.Fn msleep_spin ,
|
|
|
|
and the locking primitive sleep routines specify an additional lock
|
|
|
|
parameter.
|
|
|
|
The lock will be released before sleeping and reacquired
|
|
|
|
before the sleep routine returns.
|
|
|
|
If
|
|
|
|
.Fa priority
|
|
|
|
includes the
|
|
|
|
.Dv PDROP
|
|
|
|
flag, then
|
|
|
|
the lock will not be reacquired before returning.
|
|
|
|
The lock is used to ensure that a condition can be checked atomically,
|
|
|
|
and that the current thread can be suspended without missing a
|
|
|
|
change to the condition, or an associated wakeup.
|
|
|
|
In addition, all of the sleep routines will fully drop the
|
|
|
|
.Va Giant
|
|
|
|
mutex
|
|
|
|
(even if recursed)
|
|
|
|
while the thread is suspended and will reacquire the
|
|
|
|
.Va Giant
|
|
|
|
mutex before the function returns.
|
2008-08-07 21:00:13 +00:00
|
|
|
Note that the
|
|
|
|
.Va Giant
|
|
|
|
mutex may be specified as the lock to drop.
|
|
|
|
In that case, however, the
|
|
|
|
.Dv PDROP
|
|
|
|
flag is not allowed.
|
2007-03-09 22:41:01 +00:00
|
|
|
.Pp
|
|
|
|
To avoid lost wakeups,
|
|
|
|
either a lock should be used to protect against races,
|
|
|
|
or a timeout should be specified to place an upper bound on the delay due
|
|
|
|
to a lost wakeup.
|
|
|
|
As a result,
|
|
|
|
the
|
2006-04-17 19:11:12 +00:00
|
|
|
.Fn tsleep
|
2007-03-09 22:41:01 +00:00
|
|
|
function should only be invoked with a timeout of 0 when the
|
|
|
|
.Va Giant
|
|
|
|
mutex is held.
|
|
|
|
.Pp
|
|
|
|
The
|
2003-02-24 22:53:26 +00:00
|
|
|
.Fn msleep
|
2007-03-09 22:41:01 +00:00
|
|
|
function requires that
|
2003-05-31 14:07:25 +00:00
|
|
|
.Fa mtx
|
2007-03-09 22:41:01 +00:00
|
|
|
reference a default, i.e. non-spin, mutex.
|
2007-03-13 03:56:16 +00:00
|
|
|
Its use is deprecated in favor of
|
2007-03-09 22:41:01 +00:00
|
|
|
.Xr mtx_sleep 9
|
|
|
|
which provides identical behavior.
|
2006-01-03 17:00:38 +00:00
|
|
|
.Pp
|
|
|
|
The
|
|
|
|
.Fn msleep_spin
|
2007-03-09 22:41:01 +00:00
|
|
|
function requires that
|
2006-01-03 17:00:38 +00:00
|
|
|
.Fa mtx
|
2007-03-09 22:41:01 +00:00
|
|
|
reference a spin mutex.
|
|
|
|
The
|
|
|
|
.Fn msleep_spin
|
|
|
|
function does not accept a
|
2006-01-03 17:00:38 +00:00
|
|
|
.Fa priority
|
2007-03-09 22:41:01 +00:00
|
|
|
parameter and thus does not support changing the current thread's priority,
|
|
|
|
the
|
2006-01-03 17:00:38 +00:00
|
|
|
.Dv PDROP
|
2007-03-09 22:41:01 +00:00
|
|
|
flag,
|
|
|
|
or catching signals via the
|
2006-01-03 17:00:38 +00:00
|
|
|
.Dv PCATCH
|
2007-03-09 22:41:01 +00:00
|
|
|
flag.
|
2007-02-23 16:22:09 +00:00
|
|
|
.Pp
|
|
|
|
The
|
|
|
|
.Fn pause
|
|
|
|
function is a wrapper around
|
|
|
|
.Fn tsleep
|
|
|
|
that suspends execution of the current thread for the indicated timeout.
|
|
|
|
The thread can not be awakened early by signals or calls to
|
Add wakeup_any(), cheaper wakeup_one() for taskqueue(9).
wakeup_one() and underlying sleepq_signal() spend additional time trying
to be fair, waking thread with highest priority, sleeping longest time.
But in case of taskqueue there are many absolutely identical threads, and
any fairness between them is quite pointless. It makes even worse, since
round-robin wakeups not only make previous CPU affinity in scheduler quite
useless, but also hide from user chance to see CPU bottlenecks, when
sequential workload with one request at a time looks evenly distributed
between multiple threads.
This change adds new SLEEPQ_UNFAIR flag to sleepq_signal(), making it wakeup
thread that went to sleep last, but no longer in context switch (to avoid
immediate spinning on the thread lock). On top of that new wakeup_any()
function is added, equivalent to wakeup_one(), but setting the flag.
On top of that taskqueue(9) is switchied to wakeup_any() to wakeup its
threads.
As result, on 72-core Xeon v4 machine sequential ZFS write to 12 ZVOLs
with 16KB block size spend 34% less time in wakeup_any() and descendants
then it was spending in wakeup_one(), and total write throughput increased
by ~10% with the same as before CPU usage.
Reviewed by: markj, mmacy
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D20669
2019-06-20 01:15:33 +00:00
|
|
|
.Fn wakeup ,
|
|
|
|
.Fn wakeup_one
|
2007-02-23 16:22:09 +00:00
|
|
|
or
|
Add wakeup_any(), cheaper wakeup_one() for taskqueue(9).
wakeup_one() and underlying sleepq_signal() spend additional time trying
to be fair, waking thread with highest priority, sleeping longest time.
But in case of taskqueue there are many absolutely identical threads, and
any fairness between them is quite pointless. It makes even worse, since
round-robin wakeups not only make previous CPU affinity in scheduler quite
useless, but also hide from user chance to see CPU bottlenecks, when
sequential workload with one request at a time looks evenly distributed
between multiple threads.
This change adds new SLEEPQ_UNFAIR flag to sleepq_signal(), making it wakeup
thread that went to sleep last, but no longer in context switch (to avoid
immediate spinning on the thread lock). On top of that new wakeup_any()
function is added, equivalent to wakeup_one(), but setting the flag.
On top of that taskqueue(9) is switchied to wakeup_any() to wakeup its
threads.
As result, on 72-core Xeon v4 machine sequential ZFS write to 12 ZVOLs
with 16KB block size spend 34% less time in wakeup_any() and descendants
then it was spending in wakeup_one(), and total write throughput increased
by ~10% with the same as before CPU usage.
Reviewed by: markj, mmacy
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D20669
2019-06-20 01:15:33 +00:00
|
|
|
.Fn wakeup_any .
|
2018-03-03 23:08:49 +00:00
|
|
|
The
|
|
|
|
.Fn pause_sig
|
|
|
|
function is a variant of
|
|
|
|
.Fn pause
|
|
|
|
which can be awakened early by signals.
|
2007-03-04 23:42:58 +00:00
|
|
|
.Pp
|
|
|
|
The
|
|
|
|
.Fn wakeup_one
|
Add wakeup_any(), cheaper wakeup_one() for taskqueue(9).
wakeup_one() and underlying sleepq_signal() spend additional time trying
to be fair, waking thread with highest priority, sleeping longest time.
But in case of taskqueue there are many absolutely identical threads, and
any fairness between them is quite pointless. It makes even worse, since
round-robin wakeups not only make previous CPU affinity in scheduler quite
useless, but also hide from user chance to see CPU bottlenecks, when
sequential workload with one request at a time looks evenly distributed
between multiple threads.
This change adds new SLEEPQ_UNFAIR flag to sleepq_signal(), making it wakeup
thread that went to sleep last, but no longer in context switch (to avoid
immediate spinning on the thread lock). On top of that new wakeup_any()
function is added, equivalent to wakeup_one(), but setting the flag.
On top of that taskqueue(9) is switchied to wakeup_any() to wakeup its
threads.
As result, on 72-core Xeon v4 machine sequential ZFS write to 12 ZVOLs
with 16KB block size spend 34% less time in wakeup_any() and descendants
then it was spending in wakeup_one(), and total write throughput increased
by ~10% with the same as before CPU usage.
Reviewed by: markj, mmacy
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D20669
2019-06-20 01:15:33 +00:00
|
|
|
function makes the first highest priority thread in the queue that is
|
|
|
|
sleeping on the parameter
|
2007-03-04 23:42:58 +00:00
|
|
|
.Fa chan
|
|
|
|
runnable.
|
|
|
|
This reduces the load when a large number of threads are sleeping on
|
|
|
|
the same address, but only one of them can actually do any useful work
|
|
|
|
when made runnable.
|
|
|
|
.Pp
|
|
|
|
Due to the way it works, the
|
|
|
|
.Fn wakeup_one
|
|
|
|
function requires that only related threads sleep on a specific
|
|
|
|
.Fa chan
|
|
|
|
address.
|
|
|
|
It is the programmer's responsibility to choose a unique
|
|
|
|
.Fa chan
|
|
|
|
value.
|
2007-03-05 00:27:30 +00:00
|
|
|
The older
|
|
|
|
.Fn wakeup
|
2012-03-29 05:02:12 +00:00
|
|
|
function did not require this, though it was never good practice
|
2007-03-05 00:27:30 +00:00
|
|
|
for threads to share a
|
2007-03-04 23:42:58 +00:00
|
|
|
.Fa chan
|
|
|
|
value.
|
|
|
|
When converting from
|
|
|
|
.Fn wakeup
|
|
|
|
to
|
|
|
|
.Fn wakeup_one ,
|
2007-03-05 00:27:30 +00:00
|
|
|
pay particular attention to ensure that no other threads wait on the
|
|
|
|
same
|
2007-03-04 23:42:58 +00:00
|
|
|
.Fa chan .
|
2017-03-14 22:02:02 +00:00
|
|
|
.Pp
|
Add wakeup_any(), cheaper wakeup_one() for taskqueue(9).
wakeup_one() and underlying sleepq_signal() spend additional time trying
to be fair, waking thread with highest priority, sleeping longest time.
But in case of taskqueue there are many absolutely identical threads, and
any fairness between them is quite pointless. It makes even worse, since
round-robin wakeups not only make previous CPU affinity in scheduler quite
useless, but also hide from user chance to see CPU bottlenecks, when
sequential workload with one request at a time looks evenly distributed
between multiple threads.
This change adds new SLEEPQ_UNFAIR flag to sleepq_signal(), making it wakeup
thread that went to sleep last, but no longer in context switch (to avoid
immediate spinning on the thread lock). On top of that new wakeup_any()
function is added, equivalent to wakeup_one(), but setting the flag.
On top of that taskqueue(9) is switchied to wakeup_any() to wakeup its
threads.
As result, on 72-core Xeon v4 machine sequential ZFS write to 12 ZVOLs
with 16KB block size spend 34% less time in wakeup_any() and descendants
then it was spending in wakeup_one(), and total write throughput increased
by ~10% with the same as before CPU usage.
Reviewed by: markj, mmacy
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D20669
2019-06-20 01:15:33 +00:00
|
|
|
The
|
|
|
|
.Fn wakeup_any
|
|
|
|
function is similar to
|
|
|
|
.Fn wakeup_one ,
|
|
|
|
except that it makes runnable last thread on the queue (sleeping less),
|
|
|
|
ignoring fairness.
|
|
|
|
It can be used when threads sleeping on the
|
|
|
|
.Fa chan
|
|
|
|
are known to be identical and there is no reason to be fair.
|
|
|
|
.Pp
|
2017-03-14 22:02:02 +00:00
|
|
|
If the timeout given by
|
|
|
|
.Fa timo
|
|
|
|
or
|
|
|
|
.Fa sbt
|
|
|
|
is based on an absolute real-time clock value,
|
|
|
|
then the thread should copy the global
|
|
|
|
.Va rtc_generation
|
|
|
|
into its
|
|
|
|
.Va td_rtcgen
|
|
|
|
member before reading the RTC.
|
|
|
|
If the real-time clock is adjusted, these functions will set
|
|
|
|
.Va td_rtcgen
|
|
|
|
to zero and return zero.
|
|
|
|
The caller should reconsider its orientation with the new RTC value.
|
1996-04-03 07:41:27 +00:00
|
|
|
.Sh RETURN VALUES
|
2011-01-19 22:16:42 +00:00
|
|
|
When awakened by a call to
|
|
|
|
.Fn wakeup
|
|
|
|
or
|
|
|
|
.Fn wakeup_one ,
|
|
|
|
if a signal is pending and
|
|
|
|
.Dv PCATCH
|
|
|
|
is specified,
|
|
|
|
a non-zero error code is returned.
|
2007-03-09 22:41:01 +00:00
|
|
|
If the thread is awakened by a call to
|
|
|
|
.Fn wakeup
|
|
|
|
or
|
|
|
|
.Fn wakeup_one ,
|
|
|
|
the
|
|
|
|
.Fn msleep ,
|
|
|
|
.Fn msleep_spin ,
|
|
|
|
.Fn tsleep ,
|
|
|
|
and locking primitive sleep functions return 0.
|
2017-03-14 22:02:02 +00:00
|
|
|
Zero can also be returned when the real-time clock is adjusted;
|
|
|
|
see above regarding
|
|
|
|
.Va td_rtcgen .
|
2007-03-09 22:41:01 +00:00
|
|
|
Otherwise, a non-zero error code is returned.
|
|
|
|
.Sh ERRORS
|
|
|
|
.Fn msleep ,
|
|
|
|
.Fn msleep_spin ,
|
|
|
|
.Fn tsleep ,
|
|
|
|
and the locking primitive sleep functions will fail if:
|
|
|
|
.Bl -tag -width Er
|
|
|
|
.It Bq Er EINTR
|
|
|
|
The
|
|
|
|
.Dv PCATCH
|
|
|
|
flag was specified, a signal was caught, and the system call should be
|
|
|
|
interrupted.
|
|
|
|
.It Bq Er ERESTART
|
|
|
|
The
|
|
|
|
.Dv PCATCH
|
|
|
|
flag was specified, a signal was caught, and the system call should be
|
|
|
|
restarted.
|
|
|
|
.It Bq Er EWOULDBLOCK
|
|
|
|
A non-zero timeout was specified and the timeout expired.
|
|
|
|
.El
|
1996-04-03 07:41:27 +00:00
|
|
|
.Sh SEE ALSO
|
1998-12-23 00:24:59 +00:00
|
|
|
.Xr ps 1 ,
|
2007-03-30 18:07:26 +00:00
|
|
|
.Xr locking 9 ,
|
2000-08-15 15:14:13 +00:00
|
|
|
.Xr malloc 9 ,
|
2007-03-09 22:41:01 +00:00
|
|
|
.Xr mi_switch 9 ,
|
|
|
|
.Xr mtx_sleep 9 ,
|
|
|
|
.Xr rw_sleep 9 ,
|
2013-03-04 19:10:39 +00:00
|
|
|
.Xr sx_sleep 9 ,
|
|
|
|
.Xr timeout 9
|
1996-04-03 07:41:27 +00:00
|
|
|
.Sh HISTORY
|
2007-02-27 05:39:22 +00:00
|
|
|
The functions
|
|
|
|
.Fn sleep
|
|
|
|
and
|
|
|
|
.Fn wakeup
|
2007-02-27 16:21:01 +00:00
|
|
|
were present in
|
2007-02-27 05:39:22 +00:00
|
|
|
.At v1 .
|
|
|
|
They were probably also present in the preceding
|
2007-02-27 16:21:01 +00:00
|
|
|
PDP-7 version of
|
|
|
|
.Ux .
|
2007-02-27 05:39:22 +00:00
|
|
|
They were the basic process synchronization model.
|
1996-04-03 07:41:27 +00:00
|
|
|
.Pp
|
2003-01-03 22:37:10 +00:00
|
|
|
The
|
2003-02-24 22:53:26 +00:00
|
|
|
.Fn tsleep
|
2003-01-03 22:37:10 +00:00
|
|
|
function appeared in
|
2007-02-27 05:39:22 +00:00
|
|
|
.Bx 4.4
|
|
|
|
and added the parameters
|
|
|
|
.Fa wmesg
|
|
|
|
and
|
|
|
|
.Fa timo .
|
2006-01-03 17:00:38 +00:00
|
|
|
The
|
2007-02-27 23:09:31 +00:00
|
|
|
.Fn sleep
|
|
|
|
function was removed in
|
|
|
|
.Fx 2.2 .
|
|
|
|
The
|
2006-04-17 19:11:12 +00:00
|
|
|
.Fn wakeup_one
|
|
|
|
function appeared in
|
|
|
|
.Fx 2.2 .
|
|
|
|
The
|
2006-01-03 17:00:38 +00:00
|
|
|
.Fn msleep
|
|
|
|
function appeared in
|
|
|
|
.Fx 5.0 ,
|
|
|
|
and the
|
|
|
|
.Fn msleep_spin
|
|
|
|
function appeared in
|
2007-02-23 16:22:09 +00:00
|
|
|
.Fx 6.2 .
|
2007-02-27 16:21:01 +00:00
|
|
|
The
|
2007-02-23 16:22:09 +00:00
|
|
|
.Fn pause
|
|
|
|
function appeared in
|
2006-01-03 17:00:38 +00:00
|
|
|
.Fx 7.0 .
|
2018-03-03 23:08:49 +00:00
|
|
|
The
|
|
|
|
.Fn pause_sig
|
|
|
|
function appeared in
|
|
|
|
.Fx 12.0 .
|
1996-04-03 07:41:27 +00:00
|
|
|
.Sh AUTHORS
|
2000-11-22 09:35:58 +00:00
|
|
|
.An -nosplit
|
2005-06-28 20:15:19 +00:00
|
|
|
This manual page was written by
|
2014-06-26 21:44:30 +00:00
|
|
|
.An J\(:org Wunsch Aq Mt joerg@FreeBSD.org .
|