Commit Graph

111 Commits

Author SHA1 Message Date
Vladimir Kondratyev
b6f87b78b5 LinuxKPI: Implement kthread_worker related functions
Kthread worker is a single thread workqueue which can be used in cases
where specific kthread association is necessary, for example, when it
should have RT priority or be assigned to certain cgroup.

This change implements Linux v4.9 interface which mostly hides kthread
internals from users thus allowing to use ordinary taskqueue(9) KPI.
As kthread worker prohibits enqueueing of already pending or canceling
tasks some minimal changes to taskqueue(9) were done.
taskqueue_enqueue_flags() was added to taskqueue KPI which accepts extra
flags parameter. It contains one or more of the following flags:

TASKQUEUE_FAIL_IF_PENDING - taskqueue_enqueue_flags() fails if the task
    is already scheduled to execution. EEXIST is returned and the
    ta_pending counter value remains unchanged.
TASKQUEUE_FAIL_IF_CANCELING - taskqueue_enqueue_flags() fails if the
    task is in the canceling state and ECANCELED is returned.

Required by:	drm-kmod 5.10

MFC after:	1 week
Reviewed by:	hselasky, Pau Amma (docs)
Differential Revision:	https://reviews.freebsd.org/D35051
2022-05-17 15:10:20 +03:00
Alexander Motin
4730a8972b callout(9): Allow spin locks use with callout_init_mtx().
Implement lock_spin()/unlock_spin() lock class methods, moving the
assertion to _sleep() instead.  Change assertions in callout(9) to
allow spin locks for both regular and C_DIRECT_EXEC cases. In case of
C_DIRECT_EXEC callouts spin locks are the only locks allowed actually.

As the first use case allow taskqueue_enqueue_timeout() use on fast
task queues.  It actually becomes more efficient due to avoided extra
context switches in callout(9) thanks to C_DIRECT_EXEC.

MFC after:	2 weeks
Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D31778
2021-09-02 21:16:46 -04:00
Alexander Motin
706b1a5724 Align taskqueue_enqueue_timeout() to hardclock.
It is done for all other KPIs using HZ, but was missed here.

MFC after:	2 weeks
2021-08-31 23:50:35 -04:00
Gleb Smirnoff
4426b2e64b Add flag to struct task to mark the task as requiring network epoch.
When processing a taskqueue and a task has associated epoch, then
enter for duration of the task.  If consecutive tasks belong to the
same epoch, batch them.  Now we are talking about the network epoch
only.

Shrink the ta_priority size to 8-bits.  No current consumers use
a priority that won't fit into 8 bits.  Also complexity of
taskqueue_enqueue() is a square of maximum value of priority, so
we unlikely ever want to go over UCHAR_MAX here.

Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D23518
2020-02-11 18:48:07 +00:00
Jeff Roberson
61a74c5ccd schedlock 1/4
Eliminate recursion from most thread_lock consumers.  Return from
sched_add() without the thread_lock held.  This eliminates unnecessary
atomics and lock word loads as well as reducing the hold time for
scheduler locks.  This will eventually allow for lockless remote adds.

Discussed with:	kib
Reviewed by:	jhb
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D22626
2019-12-15 21:11:15 +00:00
Alexander Motin
3db35ffa2a Some more taskqueue optimizations.
- Optimize enqueue for two task priority values by adding new tq_hint
field, pointing to the last task inserted into the middle of the list.
In case of more then two priority values it should halve average search.
 - Move tq_active insert/remove out of the taskqueue_run_locked loop.
Instead of dirtying few shared cache lines per task introduce different
mechanism to drain active tasks, based on task sequence number counter,
that uses only cache lines already present in cache.  Since the new
mechanism does not need ordering, switch tq_active from TAILQ to LIST.
 - Move static and dynamic struct taskqueue fields into different cache
lines.  Move lock into its own cache line, so that heavy lock spinning
by multiple waiting threads would not affect the running thread.
 - While there, correct some TQ_SLEEP() wait messages.

This change fixes certain ZFS write workloads, causing huge congestion
on taskqueue lock.  Those workloads combine some large block writes to
saturate the pool and trigger allocation throttling, which uses higher
priority tasks to requeue the delayed I/Os, with many small blocks to
generate deep queue of small tasks for taskqueue to sort.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2019-11-01 22:49:44 +00:00
Andriy Gapon
5fdc2c044e provide a way to assign taskqueue threads to a kernel process
This can be used to group all threads belonging to a single logical
entity under a common kernel process.
I am planning to use the new interface for ZFS threads.

MFC after:	4 weeks
2019-10-17 06:32:34 +00:00
Alexander Motin
f91aa773be Add wakeup_any(), cheaper wakeup_one() for taskqueue(9).
wakeup_one() and underlying sleepq_signal() spend additional time trying
to be fair, waking thread with highest priority, sleeping longest time.
But in case of taskqueue there are many absolutely identical threads, and
any fairness between them is quite pointless.  It makes even worse, since
round-robin wakeups not only make previous CPU affinity in scheduler quite
useless, but also hide from user chance to see CPU bottlenecks, when
sequential workload with one request at a time looks evenly distributed
between multiple threads.

This change adds new SLEEPQ_UNFAIR flag to sleepq_signal(), making it wakeup
thread that went to sleep last, but no longer in context switch (to avoid
immediate spinning on the thread lock).  On top of that new wakeup_any()
function is added, equivalent to wakeup_one(), but setting the flag.
On top of that taskqueue(9) is switchied to wakeup_any() to wakeup its
threads.

As result, on 72-core Xeon v4 machine sequential ZFS write to 12 ZVOLs
with 16KB block size spend 34% less time in wakeup_any() and descendants
then it was spending in wakeup_one(), and total write throughput increased
by ~10% with the same as before CPU usage.

Reviewed by:	markj, mmacy
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D20669
2019-06-20 01:15:33 +00:00
Mark Johnston
bb58b5d670 Add a taskqueue_quiesce(9) KPI.
This is similar to taskqueue_drain_all(9) but will wait for the queue
to become idle before returning instead of only waiting for
already-enqueued tasks to finish.  This will be used in the opensolaris
compat layer.

PR:		227784
Reviewed by:	cem
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17975
2018-11-21 17:18:27 +00:00
Pedro F. Giffuni
ac2fffa4b7 Revert r327828, r327949, r327953, r328016-r328026, r328041:
Uses of mallocarray(9).

The use of mallocarray(9) has rocketed the required swap to build FreeBSD.
This is likely caused by the allocation size attributes which put extra pressure
on the compiler.

Given that most of these checks are superfluous we have to choose better
where to use mallocarray(9). We still have more uses of mallocarray(9) but
hopefully this is enough to bring swap usage to a reasonable level.

Reported by:	wosch
PR:		225197
2018-01-21 15:42:36 +00:00
Pedro F. Giffuni
a18a2290cd kern: make some use of mallocarray(9).
Focus on code where we are doing multiplications within malloc(9). None of
these ire likely to overflow, however the change is still useful as some
static checkers can benefit from the allocation attributes we use for
mallocarray.

This initial sweep only covers malloc(9) calls with M_NOWAIT. No good
reason but I started doing the changes before r327796 and at that time it
was convenient to make sure the sorrounding code could handle NULL values.

X-Differential revision: https://reviews.freebsd.org/D13837
2018-01-15 21:18:04 +00:00
Pedro F. Giffuni
8a36da99de sys/kern: adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:20:12 +00:00
Ian Lepore
f37b7fc2d4 Add taskqueue_enqueue_timeout_sbt(), because sometimes you want more control
over the scheduling precision than 'ticks' can offer, and because sometimes
you're already working with sbintime_t units and it's dumb to convert them
to ticks just so they can get converted back to sbintime_t under the hood.
2017-07-31 00:54:50 +00:00
Hans Petter Selasky
403f4a31ab Implement taskqueue_poll_is_busy() for use by the LinuxKPI.
Refer to comment above function for a detailed description.

Discussed with:		kib @
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-03-02 12:20:23 +00:00
Hans Petter Selasky
99eca1b2b3 While draining a timeout task prevent the taskqueue_enqueue_timeout()
function from restarting the timer.

Commonly taskqueue_enqueue_timeout() is called from within the task
function itself without any checks for teardown. Then it can happen
the timer stays active after the return of taskqueue_drain_timeout(),
because the timeout and task is drained separately.

This patch factors out the teardown flag into the timeout task itself,
allowing existing code to stay as-is instead of applying a teardown
flag to each and every of the timeout task consumers.

Add assert to taskqueue_drain_timeout() which prevents parallel
execution on the same timeout task.

Update manual page documenting the return value of
taskqueue_enqueue_timeout().

Differential Revision:	https://reviews.freebsd.org/D8012
Reviewed by:	kib, trasz
MFC after:	1 week
2016-09-29 10:38:20 +00:00
Patrick Kelsey
da2ded6575 _taskqueue_start_threads() now fails if it doesn't actually start any threads.
Reviewed by:	jhb
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D7701
2016-09-01 02:05:46 +00:00
Stephen Hurd
23ac9029f9 Update iflib to support more NIC designs
- Move group task queue into kern/subr_gtaskqueue.c
- Change intr_enable to return an int so it can be detected if it's not
  implemented
- Allow different TX/RX queues per set to be different sizes
- Don't split up TX mbufs before transmit
- Allow a completion queue for TX as well as RX
- Pass the RX budget to isc_rxd_available() to allow an earlier return
  and avoid multiple calls

Submitted by:	shurd
Reviewed by:	gallatin
Approved by:	scottl
Differential Revision:	https://reviews.freebsd.org/D7393
2016-08-12 21:29:44 +00:00
Nathan Whitehorn
96c85efb4b Replace a number of conflations of mp_ncpus and mp_maxid with either
mp_maxid or CPU_FOREACH() as appropriate. This fixes a number of places in
the kernel that assumed CPU IDs are dense in [0, mp_ncpus) and would try,
for example, to run tasks on CPUs that did not exist or to allocate too
few buffers on systems with sparse CPU IDs in which there are holes in the
range and mp_maxid > mp_ncpus. Such circumstances generally occur on
systems with SMT, but on which SMT is disabled. This patch restores system
operation at least on POWER8 systems configured in this way.

There are a number of other places in the kernel with potential problems
in these situations, but where sparse CPU IDs are not currently known
to occur, mostly in the ARM machine-dependent code. These will be fixed
in a follow-up commit after the stable/11 branch.

PR:		kern/210106
Reviewed by:	jhb
Approved by:	re (glebius)
2016-07-06 14:09:49 +00:00
Mateusz Guzik
f3c8e16ea5 taskqueue: plug a leak in _taskqueue_create
While here make some style fixes and postpone the sprintf so that it is
only done when the function can no longer fail.

CID:	1356041
2016-06-02 15:52:34 +00:00
Andriy Gapon
7107bed0f0 fix loss of taskqueue wakeups (introduced in r300113)
Submitted by:	kmacy
Tested by:	dchagin
2016-05-21 14:51:49 +00:00
Scott Long
7e52504fc2 Adjust the creation of tq_name so it can be freed correctly
Reviewed by:	jhb, allanjude
Differential Revision:	D6454
2016-05-19 17:14:24 +00:00
Scott Long
4c7070db25 Import the 'iflib' API library for network drivers. From the author:
"iflib is a library to eliminate the need for frequently duplicated device
independent logic propagated (poorly) across many network drivers."

Participation is purely optional.  The IFLIB kernel config option is
provided for drivers that want to transition between legacy and iflib
modes of operation.  ixl and ixgbe driver conversions will be committed
shortly.  We hope to see participation from the Broadcom and maybe
Chelsio drivers in the near future.

Submitted by:   mmacy@nextbsd.org
Reviewed by:    gallatin
Differential Revision:  D5211
2016-05-18 04:35:58 +00:00
John Baldwin
cbc4d2db75 Remove taskqueue_enqueue_fast().
taskqueue_enqueue() was changed to support both fast and non-fast
taskqueues 10 years ago in r154167.  It has been a compat shim ever
since.  It's time for the compat shim to go.

Submitted by:	Howard Su <howard0su@gmail.com>
Reviewed by:	sephe
Differential Revision:	https://reviews.freebsd.org/D5131
2016-03-01 17:47:32 +00:00
Randall Stewart
7c4676ddee This fixes several places where callout_stops return is examined. The
new return codes of -1 were mistakenly being considered "true". Callout_stop
now returns -1 to indicate the callout had either already completed or
was not running and 0 to indicate it could not be stopped.  Also update
the manual page to make it more consistent no non-zero in the callout_stop
or callout_reset descriptions.

MFC after:	1 Month with associated callout change.
2015-11-13 22:51:35 +00:00
Xin LI
eb3d0c5d8c MFuser/delphij/zfs-arc-rebase@r281754:
In r256613, taskqueue_enqueue_locked() have been modified to release the
task queue lock before returning.  In r276665, taskqueue_drain_all() will
call taskqueue_enqueue_locked() to insert the barrier task into the queue,
but did not reacquire the lock after it but later code expects the lock
still being held (e.g. TQ_SLEEP()).

The barrier task is special and if we release then reacquire the lock,
there would be a small race window where a high priority task could sneak
into the queue.  Looking more closely, the race seems to be tolerable but
is undesirable from semantics standpoint.

To solve this, in taskqueue_drain_tq_queue(), instead of directly calling
taskqueue_enqueue_locked(), insert the barrier task directly without
releasing the lock.
2015-05-26 01:40:33 +00:00
Adrian Chadd
75493a82e0 Remove taskqueue_start_threads_pinned(); there's noa generic cpuset version of this.
Sponsored by:	Norse Corp, Inc.
2015-02-25 21:59:03 +00:00
Adrian Chadd
bfa102cae1 Implement taskqueue_start_threads_cpuset().
This is a more generic version of taskqueue_start_threads_pinned()
which only supports a single cpuid.

This originally came from John Baldwin <jhb@> who implemented it
as part of a push towards NUMA awareness in drivers.  I started implementing
something similar for RSS and NUMA, then found he already did it.

I'd like to axe taskqueue_start_threads_pinned() so it doesn't become
part of a longer-term API.  (Read: hps@ wants to MFC things, and
if I don't do this soon, he'll MFC what's here. :-)

I have a follow-up commit which converts the intel drivers over
to using the cpuset version of this function, so we can eventually
nuke the the pinned version.

Tested:

* igb, ixgbe

Obtained from:	jhbbsd
2015-02-17 02:35:06 +00:00
Justin T. Gibbs
5b326a32ce Prevent live-lock and access of destroyed data in taskqueue_drain_all().
Phabric:	https://reviews.freebsd.org/D1247
Reviewed by:	jhb, avg
Sponsored by:	Spectra Logic Corporation

sys/kern_subr_taskqueue.c:
	Modify taskqueue_drain_all() processing to use a temporary
	"barrier task", rather than rely on a user task that may
	be destroyed during taskqueue_drain_all()'s execution.  The
	barrier task is queued behind all previously queued tasks
	and then has its priority elevated so that future tasks
	cannot pass it in the queue.

	Use a similar barrier scheme to drain threads processing
	current tasks.  This requires taskqueue_run_locked() to
	insert and remove the taskqueue_busy object for the running
	thread for every task processed.

share/man/man9/taskqueue.9:
	Remove warning about live-lock issues with taskqueue_drain_all()
	and indicate that it does not wait for tasks queued after
	it begins processing.
2015-01-04 19:55:44 +00:00
Justin T. Gibbs
2c6bf3d90b Remove trailing whitespace. 2014-11-30 19:32:00 +00:00
Andrey V. Elsukov
3e40097976 Temporary revert r269661, it looks like the patch isn't complete. 2014-08-07 14:32:28 +00:00
Andrey V. Elsukov
c60e497af9 Use cpuset_setithread() to apply cpu mask to taskq threads.
Sponsored by:	Yandex LLC
2014-08-07 10:23:50 +00:00
Adrian Chadd
924aaf69ff Pin the right thread.
This _was_ right, a last minute suggestion and not enough testing makes
Adrian a bad boy.

Tested:

* igb(4) with RSS patches, by hand verifying each igb(4) taskqueue
  tid from procstat -ka using cpuset -g -t <tid>.
2014-06-01 04:11:05 +00:00
Adrian Chadd
5a6f0eee47 Add a new taskqueue setup method that takes a cpuid to pin the
taskqueue worker thread(s) to.

For now it isn't a taskqueue/taskthread error to fail to pin
to the given cpuid.

Thanks to rpaulo@, kib@ and jhb@ for feedback.

Tested:

* igb(4), with local RSS patches to pin taskqueues.

TODO:

* ask the doc team for help in documenting the new API call.
* add a taskqueue_start_threads_cpuset() method which takes
  a cpuset_t - but this may require a bunch of surgery to
  bring cpuset_t into scope.
2014-05-24 20:37:15 +00:00
Andriy Gapon
73f82099ea add taskqueue_drain_all
This API has semantics similar to that of taskqueue_drain but acts on
all tasks that might be queued or running on a taskqueue.
A caller must ensure that no new tasks are being enqueued otherwise this
call would be totally meaningless.  For example, if the tasks are
enqueued by an interrupt filter then its interrupt must be disabled.

MFC after:	10 days
2013-11-28 18:56:34 +00:00
Andriy Gapon
4c47024ce0 taskqueue_cancel: garbage collect a write-only variable
MFC after:	3 days
2013-11-19 18:45:29 +00:00
Alexander Motin
18093155a3 Add comments that taskqueue_enqueue_locked() returns without the lock. 2013-10-21 21:16:50 +00:00
Gleb Smirnoff
a9355dbe82 Revert r256587.
Requested by:	zec
2013-10-18 11:26:40 +00:00
Alexander Motin
6d545f4c8d MFprojects/camlock r254763:
Move tq_enqueue() call out of the queue lock for known handlers (actually
I have found no others in the base system).  This reduces queue lock hold
time and congestion spinning under active multithreaded enqueuing.
2013-10-16 09:52:59 +00:00
Alexander Motin
1d1e92f102 MFprojects/camlock r254685:
Remove TQ_FLAGS_PENDING flag, softly duplicating queue emptiness status.
2013-10-16 09:48:23 +00:00
Gleb Smirnoff
348298b1a3 For VIMAGE kernels store vnet in the struct task, and set vnet context
during task processing.

Reported & tested by:	mm
2013-10-16 05:02:01 +00:00
Alexander Motin
596d33e923 MFprojects/camlock r254460:
Remove locking from taskqueue_member().  The list of threads is static
during the taskqueue life cycle, so there is no need to protect it,
taking quite congested lock several more times for each ZFS I/O.
2013-08-24 14:41:49 +00:00
Will Andrews
fdbc71742b Extend taskqueue(9) to enable per-taskqueue callbacks.
The scope of these callbacks is primarily to support actions that affect the
taskqueue's thread environments.  They are entirely optional, and
consequently are introduced as a new API: taskqueue_set_callback().

This interface allows the caller to specify that a taskqueue requires a
callback and optional context pointer for a given callback type.

The callback types included in this commit can be used to register a
constructor and destructor for thread-local storage using osd(9).  This
allows a particular taskqueue to define that its threads require a specific
type of TLS, without the need for a specially-orchestrated task-based
mechanism for startup and shutdown in order to accomplish it.

Two callback types are supported at this point:

- TASKQUEUE_CALLBACK_TYPE_INIT, called by every thread when it starts, prior
  to processing any tasks.
- TASKQUEUE_CALLBACK_TYPE_SHUTDOWN, called by every thread when it exits,
  after it has processed its last task but before the taskqueue is
  reclaimed.

While I'm here:

- Add two new macros, TQ_ASSERT_LOCKED and TQ_ASSERT_UNLOCKED, and use them
  in appropriate locations.
- Fix taskqueue.9 to mention taskqueue_start_threads(), which is a required
  interface for all consumers of taskqueue(9).

Reviewed by:	kib (all), eadler (taskqueue.9), brd (taskqueue.9)
Approved by:	ken (mentor)
Sponsored by:	Spectra Logic
MFC after:	1 month
2013-03-23 15:11:53 +00:00
Konstantin Belousov
b7c8d2f2f5 Add a special meaning to the negative ticks argument for
taskqueue_enqueue_timeout().  Do not rearm the callout if it is
already armed and the ticks is negative.  Otherwise rearm it to fire
in abs(ticks) ticks in the future.

The intended use is to call taskqueue_enqueue_timeout() for the given
timeout_task with the same negative ticks argument.  As result, the
task is scheduled to execute not further than abs(ticks) ticks in
future, and the consequent enqueues are coalesced until the already
scheduled task is finished.

Reviewed by:	rwatson
Tested by:	Markus Gebert <markus.gebert@hostpoint.ch>
MFC after:	2 weeks
2012-11-20 15:33:48 +00:00
John Baldwin
10f0ab3933 Shorten the name of the fast SWI taskqueue to "fast taskq" so that
it fits.

Reported by:	lev
MFC after:	1 week
2012-08-28 13:35:37 +00:00
Adrian Chadd
d2849f27bc Ensure that ta_pending doesn't overflow u_short by capping its value at USHRT_MAX.
If it overflows before the taskqueue can run, the task will be
re-added to the taskqueue and cause a loop in the task list.

Reported by:	Arnaud Lacombe <lacombar@gmail.com>
Submitted by:	Ryan Stone <rysto32@gmail.com>
Reviewed by:	jhb
Approved by:	re (kib)
MFC after:	1 day
2011-09-15 08:42:06 +00:00
Konstantin Belousov
b2ad91f26b Implement the delayed task execution extension to the taskqueue
mechanism. The caller may specify a timeout in ticks after which the
task will be scheduled.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	jeff, jhb
MFC after:	1 month
2011-04-26 11:39:56 +00:00
Andriy Gapon
706b0d31bb taskqueue: drop unused tq_name field
tq_name was used write-only and besides it was just a pointer, so it
could point to some garbage in a temporary buffer that's gone.
This change shouldn't change KPI/KBI as struct taskqueue is private to
subr_taskqueue.c.
If we find a need for tq_name it can be resurrected at any moment.
taskqueue_create() interface is preserved for this purpose.

Suggested by:	jhb
MFC after:	10 days
2010-11-23 14:30:22 +00:00
Juli Mallett
b79b28b69d Use macros rather than inline functions to lock and unlock mutexes, so that
line number information is preserved in witness.

Reviewed by:	jhb
2010-11-08 22:12:25 +00:00
Matthew D Fleming
f46276a9b0 Add a taskqueue_cancel(9) to cancel a pending task without waiting for
it to run as taskqueue_drain(9) does.

Requested by:	hselasky
Original code:	jeff
Reviewed by:	jhb
MFC after:	2 weeks
2010-11-08 20:56:31 +00:00
Matthew D Fleming
bf73d4d28e Use a safer mechanism for determining if a task is currently running,
that does not rely on the lifetime of pointers being the same. This also
restores the task KBI.

Suggested by:	jhb
MFC after:	1 month
2010-10-13 22:59:04 +00:00