freebsd kernel with SKQ
Go to file
Alexander Motin e7245cfc4c 9079 race condition in starting and ending condesing thread for indirect vdevs
illumos/illumos-gate@667ec66f1b

The timeline of the race condition is the following:
[1] Thread A is about to finish condesing the first vdev in spa_condense_indirect_thread(),
so it calls the spa_condense_indirect_complete_sync() sync task which sets the
spa_condensing_indirect field to NULL. Waiting for the sync task to finish, thread A
sleeps until the txg is done. When this happens, thread A will acquire spa_async_lock
and set spa_condense_thread to NULL.
[2] While thread A waits for the txg to finish, thread B which is running spa_sync() checks
whether it should condense the second vdev in vdev_indirect_should_condense() by checking
the spa_condensing_indirect field which was set to NULL by spa_condense_indirect_thread()
from thread A. So it goes on and tries to spawn a new condensing thread in
spa_condense_indirect_start_sync() and the aforementioned assertions fails because thread A
has not set spa_condense_thread to NULL (which is basically the last thing it does before
returning).

The main issue here is that we rely on both spa_condensing_indirect and spa_condense_thread to
signify whether a condensing thread is running. Ideally we would only use one throughout the
codebase. In addition, for managing spa_condense_thread we currently use spa_async_lock which
basically tights condensing to scrubing when it comes to pausing and resuming those actions
during spa export.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
2018-02-22 03:22:27 +00:00
cmd 9075 Improve ZFS pool import/load process and corrupted pool recovery 2018-02-22 02:21:03 +00:00
common 7614 zfs device evacuation/removal 2018-02-18 01:21:52 +00:00
head 5066 remove support for non-ANSI compilation 2014-08-20 06:29:42 +00:00
lib 9075 Improve ZFS pool import/load process and corrupted pool recovery 2018-02-22 02:21:03 +00:00
man 7614 zfs device evacuation/removal 2018-02-18 01:21:52 +00:00
tools/ctf 5589 improper use of NULL in tools/ctf 2015-03-09 20:43:14 +00:00
uts 9079 race condition in starting and ending condesing thread for indirect vdevs 2018-02-22 03:22:27 +00:00
OPENSOLARIS.LICENSE Update vendor/opensolaris to last OpenSolaris state (13149:b23a4dab3d50) 2012-07-18 07:48:04 +00:00