RAIDZ2/3 fails to heal silently corrupted parity w/2+ bad disks

When scrubbing, (non-sequential) resilvering, or correcting a checksum
error using RAIDZ parity, ZFS should heal any incorrect RAIDZ parity by
overwriting it.  For example, if P disks are silently corrupted (P being
the number of failures tolerated; e.g. RAIDZ2 has P=2), `zpool scrub`
should detect and heal all the bad state on these disks, including
parity.  This way if there is a subsequent failure we are fully
protected.

With RAIDZ2 or RAIDZ3, a block can have silent damage to a parity
sector, and also damage (silent or known) to a data sector.  In this
case the parity should be healed but it is not.

The problem can be noticed by scrubbing the pool twice.  Assuming there
was no damage concurrent with the scrubs, the first scrub should fix all
silent damage, and the second scrub should be "clean" (`zpool status`
should not report checksum errors on any disks).  If the bug is
encountered, then the second scrub will repair the silently-damaged
parity that the first scrub failed to repair, and these checksum errors
will be reported after the second scrub.  Since the first scrub repaired
all the damaged data, the bug can not be encountered during the second
scrub, so subsequent scrubs (more than two) are not necessary.

The root cause of the problem is some code that was inadvertently added
to `raidz_parity_verify()` by the DRAID changes.  The incorrect code
causes the parity healing to be aborted if there is damaged data
(`rc_error != 0`) or the data disk is not present (`!rc_tried`).  These
checks are not necessary, because we only call `raidz_parity_verify()`
if we have the correct data (which may have been reconstructed using
parity, and which was verified by the checksum).

This commit fixes the problem by removing the incorrect checks in
`raidz_parity_verify()`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11489 
Closes #11510
This commit is contained in:
Matthew Ahrens 2021-01-26 16:05:05 -08:00 committed by GitHub
parent d7265b3309
commit 62d4287f27
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 202 additions and 12 deletions

View File

@ -1903,16 +1903,6 @@ raidz_parity_verify(zio_t *zio, raidz_row_t *rr)
if (checksum == ZIO_CHECKSUM_NOPARITY)
return (ret);
/*
* All data columns must have been successfully read in order
* to use them to generate parity columns for comparison.
*/
for (c = rr->rr_firstdatacol; c < rr->rr_cols; c++) {
rc = &rr->rr_col[c];
if (!rc->rc_tried || rc->rc_error != 0)
return (ret);
}
for (c = 0; c < rr->rr_firstdatacol; c++) {
rc = &rr->rr_col[c];
if (!rc->rc_tried || rc->rc_error != 0)

View File

@ -727,8 +727,9 @@ tags = ['functional', 'raidz']
[tests/functional/redundancy]
tests = ['redundancy_draid1', 'redundancy_draid2', 'redundancy_draid3',
'redundancy_draid_spare1', 'redundancy_draid_spare2',
'redundancy_draid_spare3', 'redundancy_mirror', 'redundancy_raidz1',
'redundancy_raidz2', 'redundancy_raidz3', 'redundancy_stripe']
'redundancy_draid_spare3', 'redundancy_mirror', 'redundancy_raidz',
'redundancy_raidz1', 'redundancy_raidz2', 'redundancy_raidz3',
'redundancy_stripe']
tags = ['functional', 'redundancy']
[tests/functional/refquota]

View File

@ -9,6 +9,7 @@ dist_pkgdata_SCRIPTS = \
redundancy_draid_spare2.ksh \
redundancy_draid_spare3.ksh \
redundancy_mirror.ksh \
redundancy_raidz.ksh \
redundancy_raidz1.ksh \
redundancy_raidz2.ksh \
redundancy_raidz3.ksh \

View File

@ -0,0 +1,198 @@
#!/bin/ksh -p
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License (the "License").
# You may not use this file except in compliance with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or http://www.opensolaris.org/os/licensing.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
#
# Copyright (c) 2020 by vStack. All rights reserved.
# Copyright (c) 2021 by Delphix. All rights reserved.
#
. $STF_SUITE/include/libtest.shlib
. $STF_SUITE/tests/functional/redundancy/redundancy.kshlib
#
# DESCRIPTION:
# RAIDZ should provide redundancy
#
# STRATEGY:
# 1. Create block device files for the test raidz pool
# 2. For each parity value [1..3]
# - create raidz pool
# - fill it with some directories/files
# - verify resilver by replacing devices
# - verify scrub by zeroing devices
# - destroy the raidz pool
typeset -r devs=6
typeset -r dev_size_mb=512
typeset -a disks
prefetch_disable=$(get_tunable PREFETCH_DISABLE)
function cleanup
{
poolexists "$TESTPOOL" && destroy_pool "$TESTPOOL"
for i in {0..$devs}; do
rm -f "$TEST_BASE_DIR/dev-$i"
done
set_tunable32 PREFETCH_DISABLE $prefetch_disable
}
function test_resilver # <pool> <parity> <dir>
{
typeset pool=$1
typeset nparity=$2
typeset dir=$3
for (( i=0; i<$nparity; i=i+1 )); do
log_must zpool offline $pool $dir/dev-$i
done
log_must zpool export $pool
for (( i=0; i<$nparity; i=i+1 )); do
log_must zpool labelclear -f $dir/dev-$i
done
log_must zpool import -o cachefile=none -d $dir $pool
for (( i=0; i<$nparity; i=i+1 )); do
log_must zpool replace -fw $pool $dir/dev-$i
done
log_must check_pool_status $pool "errors" "No known data errors"
resilver_cksum=$(cksum_pool $pool)
if [[ $resilver_cksum != 0 ]]; then
log_must zpool status -v $pool
log_fail "resilver cksum errors: $resilver_cksum"
fi
log_must zpool clear $pool
for (( i=$nparity; i<$nparity*2; i=i+1 )); do
log_must zpool offline $pool $dir/dev-$i
done
log_must zpool export $pool
for (( i=$nparity; i<$nparity*2; i=i+1 )); do
log_must zpool labelclear -f $dir/dev-$i
done
log_must zpool import -o cachefile=none -d $dir $pool
for (( i=$nparity; i<$nparity*2; i=i+1 )); do
log_must zpool replace -fw $pool $dir/dev-$i
done
log_must check_pool_status $pool "errors" "No known data errors"
resilver_cksum=$(cksum_pool $pool)
if [[ $resilver_cksum != 0 ]]; then
log_must zpool status -v $pool
log_fail "resilver cksum errors: $resilver_cksum"
fi
log_must zpool clear $pool
}
function test_scrub # <pool> <parity> <dir>
{
typeset pool=$1
typeset nparity=$2
typeset dir=$3
typeset combrec=$4
log_must zpool export $pool
for (( i=0; i<$nparity; i=i+1 )); do
dd conv=notrunc if=/dev/zero of=$dir/dev-$i \
bs=1M seek=4 count=$(($dev_size_mb-4))
done
log_must zpool import -o cachefile=none -d $dir $pool
log_must zpool scrub -w $pool
log_must check_pool_status $pool "errors" "No known data errors"
log_must zpool clear $pool
log_must zpool export $pool
for (( i=$nparity; i<$nparity*2; i=i+1 )); do
dd conv=notrunc if=/dev/zero of=$dir/dev-$i \
bs=1M seek=4 count=$(($dev_size_mb-4))
done
log_must zpool import -o cachefile=none -d $dir $pool
log_must zpool scrub -w $pool
log_must check_pool_status $pool "errors" "No known data errors"
log_must zpool clear $pool
}
log_onexit cleanup
log_must set_tunable32 PREFETCH_DISABLE 1
# Disk files which will be used by pool
for i in {0..$(($devs - 1))}; do
device=$TEST_BASE_DIR/dev-$i
log_must truncate -s ${dev_size_mb}M $device
disks[${#disks[*]}+1]=$device
done
# Disk file which will be attached
log_must truncate -s 512M $TEST_BASE_DIR/dev-$devs
for nparity in 1 2 3; do
raid=raidz$nparity
dir=$TEST_BASE_DIR
log_must zpool create -f -o cachefile=none $TESTPOOL $raid ${disks[@]}
log_must zfs set primarycache=metadata $TESTPOOL
log_must zfs create $TESTPOOL/fs
log_must fill_fs /$TESTPOOL/fs 1 512 100 1024 R
log_must zfs create -o compress=on $TESTPOOL/fs2
log_must fill_fs /$TESTPOOL/fs2 1 512 100 1024 R
log_must zfs create -o compress=on -o recordsize=8k $TESTPOOL/fs3
log_must fill_fs /$TESTPOOL/fs3 1 512 100 1024 R
typeset pool_size=$(get_pool_prop size $TESTPOOL)
log_must zpool export $TESTPOOL
log_must zpool import -o cachefile=none -d $dir $TESTPOOL
log_must check_pool_status $TESTPOOL "errors" "No known data errors"
test_resilver $TESTPOOL $nparity $dir
test_scrub $TESTPOOL $nparity $dir
log_must zpool destroy "$TESTPOOL"
done
log_pass "raidz redundancy test succeeded."