test/common: Always wait min 2s in waitforserial()

It seems like there's some race in the kernel when we try to
delete_controller (nvme disconnect) right after the new nvme subsystem
is connected. This results in a block subsystem left with lingering
nvme devices which are not usable and which start to affect the nvmf
suite. They also can't be removed either unless the kernel is rebooted.

To workaround it make sure that we wait long enough for all of the
subsystems to be in a sane state before we attempt to stress
the connect<->disconnect path.

Mitigates #2060.

Signed-off-by: Michal Berger <michalx.berger@intel.com>
Change-Id: I9299ecfc760e334504730aab6f19d338fad88081
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9059
Reviewed-by: Pawel Piatek <pawelx.piatek@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This commit is contained in:
Michal Berger 2021-08-04 09:12:37 +02:00 committed by Jim Harris
parent e1946cd799
commit 8bb27faff6

View File

@ -1033,18 +1033,17 @@ function waitforserial() {
nvme_device_counter=$2
fi
while [ $(lsblk -l -o NAME,SERIAL | grep -c $1) -lt $nvme_device_counter ]; do
[ $i -lt 15 ] || break
i=$((i + 1))
# Wait initially for min 2s to make sure all devices are ready for use. It seems
# that we may be racing with a kernel where in some cases immediate disconnect may
# leave dangling subsystem with no-op block devices which can't be used nor removed
# (unless kernel is rebooted) and which start to negatively affect all the tests.
sleep 2
while ((i++ <= 15)); do
(($(lsblk -l -o NAME,SERIAL | grep -c "$1") == nvme_device_counter)) && return 0
echo "Waiting for devices"
sleep 1
done
if [[ $(lsblk -l -o NAME,SERIAL | grep -c $1) -lt $nvme_device_counter ]]; then
return 1
fi
return 0
return 1
}
function waitforserial_disconnect() {