numam-spdk/module
Shuhei Matsumoto ae4e54fdc3 bdev/nvme: Retry reconnecting ctrlr after seconds if reset failed
Previously reconnect retry was not controlled and was repeated indefinitely.

This patch adds two options, ctrlr_loss_timeout_sec and reconnect_delay_sec,
to nvme_ctrlr and add reset_start_tsc, reconnect_is_delayed, and
reconnect_delay_timer to nvme_ctrlr to control reconnect retry.

Both of ctrlr_loss_timeout_sec and reconnect_delay_sec are initialized to
zero. This means reconnect is not throttled as we did before this patch.

A few more changes are added.

Change nvme_io_path_is_failed() to return false if reset is throttled
even if nvme_ctrlr is reseting or is to be reconnected.

spdk_nvme_ctrlr_reconnect_poll_async() may continue returning -EAGAIN
infinitely. To check out such exceptional case, use ctrlr_loss_timeout_sec.

Not only ctrlr reset but also non-multipath ctrlr failover is controlled.
So we need to include path failover into ctrlr reconnect.

When the active path is removed and switched to one of the alternative paths,
if ctrlr reconnect is scheduled, connecting to the alternative path is left
to the scheduled reconnect.

If reset or reconnect ctrlr is failed and the retry is scheduled,
switch the active path to one of alternative paths.

Restore unit test cases removed in the previous patches.

Change-Id: Idec636c4eced39eb47ff4ef6fde72d6fd9fe4f85
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10128
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Monica Kenguva <monica.kenguva@intel.com>
2022-01-17 14:25:15 +00:00
..
accel idxd: Add support for vectored crc32 + copy 2022-01-12 08:20:39 +00:00
bdev bdev/nvme: Retry reconnecting ctrlr after seconds if reset failed 2022-01-17 14:25:15 +00:00
blob blob: use uint64_t for unmap and write_zeroes lba count 2021-10-14 08:17:16 +00:00
blobfs blobfs: check return value of strdup in blobfs_fuse_start() 2021-06-16 08:53:21 +00:00
env_dpdk so_ver: increase all major versions 2021-02-05 14:43:47 +00:00
event nvmf: remove accept poller from generic layer 2021-12-14 13:18:33 +00:00
scheduler gscheduler: use current tsc for decision. 2021-12-31 09:21:27 +00:00
sock sock: Fix SPDK_ZEROCOPY do not work for IPV6 2021-11-30 09:09:03 +00:00
Makefile scheduler: create public API and subsystem for scheduler/governor 2021-09-07 07:33:03 +00:00