numam-spdk/module/bdev
Shuhei Matsumoto ae4e54fdc3 bdev/nvme: Retry reconnecting ctrlr after seconds if reset failed
Previously reconnect retry was not controlled and was repeated indefinitely.

This patch adds two options, ctrlr_loss_timeout_sec and reconnect_delay_sec,
to nvme_ctrlr and add reset_start_tsc, reconnect_is_delayed, and
reconnect_delay_timer to nvme_ctrlr to control reconnect retry.

Both of ctrlr_loss_timeout_sec and reconnect_delay_sec are initialized to
zero. This means reconnect is not throttled as we did before this patch.

A few more changes are added.

Change nvme_io_path_is_failed() to return false if reset is throttled
even if nvme_ctrlr is reseting or is to be reconnected.

spdk_nvme_ctrlr_reconnect_poll_async() may continue returning -EAGAIN
infinitely. To check out such exceptional case, use ctrlr_loss_timeout_sec.

Not only ctrlr reset but also non-multipath ctrlr failover is controlled.
So we need to include path failover into ctrlr reconnect.

When the active path is removed and switched to one of the alternative paths,
if ctrlr reconnect is scheduled, connecting to the alternative path is left
to the scheduled reconnect.

If reset or reconnect ctrlr is failed and the retry is scheduled,
switch the active path to one of alternative paths.

Restore unit test cases removed in the previous patches.

Change-Id: Idec636c4eced39eb47ff4ef6fde72d6fd9fe4f85
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10128
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Monica Kenguva <monica.kenguva@intel.com>
2022-01-17 14:25:15 +00:00
..
aio bdev/aio: return void from bdev_aio_readv/writev 2021-09-02 07:42:31 +00:00
compress spelling: module 2021-11-30 09:05:32 +00:00
crypto spelling: module 2021-11-30 09:05:32 +00:00
delay bdev/delay: zero-copy support 2022-01-12 08:20:11 +00:00
error bdev/error: properly initialize value of num for inject_error RPC 2021-04-15 21:41:05 +00:00
ftl spelling: module 2021-11-30 09:05:32 +00:00
gpt spelling: module 2021-11-30 09:05:32 +00:00
iscsi bdev/iscsi: unregister conn poller when idle 2021-10-07 09:22:37 +00:00
lvol bdev/lvol: asserting lvol ptr before dereference 2021-08-24 07:18:54 +00:00
malloc bdev_malloc: exit early in case of no acceleration task 2022-01-14 08:35:32 +00:00
null lib/bdev: added spdk_bdev_module_fini_done() 2021-08-23 08:49:56 +00:00
nvme bdev/nvme: Retry reconnecting ctrlr after seconds if reset failed 2022-01-17 14:25:15 +00:00
ocf spelling: module 2021-11-30 09:05:32 +00:00
passthru bdev: Add API to get SPDK memory domains used by bdev 2021-08-20 07:26:10 +00:00
pmem lib/bdev: added spdk_bdev_module_fini_done() 2021-08-23 08:49:56 +00:00
raid spelling: module 2021-11-30 09:05:32 +00:00
rbd bdev/rbd: Support config_param and config_file simultaneously for rbd_register_cluster 2022-01-17 09:44:56 +00:00
split splite/vbdev_split: Free base part bdev on the error path. 2021-03-02 08:02:58 +00:00
uring so_ver: increase all major versions 2021-02-05 14:43:47 +00:00
virtio lib/bdev: added spdk_bdev_module_fini_done() 2021-08-23 08:49:56 +00:00
zone_block spelling: module 2021-11-30 09:05:32 +00:00
Makefile bdev: move bdev_rpc library contents 2020-09-25 11:43:42 +00:00