ut/nvme_pcie: Fix a few assert conditions which had used not == but =

A compiler got warning for these mistakes. Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Change-Id: Ie9772910b6a3cc9d6e45cfae1c19048179d16189 (cherry picked from commit 7641283387) Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/5527 Tested-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
make/dpdk: Correct compiler type detection
2020-12-17 15:41:22 +00:00 · 2020-12-17 15:41:22 +00:00 · 2020-12-17 15:41:22 +00:00 · 2020-07-05 14:53:47 +09:00 · 2020-06-01 18:04:29 +00:00 · 2020-06-01 17:50:18 +00:00
59 changed files with 1244 additions and 447 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -1,6 +1,74 @@
 # Changelog

-## v20.01: (Upcoming Release)
+## v20.01.3: (Upcoming Release)
+
+## v20.01.2:
+
+### dpdk
+
+Updated DPDK submodule to DPDK 19.11.2, which includes fixes for DPDK vulnerabilities:
+CVE-2020-10722, CVE-2020-10723, CVE-2020-10724, CVE-2020-10725, CVE-2020-10724.
+
+### env_dpdk
+
+A new function, `spdk_mem_reserve`, has been added to reserve a memory region in SPDK's
+memory maps. It pre-allocates data structures to hold memory address translations
+without populating the region.
+
+### rpc
+
+A new RPC, `bdev_rbd_resize` has been added to resize the Ceph RBD bdev.
+
+## v20.01.1:
+
+## v20.01:
+
+### bdev
+
+A new function, `spdk_bdev_set_timeout`, has been added to set per descriptor I/O timeouts.
+
+A new class of functions `spdk_bdev_compare*`, have been added to allow native bdev support
+of block comparisons and compare-and-write.
+
+A new class of bdev events, `SPDK_BDEV_EVENT_MEDIA_MANAGEMENT`, has been added to allow bdevs
+which expose raw media to alert all I/O channels of pending media management events.
+
+A new API was added `spdk_bdev_io_get_aux_buf` allowing the caller to request
+an auxiliary buffer for its own private use. The API is used in the same manner that
+`spdk_bdev_io_get_buf` is used and the length of the buffer is always the same as the
+bdev_io primary buffer. 'spdk_bdev_io_put_aux_buf' frees the allocated auxiliary
+buffer.
+
+### blobfs
+
+Added boolean return value for function spdk_fs_set_cache_size to indicate its operation result.
+
+Added `blobfs_set_cache_size` RPC method to set cache size for blobstore filesystem.
+
+### blobstore
+
+Added new `use_extent_table` option to `spdk_blob_opts` for creating blobs with Extent Table descriptor.
+Using this metadata format, dramatically decreases number of writes required to persist each cluster allocation
+for thin provisioned blobs. Extent Table descriptor is enabled by default.
+See the [Blobstore Programmer's Guide](https://spdk.io/doc/blob.html#blob_pg_cluster_layout) for more details.
+
+### dpdk
+
+Updated DPDK submodule to DPDK 19.11.
+
+### env_dpdk
+
+`spdk_env_dpdk_post_init` now takes a boolean, `legacy_mem`, as an argument.
+
+A new function, `spdk_env_dpdk_dump_mem_stats`, prints information about the memory consumed by DPDK to a file specified by
+the user. A new utility, `scripts/dpdk_mem_info.py`, wraps this function and prints the output in an easy to read way.
+
+### event
+
+The functions `spdk_reactor_enable_framework_monitor_context_switch()` and
+`spdk_reactor_framework_monitor_context_switch_enabled()` have been changed to
+`spdk_framework_enable_context_switch_monitor()` and
+`spdk_framework_context_switch_monitor_enabled()`, respectively.

 ### ftl

@ -18,43 +86,6 @@ parameter.

 `spdk_ftl_punit_range` and `ftl_module_init_opts` structures were removed.

-### nvmf
-
-Support for custom NVMe admin command handlers and admin command passthru
-in the NVMF subsystem.
-
-It is now possible to set a custom handler for a specific NVMe admin command.
-For example, vendor specific admin commands can now be intercepted by implementing
-a function handling the command.
-Further NVMe admin commands can be forwarded straight to an underlying NVMe bdev.
-
-The functions `spdk_nvmf_set_custom_admin_cmd_hdlr` and `spdk_nvmf_set_passthru_admin_cmd`
-in `spdk_internal/nvmf.h` expose this functionality. There is an example custom admin handler
-for the NVMe IDENTIFY CTRLR in `lib/nvmf/custom_cmd_hdlr.c`. This handler gets the SN, MN, FR, IEEE, FGUID
-attributes from the first NVMe drive in the NVMF subsystem and returns it to the NVMF initiator (sn and mn attributes
-specified during NVMF subsystem creation RPC will be overwritten).
-This handler can be enabled via the `nvmf_set_config` RPC.
-Note: In a future version of SPDK, this handler will be enabled by default.
-
-### bdev
-
-A new API was added `spdk_bdev_io_get_aux_buf` allowing the caller to request
-an auxiliary buffer for its own private use. The API is used in the same manner that
-`spdk_bdev_io_get_buf` is used and the length of the buffer is always the same as the
-bdev_io primary buffer. 'spdk_bdev_io_put_aux_buf' frees the allocated auxiliary
-buffer.
-
-### sock
-
-Added spdk_sock_writev_async for performing asynchronous writes to sockets. This call will
-never return EAGAIN, instead queueing internally until the data has all been sent. This can
-simplify many code flows that create pollers to continue attempting to flush writes
-on sockets.
-
-Added `impl_name` parameter in spdk_sock_listen and spdk_sock_connect functions. Users may now
-specify the sock layer implementation they'd prefer to use. Valid implementations are currently
-"vpp" and "posix" and NULL, where NULL results in the previous behavior of the functions.
-
 ### isa-l

 Updated ISA-L submodule to commit f3993f5c0b6911 which includes implementation and
@ -62,16 +93,30 @@ optimization for aarch64.

 Enabled ISA-L on aarch64 by default in addition to x86.

-### thread
+### nvme

-`spdk_thread_send_msg` now returns int indicating if the message was successfully
-sent.
+`delayed_pcie_doorbell` parameter in `spdk_nvme_io_qpair_opts` was renamed to `delay_cmd_submit`
+to allow reuse in other transports.

-### blobfs
+Added RDMA WR batching to NVMf RDMA initiator. Send and receive WRs are chained together
+and posted with a single call to ibv_post_send(receive) in the next call to qpair completion
+processing function. Batching is controlled by 'delay_cmd_submit' qpair option.

-Added boolean return value for function spdk_fs_set_cache_size to indicate its operation result.
+The NVMe-oF initiator now supports plugging out of tree NVMe-oF transports. In order
+to facilitate this feature, several small API changes have been made:

-Added `blobfs_set_cache_size` RPC method to set cache size for blobstore filesystem.
+The `spdk_nvme_transport_id` struct now contains a trstring member used to identify the transport.
+A new function, `spdk_nvme_transport_available_by_name`, has been added.
+A function table, `spdk_nvme_transport_ops`, and macro, `SPDK_NVME_TRANSPORT_REGISTER`, have been added which
+enable registering out of tree transports.
+
+A new function, `spdk_nvme_ns_supports_compare`, allows a user to check whether a given namespace supports the compare
+operation.
+
+A new family of functions, `spdk_nvme_ns_compare*`, give the user access to submitting compare commands to NVMe namespaces.
+
+A new function, `spdk_nvme_ctrlr_cmd_get_log_page_ext`, gives users more granular control over the command dwords sent in
+log page requests.

 ### nvmf

@ -91,45 +136,71 @@ Add `spdk_nvmf_tgt_stop_listen()` that can be used to stop listening for
 incoming connections for specified target and trid. Listener is not stopped
 implicitly upon destruction of a subsystem any more.

+A custom NVMe admin command handler has been added which allows the user to use the real drive
+attributes from one of the target NVMe drives when reporting drive attributes to the initiator.
+This handler can be enabled via the `nvmf_set_config` RPC.
+Note: In a future version of SPDK, this handler will be enabled by default.
+
+The SPDK target and initiator both now include compare-and-write functionality with one caveat. If using the RDMA transport,
+the target expects the initiator to send both the compare command and write command either with, or without inline data. The
+SPDK initiator currently respects this requirement, but this note is included as a flag for other initiators attempting
+compatibility with this version of SPDK.
+
+### rpc
+
+A new RPC, `bdev_zone_block_create`, enables creating an emulated zoned bdev on top of a standard block device.
+
+A new RPC, `bdev_ocssd_create`, enables creating an emulated zoned bdev on top of an Open Channel SSD.
+
+A new RPC, `blobfs_set_cache_size`, enables managing blobfs cache size.
+
+A new RPC, `env_dpdk_get_mem_stats`, has been added to facilitate reading DPDK related memory
+consumption stats. Please see the env_dpdk section above for more details.
+
+A new RPC, `framework_get_reactors`, has been added to retrieve a list of all reactors.
+
+`bdev_ftl_create` now takes a `base_bdev` argument in lieu of `trtype`, `traddr`, and `punits`.
+
+`bdev_nvme_set_options` now allows users to disable I/O submission batching with the `-d` flag
+
+`bdev_nvme_cuse_register` now accepts a `name` parameter.
+
+`bdev_uring_create` now takes arguments for `bdev_name` and `block_size`
+
+`nvmf_set_config` now takes an argument to enable passthru of identify commands to base NVMe devices.
+Please see the nvmf section above for more details.
+
+### scsi
+
+`spdk_scsi_lun_get_dif_ctx` now takes an additional argument of type `spdk_scsi_task`.
+
+### sock
+
+Added spdk_sock_writev_async for performing asynchronous writes to sockets. This call will
+never return EAGAIN, instead queueing internally until the data has all been sent. This can
+simplify many code flows that create pollers to continue attempting to flush writes
+on sockets.
+
+Added `impl_name` parameter in spdk_sock_listen and spdk_sock_connect functions. Users may now
+specify the sock layer implementation they'd prefer to use. Valid implementations are currently
+"vpp" and "posix" and NULL, where NULL results in the previous behavior of the functions.
+
+### thread
+
+`spdk_thread_send_msg` now returns int indicating if the message was successfully
+sent.
+
+A new function `spdk_thread_send_critical_msg`, has been added to support sending a single message from
+a context that may be interrupted, e.g. a signal handler.
+
+Two new functions, `spdk_poller_pause`, and `spdk_poller_resume`, have been added to give greater control
+of pollers to the application owner.
+
 ### util

 `spdk_pipe`, a new utility for buffering data from sockets or files for parsing
 has been added. The public API is available at `include/spdk/pipe.h`.

-### nvme
-
-`delayed_pcie_doorbell` parameter in `spdk_nvme_io_qpair_opts` was renamed to `delay_cmd_submit`
-to allow reuse in other transports.
-
-Added RDMA WR batching to NVMf RDMA initiator. Send and receive WRs are chained together
-and posted with a single call to ibv_post_send(receive) in the next call to qpair completion
-processing function. Batching is controlled by 'delay_cmd_submit' qpair option.
-
-The NVMe-oF initiator now supports plugging out of tree NVMe-oF transports. In order
-to facilitate this feature, several small API changes have been made:
-
-The `spdk_nvme_transport_id` struct now contains a trstring member used to identify the transport.
-A new function, `spdk_nvme_transport_available_by_name`, has been added.
-A function table, `spdk_nvme_transport_ops`, and macro, `SPDK_NVME_TRANSPORT_REGISTER`, have been added which
-enable registering out of tree transports.
-
-### rpc
-
-Added optional 'delay_cmd_submit' parameter to 'bdev_nvme_set_options' RPC method.
-
-An new RPC `framework_get_reactors` has been added to retrieve list of all reactors.
-
-### dpdk
-
-Updated DPDK submodule to DPDK 19.11.
-
-### event
-
-The functions `spdk_reactor_enable_framework_monitor_context_switch()` and
-`spdk_reactor_framework_monitor_context_switch_enabled()` have been changed to
-`spdk_framework_enable_context_switch_monitor()` and
-`spdk_framework_context_switch_monitor_enabled()`, respectively.
-
 ### bdev

 Added spdk_bdev_io_get_nvme_fused_status function for translating bdev_io status to NVMe status
--- a/autorun_post.py
+++ b/autorun_post.py
@ -144,7 +144,7 @@ def confirmPerPatchTests(test_list, skiplist):
        exit(1)


-def aggregateCompletedTests(output_dir, repo_dir):
+def aggregateCompletedTests(output_dir, repo_dir, skip_confirm=False):
    test_list = {}
    test_completion_table = []

@ -172,14 +172,15 @@ def aggregateCompletedTests(output_dir, repo_dir):
    printListInformation("Tests", test_list)
    generateTestCompletionTables(output_dir, test_completion_table)
    skipped_tests = getSkippedTests(repo_dir)
+    if not skip_confirm:
        confirmPerPatchTests(test_list, skipped_tests)


-def main(output_dir, repo_dir):
+def main(output_dir, repo_dir, skip_confirm=False):
    generateCoverageReport(output_dir, repo_dir)
    collectOne(output_dir, 'doc')
    collectOne(output_dir, 'ut_coverage')
-    aggregateCompletedTests(output_dir, repo_dir)
+    aggregateCompletedTests(output_dir, repo_dir, skip_confirm)


 if __name__ == "__main__":
@ -188,5 +189,7 @@ if __name__ == "__main__":
                        help="The location of your build's output directory")
    parser.add_argument("-r", "--repo_directory", type=str, required=True,
                        help="The location of your spdk repository")
+    parser.add_argument("-s", "--skip_confirm", required=False, action="store_true",
+                        help="Do not check if all autotest.sh tests were executed.")
    args = parser.parse_args()
-    main(args.directory_location, args.repo_directory)
+    main(args.directory_location, args.repo_directory, args.skip_confirm)
--- a/doc/bdev.md
+++ b/doc/bdev.md
@ -119,6 +119,12 @@ To remove a block device representation use the bdev_rbd_delete command.

 `rpc.py bdev_rbd_delete Rbd0`

+To resize a bdev use the bdev_rbd_resize command.
+
+`rpc.py bdev_rbd_resize Rbd0 4096`
+
+This command will resize the Rbd0 bdev to 4096 MiB.
+
 # Compression Virtual Bdev Module {#bdev_config_compress}

 The compression bdev module can be configured to provide compression/decompression
--- a/doc/blob.md
+++ b/doc/blob.md
@ -318,6 +318,24 @@ form a linked list. The first page in the list will be written in place on updat
 be written to fresh locations. This requires the backing device to support an atomic write size greater than
 or equal to the page size to guarantee that the operation is atomic. See the section on atomicity for details.

+### Blob cluster layout {#blob_pg_cluster_layout}
+
+Each blob is an ordered list of clusters, where starting LBA of a cluster is called extent. A blob can be
+thin provisioned, resulting in no extent for some of the clusters. When first write operation occurs
+to the unallocated cluster - new extent is chosen. This information is stored in RAM and on-disk.
+
+There are two extent representations on-disk, dependent on `use_extent_table` (default:true) opts used
+when creating a blob.
+* **use_extent_table=true**: EXTENT_PAGE descriptor is not part of linked list of pages. It contains extents
+that are not run-length encoded. Each extent page is referenced by EXTENT_TABLE descriptor, which is serialized
+as part of linked list of pages.  Extent table is run-length encoding all unallocated extent pages.
+Every new cluster allocation updates a single extent page, in case when extent page was previously allocated.
+Otherwise additionally incurs serializing whole linked list of pages for the blob.
+
+* **use_extent_table=false**: EXTENT_RLE descriptor is serialized as part of linked list of pages.
+Extents pointing to contiguous LBA are run-length encoded, including unallocated extents represented by 0.
+Every new cluster allocation incurs serializing whole linked list of pages for the blob.
+
 ### Sequences and Batches

 Internally Blobstore uses the concepts of sequences and batches to submit IO to the underlying device in either
--- a/doc/jsonrpc.md
+++ b/doc/jsonrpc.md
@ -1853,6 +1853,49 @@ Example response:
 }
 ~~~

+## bdev_rbd_resize {#rpc_bdev_rbd_resize}
+
+Resize @ref bdev_config_rbd bdev
+
+This method is available only if SPDK was build with Ceph RBD support.
+
+### Result
+
+`true` if bdev with provided name was resized or `false` otherwise.
+
+### Parameters
+
+Name                    | Optional | Type        | Description
+----------------------- | -------- | ----------- | -----------
+name                    | Required | string      | Bdev name
+new_size                | Required | int         | New bdev size for resize operation in MiB
+
+### Example
+
+Example request:
+
+~~~
+{
+  "params": {
+    "name": "Rbd0"
+    "new_size": "4096"
+  },
+  "jsonrpc": "2.0",
+  "method": "bdev_rbd_resize",
+  "id": 1
+}
+~~~
+
+Example response:
+
+~~~
+{
+  "jsonrpc": "2.0",
+  "id": 1,
+  "result": true
+}
+~~~
+
 ## bdev_delay_create {#rpc_bdev_delay_create}

 Create delay bdev. This bdev type redirects all IO to it's base bdev and inserts a delay on the completion
--- a/2
+++ b/2
@ -1 +1 @@
-Subproject commit fdb511332624e28631f553a226abb1dc0b35b28a
+Subproject commit ef71bfaface10cc19b75e45d3158ab71a788e3a9
--- a/dpdkbuild/Makefile
+++ b/dpdkbuild/Makefile
@ -140,6 +140,18 @@ endif
 # Allow users to specify EXTRA_DPDK_CFLAGS if they want to build DPDK using unsupported compiler versions
 DPDK_CFLAGS += $(EXTRA_DPDK_CFLAGS)

+ifeq ($(CC_TYPE),gcc)
+GCC_MAJOR = $(shell echo __GNUC__ | $(CC) -E -x c - | tail -n 1)
+ifeq ($(shell test $(GCC_MAJOR) -ge 10 && echo 1), 1)
+#1. gcc 10 complains on operations with zero size arrays in rte_cryptodev.c, so
+#disable this warning
+#2. gcc 10 disables fcommon by default and complains on multiple definition of
+#aesni_mb_logtype_driver symbol which is defined in header file and presented in sevral
+#translation units
+DPDK_CFLAGS += -Wno-stringop-overflow -fcommon
+endif
+endif
+
 $(SPDK_ROOT_DIR)/dpdk/build: $(SPDK_ROOT_DIR)/mk/cc.mk $(SPDK_ROOT_DIR)/include/spdk/config.h
 	$(Q)rm -rf $(SPDK_ROOT_DIR)/dpdk/build
 	$(Q)$(MAKE) -C $(SPDK_ROOT_DIR)/dpdk config T=$(DPDK_CONFIG) $(DPDK_OPTS)
--- a/examples/nvme/fio_plugin/mock_sgl_config.fio
+++ b/examples/nvme/fio_plugin/mock_sgl_config.fio
@ -11,6 +11,7 @@ iodepth=128
 rw=randrw
 bs=16k
 verify=md5
+verify_backlog=32

 [test]
 numjobs=1
--- a/include/spdk/env.h
+++ b/include/spdk/env.h
@ -1241,6 +1241,20 @@ int spdk_mem_register(void *vaddr, size_t len);
 */
 int spdk_mem_unregister(void *vaddr, size_t len);

+/**
+ * Reserve the address space specified in all memory maps.
+ *
+ * This pre-allocates the necessary space in the memory maps such that
+ * future calls to spdk_mem_register() on that region require no
+ * internal memory allocations.
+ *
+ * \param vaddr Virtual address to reserve
+ * \param len Length in bytes of vaddr
+ *
+ * \return 0 on success, negated errno on failure.
+ */
+int spdk_mem_reserve(void *vaddr, size_t len);
+
 #ifdef __cplusplus
 }
 #endif
--- a/include/spdk/nvme.h
+++ b/include/spdk/nvme.h
@ -2781,6 +2781,14 @@ struct spdk_nvme_rdma_hooks {
 	 * \return Infiniband remote key (rkey) for this buf
 	 */
 	uint64_t (*get_rkey)(struct ibv_pd *pd, void *buf, size_t size);
+
+	/**
+	 * \brief Put back keys got from get_rkey.
+	 *
+	 * \param key The Infiniband remote key (rkey) got from get_rkey
+	 *
+	 */
+	void (*put_rkey)(uint64_t key);
 };

 /**
--- a/include/spdk/nvme_spec.h
+++ b/include/spdk/nvme_spec.h
@ -1156,7 +1156,7 @@ enum spdk_nvme_generic_command_status_code {
 enum spdk_nvme_command_specific_status_code {
 	SPDK_NVME_SC_COMPLETION_QUEUE_INVALID		= 0x00,
 	SPDK_NVME_SC_INVALID_QUEUE_IDENTIFIER		= 0x01,
-	SPDK_NVME_SC_MAXIMUM_QUEUE_SIZE_EXCEEDED	= 0x02,
+	SPDK_NVME_SC_INVALID_QUEUE_SIZE			= 0x02,
 	SPDK_NVME_SC_ABORT_COMMAND_LIMIT_EXCEEDED	= 0x03,
 	/* 0x04 - reserved */
 	SPDK_NVME_SC_ASYNC_EVENT_REQUEST_LIMIT_EXCEEDED = 0x05,
--- a/include/spdk/version.h
+++ b/include/spdk/version.h
@ -54,7 +54,7 @@
 * Patch level is incremented on maintenance branch releases and reset to 0 for each
 * new major.minor release.
 */
-#define SPDK_VERSION_PATCH	0
+#define SPDK_VERSION_PATCH	3

 /**
 * Version string suffix.
--- a/include/spdk_internal/sock.h
+++ b/include/spdk_internal/sock.h
@ -77,6 +77,12 @@ struct spdk_sock_group_impl {
 	struct spdk_net_impl			*net_impl;
 	TAILQ_HEAD(, spdk_sock)			socks;
 	STAILQ_ENTRY(spdk_sock_group_impl)	link;
+	/* List of removed sockets. refreshed each time we poll the sock group. */
+	int					num_removed_socks;
+	/* Unfortunately, we can't just keep a tailq of the sockets in case they are freed
+	 * or added to another poll group later.
+	 */
+	uintptr_t				removed_socks[MAX_EVENTS_PER_POLL];
 };

 struct spdk_net_impl {
--- a/lib/bdev/scsi_nvme.c
+++ b/lib/bdev/scsi_nvme.c
@ -165,7 +165,7 @@ spdk_scsi_nvme_translate(const struct spdk_bdev_io *bdev_io, int *sc, int *sk,
 			*ascq = SPDK_SCSI_ASCQ_CAUSE_NOT_REPORTABLE;
 			break;
 		case SPDK_NVME_SC_INVALID_QUEUE_IDENTIFIER:
-		case SPDK_NVME_SC_MAXIMUM_QUEUE_SIZE_EXCEEDED:
+		case SPDK_NVME_SC_INVALID_QUEUE_SIZE:
 		case SPDK_NVME_SC_ASYNC_EVENT_REQUEST_LIMIT_EXCEEDED:
 		case SPDK_NVME_SC_INVALID_FIRMWARE_SLOT:
 		case SPDK_NVME_SC_INVALID_FIRMWARE_IMAGE:
--- a/lib/blob/blobstore.c
+++ b/lib/blob/blobstore.c
@ -247,6 +247,7 @@ _spdk_blob_alloc(struct spdk_blob_store *bs, spdk_blob_id id)

 	TAILQ_INIT(&blob->xattrs);
 	TAILQ_INIT(&blob->xattrs_internal);
+	TAILQ_INIT(&blob->pending_persists);

 	return blob;
 }
@ -268,6 +269,7 @@ static void
 _spdk_blob_free(struct spdk_blob *blob)
 {
 	assert(blob != NULL);
+	assert(TAILQ_EMPTY(&blob->pending_persists));

 	free(blob->active.extent_pages);
 	free(blob->clean.extent_pages);
@ -1520,6 +1522,7 @@ struct spdk_blob_persist_ctx {
 	spdk_bs_sequence_t		*seq;
 	spdk_bs_sequence_cpl		cb_fn;
 	void				*cb_arg;
+	TAILQ_ENTRY(spdk_blob_persist_ctx) link;
 };

 static void
@ -1540,22 +1543,34 @@ spdk_bs_batch_clear_dev(struct spdk_blob_persist_ctx *ctx, spdk_bs_batch_t *batc
 	}
 }

+static void _spdk_blob_persist_check_dirty(struct spdk_blob_persist_ctx *ctx);
+
 static void
 _spdk_blob_persist_complete(spdk_bs_sequence_t *seq, void *cb_arg, int bserrno)
 {
 	struct spdk_blob_persist_ctx	*ctx = cb_arg;
+	struct spdk_blob_persist_ctx	*next_persist;
 	struct spdk_blob		*blob = ctx->blob;

 	if (bserrno == 0) {
 		_spdk_blob_mark_clean(blob);
 	}

+	assert(ctx == TAILQ_FIRST(&blob->pending_persists));
+	TAILQ_REMOVE(&blob->pending_persists, ctx, link);
+
+	next_persist = TAILQ_FIRST(&blob->pending_persists);
+
 	/* Call user callback */
 	ctx->cb_fn(seq, ctx->cb_arg, bserrno);

 	/* Free the memory */
 	spdk_free(ctx->pages);
 	free(ctx);
+
+	if (next_persist != NULL) {
+		_spdk_blob_persist_check_dirty(next_persist);
+	}
 }

 static void
@ -2060,6 +2075,25 @@ _spdk_blob_persist_dirty(spdk_bs_sequence_t *seq, void *cb_arg, int bserrno)
 	_spdk_bs_write_super(seq, ctx->blob->bs, ctx->super, _spdk_blob_persist_dirty_cpl, ctx);
 }

+static void
+_spdk_blob_persist_check_dirty(struct spdk_blob_persist_ctx *ctx)
+{
+	if (ctx->blob->bs->clean) {
+		ctx->super = spdk_zmalloc(sizeof(*ctx->super), 0x1000, NULL,
+					  SPDK_ENV_SOCKET_ID_ANY, SPDK_MALLOC_DMA);
+		if (!ctx->super) {
+			ctx->cb_fn(ctx->seq, ctx->cb_arg, -ENOMEM);
+			free(ctx);
+			return;
+		}
+
+		spdk_bs_sequence_read_dev(ctx->seq, ctx->super, _spdk_bs_page_to_lba(ctx->blob->bs, 0),
+					  _spdk_bs_byte_to_lba(ctx->blob->bs, sizeof(*ctx->super)),
+					  _spdk_blob_persist_dirty, ctx);
+	} else {
+		_spdk_blob_persist_start(ctx);
+	}
+}

 /* Write a blob to disk */
 static void
@ -2070,7 +2104,7 @@ _spdk_blob_persist(spdk_bs_sequence_t *seq, struct spdk_blob *blob,

 	_spdk_blob_verify_md_op(blob);

-	if (blob->state == SPDK_BLOB_STATE_CLEAN) {
+	if (blob->state == SPDK_BLOB_STATE_CLEAN && TAILQ_EMPTY(&blob->pending_persists)) {
 		cb_fn(seq, cb_arg, 0);
 		return;
 	}
@ -2086,21 +2120,15 @@ _spdk_blob_persist(spdk_bs_sequence_t *seq, struct spdk_blob *blob,
 	ctx->cb_arg = cb_arg;
 	ctx->next_extent_page = 0;

-	if (blob->bs->clean) {
-		ctx->super = spdk_zmalloc(sizeof(*ctx->super), 0x1000, NULL,
-					  SPDK_ENV_SOCKET_ID_ANY, SPDK_MALLOC_DMA);
-		if (!ctx->super) {
-			cb_fn(seq, cb_arg, -ENOMEM);
-			free(ctx);
+	/* Multiple blob persists can affect one another, via blob->state or
+	 * blob mutable data changes. To prevent it, queue up the persists. */
+	if (!TAILQ_EMPTY(&blob->pending_persists)) {
+		TAILQ_INSERT_TAIL(&blob->pending_persists, ctx, link);
 		return;
 	}
+	TAILQ_INSERT_HEAD(&blob->pending_persists, ctx, link);

-		spdk_bs_sequence_read_dev(seq, ctx->super, _spdk_bs_page_to_lba(blob->bs, 0),
-					  _spdk_bs_byte_to_lba(blob->bs, sizeof(*ctx->super)),
-					  _spdk_blob_persist_dirty, ctx);
-	} else {
-		_spdk_blob_persist_start(ctx);
-	}
+	_spdk_blob_persist_check_dirty(ctx);
 }

 struct spdk_blob_copy_cluster_ctx {
@ -5129,6 +5157,9 @@ _spdk_bs_create_blob(struct spdk_blob_store *bs,
 	}

 	blob->use_extent_table = opts->use_extent_table;
+	if (blob->use_extent_table) {
+		blob->invalid_flags |= SPDK_BLOB_EXTENT_TABLE;
+	}

 	if (!internal_xattrs) {
 		_spdk_blob_xattrs_init(&internal_xattrs_default);
@ -6179,7 +6210,14 @@ _spdk_delete_snapshot_sync_clone_cpl(void *cb_arg, int bserrno)
 			ctx->snapshot->active.clusters[i] = 0;
 		}
 	}
+	for (i = 0; i < ctx->snapshot->active.num_extent_pages &&
+	     i < ctx->clone->active.num_extent_pages; i++) {
+		if (ctx->clone->active.extent_pages[i] == ctx->snapshot->active.extent_pages[i]) {
+			ctx->snapshot->active.extent_pages[i] = 0;
+		}
+	}

+	_spdk_blob_set_thin_provision(ctx->snapshot);
 	ctx->snapshot->state = SPDK_BLOB_STATE_DIRTY;

 	if (ctx->parent_snapshot_entry != NULL) {
@ -6212,6 +6250,12 @@ _spdk_delete_snapshot_sync_snapshot_xattr_cpl(void *cb_arg, int bserrno)
 			ctx->clone->active.clusters[i] = ctx->snapshot->active.clusters[i];
 		}
 	}
+	for (i = 0; i < ctx->snapshot->active.num_extent_pages &&
+	     i < ctx->clone->active.num_extent_pages; i++) {
+		if (ctx->clone->active.extent_pages[i] == 0) {
+			ctx->clone->active.extent_pages[i] = ctx->snapshot->active.extent_pages[i];
+		}
+	}

 	/* Delete old backing bs_dev from clone (related to snapshot that will be removed) */
 	ctx->clone->back_bs_dev->destroy(ctx->clone->back_bs_dev);
--- a/lib/blob/blobstore.h
+++ b/lib/blob/blobstore.h
@ -166,6 +166,9 @@ struct spdk_blob {
 	bool extent_table_found;
 	bool use_extent_table;

+	/* A list of pending metadata pending_persists */
+	TAILQ_HEAD(, spdk_blob_persist_ctx) pending_persists;
+
 	/* Number of data clusters retrived from extent table,
 	 * that many have to be read from extent pages. */
 	uint64_t	remaining_clusters_in_et;
@ -331,7 +334,8 @@ struct spdk_blob_md_descriptor_extent_page {

 #define SPDK_BLOB_THIN_PROV (1ULL << 0)
 #define SPDK_BLOB_INTERNAL_XATTR (1ULL << 1)
-#define SPDK_BLOB_INVALID_FLAGS_MASK	(SPDK_BLOB_THIN_PROV | SPDK_BLOB_INTERNAL_XATTR)
+#define SPDK_BLOB_EXTENT_TABLE (1ULL << 2)
+#define SPDK_BLOB_INVALID_FLAGS_MASK	(SPDK_BLOB_THIN_PROV | SPDK_BLOB_INTERNAL_XATTR | SPDK_BLOB_EXTENT_TABLE)

 #define SPDK_BLOB_READ_ONLY (1ULL << 0)
 #define SPDK_BLOB_DATA_RO_FLAGS_MASK	SPDK_BLOB_READ_ONLY
--- a/lib/env_dpdk/Makefile
+++ b/lib/env_dpdk/Makefile
@ -34,6 +34,10 @@
 SPDK_ROOT_DIR := $(abspath $(CURDIR)/../..)
 include $(SPDK_ROOT_DIR)/mk/spdk.common.mk

+SO_VER := 2
+SO_MINOR := 1
+SO_SUFFIX := $(SO_VER).$(SO_MINOR)
+
 CFLAGS += $(ENV_CFLAGS)
 C_SRCS = env.c memory.c pci.c init.c threads.c
 C_SRCS += pci_nvme.c pci_ioat.c pci_virtio.c pci_vmd.c
--- a/lib/env_dpdk/env.mk
+++ b/lib/env_dpdk/env.mk
@ -78,6 +78,11 @@ ifneq (, $(wildcard $(DPDK_ABS_DIR)/lib/librte_bus_pci.*))
 DPDK_LIB_LIST += rte_bus_pci
 endif

+# DPDK 20.05 eal dependency
+ifneq (, $(wildcard $(DPDK_ABS_DIR)/lib/librte_telemetry.*))
+DPDK_LIB_LIST += rte_telemetry
+endif
+
 # There are some complex dependencies when using crypto, reduce or both so
 # here we add the feature specific ones and set a flag to add the common
 # ones after that.
--- a/lib/env_dpdk/memory.c
+++ b/lib/env_dpdk/memory.c
@ -36,7 +36,6 @@
 #include "env_internal.h"

 #include <rte_config.h>
-#include <rte_malloc.h>
 #include <rte_memory.h>
 #include <rte_eal_memconfig.h>

@ -343,12 +342,8 @@ spdk_mem_map_free(struct spdk_mem_map **pmap)
 	}

 	for (i = 0; i < sizeof(map->map_256tb.map) / sizeof(map->map_256tb.map[0]); i++) {
-		if (g_legacy_mem) {
-			rte_free(map->map_256tb.map[i]);
-		} else {
 		free(map->map_256tb.map[i]);
 	}
-	}

 	pthread_mutex_destroy(&map->mutex);

@ -508,6 +503,57 @@ spdk_mem_unregister(void *vaddr, size_t len)
 	return 0;
 }

+int
+spdk_mem_reserve(void *vaddr, size_t len)
+{
+	struct spdk_mem_map *map;
+	void *seg_vaddr;
+	size_t seg_len;
+	uint64_t reg;
+
+	if ((uintptr_t)vaddr & ~MASK_256TB) {
+		DEBUG_PRINT("invalid usermode virtual address %p\n", vaddr);
+		return -EINVAL;
+	}
+
+	if (((uintptr_t)vaddr & MASK_2MB) || (len & MASK_2MB)) {
+		DEBUG_PRINT("invalid %s parameters, vaddr=%p len=%ju\n",
+			    __func__, vaddr, len);
+		return -EINVAL;
+	}
+
+	if (len == 0) {
+		return 0;
+	}
+
+	pthread_mutex_lock(&g_spdk_mem_map_mutex);
+
+	/* Check if any part of this range is already registered */
+	seg_vaddr = vaddr;
+	seg_len = len;
+	while (seg_len > 0) {
+		reg = spdk_mem_map_translate(g_mem_reg_map, (uint64_t)seg_vaddr, NULL);
+		if (reg & REG_MAP_REGISTERED) {
+			pthread_mutex_unlock(&g_spdk_mem_map_mutex);
+			return -EBUSY;
+		}
+		seg_vaddr += VALUE_2MB;
+		seg_len -= VALUE_2MB;
+	}
+
+	/* Simply set the translation to the memory map's default. This allocates the space in the
+	 * map but does not provide a valid translation. */
+	spdk_mem_map_set_translation(g_mem_reg_map, (uint64_t)vaddr, len,
+				     g_mem_reg_map->default_translation);
+
+	TAILQ_FOREACH(map, &g_spdk_mem_maps, tailq) {
+		spdk_mem_map_set_translation(map, (uint64_t)vaddr, len, map->default_translation);
+	}
+
+	pthread_mutex_unlock(&g_spdk_mem_map_mutex);
+	return 0;
+}
+
 static struct map_1gb *
 spdk_mem_map_get_map_1gb(struct spdk_mem_map *map, uint64_t vfn_2mb)
 {
@ -527,23 +573,7 @@ spdk_mem_map_get_map_1gb(struct spdk_mem_map *map, uint64_t vfn_2mb)
 		/* Recheck to make sure nobody else got the mutex first. */
 		map_1gb = map->map_256tb.map[idx_256tb];
 		if (!map_1gb) {
-			/* Some of the existing apps use TCMalloc hugepage
-			 * allocator and register this tcmalloc allocated
-			 * hugepage memory with SPDK in the mmap hook. Since
-			 * this function is called in the spdk_mem_register
-			 * code path we can't do a malloc here otherwise that
-			 * would cause a livelock. So we use the dpdk provided
-			 * allocator instead, which avoids this cyclic
-			 * dependency.  Note this is only guaranteed to work when
-			 * DPDK dynamic memory allocation is disabled (--legacy-mem),
-			 * which then is a requirement for anyone using TCMalloc in
-			 * this way.
-			 */
-			if (g_legacy_mem) {
-				map_1gb = rte_malloc(NULL, sizeof(struct map_1gb), 0);
-			} else {
 			map_1gb = malloc(sizeof(struct map_1gb));
-			}
 			if (map_1gb) {
 				/* initialize all entries to default translation */
 				for (i = 0; i < SPDK_COUNTOF(map_1gb->map); i++) {
@ -778,14 +808,23 @@ static TAILQ_HEAD(, spdk_vtophys_pci_device) g_vtophys_pci_devices =
 	TAILQ_HEAD_INITIALIZER(g_vtophys_pci_devices);

 static struct spdk_mem_map *g_vtophys_map;
+static struct spdk_mem_map *g_phys_ref_map;

 #if SPDK_VFIO_ENABLED
 static int
 vtophys_iommu_map_dma(uint64_t vaddr, uint64_t iova, uint64_t size)
 {
 	struct spdk_vfio_dma_map *dma_map;
+	uint64_t refcount;
 	int ret;

+	refcount = spdk_mem_map_translate(g_phys_ref_map, iova, NULL);
+	assert(refcount < UINT64_MAX);
+	if (refcount > 0) {
+		spdk_mem_map_set_translation(g_phys_ref_map, iova, size, refcount + 1);
+		return 0;
+	}
+
 	dma_map = calloc(1, sizeof(*dma_map));
 	if (dma_map == NULL) {
 		return -ENOMEM;
@ -832,6 +871,7 @@ vtophys_iommu_map_dma(uint64_t vaddr, uint64_t iova, uint64_t size)
 out_insert:
 	TAILQ_INSERT_TAIL(&g_vfio.maps, dma_map, tailq);
 	pthread_mutex_unlock(&g_vfio.mutex);
+	spdk_mem_map_set_translation(g_phys_ref_map, iova, size, refcount + 1);
 	return 0;
 }

@ -839,6 +879,7 @@ static int
 vtophys_iommu_unmap_dma(uint64_t iova, uint64_t size)
 {
 	struct spdk_vfio_dma_map *dma_map;
+	uint64_t refcount;
 	int ret;

 	pthread_mutex_lock(&g_vfio.mutex);
@ -854,6 +895,18 @@ vtophys_iommu_unmap_dma(uint64_t iova, uint64_t size)
 		return -ENXIO;
 	}

+	refcount = spdk_mem_map_translate(g_phys_ref_map, iova, NULL);
+	assert(refcount < UINT64_MAX);
+	if (refcount > 0) {
+		spdk_mem_map_set_translation(g_phys_ref_map, iova, size, refcount - 1);
+	}
+
+	/* We still have outstanding references, don't clear it. */
+	if (refcount > 1) {
+		pthread_mutex_unlock(&g_vfio.mutex);
+		return 0;
+	}
+
 	/** don't support partial or multiple-page unmap for now */
 	assert(dma_map->map.size == size);

@ -1383,10 +1436,21 @@ spdk_vtophys_init(void)
 		.are_contiguous = vtophys_check_contiguous_entries,
 	};

+	const struct spdk_mem_map_ops phys_ref_map_ops = {
+		.notify_cb = NULL,
+		.are_contiguous = NULL,
+	};
+
 #if SPDK_VFIO_ENABLED
 	spdk_vtophys_iommu_init();
 #endif

+	g_phys_ref_map = spdk_mem_map_alloc(0, &phys_ref_map_ops, NULL);
+	if (g_phys_ref_map == NULL) {
+		DEBUG_PRINT("phys_ref map allocation failed.\n");
+		return -ENOMEM;
+	}
+
 	g_vtophys_map = spdk_mem_map_alloc(SPDK_VTOPHYS_ERROR, &vtophys_map_ops, NULL);
 	if (g_vtophys_map == NULL) {
 		DEBUG_PRINT("vtophys map allocation failed\n");
--- a/lib/env_dpdk/pci.c
+++ b/lib/env_dpdk/pci.c
@ -62,15 +62,7 @@ spdk_map_bar_rte(struct spdk_pci_device *device, uint32_t bar,
 	struct rte_pci_device *dev = device->dev_handle;

 	*mapped_addr = dev->mem_resource[bar].addr;
-	if (*mapped_addr == NULL) {
-		return -1;
-	}
-
 	*phys_addr = (uint64_t)dev->mem_resource[bar].phys_addr;
-	if (*phys_addr == 0) {
-		return -1;
-	}
-
 	*size = (uint64_t)dev->mem_resource[bar].len;

 	return 0;
@ -141,8 +133,8 @@ spdk_detach_rte(struct spdk_pci_device *dev)
 	dev->internal.pending_removal = true;
 	if (spdk_process_is_primary() && !pthread_equal(g_dpdk_tid, pthread_self())) {
 		rte_eal_alarm_set(1, spdk_detach_rte_cb, rte_dev);
-		/* wait up to 20ms for the cb to start executing */
-		for (i = 20; i > 0; i--) {
+		/* wait up to 2s for the cb to finish executing */
+		for (i = 2000; i > 0; i--) {

 			spdk_delay_us(1000);
 			pthread_mutex_lock(&g_pci_mutex);
@ -157,7 +149,7 @@ spdk_detach_rte(struct spdk_pci_device *dev)
 		/* besides checking the removed flag, we also need to wait
 		 * for the dpdk detach function to unwind, as it's doing some
 		 * operations even after calling our detach callback. Simply
-		 * cancell the alarm - if it started executing already, this
+		 * cancel the alarm - if it started executing already, this
 		 * call will block and wait for it to finish.
 		 */
 		rte_eal_alarm_cancel(spdk_detach_rte_cb, rte_dev);
@ -171,6 +163,8 @@ spdk_detach_rte(struct spdk_pci_device *dev)
 		if (!removed) {
 			fprintf(stderr, "Timeout waiting for DPDK to remove PCI device %s.\n",
 				rte_dev->name);
+			/* If we reach this state, then the device couldn't be removed and most likely
+			   a subsequent hot add of a device in the same BDF will fail */
 		}
 	} else {
 		spdk_detach_rte_cb(rte_dev);
--- a/lib/iscsi/iscsi.c
+++ b/lib/iscsi/iscsi.c
@ -1138,6 +1138,11 @@ iscsi_conn_login_pdu_success_complete(void *arg)
 {
 	struct spdk_iscsi_conn *conn = arg;

+	if (conn->state >= ISCSI_CONN_STATE_EXITING) {
+		/* Connection is being exited before this callback is executed. */
+		SPDK_DEBUGLOG(SPDK_LOG_ISCSI, "Connection is already exited.\n");
+		return;
+	}
 	if (conn->full_feature) {
 		if (iscsi_conn_params_update(conn) != 0) {
 			return;
--- a/lib/nvme/nvme.c
+++ b/lib/nvme/nvme.c
@ -34,6 +34,7 @@
 #include "spdk/nvmf_spec.h"
 #include "nvme_internal.h"
 #include "nvme_io_msg.h"
+#include "nvme_uevent.h"

 #define SPDK_NVME_DRIVER_NAME "spdk_nvme_driver"

@ -350,10 +351,19 @@ nvme_robust_mutex_init_shared(pthread_mutex_t *mtx)
 int
 nvme_driver_init(void)
 {
+	static pthread_mutex_t g_init_mutex = PTHREAD_MUTEX_INITIALIZER;
 	int ret = 0;
 	/* Any socket ID */
 	int socket_id = -1;

+	/* Use a special process-private mutex to ensure the global
+	 * nvme driver object (g_spdk_nvme_driver) gets initialized by
+	 * only one thread.  Once that object is established and its
+	 * mutex is initialized, we can unlock this mutex and use that
+	 * one instead.
+	 */
+	pthread_mutex_lock(&g_init_mutex);
+
 	/* Each process needs its own pid. */
 	g_spdk_nvme_pid = getpid();

@ -366,6 +376,7 @@ nvme_driver_init(void)
 	if (spdk_process_is_primary()) {
 		/* The unique named memzone already reserved. */
 		if (g_spdk_nvme_driver != NULL) {
+			pthread_mutex_unlock(&g_init_mutex);
 			return 0;
 		} else {
 			g_spdk_nvme_driver = spdk_memzone_reserve(SPDK_NVME_DRIVER_NAME,
@ -375,7 +386,7 @@ nvme_driver_init(void)

 		if (g_spdk_nvme_driver == NULL) {
 			SPDK_ERRLOG("primary process failed to reserve memory\n");
-
+			pthread_mutex_unlock(&g_init_mutex);
 			return -1;
 		}
 	} else {
@ -393,15 +404,16 @@ nvme_driver_init(void)
 			}
 			if (g_spdk_nvme_driver->initialized == false) {
 				SPDK_ERRLOG("timeout waiting for primary process to init\n");
-
+				pthread_mutex_unlock(&g_init_mutex);
 				return -1;
 			}
 		} else {
 			SPDK_ERRLOG("primary process is not started yet\n");
-
+			pthread_mutex_unlock(&g_init_mutex);
 			return -1;
 		}

+		pthread_mutex_unlock(&g_init_mutex);
 		return 0;
 	}

@ -415,12 +427,21 @@ nvme_driver_init(void)
 	if (ret != 0) {
 		SPDK_ERRLOG("failed to initialize mutex\n");
 		spdk_memzone_free(SPDK_NVME_DRIVER_NAME);
+		pthread_mutex_unlock(&g_init_mutex);
 		return ret;
 	}

+	/* The lock in the shared g_spdk_nvme_driver object is now ready to
+	 * be used - so we can unlock the g_init_mutex here.
+	 */
+	pthread_mutex_unlock(&g_init_mutex);
 	nvme_robust_mutex_lock(&g_spdk_nvme_driver->lock);

 	g_spdk_nvme_driver->initialized = false;
+	g_spdk_nvme_driver->hotplug_fd = spdk_uevent_connect();
+	if (g_spdk_nvme_driver->hotplug_fd < 0) {
+		SPDK_DEBUGLOG(SPDK_LOG_NVME, "Failed to open uevent netlink socket\n");
+	}

 	TAILQ_INIT(&g_spdk_nvme_driver->shared_attached_ctrlrs);

@ -594,6 +615,7 @@ spdk_nvme_probe_internal(struct spdk_nvme_probe_ctx *probe_ctx,
 	int rc;
 	struct spdk_nvme_ctrlr *ctrlr, *ctrlr_tmp;

+	spdk_nvme_trid_populate_transport(&probe_ctx->trid, probe_ctx->trid.trtype);
 	if (!spdk_nvme_transport_available_by_name(probe_ctx->trid.trstring)) {
 		SPDK_ERRLOG("NVMe trtype %u not available\n", probe_ctx->trid.trtype);
 		return -1;
@ -741,7 +763,7 @@ void
 spdk_nvme_trid_populate_transport(struct spdk_nvme_transport_id *trid,
 				  enum spdk_nvme_transport_type trtype)
 {
-	const char *trstring;
+	const char *trstring = "";

 	trid->trtype = trtype;
 	switch (trtype) {
@ -760,7 +782,8 @@ spdk_nvme_trid_populate_transport(struct spdk_nvme_transport_id *trid,
 	case SPDK_NVME_TRANSPORT_CUSTOM:
 	default:
 		SPDK_ERRLOG("don't use this for custom transports\n");
-		break;
+		assert(0);
+		return;
 	}
 	snprintf(trid->trstring, SPDK_NVMF_TRSTRING_MAX_LEN, "%s", trstring);
 }
--- a/lib/nvme/nvme_ctrlr.c
+++ b/lib/nvme/nvme_ctrlr.c
@ -2618,6 +2618,7 @@ nvme_ctrlr_destruct(struct spdk_nvme_ctrlr *ctrlr)

 	SPDK_DEBUGLOG(SPDK_LOG_NVME, "Prepare to destruct SSD: %s\n", ctrlr->trid.traddr);

+	spdk_nvme_qpair_process_completions(ctrlr->adminq, 0);
 	nvme_transport_admin_qpair_abort_aers(ctrlr->adminq);

 	TAILQ_FOREACH_SAFE(qpair, &ctrlr->active_io_qpairs, tailq, tmp) {
--- a/lib/nvme/nvme_cuse.c
+++ b/lib/nvme/nvme_cuse.c
@ -60,6 +60,7 @@ struct cuse_device {
 	TAILQ_ENTRY(cuse_device)	tailq;
 };

+static pthread_mutex_t g_cuse_mtx = PTHREAD_MUTEX_INITIALIZER;
 static TAILQ_HEAD(, cuse_device) g_ctrlr_ctx_head = TAILQ_HEAD_INITIALIZER(g_ctrlr_ctx_head);
 static struct spdk_bit_array *g_ctrlr_started;

@ -700,13 +701,14 @@ cuse_nvme_ns_start(struct cuse_device *ctrlr_device, uint32_t nsid, const char *
 	if (rv < 0) {
 		SPDK_ERRLOG("Device name too long.\n");
 		free(ns_device);
-		return -1;
+		return -ENAMETOOLONG;
 	}

-	if (pthread_create(&ns_device->tid, NULL, cuse_thread, ns_device)) {
+	rv = pthread_create(&ns_device->tid, NULL, cuse_thread, ns_device);
+	if (rv != 0) {
 		SPDK_ERRLOG("pthread_create failed\n");
 		free(ns_device);
-		return -1;
+		return -rv;
 	}

 	TAILQ_INSERT_TAIL(&ctrlr_device->ns_devices, ns_device, tailq);
@ -811,7 +813,7 @@ nvme_cuse_start(struct spdk_nvme_ctrlr *ctrlr)
 		g_ctrlr_started = spdk_bit_array_create(128);
 		if (g_ctrlr_started == NULL) {
 			SPDK_ERRLOG("Cannot create bit array\n");
-			return -1;
+			return -ENOMEM;
 		}
 	}

@ -843,9 +845,10 @@ nvme_cuse_start(struct spdk_nvme_ctrlr *ctrlr)
 	snprintf(ctrlr_device->dev_name, sizeof(ctrlr_device->dev_name), "spdk/nvme%d",
 		 ctrlr_device->index);

-	if (pthread_create(&ctrlr_device->tid, NULL, cuse_thread, ctrlr_device)) {
+	rv = pthread_create(&ctrlr_device->tid, NULL, cuse_thread, ctrlr_device);
+	if (rv != 0) {
 		SPDK_ERRLOG("pthread_create failed\n");
-		rv = -1;
+		rv = -rv;
 		goto err3;
 	}
 	TAILQ_INSERT_TAIL(&g_ctrlr_ctx_head, ctrlr_device, tailq);
@ -857,10 +860,10 @@ nvme_cuse_start(struct spdk_nvme_ctrlr *ctrlr)
 			continue;
 		}

-		if (cuse_nvme_ns_start(ctrlr_device, nsid, ctrlr_device->dev_name) < 0) {
+		rv = cuse_nvme_ns_start(ctrlr_device, nsid, ctrlr_device->dev_name);
+		if (rv < 0) {
 			SPDK_ERRLOG("Cannot start CUSE namespace device.");
 			cuse_nvme_ctrlr_stop(ctrlr_device);
-			rv = -1;
 			goto err3;
 		}
 	}
@ -877,10 +880,10 @@ err2:
 	return rv;
 }

-static void
-nvme_cuse_stop(struct spdk_nvme_ctrlr *ctrlr)
+static struct cuse_device *
+nvme_cuse_get_cuse_ctrlr_device(struct spdk_nvme_ctrlr *ctrlr)
 {
-	struct cuse_device *ctrlr_device;
+	struct cuse_device *ctrlr_device = NULL;

 	TAILQ_FOREACH(ctrlr_device, &g_ctrlr_ctx_head, tailq) {
 		if (ctrlr_device->ctrlr == ctrlr) {
@ -888,12 +891,46 @@ nvme_cuse_stop(struct spdk_nvme_ctrlr *ctrlr)
 		}
 	}

+	return ctrlr_device;
+}
+
+static struct cuse_device *
+nvme_cuse_get_cuse_ns_device(struct spdk_nvme_ctrlr *ctrlr, uint32_t nsid)
+{
+	struct cuse_device *ctrlr_device = NULL;
+	struct cuse_device *ns_device = NULL;
+
+	ctrlr_device = nvme_cuse_get_cuse_ctrlr_device(ctrlr);
+	if (!ctrlr_device) {
+		return NULL;
+	}
+
+	TAILQ_FOREACH(ns_device, &ctrlr_device->ns_devices, tailq) {
+		if (ns_device->nsid == nsid) {
+			break;
+		}
+	}
+
+	return ns_device;
+}
+
+static void
+nvme_cuse_stop(struct spdk_nvme_ctrlr *ctrlr)
+{
+	struct cuse_device *ctrlr_device;
+
+	pthread_mutex_lock(&g_cuse_mtx);
+
+	ctrlr_device = nvme_cuse_get_cuse_ctrlr_device(ctrlr);
 	if (!ctrlr_device) {
 		SPDK_ERRLOG("Cannot find associated CUSE device\n");
+		pthread_mutex_unlock(&g_cuse_mtx);
 		return;
 	}

 	cuse_nvme_ctrlr_stop(ctrlr_device);
+
+	pthread_mutex_unlock(&g_cuse_mtx);
 }

 static struct nvme_io_msg_producer cuse_nvme_io_msg_producer = {
@ -911,18 +948,35 @@ spdk_nvme_cuse_register(struct spdk_nvme_ctrlr *ctrlr)
 		return rc;
 	}

+	pthread_mutex_lock(&g_cuse_mtx);
+
 	rc = nvme_cuse_start(ctrlr);
 	if (rc) {
 		nvme_io_msg_ctrlr_unregister(ctrlr, &cuse_nvme_io_msg_producer);
 	}

+	pthread_mutex_unlock(&g_cuse_mtx);
+
 	return rc;
 }

 void
 spdk_nvme_cuse_unregister(struct spdk_nvme_ctrlr *ctrlr)
 {
-	nvme_cuse_stop(ctrlr);
+	struct cuse_device *ctrlr_device;
+
+	pthread_mutex_lock(&g_cuse_mtx);
+
+	ctrlr_device = nvme_cuse_get_cuse_ctrlr_device(ctrlr);
+	if (!ctrlr_device) {
+		SPDK_ERRLOG("Cannot find associated CUSE device\n");
+		pthread_mutex_unlock(&g_cuse_mtx);
+		return;
+	}
+
+	cuse_nvme_ctrlr_stop(ctrlr_device);
+
+	pthread_mutex_unlock(&g_cuse_mtx);

 	nvme_io_msg_ctrlr_unregister(ctrlr, &cuse_nvme_io_msg_producer);
 }
@ -932,20 +986,15 @@ spdk_nvme_cuse_get_ctrlr_name(struct spdk_nvme_ctrlr *ctrlr)
 {
 	struct cuse_device *ctrlr_device;

-	if (TAILQ_EMPTY(&g_ctrlr_ctx_head)) {
-		return NULL;
-	}
-
-	TAILQ_FOREACH(ctrlr_device, &g_ctrlr_ctx_head, tailq) {
-		if (ctrlr_device->ctrlr == ctrlr) {
-			break;
-		}
-	}
+	pthread_mutex_lock(&g_cuse_mtx);

+	ctrlr_device = nvme_cuse_get_cuse_ctrlr_device(ctrlr);
 	if (!ctrlr_device) {
+		pthread_mutex_unlock(&g_cuse_mtx);
 		return NULL;
 	}

+	pthread_mutex_unlock(&g_cuse_mtx);
 	return ctrlr_device->dev_name;
 }

@ -953,31 +1002,15 @@ char *
 spdk_nvme_cuse_get_ns_name(struct spdk_nvme_ctrlr *ctrlr, uint32_t nsid)
 {
 	struct cuse_device *ns_device;
-	struct cuse_device *ctrlr_device;

-	if (TAILQ_EMPTY(&g_ctrlr_ctx_head)) {
-		return NULL;
-	}
-
-	TAILQ_FOREACH(ctrlr_device, &g_ctrlr_ctx_head, tailq) {
-		if (ctrlr_device->ctrlr == ctrlr) {
-			break;
-		}
-	}
-
-	if (!ctrlr_device) {
-		return NULL;
-	}
-
-	TAILQ_FOREACH(ns_device, &ctrlr_device->ns_devices, tailq) {
-		if (ns_device->nsid == nsid) {
-			break;
-		}
-	}
+	pthread_mutex_lock(&g_cuse_mtx);

+	ns_device = nvme_cuse_get_cuse_ns_device(ctrlr, nsid);
 	if (!ns_device) {
+		pthread_mutex_unlock(&g_cuse_mtx);
 		return NULL;
 	}

+	pthread_mutex_unlock(&g_cuse_mtx);
 	return ns_device->dev_name;
 }
--- a/lib/nvme/nvme_internal.h
+++ b/lib/nvme/nvme_internal.h
@ -767,6 +767,9 @@ struct nvme_driver {

 	bool				initialized;
 	struct spdk_uuid		default_extended_host_id;
+
+	/** netlink socket fd for hotplug messages */
+	int				hotplug_fd;
 };

 extern struct nvme_driver *g_spdk_nvme_driver;
--- a/lib/nvme/nvme_io_msg.c
+++ b/lib/nvme/nvme_io_msg.c
@ -64,6 +64,7 @@ nvme_io_msg_send(struct spdk_nvme_ctrlr *ctrlr, uint32_t nsid, spdk_nvme_io_msg_
 	rc = spdk_ring_enqueue(ctrlr->external_io_msgs, (void **)&io, 1, NULL);
 	if (rc != 1) {
 		assert(false);
+		free(io);
 		pthread_mutex_unlock(&ctrlr->external_io_msgs_lock);
 		return -ENOMEM;
 	}
@ -106,6 +107,20 @@ spdk_nvme_io_msg_process(struct spdk_nvme_ctrlr *ctrlr)
 	return count;
 }

+static bool
+nvme_io_msg_is_producer_registered(struct spdk_nvme_ctrlr *ctrlr,
+				   struct nvme_io_msg_producer *io_msg_producer)
+{
+	struct nvme_io_msg_producer *tmp;
+
+	STAILQ_FOREACH(tmp, &ctrlr->io_producers, link) {
+		if (tmp == io_msg_producer) {
+			return true;
+		}
+	}
+	return false;
+}
+
 int
 nvme_io_msg_ctrlr_register(struct spdk_nvme_ctrlr *ctrlr,
 			   struct nvme_io_msg_producer *io_msg_producer)
@ -115,6 +130,10 @@ nvme_io_msg_ctrlr_register(struct spdk_nvme_ctrlr *ctrlr,
 		return -EINVAL;
 	}

+	if (nvme_io_msg_is_producer_registered(ctrlr, io_msg_producer)) {
+		return -EEXIST;
+	}
+
 	if (!STAILQ_EMPTY(&ctrlr->io_producers) || ctrlr->is_resetting) {
 		/* There are registered producers - IO messaging already started */
 		STAILQ_INSERT_TAIL(&ctrlr->io_producers, io_msg_producer, link);
@ -136,7 +155,8 @@ nvme_io_msg_ctrlr_register(struct spdk_nvme_ctrlr *ctrlr,
 	if (ctrlr->external_io_msgs_qpair == NULL) {
 		SPDK_ERRLOG("spdk_nvme_ctrlr_alloc_io_qpair() failed\n");
 		spdk_ring_free(ctrlr->external_io_msgs);
-		return -1;
+		ctrlr->external_io_msgs = NULL;
+		return -ENOMEM;
 	}

 	STAILQ_INSERT_TAIL(&ctrlr->io_producers, io_msg_producer, link);
@ -157,6 +177,7 @@ nvme_io_msg_ctrlr_detach(struct spdk_nvme_ctrlr *ctrlr)

 	if (ctrlr->external_io_msgs) {
 		spdk_ring_free(ctrlr->external_io_msgs);
+		ctrlr->external_io_msgs = NULL;
 	}

 	if (ctrlr->external_io_msgs_qpair) {
@ -173,6 +194,10 @@ nvme_io_msg_ctrlr_unregister(struct spdk_nvme_ctrlr *ctrlr,
 {
 	assert(io_msg_producer != NULL);

+	if (!nvme_io_msg_is_producer_registered(ctrlr, io_msg_producer)) {
+		return;
+	}
+
 	STAILQ_REMOVE(&ctrlr->io_producers, io_msg_producer, nvme_io_msg_producer, link);
 	if (STAILQ_EMPTY(&ctrlr->io_producers)) {
 		nvme_io_msg_ctrlr_detach(ctrlr);
--- a/lib/nvme/nvme_pcie.c
+++ b/lib/nvme/nvme_pcie.c
@ -212,7 +212,6 @@ static int nvme_pcie_qpair_destroy(struct spdk_nvme_qpair *qpair);
 __thread struct nvme_pcie_ctrlr *g_thread_mmio_ctrlr = NULL;
 static uint16_t g_signal_lock;
 static bool g_sigset = false;
-static int g_hotplug_fd = -1;

 static void
 nvme_sigbus_fault_sighandler(int signum, siginfo_t *info, void *ctx)
@ -271,7 +270,11 @@ _nvme_pcie_hotplug_monitor(struct spdk_nvme_probe_ctx *probe_ctx)
 	union spdk_nvme_csts_register csts;
 	struct spdk_nvme_ctrlr_process *proc;

-	while (spdk_get_uevent(g_hotplug_fd, &event) > 0) {
+	if (g_spdk_nvme_driver->hotplug_fd < 0) {
+		return 0;
+	}
+
+	while (spdk_get_uevent(g_spdk_nvme_driver->hotplug_fd, &event) > 0) {
 		if (event.subsystem == SPDK_NVME_UEVENT_SUBSYSTEM_UIO ||
 		    event.subsystem == SPDK_NVME_UEVENT_SUBSYSTEM_VFIO) {
 			if (event.action == SPDK_NVME_UEVENT_ADD) {
@ -768,15 +771,8 @@ nvme_pcie_ctrlr_scan(struct spdk_nvme_probe_ctx *probe_ctx,

 	/* Only the primary process can monitor hotplug. */
 	if (spdk_process_is_primary()) {
-		if (g_hotplug_fd < 0) {
-			g_hotplug_fd = spdk_uevent_connect();
-			if (g_hotplug_fd < 0) {
-				SPDK_DEBUGLOG(SPDK_LOG_NVME, "Failed to open uevent netlink socket\n");
-			}
-		} else {
 		_nvme_pcie_hotplug_monitor(probe_ctx);
 	}
-	}

 	if (enum_ctx.has_pci_addr == false) {
 		return spdk_pci_enumerate(spdk_pci_nvme_get_driver(),
@ -828,7 +824,6 @@ struct spdk_nvme_ctrlr *nvme_pcie_ctrlr_construct(const struct spdk_nvme_transpo

 	pctrlr->is_remapped = false;
 	pctrlr->ctrlr.is_removed = false;
-	spdk_nvme_trid_populate_transport(&pctrlr->ctrlr.trid, SPDK_NVME_TRANSPORT_PCIE);
 	pctrlr->devhandle = devhandle;
 	pctrlr->ctrlr.opts = *opts;
 	memcpy(&pctrlr->ctrlr.trid, trid, sizeof(pctrlr->ctrlr.trid));
@ -997,7 +992,8 @@ nvme_pcie_qpair_construct(struct spdk_nvme_qpair *qpair,
 	volatile uint32_t	*doorbell_base;
 	uint64_t		offset;
 	uint16_t		num_trackers;
-	size_t			page_align = VALUE_2MB;
+	size_t			page_align = sysconf(_SC_PAGESIZE);
+	size_t			queue_align, queue_len;
 	uint32_t                flags = SPDK_MALLOC_DMA;
 	uint64_t		sq_paddr = 0;
 	uint64_t		cq_paddr = 0;
@ -1035,7 +1031,7 @@ nvme_pcie_qpair_construct(struct spdk_nvme_qpair *qpair,
 	/* cmd and cpl rings must be aligned on page size boundaries. */
 	if (ctrlr->opts.use_cmb_sqs) {
 		if (nvme_pcie_ctrlr_alloc_cmb(ctrlr, pqpair->num_entries * sizeof(struct spdk_nvme_cmd),
-					      sysconf(_SC_PAGESIZE), &offset) == 0) {
+					      page_align, &offset) == 0) {
 			pqpair->cmd = pctrlr->cmb_bar_virt_addr + offset;
 			pqpair->cmd_bus_addr = pctrlr->cmb_bar_phys_addr + offset;
 			pqpair->sq_in_cmb = true;
@ -1049,9 +1045,9 @@ nvme_pcie_qpair_construct(struct spdk_nvme_qpair *qpair,
 			/* To ensure physical address contiguity we make each ring occupy
 			 * a single hugepage only. See MAX_IO_QUEUE_ENTRIES.
 			 */
-			pqpair->cmd = spdk_zmalloc(pqpair->num_entries * sizeof(struct spdk_nvme_cmd),
-						   page_align, NULL,
-						   SPDK_ENV_SOCKET_ID_ANY, flags);
+			queue_len = pqpair->num_entries * sizeof(struct spdk_nvme_cmd);
+			queue_align = spdk_max(spdk_align32pow2(queue_len), page_align);
+			pqpair->cmd = spdk_zmalloc(queue_len, queue_align, NULL, SPDK_ENV_SOCKET_ID_ANY, flags);
 			if (pqpair->cmd == NULL) {
 				SPDK_ERRLOG("alloc qpair_cmd failed\n");
 				return -ENOMEM;
@ -1072,9 +1068,9 @@ nvme_pcie_qpair_construct(struct spdk_nvme_qpair *qpair,
 	if (pqpair->cq_vaddr) {
 		pqpair->cpl = pqpair->cq_vaddr;
 	} else {
-		pqpair->cpl = spdk_zmalloc(pqpair->num_entries * sizeof(struct spdk_nvme_cpl),
-					   page_align, NULL,
-					   SPDK_ENV_SOCKET_ID_ANY, flags);
+		queue_len = pqpair->num_entries * sizeof(struct spdk_nvme_cpl);
+		queue_align = spdk_max(spdk_align32pow2(queue_len), page_align);
+		pqpair->cpl = spdk_zmalloc(queue_len, queue_align, NULL, SPDK_ENV_SOCKET_ID_ANY, flags);
 		if (pqpair->cpl == NULL) {
 			SPDK_ERRLOG("alloc qpair_cpl failed\n");
 			return -ENOMEM;
--- a/lib/nvme/nvme_qpair.c
+++ b/lib/nvme/nvme_qpair.c
@ -207,7 +207,7 @@ static const struct nvme_string generic_status[] = {
 static const struct nvme_string command_specific_status[] = {
 	{ SPDK_NVME_SC_COMPLETION_QUEUE_INVALID, "INVALID COMPLETION QUEUE" },
 	{ SPDK_NVME_SC_INVALID_QUEUE_IDENTIFIER, "INVALID QUEUE IDENTIFIER" },
-	{ SPDK_NVME_SC_MAXIMUM_QUEUE_SIZE_EXCEEDED, "MAX QUEUE SIZE EXCEEDED" },
+	{ SPDK_NVME_SC_INVALID_QUEUE_SIZE, "INVALID QUEUE SIZE" },
 	{ SPDK_NVME_SC_ABORT_COMMAND_LIMIT_EXCEEDED, "ABORT CMD LIMIT EXCEEDED" },
 	{ SPDK_NVME_SC_ASYNC_EVENT_REQUEST_LIMIT_EXCEEDED, "ASYNC LIMIT EXCEEDED" },
 	{ SPDK_NVME_SC_INVALID_FIRMWARE_SLOT, "INVALID FIRMWARE SLOT" },
@ -575,6 +575,7 @@ nvme_qpair_deinit(struct spdk_nvme_qpair *qpair)
 {
 	struct nvme_error_cmd *cmd, *entry;

+	nvme_qpair_abort_queued_reqs(qpair, 1);
 	nvme_qpair_complete_error_reqs(qpair);

 	TAILQ_FOREACH_SAFE(cmd, &qpair->err_cmd_head, link, entry) {
--- a/lib/nvme/nvme_rdma.c
+++ b/lib/nvme/nvme_rdma.c
@ -127,6 +127,12 @@ struct spdk_nvme_recv_wr_list {
 	struct ibv_recv_wr	*last;
 };

+/* Memory regions */
+union nvme_rdma_mr {
+	struct ibv_mr	*mr;
+	uint64_t	key;
+};
+
 /* NVMe RDMA qpair extensions for spdk_nvme_qpair */
 struct nvme_rdma_qpair {
 	struct spdk_nvme_qpair			qpair;
@ -143,18 +149,19 @@ struct nvme_rdma_qpair {

 	uint16_t				num_entries;

+	bool					delay_cmd_submit;
+
 	/* Parallel arrays of response buffers + response SGLs of size num_entries */
 	struct ibv_sge				*rsp_sgls;
 	struct spdk_nvme_cpl			*rsps;

 	struct ibv_recv_wr			*rsp_recv_wrs;

-	bool					delay_cmd_submit;
 	struct spdk_nvme_send_wr_list		sends_to_post;
 	struct spdk_nvme_recv_wr_list		recvs_to_post;

 	/* Memory region describing all rsps for this qpair */
-	struct ibv_mr				*rsp_mr;
+	union nvme_rdma_mr			rsp_mr;

 	/*
 	 * Array of num_entries NVMe commands registered as RDMA message buffers.
@ -163,7 +170,7 @@ struct nvme_rdma_qpair {
 	struct spdk_nvmf_cmd			*cmds;

 	/* Memory region describing all cmds for this qpair */
-	struct ibv_mr				*cmd_mr;
+	union nvme_rdma_mr			cmd_mr;

 	struct spdk_nvme_rdma_mr_map		*mr_map;

@ -174,8 +181,19 @@ struct nvme_rdma_qpair {
 	struct rdma_cm_event			*evt;
 };

+enum NVME_RDMA_COMPLETION_FLAGS {
+	NVME_RDMA_SEND_COMPLETED = 1u << 0,
+	NVME_RDMA_RECV_COMPLETED = 1u << 1,
+};
+
 struct spdk_nvme_rdma_req {
-	int					id;
+	uint16_t				id;
+	uint16_t				completion_flags: 2;
+	uint16_t				reserved: 14;
+	/* if completion of RDMA_RECV received before RDMA_SEND, we will complete nvme request
+	 * during processing of RDMA_SEND. To complete the request we must know the index
+	 * of nvme_cpl received in RDMA_RECV, so store it in this field */
+	uint16_t				rsp_idx;

 	struct ibv_send_wr			send_wr;

@ -184,8 +202,6 @@ struct spdk_nvme_rdma_req {
 	struct ibv_sge				send_sgl[NVME_RDMA_DEFAULT_TX_SGE];

 	TAILQ_ENTRY(spdk_nvme_rdma_req)		link;
-
-	bool					request_ready_to_put;
 };

 static const char *rdma_cm_event_str[] = {
@ -210,6 +226,26 @@ static const char *rdma_cm_event_str[] = {
 static LIST_HEAD(, spdk_nvme_rdma_mr_map) g_rdma_mr_maps = LIST_HEAD_INITIALIZER(&g_rdma_mr_maps);
 static pthread_mutex_t g_rdma_mr_maps_mutex = PTHREAD_MUTEX_INITIALIZER;

+static inline void *
+nvme_rdma_calloc(size_t nmemb, size_t size)
+{
+	if (!g_nvme_hooks.get_rkey) {
+		return calloc(nmemb, size);
+	} else {
+		return spdk_zmalloc(nmemb * size, 0, NULL, SPDK_ENV_SOCKET_ID_ANY, SPDK_MALLOC_DMA);
+	}
+}
+
+static inline void
+nvme_rdma_free(void *buf)
+{
+	if (!g_nvme_hooks.get_rkey) {
+		free(buf);
+	} else {
+		spdk_free(buf);
+	}
+}
+
 int nvme_rdma_ctrlr_delete_io_qpair(struct spdk_nvme_ctrlr *ctrlr,
 				    struct spdk_nvme_qpair *qpair);

@ -244,7 +280,8 @@ nvme_rdma_req_get(struct nvme_rdma_qpair *rqpair)
 static void
 nvme_rdma_req_put(struct nvme_rdma_qpair *rqpair, struct spdk_nvme_rdma_req *rdma_req)
 {
-	rdma_req->request_ready_to_put = false;
+	rdma_req->completion_flags = 0;
+	rdma_req->req = NULL;
 	TAILQ_REMOVE(&rqpair->outstanding_reqs, rdma_req, link);
 	TAILQ_INSERT_HEAD(&rqpair->free_reqs, rdma_req, link);
 }
@ -614,23 +651,66 @@ nvme_rdma_post_recv(struct nvme_rdma_qpair *rqpair, uint16_t rsp_idx)
 	return nvme_rdma_qpair_queue_recv_wr(rqpair, wr);
 }

+static int
+nvme_rdma_reg_mr(struct rdma_cm_id *cm_id, union nvme_rdma_mr *mr, void *mem, size_t length)
+{
+	if (!g_nvme_hooks.get_rkey) {
+		mr->mr = rdma_reg_msgs(cm_id, mem, length);
+		if (mr->mr == NULL) {
+			SPDK_ERRLOG("Unable to register mr: %s (%d)\n",
+				    spdk_strerror(errno), errno);
+			return -1;
+		}
+	} else {
+		mr->key = g_nvme_hooks.get_rkey(cm_id->pd, mem, length);
+	}
+
+	return 0;
+}
+
+static void
+nvme_rdma_dereg_mr(union nvme_rdma_mr *mr)
+{
+	if (!g_nvme_hooks.get_rkey) {
+		if (mr->mr && rdma_dereg_mr(mr->mr)) {
+			SPDK_ERRLOG("Unable to de-register mr\n");
+		}
+	} else {
+		if (mr->key) {
+			g_nvme_hooks.put_rkey(mr->key);
+		}
+	}
+	memset(mr, 0, sizeof(*mr));
+}
+
+static uint32_t
+nvme_rdma_mr_get_lkey(union nvme_rdma_mr *mr)
+{
+	uint32_t lkey;
+
+	if (!g_nvme_hooks.get_rkey) {
+		lkey = mr->mr->lkey;
+	} else {
+		lkey = *((uint64_t *) mr->key);
+	}
+
+	return lkey;
+}
+
 static void
 nvme_rdma_unregister_rsps(struct nvme_rdma_qpair *rqpair)
 {
-	if (rqpair->rsp_mr && rdma_dereg_mr(rqpair->rsp_mr)) {
-		SPDK_ERRLOG("Unable to de-register rsp_mr\n");
-	}
-	rqpair->rsp_mr = NULL;
+	nvme_rdma_dereg_mr(&rqpair->rsp_mr);
 }

 static void
 nvme_rdma_free_rsps(struct nvme_rdma_qpair *rqpair)
 {
-	free(rqpair->rsps);
+	nvme_rdma_free(rqpair->rsps);
 	rqpair->rsps = NULL;
-	free(rqpair->rsp_sgls);
+	nvme_rdma_free(rqpair->rsp_sgls);
 	rqpair->rsp_sgls = NULL;
-	free(rqpair->rsp_recv_wrs);
+	nvme_rdma_free(rqpair->rsp_recv_wrs);
 	rqpair->rsp_recv_wrs = NULL;
 }

@ -640,20 +720,19 @@ nvme_rdma_alloc_rsps(struct nvme_rdma_qpair *rqpair)
 	rqpair->rsps = NULL;
 	rqpair->rsp_recv_wrs = NULL;

-	rqpair->rsp_sgls = calloc(rqpair->num_entries, sizeof(*rqpair->rsp_sgls));
+	rqpair->rsp_sgls = nvme_rdma_calloc(rqpair->num_entries, sizeof(*rqpair->rsp_sgls));
 	if (!rqpair->rsp_sgls) {
 		SPDK_ERRLOG("Failed to allocate rsp_sgls\n");
 		goto fail;
 	}

-	rqpair->rsp_recv_wrs = calloc(rqpair->num_entries,
-				      sizeof(*rqpair->rsp_recv_wrs));
+	rqpair->rsp_recv_wrs = nvme_rdma_calloc(rqpair->num_entries, sizeof(*rqpair->rsp_recv_wrs));
 	if (!rqpair->rsp_recv_wrs) {
 		SPDK_ERRLOG("Failed to allocate rsp_recv_wrs\n");
 		goto fail;
 	}

-	rqpair->rsps = calloc(rqpair->num_entries, sizeof(*rqpair->rsps));
+	rqpair->rsps = nvme_rdma_calloc(rqpair->num_entries, sizeof(*rqpair->rsps));
 	if (!rqpair->rsps) {
 		SPDK_ERRLOG("can not allocate rdma rsps\n");
 		goto fail;
@ -668,22 +747,25 @@ fail:
 static int
 nvme_rdma_register_rsps(struct nvme_rdma_qpair *rqpair)
 {
-	int i, rc;
+	uint16_t i;
+	int rc;
+	uint32_t lkey;

-	rqpair->rsp_mr = rdma_reg_msgs(rqpair->cm_id, rqpair->rsps,
-				       rqpair->num_entries * sizeof(*rqpair->rsps));
-	if (rqpair->rsp_mr == NULL) {
-		rc = -errno;
-		SPDK_ERRLOG("Unable to register rsp_mr: %s (%d)\n", spdk_strerror(errno), errno);
+	rc = nvme_rdma_reg_mr(rqpair->cm_id, &rqpair->rsp_mr,
+			      rqpair->rsps, rqpair->num_entries * sizeof(*rqpair->rsps));
+
+	if (rc < 0) {
 		goto fail;
 	}

+	lkey = nvme_rdma_mr_get_lkey(&rqpair->rsp_mr);
+
 	for (i = 0; i < rqpair->num_entries; i++) {
 		struct ibv_sge *rsp_sgl = &rqpair->rsp_sgls[i];

 		rsp_sgl->addr = (uint64_t)&rqpair->rsps[i];
 		rsp_sgl->length = sizeof(rqpair->rsps[i]);
-		rsp_sgl->lkey = rqpair->rsp_mr->lkey;
+		rsp_sgl->lkey = lkey;

 		rqpair->rsp_recv_wrs[i].wr_id = i;
 		rqpair->rsp_recv_wrs[i].next = NULL;
@ -711,10 +793,7 @@ fail:
 static void
 nvme_rdma_unregister_reqs(struct nvme_rdma_qpair *rqpair)
 {
-	if (rqpair->cmd_mr && rdma_dereg_mr(rqpair->cmd_mr)) {
-		SPDK_ERRLOG("Unable to de-register cmd_mr\n");
-	}
-	rqpair->cmd_mr = NULL;
+	nvme_rdma_dereg_mr(&rqpair->cmd_mr);
 }

 static void
@ -724,25 +803,25 @@ nvme_rdma_free_reqs(struct nvme_rdma_qpair *rqpair)
 		return;
 	}

-	free(rqpair->cmds);
+	nvme_rdma_free(rqpair->cmds);
 	rqpair->cmds = NULL;

-	free(rqpair->rdma_reqs);
+	nvme_rdma_free(rqpair->rdma_reqs);
 	rqpair->rdma_reqs = NULL;
 }

 static int
 nvme_rdma_alloc_reqs(struct nvme_rdma_qpair *rqpair)
 {
-	int i;
+	uint16_t i;

-	rqpair->rdma_reqs = calloc(rqpair->num_entries, sizeof(struct spdk_nvme_rdma_req));
+	rqpair->rdma_reqs = nvme_rdma_calloc(rqpair->num_entries, sizeof(struct spdk_nvme_rdma_req));
 	if (rqpair->rdma_reqs == NULL) {
 		SPDK_ERRLOG("Failed to allocate rdma_reqs\n");
 		goto fail;
 	}

-	rqpair->cmds = calloc(rqpair->num_entries, sizeof(*rqpair->cmds));
+	rqpair->cmds = nvme_rdma_calloc(rqpair->num_entries, sizeof(*rqpair->cmds));
 	if (!rqpair->cmds) {
 		SPDK_ERRLOG("Failed to allocate RDMA cmds\n");
 		goto fail;
@ -785,16 +864,20 @@ static int
 nvme_rdma_register_reqs(struct nvme_rdma_qpair *rqpair)
 {
 	int i;
+	int rc;
+	uint32_t lkey;

-	rqpair->cmd_mr = rdma_reg_msgs(rqpair->cm_id, rqpair->cmds,
-				       rqpair->num_entries * sizeof(*rqpair->cmds));
-	if (!rqpair->cmd_mr) {
-		SPDK_ERRLOG("Unable to register cmd_mr\n");
+	rc = nvme_rdma_reg_mr(rqpair->cm_id, &rqpair->cmd_mr,
+			      rqpair->cmds, rqpair->num_entries * sizeof(*rqpair->cmds));
+
+	if (rc < 0) {
 		goto fail;
 	}

+	lkey = nvme_rdma_mr_get_lkey(&rqpair->cmd_mr);
+
 	for (i = 0; i < rqpair->num_entries; i++) {
-		rqpair->rdma_reqs[i].send_sgl[0].lkey = rqpair->cmd_mr->lkey;
+		rqpair->rdma_reqs[i].send_sgl[0].lkey = lkey;
 	}

 	return 0;
@ -804,35 +887,6 @@ fail:
 	return -ENOMEM;
 }

-static int
-nvme_rdma_recv(struct nvme_rdma_qpair *rqpair, uint64_t rsp_idx, int *reaped)
-{
-	struct spdk_nvme_rdma_req *rdma_req;
-	struct spdk_nvme_cpl *rsp;
-	struct nvme_request *req;
-
-	assert(rsp_idx < rqpair->num_entries);
-	rsp = &rqpair->rsps[rsp_idx];
-	rdma_req = &rqpair->rdma_reqs[rsp->cid];
-
-	req = rdma_req->req;
-	nvme_rdma_req_complete(req, rsp);
-
-	if (rdma_req->request_ready_to_put) {
-		(*reaped)++;
-		nvme_rdma_req_put(rqpair, rdma_req);
-	} else {
-		rdma_req->request_ready_to_put = true;
-	}
-
-	if (nvme_rdma_post_recv(rqpair, rsp_idx)) {
-		SPDK_ERRLOG("Unable to re-post rx descriptor\n");
-		return -1;
-	}
-
-	return 0;
-}
-
 static int
 nvme_rdma_resolve_addr(struct nvme_rdma_qpair *rqpair,
 		       struct sockaddr *src_addr,
@ -1023,9 +1077,9 @@ nvme_rdma_register_mem(struct nvme_rdma_qpair *rqpair)
 		}
 	}

-	mr_map = calloc(1, sizeof(*mr_map));
+	mr_map = nvme_rdma_calloc(1, sizeof(*mr_map));
 	if (mr_map == NULL) {
-		SPDK_ERRLOG("calloc() failed\n");
+		SPDK_ERRLOG("Failed to allocate mr_map\n");
 		pthread_mutex_unlock(&g_rdma_mr_maps_mutex);
 		return -1;
 	}
@ -1035,7 +1089,8 @@ nvme_rdma_register_mem(struct nvme_rdma_qpair *rqpair)
 	mr_map->map = spdk_mem_map_alloc((uint64_t)NULL, &nvme_rdma_map_ops, pd);
 	if (mr_map->map == NULL) {
 		SPDK_ERRLOG("spdk_mem_map_alloc() failed\n");
-		free(mr_map);
+		nvme_rdma_free(mr_map);
+
 		pthread_mutex_unlock(&g_rdma_mr_maps_mutex);
 		return -1;
 	}
@ -1067,7 +1122,7 @@ nvme_rdma_unregister_mem(struct nvme_rdma_qpair *rqpair)
 	if (mr_map->ref == 0) {
 		LIST_REMOVE(mr_map, link);
 		spdk_mem_map_free(&mr_map->map);
-		free(mr_map);
+		nvme_rdma_free(mr_map);
 	}

 	pthread_mutex_unlock(&g_rdma_mr_maps_mutex);
@ -1517,6 +1572,7 @@ nvme_rdma_req_init(struct nvme_rdma_qpair *rqpair, struct nvme_request *req,
 	struct spdk_nvme_ctrlr *ctrlr = rqpair->qpair.ctrlr;
 	int rc;

+	assert(rdma_req->req == NULL);
 	rdma_req->req = req;
 	req->cmd.cid = rdma_req->id;

@ -1569,7 +1625,7 @@ nvme_rdma_ctrlr_create_qpair(struct spdk_nvme_ctrlr *ctrlr,
 	struct spdk_nvme_qpair *qpair;
 	int rc, retry_count = 0;

-	rqpair = calloc(1, sizeof(struct nvme_rdma_qpair));
+	rqpair = nvme_rdma_calloc(1, sizeof(struct nvme_rdma_qpair));
 	if (!rqpair) {
 		SPDK_ERRLOG("failed to get create rqpair\n");
 		return NULL;
@ -1587,6 +1643,7 @@ nvme_rdma_ctrlr_create_qpair(struct spdk_nvme_ctrlr *ctrlr,
 	SPDK_DEBUGLOG(SPDK_LOG_NVME, "rc =%d\n", rc);
 	if (rc) {
 		SPDK_ERRLOG("Unable to allocate rqpair RDMA requests\n");
+		nvme_rdma_free(rqpair);
 		return NULL;
 	}
 	SPDK_DEBUGLOG(SPDK_LOG_NVME, "RDMA requests allocated\n");
@ -1595,6 +1652,8 @@ nvme_rdma_ctrlr_create_qpair(struct spdk_nvme_ctrlr *ctrlr,
 	SPDK_DEBUGLOG(SPDK_LOG_NVME, "rc =%d\n", rc);
 	if (rc < 0) {
 		SPDK_ERRLOG("Unable to allocate rqpair RDMA responses\n");
+		nvme_rdma_free_reqs(rqpair);
+		nvme_rdma_free(rqpair);
 		return NULL;
 	}
 	SPDK_DEBUGLOG(SPDK_LOG_NVME, "RDMA responses allocated\n");
@ -1686,7 +1745,7 @@ nvme_rdma_ctrlr_delete_io_qpair(struct spdk_nvme_ctrlr *ctrlr, struct spdk_nvme_

 	nvme_rdma_free_reqs(rqpair);
 	nvme_rdma_free_rsps(rqpair);
-	free(rqpair);
+	nvme_rdma_free(rqpair);

 	return 0;
 }
@ -1718,20 +1777,19 @@ struct spdk_nvme_ctrlr *nvme_rdma_ctrlr_construct(const struct spdk_nvme_transpo
 	struct ibv_device_attr dev_attr;
 	int i, flag, rc;

-	rctrlr = calloc(1, sizeof(struct nvme_rdma_ctrlr));
+	rctrlr = nvme_rdma_calloc(1, sizeof(struct nvme_rdma_ctrlr));
 	if (rctrlr == NULL) {
 		SPDK_ERRLOG("could not allocate ctrlr\n");
 		return NULL;
 	}

-	spdk_nvme_trid_populate_transport(&rctrlr->ctrlr.trid, SPDK_NVME_TRANSPORT_RDMA);
 	rctrlr->ctrlr.opts = *opts;
 	memcpy(&rctrlr->ctrlr.trid, trid, sizeof(rctrlr->ctrlr.trid));

 	contexts = rdma_get_devices(NULL);
 	if (contexts == NULL) {
 		SPDK_ERRLOG("rdma_get_devices() failed: %s (%d)\n", spdk_strerror(errno), errno);
-		free(rctrlr);
+		nvme_rdma_free(rctrlr);
 		return NULL;
 	}

@ -1743,7 +1801,7 @@ struct spdk_nvme_ctrlr *nvme_rdma_ctrlr_construct(const struct spdk_nvme_transpo
 		if (rc < 0) {
 			SPDK_ERRLOG("Failed to query RDMA device attributes.\n");
 			rdma_free_devices(contexts);
-			free(rctrlr);
+			nvme_rdma_free(rctrlr);
 			return NULL;
 		}
 		rctrlr->max_sge = spdk_min(rctrlr->max_sge, (uint16_t)dev_attr.max_sge);
@ -1754,13 +1812,13 @@ struct spdk_nvme_ctrlr *nvme_rdma_ctrlr_construct(const struct spdk_nvme_transpo

 	rc = nvme_ctrlr_construct(&rctrlr->ctrlr);
 	if (rc != 0) {
-		free(rctrlr);
+		nvme_rdma_free(rctrlr);
 		return NULL;
 	}

 	STAILQ_INIT(&rctrlr->pending_cm_events);
 	STAILQ_INIT(&rctrlr->free_cm_events);
-	rctrlr->cm_events = calloc(NVME_RDMA_NUM_CM_EVENTS, sizeof(*rctrlr->cm_events));
+	rctrlr->cm_events = nvme_rdma_calloc(NVME_RDMA_NUM_CM_EVENTS, sizeof(*rctrlr->cm_events));
 	if (rctrlr->cm_events == NULL) {
 		SPDK_ERRLOG("unable to allocat buffers to hold CM events.\n");
 		nvme_rdma_ctrlr_destruct(&rctrlr->ctrlr);
@ -1834,7 +1892,7 @@ nvme_rdma_ctrlr_destruct(struct spdk_nvme_ctrlr *ctrlr)

 	STAILQ_INIT(&rctrlr->free_cm_events);
 	STAILQ_INIT(&rctrlr->pending_cm_events);
-	free(rctrlr->cm_events);
+	nvme_rdma_free(rctrlr->cm_events);

 	if (rctrlr->cm_channel) {
 		rdma_destroy_event_channel(rctrlr->cm_channel);
@ -1843,7 +1901,7 @@ nvme_rdma_ctrlr_destruct(struct spdk_nvme_ctrlr *ctrlr)

 	nvme_ctrlr_destruct_finish(ctrlr);

-	free(rctrlr);
+	nvme_rdma_free(rctrlr);

 	return 0;
 }
@ -1945,6 +2003,14 @@ nvme_rdma_qpair_check_timeout(struct spdk_nvme_qpair *qpair)
 	}
 }

+static inline int
+nvme_rdma_request_ready(struct nvme_rdma_qpair *rqpair, struct spdk_nvme_rdma_req *rdma_req)
+{
+	nvme_rdma_req_complete(rdma_req->req, &rqpair->rsps[rdma_req->rsp_idx]);
+	nvme_rdma_req_put(rqpair, rdma_req);
+	return nvme_rdma_post_recv(rqpair, rdma_req->rsp_idx);
+}
+
 #define MAX_COMPLETIONS_PER_POLL 128

 int
@ -1954,10 +2020,12 @@ nvme_rdma_qpair_process_completions(struct spdk_nvme_qpair *qpair,
 	struct nvme_rdma_qpair		*rqpair = nvme_rdma_qpair(qpair);
 	struct ibv_wc			wc[MAX_COMPLETIONS_PER_POLL];
 	int				i, rc = 0, batch_size;
-	uint32_t			reaped;
+	uint32_t			reaped = 0;
+	uint16_t			rsp_idx;
 	struct ibv_cq			*cq;
 	struct spdk_nvme_rdma_req	*rdma_req;
 	struct nvme_rdma_ctrlr		*rctrlr;
+	struct spdk_nvme_cpl		*rsp;

 	if (spdk_unlikely(nvme_rdma_qpair_submit_sends(rqpair) ||
 			  nvme_rdma_qpair_submit_recvs(rqpair))) {
@ -1982,7 +2050,6 @@ nvme_rdma_qpair_process_completions(struct spdk_nvme_qpair *qpair,

 	cq = rqpair->cq;

-	reaped = 0;
 	do {
 		batch_size = spdk_min((max_completions - reaped),
 				      MAX_COMPLETIONS_PER_POLL);
@ -2012,20 +2079,32 @@ nvme_rdma_qpair_process_completions(struct spdk_nvme_qpair *qpair,
 					goto fail;
 				}

-				if (nvme_rdma_recv(rqpair, wc[i].wr_id, &reaped)) {
-					SPDK_ERRLOG("nvme_rdma_recv processing failure\n");
+				assert(wc[i].wr_id < rqpair->num_entries);
+				rsp_idx = (uint16_t)wc[i].wr_id;
+				rsp = &rqpair->rsps[rsp_idx];
+				rdma_req = &rqpair->rdma_reqs[rsp->cid];
+				rdma_req->completion_flags |= NVME_RDMA_RECV_COMPLETED;
+				rdma_req->rsp_idx = rsp_idx;
+
+				if ((rdma_req->completion_flags & NVME_RDMA_SEND_COMPLETED) != 0) {
+					if (spdk_unlikely(nvme_rdma_request_ready(rqpair, rdma_req))) {
+						SPDK_ERRLOG("Unable to re-post rx descriptor\n");
 						goto fail;
 					}
+					reaped++;
+				}
 				break;

 			case IBV_WC_SEND:
 				rdma_req = (struct spdk_nvme_rdma_req *)wc[i].wr_id;
+				rdma_req->completion_flags |= NVME_RDMA_SEND_COMPLETED;

-				if (rdma_req->request_ready_to_put) {
+				if ((rdma_req->completion_flags & NVME_RDMA_RECV_COMPLETED) != 0) {
+					if (spdk_unlikely(nvme_rdma_request_ready(rqpair, rdma_req))) {
+						SPDK_ERRLOG("Unable to re-post rx descriptor\n");
+						goto fail;
+					}
 					reaped++;
-					nvme_rdma_req_put(rqpair, rdma_req);
-				} else {
-					rdma_req->request_ready_to_put = true;
 				}
 				break;

--- a/lib/nvme/nvme_tcp.c
+++ b/lib/nvme/nvme_tcp.c
@ -236,6 +236,11 @@ nvme_tcp_ctrlr_disconnect_qpair(struct spdk_nvme_ctrlr *ctrlr, struct spdk_nvme_
 	struct nvme_tcp_qpair *tqpair = nvme_tcp_qpair(qpair);
 	struct nvme_tcp_pdu *pdu;

+	if (nvme_qpair_get_state(qpair) == NVME_QPAIR_DISABLED) {
+		/* Already disconnecting */
+		return;
+	}
+
 	nvme_qpair_set_state(qpair, NVME_QPAIR_DISABLED);
 	spdk_sock_close(&tqpair->sock);

@ -1620,7 +1625,6 @@ struct spdk_nvme_ctrlr *nvme_tcp_ctrlr_construct(const struct spdk_nvme_transpor

 	tctrlr->ctrlr.opts = *opts;
 	tctrlr->ctrlr.trid = *trid;
-	spdk_nvme_trid_populate_transport(&tctrlr->ctrlr.trid, SPDK_NVME_TRANSPORT_TCP);

 	rc = nvme_ctrlr_construct(&tctrlr->ctrlr);
 	if (rc != 0) {
--- a/lib/nvmf/ctrlr.c
+++ b/lib/nvmf/ctrlr.c
@ -2496,6 +2496,11 @@ spdk_nvmf_ctrlr_process_io_fused_cmd(struct spdk_nvmf_request *req, struct spdk_
 		/* save request of first command to generate response later */
 		req->first_fused_req = first_fused_req;
 		req->qpair->first_fused_req = NULL;
+	} else {
+		SPDK_ERRLOG("Invalid fused command fuse field.\n");
+		rsp->status.sct = SPDK_NVME_SCT_GENERIC;
+		rsp->status.sc = SPDK_NVME_SC_INVALID_FIELD;
+		return SPDK_NVMF_REQUEST_EXEC_STATUS_COMPLETE;
 	}

 	rc = spdk_nvmf_bdev_ctrlr_compare_and_write_cmd(bdev, desc, ch, req->first_fused_req, req);
--- a/lib/nvmf/nvmf_rpc.c
+++ b/lib/nvmf/nvmf_rpc.c
@ -2,7 +2,7 @@
 *   BSD LICENSE
 *
 *   Copyright (c) Intel Corporation. All rights reserved.
- *   Copyright (c) 2018-2019 Mellanox Technologies LTD. All rights reserved.
+ *   Copyright (c) 2018-2020 Mellanox Technologies LTD. All rights reserved.
 *
 *   Redistribution and use in source and binary forms, with or without
 *   modification, are permitted provided that the following conditions
@ -359,11 +359,17 @@ spdk_rpc_nvmf_subsystem_started(struct spdk_nvmf_subsystem *subsystem,
 				void *cb_arg, int status)
 {
 	struct spdk_jsonrpc_request *request = cb_arg;
-	struct spdk_json_write_ctx *w;

-	w = spdk_jsonrpc_begin_result(request);
+	if (!status) {
+		struct spdk_json_write_ctx *w = spdk_jsonrpc_begin_result(request);
 		spdk_json_write_bool(w, true);
 		spdk_jsonrpc_end_result(request, w);
+	} else {
+		spdk_jsonrpc_send_error_response_fmt(request, SPDK_JSONRPC_ERROR_INTERNAL_ERROR,
+						     "Subsystem %s start failed",
+						     subsystem->subnqn);
+		spdk_nvmf_subsystem_destroy(subsystem);
+	}
 }

 static void
@ -371,72 +377,77 @@ spdk_rpc_nvmf_create_subsystem(struct spdk_jsonrpc_request *request,
 			       const struct spdk_json_val *params)
 {
 	struct rpc_subsystem_create *req;
-	struct spdk_nvmf_subsystem *subsystem;
+	struct spdk_nvmf_subsystem *subsystem = NULL;
 	struct spdk_nvmf_tgt *tgt;
+	int rc = -1;

 	req = calloc(1, sizeof(*req));
 	if (!req) {
-		goto invalid;
+		SPDK_ERRLOG("Memory allocation failed\n");
+		spdk_jsonrpc_send_error_response(request, SPDK_JSONRPC_ERROR_INTERNAL_ERROR,
+						 "Memory allocation failed");
+		return;
 	}

 	if (spdk_json_decode_object(params, rpc_subsystem_create_decoders,
 				    SPDK_COUNTOF(rpc_subsystem_create_decoders),
 				    req)) {
 		SPDK_ERRLOG("spdk_json_decode_object failed\n");
-		goto invalid;
+		spdk_jsonrpc_send_error_response(request, SPDK_JSONRPC_ERROR_INVALID_PARAMS, "Invalid parameters");
+		goto cleanup;
 	}

 	tgt = spdk_nvmf_get_tgt(req->tgt_name);
 	if (!tgt) {
-		spdk_jsonrpc_send_error_response(request, SPDK_JSONRPC_ERROR_INTERNAL_ERROR,
-						 "Unable to find a target.");
-		goto invalid_custom_response;
+		SPDK_ERRLOG("Unable to find target %s\n", req->tgt_name);
+		spdk_jsonrpc_send_error_response_fmt(request, SPDK_JSONRPC_ERROR_INTERNAL_ERROR,
+						     "Unable to find target %s", req->tgt_name);
+		goto cleanup;
 	}

 	subsystem = spdk_nvmf_subsystem_create(tgt, req->nqn, SPDK_NVMF_SUBTYPE_NVME,
 					       req->max_namespaces);
 	if (!subsystem) {
-		goto invalid;
+		SPDK_ERRLOG("Unable to create subsystem %s\n", req->nqn);
+		spdk_jsonrpc_send_error_response_fmt(request, SPDK_JSONRPC_ERROR_INTERNAL_ERROR,
+						     "Unable to create subsystem %s", req->nqn);
+		goto cleanup;
 	}

 	if (req->serial_number) {
 		if (spdk_nvmf_subsystem_set_sn(subsystem, req->serial_number)) {
 			SPDK_ERRLOG("Subsystem %s: invalid serial number '%s'\n", req->nqn, req->serial_number);
-			goto invalid;
+			spdk_jsonrpc_send_error_response_fmt(request, SPDK_JSONRPC_ERROR_INVALID_PARAMS,
+							     "Invalid SN %s", req->serial_number);
+			goto cleanup;
 		}
 	}

 	if (req->model_number) {
 		if (spdk_nvmf_subsystem_set_mn(subsystem, req->model_number)) {
 			SPDK_ERRLOG("Subsystem %s: invalid model number '%s'\n", req->nqn, req->model_number);
-			goto invalid;
+			spdk_jsonrpc_send_error_response_fmt(request, SPDK_JSONRPC_ERROR_INVALID_PARAMS,
+							     "Invalid MN %s", req->model_number);
+			goto cleanup;
 		}
 	}

 	spdk_nvmf_subsystem_set_allow_any_host(subsystem, req->allow_any_host);

-	free(req->nqn);
-	free(req->tgt_name);
-	free(req->serial_number);
-	free(req->model_number);
-	free(req);
-
-	spdk_nvmf_subsystem_start(subsystem,
+	rc = spdk_nvmf_subsystem_start(subsystem,
 				       spdk_rpc_nvmf_subsystem_started,
 				       request);

-	return;
-
-invalid:
-	spdk_jsonrpc_send_error_response(request, SPDK_JSONRPC_ERROR_INVALID_PARAMS, "Invalid parameters");
-invalid_custom_response:
-	if (req) {
+cleanup:
 	free(req->nqn);
 	free(req->tgt_name);
 	free(req->serial_number);
 	free(req->model_number);
-	}
 	free(req);
+
+	if (rc && subsystem) {
+		spdk_nvmf_subsystem_destroy(subsystem);
+	}
 }
 SPDK_RPC_REGISTER("nvmf_create_subsystem", spdk_rpc_nvmf_create_subsystem, SPDK_RPC_RUNTIME)
 SPDK_RPC_REGISTER_ALIAS_DEPRECATED(nvmf_create_subsystem, nvmf_subsystem_create)
--- a/lib/nvmf/rdma.c
+++ b/lib/nvmf/rdma.c
@ -2,7 +2,7 @@
 *   BSD LICENSE
 *
 *   Copyright (c) Intel Corporation. All rights reserved.
- *   Copyright (c) 2019 Mellanox Technologies LTD. All rights reserved.
+ *   Copyright (c) 2019, 2020 Mellanox Technologies LTD. All rights reserved.
 *
 *   Redistribution and use in source and binary forms, with or without
 *   modification, are permitted provided that the following conditions
@ -2961,14 +2961,30 @@ static const char *CM_EVENT_STR[] = {
 };
 #endif /* DEBUG */

+static void
+nvmf_rdma_disconnect_qpairs_on_port(struct spdk_nvmf_rdma_transport *rtransport,
+				    struct spdk_nvmf_rdma_port *port)
+{
+	struct spdk_nvmf_rdma_poll_group	*rgroup;
+	struct spdk_nvmf_rdma_poller		*rpoller;
+	struct spdk_nvmf_rdma_qpair		*rqpair;
+
+	TAILQ_FOREACH(rgroup, &rtransport->poll_groups, link) {
+		TAILQ_FOREACH(rpoller, &rgroup->pollers, link) {
+			TAILQ_FOREACH(rqpair, &rpoller->qpairs, link) {
+				if (rqpair->listen_id == port->id) {
+					spdk_nvmf_rdma_start_disconnect(rqpair);
+				}
+			}
+		}
+	}
+}
+
 static bool
 nvmf_rdma_handle_cm_event_addr_change(struct spdk_nvmf_transport *transport,
 				      struct rdma_cm_event *event)
 {
 	struct spdk_nvme_transport_id		trid;
-	struct spdk_nvmf_rdma_qpair		*rqpair;
-	struct spdk_nvmf_rdma_poll_group	*rgroup;
-	struct spdk_nvmf_rdma_poller		*rpoller;
 	struct spdk_nvmf_rdma_port		*port;
 	struct spdk_nvmf_rdma_transport		*rtransport;
 	uint32_t				ref, i;
@ -2986,27 +3002,41 @@ nvmf_rdma_handle_cm_event_addr_change(struct spdk_nvmf_transport *transport,
 		}
 	}
 	if (event_acked) {
-		TAILQ_FOREACH(rgroup, &rtransport->poll_groups, link) {
-			TAILQ_FOREACH(rpoller, &rgroup->pollers, link) {
-				TAILQ_FOREACH(rqpair, &rpoller->qpairs, link) {
-					if (rqpair->listen_id == port->id) {
-						spdk_nvmf_rdma_start_disconnect(rqpair);
-					}
-				}
-			}
-		}
+		nvmf_rdma_disconnect_qpairs_on_port(rtransport, port);

 		for (i = 0; i < ref; i++) {
 			spdk_nvmf_rdma_stop_listen(transport, &trid);
 		}
-		while (ref > 0) {
+		for (i = 0; i < ref; i++) {
 			spdk_nvmf_rdma_listen(transport, &trid, NULL, NULL);
-			ref--;
 		}
 	}
 	return event_acked;
 }

+static void
+nvmf_rdma_handle_cm_event_port_removal(struct spdk_nvmf_transport *transport,
+				       struct rdma_cm_event *event)
+{
+	struct spdk_nvmf_rdma_port		*port;
+	struct spdk_nvmf_rdma_transport		*rtransport;
+	uint32_t				ref, i;
+
+	port = event->id->context;
+	rtransport = SPDK_CONTAINEROF(transport, struct spdk_nvmf_rdma_transport, transport);
+	ref = port->ref;
+
+	SPDK_NOTICELOG("Port %s:%s is being removed\n", port->trid.traddr, port->trid.trsvcid);
+
+	nvmf_rdma_disconnect_qpairs_on_port(rtransport, port);
+
+	rdma_ack_cm_event(event);
+
+	for (i = 0; i < ref; i++) {
+		spdk_nvmf_rdma_stop_listen(transport, &port->trid);
+	}
+}
+
 static void
 spdk_nvmf_process_cm_event(struct spdk_nvmf_transport *transport, new_qpair_fn cb_fn, void *cb_arg)
 {
@ -3024,7 +3054,13 @@ spdk_nvmf_process_cm_event(struct spdk_nvmf_transport *transport, new_qpair_fn c
 	while (1) {
 		event_acked = false;
 		rc = rdma_get_cm_event(rtransport->event_channel, &event);
-		if (rc == 0) {
+		if (rc) {
+			if (errno != EAGAIN && errno != EWOULDBLOCK) {
+				SPDK_ERRLOG("Acceptor Event Error: %s\n", spdk_strerror(errno));
+			}
+			break;
+		}
+
 		SPDK_DEBUGLOG(SPDK_LOG_RDMA, "Acceptor Event: %s\n", CM_EVENT_STR[event->event]);

 		spdk_trace_record(TRACE_RDMA_CM_ASYNC_EVENT, 0, 0, 0, event->event);
@ -3057,13 +3093,32 @@ spdk_nvmf_process_cm_event(struct spdk_nvmf_transport *transport, new_qpair_fn c
 			/* TODO: Should we be waiting for this event anywhere? */
 			break;
 		case RDMA_CM_EVENT_DISCONNECTED:
-			case RDMA_CM_EVENT_DEVICE_REMOVAL:
 			rc = nvmf_rdma_disconnect(event);
 			if (rc < 0) {
 				SPDK_ERRLOG("Unable to process disconnect event. rc: %d\n", rc);
 				break;
 			}
 			break;
+		case RDMA_CM_EVENT_DEVICE_REMOVAL:
+			/* In case of device removal, kernel IB part triggers IBV_EVENT_DEVICE_FATAL
+			 * which triggers RDMA_CM_EVENT_DEVICE_REMOVAL on all cma_id’s.
+			 * Once these events are sent to SPDK, we should release all IB resources and
+			 * don't make attempts to call any ibv_query/modify/create functions. We can only call
+			 * ibv_destory* functions to release user space memory allocated by IB. All kernel
+			 * resources are already cleaned. */
+			if (event->id->qp) {
+				/* If rdma_cm event has a valid `qp` pointer then the event refers to the
+				 * corresponding qpair. Otherwise the event refers to a listening device */
+				rc = nvmf_rdma_disconnect(event);
+				if (rc < 0) {
+					SPDK_ERRLOG("Unable to process disconnect event. rc: %d\n", rc);
+					break;
+				}
+			} else {
+				nvmf_rdma_handle_cm_event_port_removal(transport, event);
+				event_acked = true;
+			}
+			break;
 		case RDMA_CM_EVENT_MULTICAST_JOIN:
 		case RDMA_CM_EVENT_MULTICAST_ERROR:
 			/* Multicast is not used */
@ -3081,12 +3136,6 @@ spdk_nvmf_process_cm_event(struct spdk_nvmf_transport *transport, new_qpair_fn c
 		if (!event_acked) {
 			rdma_ack_cm_event(event);
 		}
-		} else {
-			if (errno != EAGAIN && errno != EWOULDBLOCK) {
-				SPDK_ERRLOG("Acceptor Event Error: %s\n", spdk_strerror(errno));
-			}
-			break;
-		}
 	}
 }

@ -3450,7 +3499,9 @@ spdk_nvmf_rdma_poll_group_destroy(struct spdk_nvmf_transport_poll_group *group)
 		}

 		if (poller->srq) {
+			if (poller->resources) {
 				nvmf_rdma_resources_destroy(poller->resources);
+			}
 			ibv_destroy_srq(poller->srq);
 			SPDK_DEBUGLOG(SPDK_LOG_RDMA, "Destroyed RDMA shared queue %p\n", poller->srq);
 		}
--- a/lib/sock/sock.c
+++ b/lib/sock/sock.c
@ -332,6 +332,14 @@ spdk_sock_writev_async(struct spdk_sock *sock, struct spdk_sock_request *req)
 int
 spdk_sock_flush(struct spdk_sock *sock)
 {
+	if (sock == NULL) {
+		return -EBADF;
+	}
+
+	if (sock->flags.closed) {
+		return -EBADF;
+	}
+
 	return sock->net_impl->flush(sock);
 }

@ -396,6 +404,7 @@ spdk_sock_group_create(void *ctx)
 		if (group_impl != NULL) {
 			STAILQ_INSERT_TAIL(&group->group_impls, group_impl, link);
 			TAILQ_INIT(&group_impl->socks);
+			group_impl->num_removed_socks = 0;
 			group_impl->net_impl = impl;
 		}
 	}
@ -492,6 +501,9 @@ spdk_sock_group_remove_sock(struct spdk_sock_group *group, struct spdk_sock *soc
 	rc = group_impl->net_impl->group_impl_remove_sock(group_impl, sock);
 	if (rc == 0) {
 		TAILQ_REMOVE(&group_impl->socks, sock, link);
+		assert(group_impl->num_removed_socks < MAX_EVENTS_PER_POLL);
+		group_impl->removed_socks[group_impl->num_removed_socks] = (uintptr_t)sock;
+		group_impl->num_removed_socks++;
 		sock->group_impl = NULL;
 		sock->cb_fn = NULL;
 		sock->cb_arg = NULL;
@ -518,6 +530,9 @@ spdk_sock_group_impl_poll_count(struct spdk_sock_group_impl *group_impl,
 		return 0;
 	}

+	/* The number of removed sockets should be reset for each call to poll. */
+	group_impl->num_removed_socks = 0;
+
 	num_events = group_impl->net_impl->group_impl_poll(group_impl, max_events, socks);
 	if (num_events == -1) {
 		return -1;
@ -525,10 +540,21 @@ spdk_sock_group_impl_poll_count(struct spdk_sock_group_impl *group_impl,

 	for (i = 0; i < num_events; i++) {
 		struct spdk_sock *sock = socks[i];
+		int j;
+		bool valid = true;
+		for (j = 0; j < group_impl->num_removed_socks; j++) {
+			if ((uintptr_t)sock == group_impl->removed_socks[j]) {
+				valid = false;
+				break;
+			}
+		}

+		if (valid) {
 			assert(sock->cb_fn != NULL);
 			sock->cb_fn(sock->cb_arg, group, sock);
 		}
+	}
+
 	return num_events;
 }

--- a/lib/vhost/vhost_blk.c
+++ b/lib/vhost/vhost_blk.c
@ -919,6 +919,7 @@ vhost_blk_get_config(struct spdk_vhost_dev *vdev, uint8_t *config,
 	uint32_t blk_size;
 	uint64_t blkcnt;

+	memset(&blkcfg, 0, sizeof(blkcfg));
 	bvdev = to_blk_dev(vdev);
 	assert(bvdev != NULL);
 	bdev = bvdev->bdev;
@ -949,7 +950,6 @@ vhost_blk_get_config(struct spdk_vhost_dev *vdev, uint8_t *config,
 		}
 	}

-	memset(&blkcfg, 0, sizeof(blkcfg));
 	blkcfg.blk_size = blk_size;
 	/* minimum I/O size in blocks */
 	blkcfg.min_io_size = 1;
--- a/mk/spdk.common.mk
+++ b/mk/spdk.common.mk
@ -266,7 +266,7 @@ LINK_CXX=\
 #
 # Variables to use for versioning shared libs
 #
-SO_VER := 1
+SO_VER := 2
 SO_MINOR := 0
 SO_SUFFIX_ALL := $(SO_VER).$(SO_MINOR)

--- a/mk/spdk.lib.mk
+++ b/mk/spdk.lib.mk
@ -37,7 +37,11 @@ include $(SPDK_ROOT_DIR)/mk/spdk.lib_deps.mk
 SPDK_MAP_FILE = $(SPDK_ROOT_DIR)/shared_lib/spdk.map
 LIB := $(call spdk_lib_list_to_static_libs,$(LIBNAME))
 SHARED_LINKED_LIB := $(subst .a,.so,$(LIB))
+ifdef SO_SUFFIX
+SHARED_REALNAME_LIB := $(subst .so,.so.$(SO_SUFFIX),$(SHARED_LINKED_LIB))
+else
 SHARED_REALNAME_LIB := $(subst .so,.so.$(SO_SUFFIX_ALL),$(SHARED_LINKED_LIB))
+endif

 ifeq ($(CONFIG_SHARED),y)
 DEP := $(SHARED_LINKED_LIB)
--- a/module/bdev/crypto/vbdev_crypto.c
+++ b/module/bdev/crypto/vbdev_crypto.c
@ -131,6 +131,7 @@ uint8_t g_number_of_claimed_volumes = 0;
 /* Specific to AES_CBC. */
 #define AES_CBC_IV_LENGTH	16
 #define AES_CBC_KEY_LENGTH	16
+#define AESNI_MB_NUM_QP		64

 /* Common for suported devices. */
 #define IV_OFFSET            (sizeof(struct rte_crypto_op) + \
@ -368,6 +369,7 @@ vbdev_crypto_init_crypto_drivers(void)
 	struct device_qp *dev_qp;
 	unsigned int max_sess_size = 0, sess_size;
 	uint16_t num_lcores = rte_lcore_count();
+	char aesni_args[32];

 	/* Only the first call, via RPC or module init should init the crypto drivers. */
 	if (g_session_mp != NULL) {
@ -375,7 +377,8 @@ vbdev_crypto_init_crypto_drivers(void)
 	}

 	/* We always init AESNI_MB */
-	rc = rte_vdev_init(AESNI_MB, NULL);
+	snprintf(aesni_args, sizeof(aesni_args), "max_nb_queue_pairs=%d", AESNI_MB_NUM_QP);
+	rc = rte_vdev_init(AESNI_MB, aesni_args);
 	if (rc) {
 		SPDK_ERRLOG("error creating virtual PMD %s\n", AESNI_MB);
 		return -EINVAL;
--- a/module/bdev/nvme/bdev_nvme.c
+++ b/module/bdev/nvme/bdev_nvme.c
@ -328,7 +328,9 @@ _bdev_nvme_reset_complete(struct nvme_bdev_ctrlr *nvme_bdev_ctrlr, int rc)
 		SPDK_NOTICELOG("Resetting controller successful.\n");
 	}

-	__atomic_clear(&nvme_bdev_ctrlr->resetting, __ATOMIC_RELAXED);
+	pthread_mutex_lock(&g_bdev_nvme_mutex);
+	nvme_bdev_ctrlr->resetting = false;
+	pthread_mutex_unlock(&g_bdev_nvme_mutex);
 	/* Make sure we clear any pending resets before returning. */
 	spdk_for_each_channel(nvme_bdev_ctrlr,
 			      _bdev_nvme_complete_pending_resets,
@ -425,7 +427,20 @@ bdev_nvme_reset(struct nvme_bdev_ctrlr *nvme_bdev_ctrlr, struct nvme_bdev_io *bi
 	struct spdk_io_channel *ch;
 	struct nvme_io_channel *nvme_ch;

-	if (__atomic_test_and_set(&nvme_bdev_ctrlr->resetting, __ATOMIC_RELAXED)) {
+	pthread_mutex_lock(&g_bdev_nvme_mutex);
+	if (nvme_bdev_ctrlr->destruct) {
+		/* Don't bother resetting if the controller is in the process of being destructed. */
+		if (bio) {
+			spdk_bdev_io_complete(spdk_bdev_io_from_ctx(bio), SPDK_BDEV_IO_STATUS_FAILED);
+		}
+		pthread_mutex_unlock(&g_bdev_nvme_mutex);
+		return 0;
+	}
+
+	if (!nvme_bdev_ctrlr->resetting) {
+		nvme_bdev_ctrlr->resetting = true;
+	} else {
+		pthread_mutex_unlock(&g_bdev_nvme_mutex);
 		SPDK_NOTICELOG("Unable to perform reset, already in progress.\n");
 		/*
 		 * The internal reset calls won't be queued. This is on purpose so that we don't
@ -442,6 +457,7 @@ bdev_nvme_reset(struct nvme_bdev_ctrlr *nvme_bdev_ctrlr, struct nvme_bdev_io *bi
 		return 0;
 	}

+	pthread_mutex_unlock(&g_bdev_nvme_mutex);
 	/* First, delete all NVMe I/O queue pairs. */
 	spdk_for_each_channel(nvme_bdev_ctrlr,
 			      _bdev_nvme_reset_destroy_qpair,
--- a/module/bdev/nvme/bdev_nvme_cuse_rpc.c
+++ b/module/bdev/nvme/bdev_nvme_cuse_rpc.c
@ -83,8 +83,8 @@ spdk_rpc_nvme_cuse_register(struct spdk_jsonrpc_request *request,

 	rc = spdk_nvme_cuse_register(bdev_ctrlr->ctrlr);
 	if (rc) {
-		SPDK_ERRLOG("Failed to register CUSE devices\n");
-		spdk_jsonrpc_send_error_response(request, -rc, spdk_strerror(rc));
+		SPDK_ERRLOG("Failed to register CUSE devices: %s\n", spdk_strerror(-rc));
+		spdk_jsonrpc_send_error_response(request, rc, spdk_strerror(-rc));
 		goto cleanup;
 	}

--- a/module/bdev/nvme/common.c
+++ b/module/bdev/nvme/common.c
@ -130,10 +130,20 @@ nvme_bdev_unregister_cb(void *io_device)
 	free(nvme_bdev_ctrlr);
 }

-void
+int
 nvme_bdev_ctrlr_destruct(struct nvme_bdev_ctrlr *nvme_bdev_ctrlr)
 {
 	assert(nvme_bdev_ctrlr->destruct);
+	pthread_mutex_lock(&g_bdev_nvme_mutex);
+	if (nvme_bdev_ctrlr->resetting) {
+		nvme_bdev_ctrlr->destruct_poller =
+			spdk_poller_register((spdk_poller_fn)nvme_bdev_ctrlr_destruct, nvme_bdev_ctrlr, 1000);
+		pthread_mutex_unlock(&g_bdev_nvme_mutex);
+		return 1;
+	}
+	pthread_mutex_unlock(&g_bdev_nvme_mutex);
+
+	spdk_poller_unregister(&nvme_bdev_ctrlr->destruct_poller);
 	if (nvme_bdev_ctrlr->opal_dev) {
 		if (nvme_bdev_ctrlr->opal_poller != NULL) {
 			spdk_poller_unregister(&nvme_bdev_ctrlr->opal_poller);
@ -149,6 +159,7 @@ nvme_bdev_ctrlr_destruct(struct nvme_bdev_ctrlr *nvme_bdev_ctrlr)
 	}

 	spdk_io_device_unregister(nvme_bdev_ctrlr, nvme_bdev_unregister_cb);
+	return 1;
 }

 void
--- a/module/bdev/nvme/common.h
+++ b/module/bdev/nvme/common.h
@ -94,6 +94,7 @@ struct nvme_bdev_ctrlr {
 	struct spdk_poller		*opal_poller;

 	struct spdk_poller		*adminq_timer_poller;
+	struct spdk_poller		*destruct_poller;

 	struct ocssd_bdev_ctrlr		*ocssd_ctrlr;

@ -150,7 +151,7 @@ struct nvme_bdev_ctrlr *nvme_bdev_next_ctrlr(struct nvme_bdev_ctrlr *prev);
 void nvme_bdev_dump_trid_json(struct spdk_nvme_transport_id *trid,
 			      struct spdk_json_write_ctx *w);

-void nvme_bdev_ctrlr_destruct(struct nvme_bdev_ctrlr *nvme_bdev_ctrlr);
+int nvme_bdev_ctrlr_destruct(struct nvme_bdev_ctrlr *nvme_bdev_ctrlr);
 void nvme_bdev_attach_bdev_to_ns(struct nvme_bdev_ns *nvme_ns, struct nvme_bdev *nvme_disk);
 void nvme_bdev_detach_bdev_from_ns(struct nvme_bdev *nvme_disk);

--- a/module/bdev/rbd/bdev_rbd.c
+++ b/module/bdev/rbd/bdev_rbd.c
@ -328,7 +328,9 @@ bdev_rbd_flush(struct bdev_rbd *disk, struct spdk_io_channel *ch,
 	       struct spdk_bdev_io *bdev_io, uint64_t offset, uint64_t nbytes)
 {
 	struct bdev_rbd_io_channel *rbdio_ch = spdk_io_channel_get_ctx(ch);
+	struct bdev_rbd_io *rbd_io = (struct bdev_rbd_io *)bdev_io->driver_ctx;

+	rbd_io->num_segments++;
 	return bdev_rbd_start_aio(rbdio_ch->image, bdev_io, NULL, offset, nbytes);
 }

@ -783,6 +785,44 @@ spdk_bdev_rbd_delete(struct spdk_bdev *bdev, spdk_delete_rbd_complete cb_fn, voi
 	spdk_bdev_unregister(bdev, cb_fn, cb_arg);
 }

+int
+spdk_bdev_rbd_resize(struct spdk_bdev *bdev, const uint64_t new_size_in_mb)
+{
+	struct spdk_io_channel *ch;
+	struct bdev_rbd_io_channel *rbd_io_ch;
+	int rc;
+	uint64_t new_size_in_byte;
+	uint64_t current_size_in_mb;
+
+	if (bdev->module != &rbd_if) {
+		return -EINVAL;
+	}
+
+	current_size_in_mb = bdev->blocklen * bdev->blockcnt / (1024 * 1024);
+	if (current_size_in_mb > new_size_in_mb) {
+		SPDK_ERRLOG("The new bdev size must be lager than current bdev size.\n");
+		return -EINVAL;
+	}
+
+	ch = bdev_rbd_get_io_channel(bdev);
+	rbd_io_ch = spdk_io_channel_get_ctx(ch);
+	new_size_in_byte = new_size_in_mb * 1024 * 1024;
+
+	rc = rbd_resize(rbd_io_ch->image, new_size_in_byte);
+	if (rc != 0) {
+		SPDK_ERRLOG("failed to resize the ceph bdev.\n");
+		return rc;
+	}
+
+	rc = spdk_bdev_notify_blockcnt_change(bdev, new_size_in_byte / bdev->blocklen);
+	if (rc != 0) {
+		SPDK_ERRLOG("failed to notify block cnt change.\n");
+		return rc;
+	}
+
+	return rc;
+}
+
 static int
 bdev_rbd_library_init(void)
 {
--- a/module/bdev/rbd/bdev_rbd.h
+++ b/module/bdev/rbd/bdev_rbd.h
@ -57,4 +57,12 @@ int spdk_bdev_rbd_create(struct spdk_bdev **bdev, const char *name, const char *
 void spdk_bdev_rbd_delete(struct spdk_bdev *bdev, spdk_delete_rbd_complete cb_fn,
 			  void *cb_arg);

+/**
+ * Resize rbd bdev.
+ *
+ * \param bdev Pointer to rbd bdev.
+ * \param new_size_in_mb The new size in MiB for this bdev.
+ */
+int spdk_bdev_rbd_resize(struct spdk_bdev *bdev, const uint64_t new_size_in_mb);
+
 #endif /* SPDK_BDEV_RBD_H */
--- a/module/bdev/rbd/bdev_rbd_rpc.c
+++ b/module/bdev/rbd/bdev_rbd_rpc.c
@ -197,3 +197,56 @@ cleanup:
 }
 SPDK_RPC_REGISTER("bdev_rbd_delete", spdk_rpc_bdev_rbd_delete, SPDK_RPC_RUNTIME)
 SPDK_RPC_REGISTER_ALIAS_DEPRECATED(bdev_rbd_delete, delete_rbd_bdev)
+
+struct rpc_bdev_rbd_resize {
+	char *name;
+	uint64_t new_size;
+};
+
+static const struct spdk_json_object_decoder rpc_bdev_rbd_resize_decoders[] = {
+	{"name", offsetof(struct rpc_bdev_rbd_resize, name), spdk_json_decode_string},
+	{"new_size", offsetof(struct rpc_bdev_rbd_resize, new_size), spdk_json_decode_uint64}
+};
+
+static void
+free_rpc_bdev_rbd_resize(struct rpc_bdev_rbd_resize *req)
+{
+	free(req->name);
+}
+
+static void
+spdk_rpc_bdev_rbd_resize(struct spdk_jsonrpc_request *request,
+			 const struct spdk_json_val *params)
+{
+	struct rpc_bdev_rbd_resize req = {};
+	struct spdk_bdev *bdev;
+	struct spdk_json_write_ctx *w;
+	int rc;
+
+	if (spdk_json_decode_object(params, rpc_bdev_rbd_resize_decoders,
+				    SPDK_COUNTOF(rpc_bdev_rbd_resize_decoders),
+				    &req)) {
+		spdk_jsonrpc_send_error_response(request, SPDK_JSONRPC_ERROR_INTERNAL_ERROR,
+						 "spdk_json_decode_object failed");
+		goto cleanup;
+	}
+
+	bdev = spdk_bdev_get_by_name(req.name);
+	if (bdev == NULL) {
+		spdk_jsonrpc_send_error_response(request, -ENODEV, spdk_strerror(ENODEV));
+		goto cleanup;
+	}
+
+	rc = spdk_bdev_rbd_resize(bdev, req.new_size);
+	if (rc) {
+		spdk_jsonrpc_send_error_response(request, rc, spdk_strerror(-rc));
+		goto cleanup;
+	}
+
+	w = spdk_jsonrpc_begin_result(request);
+	spdk_json_write_bool(w, true);
+	spdk_jsonrpc_end_result(request, w);
+cleanup:
+	free_rpc_bdev_rbd_resize(&req);
+}
+SPDK_RPC_REGISTER("bdev_rbd_resize", spdk_rpc_bdev_rbd_resize, SPDK_RPC_RUNTIME)
--- a/module/sock/posix/posix.c
+++ b/module/sock/posix/posix.c
@ -462,7 +462,7 @@ spdk_posix_sock_close(struct spdk_sock *_sock)
 }

 #ifdef SPDK_ZEROCOPY
-static void
+static int
 _sock_check_zcopy(struct spdk_sock *sock)
 {
 	struct spdk_posix_sock *psock = __posix_sock(sock);
@ -483,7 +483,7 @@ _sock_check_zcopy(struct spdk_sock *sock)

 		if (rc < 0) {
 			if (errno == EWOULDBLOCK || errno == EAGAIN) {
-				return;
+				return 0;
 			}

 			if (!TAILQ_EMPTY(&sock->pending_reqs)) {
@ -491,19 +491,19 @@ _sock_check_zcopy(struct spdk_sock *sock)
 			} else {
 				SPDK_WARNLOG("Recvmsg yielded an error!\n");
 			}
-			return;
+			return 0;
 		}

 		cm = CMSG_FIRSTHDR(&msgh);
 		if (cm->cmsg_level != SOL_IP || cm->cmsg_type != IP_RECVERR) {
 			SPDK_WARNLOG("Unexpected cmsg level or type!\n");
-			return;
+			return 0;
 		}

 		serr = (struct sock_extended_err *)CMSG_DATA(cm);
 		if (serr->ee_errno != 0 || serr->ee_origin != SO_EE_ORIGIN_ZEROCOPY) {
 			SPDK_WARNLOG("Unexpected extended error origin\n");
-			return;
+			return 0;
 		}

 		/* Most of the time, the pending_reqs array is in the exact
@ -521,7 +521,7 @@ _sock_check_zcopy(struct spdk_sock *sock)

 					rc = spdk_sock_request_put(sock, req, 0);
 					if (rc < 0) {
-						return;
+						return rc;
 					}

 				} else if (found) {
@ -531,6 +531,8 @@ _sock_check_zcopy(struct spdk_sock *sock)

 		}
 	}
+
+	return 0;
 }
 #endif

@ -959,14 +961,22 @@ spdk_posix_sock_group_impl_poll(struct spdk_sock_group_impl *_group, int max_eve

 	for (i = 0, j = 0; i < num_events; i++) {
 #if defined(__linux__)
+		sock = events[i].data.ptr;
+
 #ifdef SPDK_ZEROCOPY
 		if (events[i].events & EPOLLERR) {
-			_sock_check_zcopy(events[i].data.ptr);
+			rc = _sock_check_zcopy(sock);
+			/* If the socket was closed or removed from
+			 * the group in response to a send ack, don't
+			 * add it to the array here. */
+			if (rc || sock->cb_fn == NULL) {
+				continue;
+			}
 		}
 #endif

 		if (events[i].events & EPOLLIN) {
-			socks[j++] = events[i].data.ptr;
+			socks[j++] = sock;
 		}

 #elif defined(__FreeBSD__)
--- a/module/sock/vpp/Makefile
+++ b/module/sock/vpp/Makefile
@ -38,6 +38,13 @@ C_SRCS += vpp.c
 CFLAGS += -Wno-sign-compare -Wno-error=old-style-definition
 CFLAGS += -Wno-error=strict-prototypes -Wno-error=ignored-qualifiers

+GCC_VERSION=$(shell $(CC) -dumpversion | cut -d. -f1)
+
+# disable packed member unalign warnings
+ifeq ($(shell test $(GCC_VERSION) -ge 9 && echo 1), 1)
+CFLAGS += -Wno-error=address-of-packed-member
+endif
+
 LIBNAME = sock_vpp

 include $(SPDK_ROOT_DIR)/mk/spdk.lib.mk
--- a/pkg/spdk.spec
+++ b/pkg/spdk.spec
@ -2,12 +2,12 @@
 %bcond_with doc

 Name: spdk
-Version: master
+Version: 20.01.x
 Release: 0%{?dist}
 Epoch: 0
 URL: http://spdk.io

-Source: https://github.com/spdk/spdk/archive/master.tar.gz
+Source: https://github.com/spdk/spdk/archive/v20.01.x.tar.gz
 Summary: Set of libraries and utilities for high performance user-mode storage

 %define package_version %{epoch}:%{version}-%{release}
--- a/scripts/rpc.py
+++ b/scripts/rpc.py
@ -541,6 +541,20 @@ if __name__ == "__main__":
    p.add_argument('name', help='rbd bdev name')
    p.set_defaults(func=bdev_rbd_delete)

+    def bdev_rbd_resize(args):
+        print_json(rpc.bdev.bdev_rbd_resize(args.client,
+                                            name=args.name,
+                                            new_size=int(args.new_size)))
+        rpc.bdev.bdev_rbd_resize(args.client,
+                                 name=args.name,
+                                 new_size=int(args.new_size))
+
+    p = subparsers.add_parser('bdev_rbd_resize',
+                              help='Resize a rbd bdev')
+    p.add_argument('name', help='rbd bdev name')
+    p.add_argument('new_size', help='new bdev size for resize operation. The unit is MiB')
+    p.set_defaults(func=bdev_rbd_resize)
+
    def bdev_delay_create(args):
        print_json(rpc.bdev.bdev_delay_create(args.client,
                                              base_bdev_name=args.base_bdev_name,
--- a/scripts/rpc/bdev.py
+++ b/scripts/rpc/bdev.py
@ -585,6 +585,20 @@ def bdev_rbd_delete(client, name):
    return client.call('bdev_rbd_delete', params)


+def bdev_rbd_resize(client, name, new_size):
+    """Resize rbd bdev in the system.
+
+    Args:
+        name: name of rbd bdev to resize
+        new_size: new bdev size of resize operation. The unit is MiB
+    """
+    params = {
+            'name': name,
+            'new_size': new_size,
+            }
+    return client.call('bdev_rbd_resize', params)
+
+
@deprecated_alias('construct_error_bdev')
 def bdev_error_create(client, base_name):
    """Construct an error injection block device.
--- a/test/app/stub/stub.c
+++ b/test/app/stub/stub.c
@ -41,6 +41,27 @@
 static char g_path[256];
 static struct spdk_poller *g_poller;

+struct ctrlr_entry {
+	struct spdk_nvme_ctrlr *ctrlr;
+	struct ctrlr_entry *next;
+};
+
+static struct ctrlr_entry *g_controllers = NULL;
+
+static void
+cleanup(void)
+{
+	struct ctrlr_entry *ctrlr_entry = g_controllers;
+
+	while (ctrlr_entry) {
+		struct ctrlr_entry *next = ctrlr_entry->next;
+
+		spdk_nvme_detach(ctrlr_entry->ctrlr);
+		free(ctrlr_entry);
+		ctrlr_entry = next;
+	}
+}
+
 static void
 usage(char *executable_name)
 {
@ -70,6 +91,17 @@ static void
 attach_cb(void *cb_ctx, const struct spdk_nvme_transport_id *trid,
 	  struct spdk_nvme_ctrlr *ctrlr, const struct spdk_nvme_ctrlr_opts *opts)
 {
+	struct ctrlr_entry *entry;
+
+	entry = malloc(sizeof(struct ctrlr_entry));
+	if (entry == NULL) {
+		fprintf(stderr, "Malloc error\n");
+		exit(1);
+	}
+
+	entry->ctrlr = ctrlr;
+	entry->next = g_controllers;
+	g_controllers = entry;
 }

 static int
@ -163,6 +195,8 @@ main(int argc, char **argv)
 	opts.shutdown_cb = stub_shutdown;

 	ch = spdk_app_start(&opts, stub_start, (void *)(intptr_t)opts.shm_id);
+
+	cleanup();
 	spdk_app_fini();

 	return ch;
--- a/test/blobfs/rocksdb/rocksdb.sh
+++ b/test/blobfs/rocksdb/rocksdb.sh
@ -45,7 +45,14 @@ pushd $DB_BENCH_DIR
 if [ -z "$SKIP_GIT_CLEAN" ]; then
 	git clean -x -f -d
 fi
-$MAKE db_bench $MAKEFLAGS $MAKECONFIG DEBUG_LEVEL=0 SPDK_DIR=$rootdir
+
+EXTRA_CXXFLAGS=""
+GCC_VERSION=$(cc -dumpversion | cut -d. -f1)
+if (( GCC_VERSION >= 9 )); then
+        EXTRA_CXXFLAGS+="-Wno-deprecated-copy -Wno-pessimizing-move"
+fi
+
+$MAKE db_bench $MAKEFLAGS $MAKECONFIG DEBUG_LEVEL=0 SPDK_DIR=$rootdir EXTRA_CXXFLAGS="$EXTRA_CXXFLAGS"
 popd

 timing_exit db_bench_build
--- a/test/env/memory/memory_ut.c
+++ b/test/env/memory/memory_ut.c
@ -52,20 +52,6 @@ DEFINE_STUB(rte_mem_event_callback_register, int,
 	    (const char *name, rte_mem_event_callback_t clb, void *arg), 0);
 DEFINE_STUB(rte_mem_virt2iova, rte_iova_t, (const void *virtaddr), 0);

-void *
-rte_malloc(const char *type, size_t size, unsigned align)
-{
-	CU_ASSERT(type == NULL);
-	CU_ASSERT(align == 0);
-	return malloc(size);
-}
-
-void
-rte_free(void *ptr)
-{
-	free(ptr);
-}
-
 static int
 test_mem_map_notify(void *cb_ctx, struct spdk_mem_map *map,
 		    enum spdk_mem_map_notify_action action,
--- a/test/iscsi_tgt/common.sh
+++ b/test/iscsi_tgt/common.sh
@ -107,6 +107,7 @@ function start_vpp() {
 	# for VPP side maximal size of MTU for TCP is 1460 and tests doesn't work
 	# stable with larger packets
 	MTU=1460
+	MTU_W_HEADER=$((MTU+20))
 	ip link set dev $INITIATOR_INTERFACE mtu $MTU
 	ethtool -K $INITIATOR_INTERFACE tso off
 	ethtool -k $INITIATOR_INTERFACE
@ -131,7 +132,7 @@ function start_vpp() {
 	xtrace_disable
 	counter=40
 	while [ $counter -gt 0 ] ; do
-		vppctl show version &> /dev/null && break
+		vppctl show version | grep -E "vpp v[0-9]+\.[0-9]+" && break
 		counter=$(( counter - 1 ))
 		sleep 0.5
 	done
@ -140,37 +141,47 @@ function start_vpp() {
 		return 1
 	fi

-	# Setup host interface
-	vppctl create host-interface name $TARGET_INTERFACE
-	VPP_TGT_INT="host-$TARGET_INTERFACE"
-	vppctl set interface state $VPP_TGT_INT up
-	vppctl set interface ip address $VPP_TGT_INT $TARGET_IP/24
-	vppctl set interface mtu $MTU $VPP_TGT_INT
+	# Below VPP commands are masked with "|| true" for the sake of
+	# running the test in the CI system. For reasons unknown when
+	# run via CI these commands result in 141 return code (pipefail)
+	# even despite producing valid output.
+	# Using "|| true" does not impact the "-e" flag used in test scripts
+	# because vppctl cli commands always return with 0, even if
+	# there was an error.
+	# As a result - grep checks on command outputs must be used to
+	# verify vpp configuration and connectivity.

-	vppctl show interface
+	# Setup host interface
+	vppctl create host-interface name $TARGET_INTERFACE || true
+	VPP_TGT_INT="host-$TARGET_INTERFACE"
+	vppctl set interface state $VPP_TGT_INT up || true
+	vppctl set interface ip address $VPP_TGT_INT $TARGET_IP/24 || true
+	vppctl set interface mtu $MTU $VPP_TGT_INT || true
+
+	vppctl show interface | tr -s " " | grep -E "host-$TARGET_INTERFACE [0-9]+ up $MTU/0/0/0"

 	# Disable session layer
 	# NOTE: VPP net framework should enable it itself.
-	vppctl session disable
+	vppctl session disable || true

 	# Verify connectivity
-	vppctl show int addr
+	vppctl show int addr | grep -E "$TARGET_IP/24"
 	ip addr show $INITIATOR_INTERFACE
 	ip netns exec $TARGET_NAMESPACE ip addr show $TARGET_INTERFACE
 	sleep 3
 	# SC1010: ping -M do - in this case do is an option not bash special word
 	# shellcheck disable=SC1010
 	ping -c 1 $TARGET_IP -s $(( MTU - 28 )) -M do
-	vppctl ping $INITIATOR_IP repeat 1 size $(( MTU - (28 + 8) )) verbose
+	vppctl ping $INITIATOR_IP repeat 1 size $(( MTU - (28 + 8) )) verbose | grep -E "$MTU_W_HEADER bytes from $INITIATOR_IP"
 }

 function kill_vpp() {
-	vppctl delete host-interface name $TARGET_INTERFACE
+	vppctl delete host-interface name $TARGET_INTERFACE || true

 	# Dump VPP configuration before kill
-	vppctl show api clients
-	vppctl show session
-	vppctl show errors
+	vppctl show api clients || true
+	vppctl show session || true
+	vppctl show errors || true

 	killprocess $vpp_pid
 }
--- a/test/iscsi_tgt/rbd/rbd.sh
+++ b/test/iscsi_tgt/rbd/rbd.sh
@ -40,6 +40,15 @@ $rpc_py iscsi_create_portal_group $PORTAL_TAG $TARGET_IP:$ISCSI_PORT
 $rpc_py iscsi_create_initiator_group $INITIATOR_TAG $INITIATOR_NAME $NETMASK
 rbd_bdev="$($rpc_py bdev_rbd_create $RBD_POOL $RBD_NAME 4096)"
 $rpc_py bdev_get_bdevs
+
+$rpc_py bdev_rbd_resize $rbd_bdev 2000
+num_block=$($rpc_py bdev_get_bdevs|grep num_blocks|sed 's/[^[:digit:]]//g')
+# get the bdev size in MiB.
+total_size=$(( num_block * 4096/ 1048576 ))
+if [ $total_size != 2000 ];then
+	echo "resize failed."
+	exit 1
+fi
 # "Ceph0:0" ==> use Ceph0 blockdev for LUN0
 # "1:2" ==> map PortalGroup1 to InitiatorGroup2
 # "64" ==> iSCSI queue depth 64
--- a/test/nvme/spdk_nvme_cli_cuse.sh
+++ b/test/nvme/spdk_nvme_cli_cuse.sh
@ -93,6 +93,7 @@ set -e

 for i in {1..10}; do
 	if [ -f "${KERNEL_OUT}.${i}" ] && [ -f "${CUSE_OUT}.${i}" ]; then
+		sed -i "s/${nvme_name}/nvme0/g" ${KERNEL_OUT}.${i}
 		diff --suppress-common-lines ${KERNEL_OUT}.${i} ${CUSE_OUT}.${i}
 	fi
 done
--- a/test/nvmf/host/bdevperf.sh
+++ b/test/nvmf/host/bdevperf.sh
@ -22,6 +22,13 @@ function tgt_init()
 }

 nvmftestinit
+# There is an intermittent error relating to this test and Soft-RoCE. for now, just
+# skip this test if we are using rxe. TODO: get to the bottom of GitHub issue #1165
+if [ $TEST_TRANSPORT == "rdma" ] && check_ip_is_soft_roce $NVMF_FIRST_TARGET_IP; then
+	echo "Using software RDMA, skipping the host bdevperf tests."
+	exit 0
+fi
+

 tgt_init

--- a/test/unit/lib/nvme/nvme.c/nvme_ut.c
+++ b/test/unit/lib/nvme/nvme.c/nvme_ut.c
@ -63,6 +63,7 @@ DEFINE_STUB(nvme_transport_ctrlr_construct, struct spdk_nvme_ctrlr *,
 DEFINE_STUB_V(nvme_io_msg_ctrlr_detach, (struct spdk_nvme_ctrlr *ctrlr));
 DEFINE_STUB(spdk_nvme_transport_available, bool,
 	    (enum spdk_nvme_transport_type trtype), true);
+DEFINE_STUB(spdk_uevent_connect, int, (void), 1);


 static bool ut_destruct_called = false;
--- a/test/unit/lib/nvme/nvme_pcie.c/nvme_pcie_ut.c
+++ b/test/unit/lib/nvme/nvme_pcie.c/nvme_pcie_ut.c
@ -462,11 +462,11 @@ test_build_contig_hw_sgl_request(void)
 	CU_ASSERT(req.cmd.dptr.sgl1.address == tr.prp_sgl_bus_addr);
 	CU_ASSERT(req.cmd.dptr.sgl1.unkeyed.length == 2 * sizeof(struct spdk_nvme_sgl_descriptor));
 	CU_ASSERT(tr.u.sgl[0].unkeyed.type == SPDK_NVME_SGL_TYPE_DATA_BLOCK);
-	CU_ASSERT(tr.u.sgl[0].unkeyed.length = 60);
-	CU_ASSERT(tr.u.sgl[0].address = 0xDEADBEEF);
+	CU_ASSERT(tr.u.sgl[0].unkeyed.length == 60);
+	CU_ASSERT(tr.u.sgl[0].address == 0xDEADBEEF);
 	CU_ASSERT(tr.u.sgl[1].unkeyed.type == SPDK_NVME_SGL_TYPE_DATA_BLOCK);
-	CU_ASSERT(tr.u.sgl[1].unkeyed.length = 40);
-	CU_ASSERT(tr.u.sgl[1].address = 0xDEADBEEF);
+	CU_ASSERT(tr.u.sgl[1].unkeyed.length == 40);
+	CU_ASSERT(tr.u.sgl[1].address == 0xDEADBEEF);

 	MOCK_CLEAR(spdk_vtophys);
 	g_vtophys_size = 0;