This can be used for multipath validation.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Iba62c5e90b22a9a85078fbc783d3a7273c029cde
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10137
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
As per the nvme specs,
If OPTPERF is set to ‘1’ indicates that the fields
NPWG, NPWA, NPDG, NPDA, and NOWS are defined for this namespace and
should be used by the host for I/O optimization
Setting NPWA, NPDG, NPDA same as NPWG and NOWS same as MDTS
Fixes#2197
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Change-Id: Ic769a21b6821fa731eeae83e7d30c380e8092e37
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10007
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
We generally shouldn't do ERRLOGs based on bad
inputs from the host, so change some of these to
DEBUGLOGs instead.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Id42a8e51964e0238cd2841be8ac1d4ccaa1d150d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10037
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
When doing controller reset and shutdown, we may change the
CSTS.RDY and CSTS.SHN even there are pending IOs in the IO
queues, so here we add a timer in the reset and shutdown
callback, it will change the status when there are no
connected IO queues.
Fix#2199.
Change-Id: I3a54d30b257973661b269ad5e37637490f9390f4
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9908
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
SPDK nvmf target reports all listeners on all subsystems
in discovery pages, kernel target reports only subsystems
listening on a port where discovery command is received.
NVMEoF specification allows to specify any addresses/
transport types. Ch 5: The set of Discovery Log entries should
include all applicable addresses on the same fabric as the
Discovery Service and may include addresses on other fabrics.
To align SPDK and kernel targets behaviour, add filtering
rules to allow flexible configuration of what should be
listed in discovery log page entries.
Fixes#2082
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: Ie981edebb29206793d3310940034dcbb22c52441
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9185
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
This reverts commit 3b1f13ef29.
It seems like this particular commit is causing failures on the
CI side related to the following issue:
https://github.com/spdk/spdk/issues/2214
Reverting for now to make the CI stable.
Signed-off-by: Michal Berger <michalx.berger@intel.com>
Change-Id: I7f32c612b8549477de0b9079d114cec4ec9bda59
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9986
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
There is a race condition between controller destruction and
subsystem state change, e.g. admin qpair may already be freed
when a namespace is added or removed. As result in function
poll_group_update_subsystem we may get heap-use-after-free error
Another problem is that some qpair's live time may exceed controller's
life time. To avoid it, start controller destruction process when the last
qpair finished the disconnect process (previously controller started
the descruction process before the last qpair starts to disconnect
and it could lead to raise conditions)
Fixes#2055
Change-Id: I11612979eb914a5fdcc7b9e3c812bf1e450b6120
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8963
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Matt Dumm <matt.dumm@hpe.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
nvmf_ctrlr_cc_shn_done() and nvmf_ctrlr_cc_reset_done() are
almost same, so consolidate them together by adding a flag.
Change-Id: Ib7714d31a40f9d8d344ec2630c083c5d76dac8a1
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9907
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
We already support Set Features with Host Behavior, so here also
add the support in Get Features.
Change-Id: I27d973e81fd6be89cc67ad559439334fc1087c9e
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9818
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: GangCao <gang.cao@intel.com>
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Reviewed-by: John Levon <levon@movementarian.org>
We still don't support get log page with error
information LID.
Change-Id: I92db361dc956ea3ed4f6e7bdfdca763d0fea6886
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9583
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
When removing a listener, there's a small window when the listener is
already freed, while the controllers that were created on that listener
are still active. It happens, because the listener is removed before
disconnecting its qpairs and in turn destroying the controllers.
The ctrlr->listener pointer is now cleared when a listener is freed and
its value is checked for NULL before each use.
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: Iace65a46eebc5972cc4ef0f1c1822ffc5d3e559e
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9237
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
When a subsystem is being deleted, we disconnect all qpairs
and when the last qpair for some controller is disconnected,
we start controller desctruction process. This process requires
to send a message to subsystem's thread to remove the controller
from the list in the subsystem and after that send a message to
controller's thread to release resources.
The problem is that the subsystem also destroys all attached
controllers. This order is unpredictable and we may get
heap-use-after-free or double free.
To fix this problem we can rely on the fact that the subsystem
can only be destroyed in incative state, that means that all
qpairs linked to the subsystem are already disconnected and
all controllers are already destroyed or in the process of
destruction.
spdk_nvmf_subsystem_destroy API is now can be asyncrhonous,
it accepts a callback with cb argument.
Change-Id: Ic72d69200bc8302dae2f8cd8ca44bc640c6a8116
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6660
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Refine ANA state from per subsystem listener to per subsystem listener
per ANA group.
Add an array of ANA state per ANA group to subsystem listener. The array is
indexed by ANA group ID - 1.
Then in I/O paths, we get ANA state by
ctrlr->listener->ana_state[ns->anagrpid - 1].
The NVMe specification indicates the existence of NVM subsystem specific
ANA state when FFFFFFFFh is specified as NSID for the Get Features
and the Set Features commands. For these, we return the optimized state.
Update the nvmf_subsystem_get_listeners RPC to return all ANA states
of the underlying ANA groups. The nvmf_subsystem_get_listeners RPC is
not matured and not used in the test code yet. Hence compatibility is
not high priority.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ia2d4d5361ac01236f595c22765fd35e4c5fdee0e
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9064
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This is the first patch in the patch series to control ANA states not only
as a unit of subsystem listener but also as a unit of ANA group and create
user preferred mapping between namespaces and ANA groups within a single
subsystem.
This patch adds anagrpid to both spdk_nvmf_ns and spdk_nvmf_ns_opts, and adds
ana_group array to spdk_nvmf_subsystem to count number of namespaces per ANA
group within a single subsystem. The size of the ana_group array is equal
with the size of the namespaces.
For each subsystem, allocate ana_group array regardless of the value of
ana_reporting of the subsystem.
For each namespace, at its creation, initialize anagrpid explicitly to be equal
with nsid by default and increments the corresponding entry of the ana_group
array of the subsystem regardless of teh value of the ana_reporting of thee
subsystem.
Hence the contents of the created ANA log page is not changed even if the
algorithm to crete ANA log page is changed.
Additionally this patch adds a unit test case that one ANA group
has multiple namespaces.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I78539db4e7248c2953c6927ff8128cb5a7e34b96
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9102
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Only one active socket connection is supported in libvfio-user,
RESERVATION should not be supported in this case.
Change-Id: I36a746f479da767d857425e84dfed5f2bef30b17
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9124
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Windows will always sends a Set Feature Interrupt Coalescing even
SPDK reports we can't support it in Get Feature command. Here
we return Feature Not Changeable instead of Invalid Field which
is more meaningful.
Change-Id: Ie08086c3eba1e2d790a7ae4976653b6f9085028c
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8923
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
The data buffer isn't available at the beginning.
Change-Id: Ieeb1a297ff52dfdc6cd999d04862a0cd96483650
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8932
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Implemented nvmf code to allow transports to use ZCOPY. Note ZCOPY
has to be enabled within the individual transport layer
Signed-off-by: matthewb <matthew.burbridge@hpe.com>
Change-Id: I273b3d4ab44d882c916ac39e821505e1f4211ded
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6817
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Xiaodong Liu <xiaodong.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Since we are using NVMf fabric library to emulate a PCIe based SSD via
vfio-user target, so there maybe some commands that are related with
PCIe SSD only, such as set/get features with interrupt coalescing
and Interrupt Mask Set/Interrupt Mask Clear registers. Even the
NVMf library doesn't support that, it is not a fatal error to Host
NVMe driver, so here we use the info log instead of error log for
this case so that to avoid noise logs.
Fix#2036.
Change-Id: I8283bcde5779080835d6ab827dbd852b3816176f
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8766
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: <dongx.yi@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
The NVMf library will not implement interrupt coalescing and ignore them, but we can
report this via get_features.
Some OS may check the result from get_features so that it will not send set_features
for interrupt coalescing.
Change-Id: I7466bcbc0ea5b3b067751cdf1979b2e0681c0043
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8765
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Bit 1 in the CMIC of the Identify Controller Data Structure specifies
if the NVM subsystem may have multiple controllers or not.
However, multi_host indicated a particular use case such that the NVM
subsystem is used by multiple hosts.
multi_ctrlr will be more appropriate.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I0246096a5cc44721aeff3ff6f96473a2abe11964
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8719
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
The vfio-user target emulated NVMe device is treated as
PCIe NVMe SSD in the Guest VM, so when doing controller
reset or shutdown, we should abort the AERs which in the
NVMf library.
Users may switch kernel NVMe driver to SPDK NVMe driver
in the VM, without this fix, we will got "AERL exceeded"
response very frequently, because the AERs submitted by
previous driver will never be aborted in runtime.
Change-Id: I0222ed509629ccb0e98217414dd9043857105686
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8558
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
When users remove kernel NVMe driver in the VM, after 120 seconds,
SPDK NVMf target will disconnect ADMIN queue pair due to association
timer timeout, and for vfio-user transport, the ADMIN queue pair
connection is associated with the socket connection, so when probing
the NVMe controller again, because there is no active ADMIN connection
for fabric register R/W commands, it will cause segment fault.
Here we set the association timeout value to 0 for vfio-user transport,
so that the ADMIN connection will not be disconnected when shutdown the
controller, the ADMIN queue pair will be disconnected when the socket
connection breaks.
Change-Id: I3613169229bae384405889653e50f581d30d7c07
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8557
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
It is better to not fail connect commands when a subsystem
is not ready. The host will not be expecting that and will
typically treat it as a catastrophic failure (i.e. it won't
retry the connect).
So instead when this situation occurs, start a poller for
the connect request. We will continue to retry processing
it until the subsystem is ready to handle it.
Fixes issue #1985.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Id8835df8f0edf1e889fdd7e754e261c2a880cbb6
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8571
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
When the property is 8 bytes but the host only requested
4, we need to mask and only return the bytes requested
by the host. Wait to do the DEBUGLOG until after
that has happened.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I8f476a47e9fd07bf652fd64f3b1c17d650374167
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8506
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: <dongx.yi@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Commit b7cc4dd added support multiple AERs, but didn't
remove the code that hardcodes aerl=0 for non-discovery
controllers. So even though the target now supports
multiple AERs, we never indicate that for non-discovery
controllers.
The spec also recommends that implementations support
a minimum of 4 AERs - so the current behavior is not
recommended.
It seems that at least on Windows (when testing with
vfio-user transport) we see the limit get exceeded
which results in ERRLOGs. Let's keep the ERRLOG there
for now, assuming that once we report we support 4
AERs that Windows won't try to send more than that.
Fixes issue #2000.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I6a07a6f37aaa6e531ae2cf1e1c46da036b00785b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8488
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: <dongx.yi@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
We cannot control what the host may send to the target.
For example, we have empirical evidence that Windows
will send vendor-specific IDs for features and log pages
(when testing with the vfio-user target transport).
So let's change the ERRLOGs in these cases to DEBUGLOGs.
Fixes issues #2004, #2007, #2008.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I8d5b92fc5e33d698af246f2f1c34f7cf51e6488a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8487
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: <dongx.yi@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
nvmf_ctrlr_get_log_page used req->data to store the log page result.
While the req->data only contains the first iov, if req->iovcnt is
larger than 1, the req->data may not hold the complete log page; and
even worse, the log page result may be written to invalid address and
cause memory corruption.
Change-Id: Ie6415a6bd2327419fe4b32f21ac814fd827c9e95
Signed-off-by: Jiewei Ke <jiewei@smartx.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7970
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
POSIX defines %z for printing size_t values in a portable way.
Replace a reference to %ld to remove the assumption about
the type of size_t.
Signed-off-by: Nick Connolly <nick.connolly@mayadata.io>
Change-Id: I2186aa5e7072f565ea75de935e22c2c23acf1a1a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8341
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
The traces are tracking the lifecycle of a poll group: creating it,
adding and disconnecting qpairs, and finally destroying the group.
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: I075b7f24d14b8fbb42bb18ddd70a668a8bace118
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7158
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
vfio-user transport `req_complete` callback will zero the internal
NVMe command and response fields, the common NVMf library should
not use them after the callback, so here we use stack variables
to save them before the `req_complete` callback.
Fix issue #1965.
Change-Id: Iff2342b6095d9496cdf112d657a0a99ce1fb5d12
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8129
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Karol Latecki <karol.latecki@intel.com>
Reviewed-by: <dongx.yi@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
In struct spdk_nvmf_poll_group_stat, there are statistics of cumulative IO and
admin queue pair counts. But current qpair counts are not reflected. Use
this patch to add current admin and io qpair counts for a poll group.
Signed-off-by: Rui Chang <rui.chang@arm.com>
Change-Id: I7d40aed8b3fb09f9d34e5b5232380d162b97882b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7969
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Eugene Kochetov <evgeniik@nvidia.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: GangCao <gang.cao@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
nvmf_get_ana_log_page used req->data to store the log page result.
While the req->data only contains the first iov, if req->iovcnt is
larger than 1, the req->data may not hold the complete log page; and
even worse, the log page result may be written to invalid address and
cause memory corruption.
The following patch will fix the same issue for other commands in
nvmf_ctrlr_get_log_page.
Fix#1946
Signed-off-by: Jiewei Ke <jiewei@smartx.com>
Change-Id: I495f3be05c82be5cd53609772c655c8924b9179f
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7923
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Community-CI: Mellanox Build Bot
Support ACRE (Advanced Command Retry Enable) feature. Currently set
all crdt to 0 and only select crdt[0] (crd=1) when the IO has any
error.
Signed-off-by: Peng Yu <yupeng0921@gmail.com>
Change-Id: If7bc30f91f5b2d0839002dead17188a4b3a52d5d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7885
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This patch fixes missing free of qpair_mask when a listener error
occurs in ctrlr_create.
Signed-off-by: Jonas Pfefferle <pepperjo@japf.ch>
Change-Id: I09162b86d8ac73bf9fc2006a08dcc0a955f222b3
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7818
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Similar issue was fixed in
813869d823
nvmf: Fix possible race condition when adding IO qpair
This patch fixes the same issue which occurs a bit later,
when a message is delivered to another thread. This issue
occurred on CI, callstack is the following:
00:11:46.296 #6 0x00007f2705199f05 in __ubsan_handle_type_mismatch_v1 () from /lib64/libubsan.so.1
00:11:46.296 No symbol table info available.
00:11:46.296 #7 0x00007f27067ace6f in ctrlr_add_qpair_and_update_rsp (qpair=0x221edc0, ctrlr=0x1dc4ea0, rsp=0x2242918) at ctrlr.c:230
00:11:46.296 __PRETTY_FUNCTION__ = "ctrlr_add_qpair_and_update_rsp"
00:11:46.296 __func__ = "ctrlr_add_qpair_and_update_rsp"
00:11:46.296 #8 0x00007f27067b1d0b in nvmf_ctrlr_add_io_qpair (ctx=0x2242540) at ctrlr.c:534
00:11:46.296 req = 0x2242540
00:11:46.296 rsp = 0x2242918
00:11:46.296 qpair = 0x221edc0
00:11:46.296 ctrlr = 0x1dc4ea0
00:11:46.296 __func__ = "nvmf_ctrlr_add_io_qpair"
00:11:46.296 #9 0x00007f27062553ce in msg_queue_run_batch (thread=0x1cff540, max_msgs=8) at thread.c:553
where line 230 in ctrlr.c was
assert(ctrlr->admin_qpair->group->thread == spdk_get_thread());
That means that admin qpair was disconnected from the poll
group and controller is in the process of destruction
Change-Id: I818ba56adda5ed3488a8df78483c0b6839758192
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6364
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Aspects of bit fields are 'implementation defined'. On some platforms
alignment will occur if two adjacent fields are of different types. This
occurs in spdk_nvme_feat_async_event_configutation after the crit_warn
member which is effectively an int8_t, followed by an int16_t. There
isn't a generic way of changing the compiler's behaviour, so the best
options are:
- Change crit_warn to a uint32_t bit field and copy the value to/from
a spdk_nvme_critical_warning_state variable to use it. This requires
changes to code using the field.
- Adjust the structure definition to use smaller types to avoid the
problem. This preserves existing semantics, but the field order will
need to be reviewed if big-endian support is ever added (other places
in nvme_spec.h will need similar attention). A second reserved field
is required.
Use smaller types which seems the most straightforward option. Adjust
the use of the spdk_nvme_feat_async_event_configuration reserved fields
in lib/nvmf/ctrlr.c.
The new structure is binary compatible and the fields behave in the same
way, with the exception of an additional reserved field, so updating
CHANGELOG.md probably isn't necessary.
Signed-off-by: Nick Connolly <nick.connolly@mayadata.io>
Change-Id: I7d8163c84b4f410fc95a5b7064506ad7b4b62c6c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6340
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Additionally, the user can specify a namespace to also pause during the
operation.
This allows for the management of hosts, listeners, and the addition of
namespaces all while I/O to other namespaces is occurring. Pausing a
specific namespace also allows for the removal of that namespace without
impacting I/O to other namespaces in the subsystem.
Change-Id: I364336df16df92fe2069114674cb7a68076de6fb
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4997
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
There is a chance that admin qpair is being destroyed at
the moment when IO qpair is added to a controller due to e.g.
expired keep alive timer. Part of the qpair destruction process
is change of qpair's state to DEACTIVATING and removing it
from poll group. We can check admin qpair's state and poll
group pointer before sending a message to poll group's thread
and fail connect command.
Logs and backtrace from one CI build that hit this problem:
00:10:53.192 [2021-01-22 15:29:46.671869] ctrlr.c: 185:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host from subsystem nqn.2016-06.io.spdk:cnode1 due to keep alive timeout.
00:10:53.374 [2021-01-22 15:29:46.854223] ctrlr.c: 185:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host from subsystem nqn.2016-06.io.spdk:cnode2 due to keep alive timeout.
00:10:53.374 ctrlr.c:587:41: runtime error: member access within null pointer of type 'struct spdk_nvmf_poll_group'
00:10:53.486 #0 0x7f9307d3d3d8 in _nvmf_ctrlr_add_io_qpair /home/vagrant/spdk_repo/spdk/lib/nvmf/ctrlr.c:587
00:10:53.486 #1 0x7f93077ea3cd in msg_queue_run_batch /home/vagrant/spdk_repo/spdk/lib/thread/thread.c:553
00:10:53.486 #2 0x7f93077eb66f in thread_poll /home/vagrant/spdk_repo/spdk/lib/thread/thread.c:631
00:10:53.486 #3 0x7f93077ede54 in spdk_thread_poll /home/vagrant/spdk_repo/spdk/lib/thread/thread.c:740
00:10:53.486 #4 0x7f93078366c3 in _reactor_run /home/vagrant/spdk_repo/spdk/lib/event/reactor.c:677
00:10:53.486 #5 0x7f9307836ec8 in reactor_run /home/vagrant/spdk_repo/spdk/lib/event/reactor.c:721
00:10:53.486 #6 0x7f9307837dfb in spdk_reactors_start /home/vagrant/spdk_repo/spdk/lib/event/reactor.c:838
00:10:53.486 #7 0x7f930782f1c4 in spdk_app_start /home/vagrant/spdk_repo/spdk/lib/event/app.c:580
00:10:53.486 #8 0x4024fa in main /home/vagrant/spdk_repo/spdk/app/nvmf_tgt/nvmf_main.c:75
00:10:53.486 #9 0x7f930716d1a2 in __libc_start_main (/lib64/libc.so.6+0x271a2)
00:10:53.486 #10 0x40228d in _start (/home/vagrant/spdk_repo/spdk/build/bin/nvmf_tgt+0x40228d)
Change-Id: I0968eabd1bcd532b8d69434ad5503204c0a2d92b
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6071
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: <dongx.yi@intel.com>