PCIe transport IDs are a non-standard extension to the NVMe-oF transport
address, and they only use the transport type and address fields of the
structure. Add a special case so that the rest of the fields are
ignored for PCIe addresses. All other transport types are NVMe-oF
addresses and should compare all fields.
Change-Id: I45ed143ea1712d17c6de8082677deeefd395c8a2
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-on: https://review.gerrithub.io/365916
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Sync the spec header with the major changes for NVMe 1.3.
Some of the added fields may have been added in previous
versions of the spec.
Change-Id: Ia50a52f5192cf450bb5cc2d18fcc1f92ebce7f77
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-on: https://review.gerrithub.io/362046
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I941a4cb5563cebb2e68b48d3a74b4b73af0e9657
Reviewed-on: https://review.gerrithub.io/365662
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Change-Id: I1e272c63a5d863c92f1aa8299a9d98dcb72d0b13
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-on: https://review.gerrithub.io/365082
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ziye Yang <optimistyzy@gmail.com>
These commands should be treated as aborted by spec,
so correctly deliver abort notifications when a
qpair is deleted.
Change-Id: I8af47a3f42f5695ef8e1a70813662e69102720b2
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/364681
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Preivously, we can only get part of the log page
entries due to the limitation of buffer. With this
patch, we can get all.
Change-Id: I81b8afc73c108dac65d1ff5fe8d03bda41fa29a1
Signed-off-by: Ziye Yang <optimistyzy@gmail.com>
Reviewed-on: https://review.gerrithub.io/363323
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Also provide an option in perf tool let users to
disable it.
Change-Id: If4952513d77cecaa4f9403fbea811d86916ee87c
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/363311
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
- rename spdk_malloc_socket to spdk_dma_malloc_socket
- rename spdk_malloc to spdk_dma_malloc
- rename spdk_zmalloc to spdk_dma_zmalloc
- rename spdk_realloc to spdk_dma_realloc
- rename spdk_free to spdk_dma_free
Change-Id: I52a11b7a4243281f9c56f503e826fd7c4a1fd883
Signed-off-by: John Meneghini <johnm@netapp.com>
Reviewed-on: https://review.gerrithub.io/362604
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
An ECN to the NVMe 1.3 specification has clarified that the NQN may
contain 223 bytes before the null terminator. Make all of our NQN
length checks consistently enforce this behavior.
Change-Id: Iebfd57d11abea64964c7a6ad9d886e40efa243c3
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Unit tests will be added as part of a separate patch updating all UT for
nvme.c. Global used for timeout value so it can be easily overwritten
by the upcoming unit tests for this function.
Change-Id: I7fc15aab91601ac57c94cae266b212c0998d2495
Signed-off-by: paul luse <paul.e.luse@intel.com>
The VirtualBox emulated NVMe device will intermittently
hang on the first read/write command after an I/O
qpair has been allocated. The frequency of the hang
diminishes if a delay is added after allocating the I/O
qpair - until it disappears completely with a 100us delay.
So add a quirk to insert this delay.
Note - the 100us delay was tested by running
the hello_world example app 50000 times.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I237e31b1b8a1a1e28262851ae0a21cd7345f0f1a
Fixes a scan-build warning about using qpairs after they have been
freed.
Change-Id: I263eabd6b784acf540c66136965f7705ef110a78
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Fix up the existing comment blocks misaligned in the first column.
Also add line numbers to the comment checks.
Change-Id: I9d28c365271df36e7013d74cbb02d0023ab4f581
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
A 100us is so small that applying the quirk to the specific
SSDs that require the delay is more trouble than it is worth.
So remove the quirk and always wait 100us before re-enabling
the NVMe SSD during initialization.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Id6a8cc6e35d103fffdf135580301fc3e5b27e722
Also avoid an spdk_get_ticks() call in the default
case where a timeout_cb_fn is not defined.
On my Intel(R) Xeon(R) E5-2699 v3 system with an
Intel(R) P3700 SSD, these modifications reduce software
overhead per I/O by 3-5% (as measured by the SPDK
overhead tool).
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I5bb5a87747b15d9e27655fabcd2bc1a40b0b990e
Queue aborts that would exceed the abort command limit
in software as a convenience for the user.
Change-Id: I8c1f0380984cc6c0cdb453db961939a7f571b336
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Instead, pass NULL when an ADMIN command times out.
We don't expose the admin queue to the user.
Change-Id: If0768d329a689f6f7c3734c9d419e680d7378ed1
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
For each command that times out, call the timeout
callback one time if the user registered one.
Change-Id: Iaad39a886468e89bef63fe292c5cad1dce97a57c
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
It has been discovered that some devices require
a very small delay before writing CC.EN to 1 after
CSTS.RDY goes to 0.
Change-Id: I73d31726d17ebf5bbec7ee528e2f98fcd05234dd
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This isn't the indentation pattern I would have chosen, but
it's a complicated negotiation between what I want and what
astyle will let me get away with.
Change-Id: I4909587823931842ac3f227134e1d05e7d80da74
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Any Intel device reporting device ID 0x0953 needs this quirk.
Change-Id: I690b01ecf05105df00ec8cf6f2da7f7c0a601aa8
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This prevents a deadlock if the user immediately
calls spdk_nvme_detach.
Change-Id: I79f28abe163cbbf184bea907692c44aa4e1c8893
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The message about the uevent socket is not a fatal error; it just means
that hotplug monitoring will not work.
Change-Id: I29f6a253e96a86420c0fde9e19135f9f1d229bb9
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
remove the unnecessary rte_eal_pci_probe_one() in function
spdk_pci_device_detach(), this could cause error message when we
terminate the application, it will also not make sense try to probe one
device after we detach it, we could call spdk_pci_nvme_device_attach()
instead of spdk_pci_nvme_enumerate() when we have one given device address,
dpdk will try to scan the device and add it back to pci device list then.
Change-Id: I35f5bb412249bb20da57394f0531c10a49691906
Signed-off-by: Cunyin Chang <cunyin.chang@intel.com>
Register all spdk_malloc() memory regions as ibv_mr in a spdk_mem_map
so we can look up the RDMA key for the user's buffer and pass it in the SGL
directly, rather than copying through a pre-registered bounce buffer.
Change-Id: I7340bc2020b5256750c95dbd24ba67961404e5e7
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The extended LBA format flag should be initialized after namespace
capability flag.
Change-Id: Iad479b454bb4e31120c17d40ae23937a099c6f8f
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
The send completions must be processed prior to the
recv completions. However, if the completion queues
are separate this leaves a small window where
a send+recv completion arrive between polling
the send_cq and the recv_cq, resulting in the code
seeing the recv completion prior to the send
completion.
By combining the completion queues, this eliminates
any potential gap. The send completion will always
be processed before the recv completion.
Change-Id: I06bfef6af48559d0b9e00524ebc10f1a102e7387
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The actual uses of intrinsics are already guarded by feature-specific
ifdefs in nvme_pcie_copy_command(), but the header itself should also
only be included when it will actually be needed.
Change-Id: Ife65d6432b8dfd9d9db80fe4e385ab76491874c0
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
These allocations need to be from memory registered with the SPDK env
library to allow future work on automatic ibverbs memory registration.
Change-Id: I6ec6999ecd6d6bf6ba4ab159630f7d01f3d46154
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This can be used for issuing an abort for the timed-out command.
Change-Id: I3c5727fdddc156cd7c8f99afbc3e6da8e73bba56
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Move the ibv_recv_wr initialization in
nvme_rdma_alloc_rsps. Thus we can save some
CPU times
Change-Id: Id449b2684290431f8b3ba97ec4058171d34038bf
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
We do not need to set it for submission since the contents
are same
Change-Id: I345094e2e8a858b318be73d28f09393566587d95
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Move cb_arg to the first argument to match the other NVMe callback
function signatures.
Change-Id: I4e699c8071dcb7ba4ce3cdb82ee985600208204c
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This has been reported for a number of different device
types. We suspect these devices are technically out of
spec, but they work with most other available NVMe
drivers on accident.
Change-Id: I529cfc03fc314cbab2a1cd40620bf1dd5b54182d
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
For namespaces with end-to-end protection information, metadata size
of exactly 8 bytes, and extended LBA configured, the NVMe driver would
calculate the size of the data block incorrectly. The NVMe spec has a
special provision for this specific case (8-byte metadata only) and
PRACT = 1 that requires that the host does not send the metadata as part
of the host memory buffer.
To fix this, clean up the calculation of the per-block data transfer
size by adding a new extended_lba_size field in the namespace, which
represents the total size of data to be transferred per block based on
the namespace's configured metadata size and whether it transfers
metadata as part of the data buffer. Then add the special case for
PRACT = 1 and PI configured and extended LBA in the R/W helper
functions.
Change-Id: I0b383a58c773cac06e6c018858b57129064c6059
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
These were repeated a few different places, so pull them into a common
header file.
Change-Id: Id807fa2cfec0de2e0363aeb081510fb801781985
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This removes one addition from the submission path (negligible, but a
nice side effect), but also opens up the possibility of reporting the
total time an I/O took - since we are always tracking the submission
time anyway, there is no extra cost to report it in the completion
callback.
Change-Id: I7129e7c09d20da8082042a7622d045846461dd9c
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
For infinite timeout states, instead of printing UINT64_MAX as a
decimal number, interpret it as "no timeout" instead.
Change-Id: I579f5857f96286734940ab5f493261e60354c4fe
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The PCIe transport initializes the quirks directly, so the generic hook
to get PCI ID is no longer necessary. This path was dead code.
Change-Id: I25bdaa598db53e4312a264d9d8356d1b416696e5
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The logic to fail queue pairs when the controller is failed should be
handled in the generic code, not in the individual transports.
This also allows nvme_qpair_fail() to be private to nvme_qpair.c.
Change-Id: I6194576dceb35073b9af8847e59314900028637c
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This is more CPU efficient than only grabbing one
completion per call to ibv_poll_cq.
Change-Id: I0c70d33639f0f345482d9e7c810f9c6723937058
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
In our previous code, we did not ack the event in
exceptional cases when we get a event via rdma_get_cm_event.
Thus, the code may block with in this statement:
rdma_destroy_id(rqpair->cm_id);
in some exceptiaonal cases. And this patch will solve this
issue.
Change-Id: Iddb6fb5356a5ee0ed04e261a040ba53042fca302
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
This make sure the qpair failure could be started from upper level application.
Change-Id: I7e04fe36929cc634ddf0078db96fbc40afb38f8c
Signed-off-by: Cunyin Chang <cunyin.chang@intel.com>
The g_thread_mmio_ctrlr should be not NULL pointer when it enter the
handler function.
Change-Id: I45dba601c672b16e2c6feafd9059bafde0d8f1b4
Signed-off-by: Cunyin Chang <cunyin.chang@intel.com>
If namespace is formatted with per lba metadata feature and also disable end-to-end protection
feature, host couldn't use per extended-lba metadata area.
Signed-off-by: Zhihao Zhang <thomas.zzh@alibaba-inc.com>
If the user asked for a specific PCI address in spdk_nvme_probe(), we
need to return 1, not 0, for the other PCI addresses that don't match
when enumerating. 0 means to attach the PCI driver, whereas 1 means to
continue enumerating.
With the previous behavior of returning 0, all NVMe devices would be
attached to the DPDK PCI driver, even if the user did not request for
them to be probed, and further calls to spdk_nvme_probe() would not find
any devices.
Change-Id: Ifbbcd7d1abe8ab535b6957855172e66a3e69fbe4
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Currently we use the pci functions provided by DPDK,
it identifies the device by class id related
info but not by pci bdf info, so we can add the filering
by pci_addr in pcie_nvme_enum_cb function.
Change-Id: I5942e98853f00fc10fa6aae5c113517653d1b357
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Since nvme_ns_cmd.c now walks the SGL, some of the test code
needs to also be updated to initialize and return correct values
such as ctrlr->flags and sge_length.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I521213695def35d0897aabf57a0638a6c347632e
Preivously, we only supports probe the NVMf target
via discovery info, now we can support to directly
to connect it.
Change-Id: I08ce1d95de6744286357e68b48c97b773b902ac8
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
(1) Add nvme_rdma_build_sgl_request function
(2) Merge nvme_rdma_pre/post_copy_mem to nvme_rdma_copy_mem
Change-Id: I86abab821b32b4da0aa9489a6b9f7dc430333159
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Now that the hotplug code is isolated in nvme_pcie.c, it can call the
PCIe transport attach function directly.
Change-Id: I2df3b9168473b537cc9b13367e06d3d3b6fa22be
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The spdk_nvme_qpair::num_entries value is never used in the common code,
so move it to the individual transport qpairs to make it clear that it
is a transport-specific implementation detail.
Change-Id: I5c8f0de4fcd808912ba6d248cf5cee816079fd32
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The definition of SPDK_UNREACHABLE uses the build-time DEBUG definition,
which is not available in the public API.
Change-Id: I1862c99fa5c85ccd3483f94e9c35de531da57f3c
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Instead of passing the work completion, just pass the
response index. This keeps the work completions localized
to the polling function.
Change-Id: I0e6a1d8564200b5ac3aa43dfd58ae152d439bbd8
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This eliminates an if statement, since the two callers
of this function know the desired queue size.
Change-Id: I28fabac8613f7b8fc7d96cf95b085b6e4dcf985f
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Just call the regular qpair create function instead.
Change-Id: Ic35b1eb6fcdf0d82733ea573a493f583dd63d5bd
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Use the num_entries value in the generic qpair instead. These
values had to match anyway.
Change-Id: Ia6400fbaba97df3ef6db4dc07a2ab95af1e5143f
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Avoid an extra level of pointer chasing when we are filling out the NVMe
SGL.
Change-Id: I1a40af16fda80f7480c419524876bfb1a1902eb8
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This will allow it to be better be reused for some future patches
enabling splitting of non-PRP-compliant SGL-based requests.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ica38fd6cf191f72baa524bcc4896b3c9939ab762
This intermediate function is no longer needed.
Change-Id: I3523cc6d8f3b290165a953d42cca8b76eda762c5
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Sending the fabric connect command is part of establishing
a connection, so move it into the main connection-establishing
function.
Change-Id: I55e7ffdd16b576c81b51d7d3910203f9afc1f4c2
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This function initializes the members of an existing
qpair struct. It doesn't construct one from scratch.
Change-Id: I0b9afac1ad25cfb217efd146702f693c74f5f697
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
No need to allocate all of the requests and responses until
we know a connection can be established.
Change-Id: I072a10aadfd7ced773634448f7d7e788622d0a4c
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The code is clearer if this function is incorporated
into its only caller.
Change-Id: I33901cddf80ae27896b2acfd1b9e7d212f21f5f3
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This is resolving the address and route to the target, not
binding a socket to an address.
Change-Id: I80055481ed2e020410a1e186a4e7371b60faaee9
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
They were very close to the same already, so finish the job.
Change-Id: Ifba9e3b2d11a3e70cbfbe46f57a67552db2757ed
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
We should be sending the bounce buffer's remote key to the target so it
can put it into an RDMA SGE on the remote side.
Change-Id: Icded155ad2292c67baa722f001c9c07178bc2754
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
There is no particular reason for this to be 127; make it 128 to at
least be consistent with the PCIe transport.
Change-Id: I60500e0044d3549ba6350e1f35f09d624848bd21
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This function was only called from one place and saved no
lines of code.
Change-Id: If5e653732df57c1f2c93e20cf4f286eac31df91c
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This matches the behavior of nvme_pcie, which queues a request if no
tracker is available.
Change-Id: Idbf6c951c89451cfea22ec6bc553ff46f988f818
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>