1235 Commits

Author SHA1 Message Date
Shuhei Matsumoto
7ee58b90e1 nvmf/tcp: Set DIF context to PDU when processing in-capsule, C2H, or H2C data
Set DIF context of the corresponding request to PDU when
- processing in-capsule data of the command,
- processing data of C2H PDU, or
- processing data of H2C PDU.

Change-Id: I3a668a55be21dbe2ee6ecf26476290670bd7b4a8
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/458929
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
2019-07-11 05:30:28 +00:00
Shuhei Matsumoto
e3e023cfd3 nvmf/tcp: Increase in-capsule buffer size to fill DIF fields
When NVMe/TCP initiator transfers in-capsule data, NVMe/TCP has to
process it as in-capsule data. If DIF insert/strip is enabled,
in-capsule data size will be increased by NVMe/TCP target to insert
metadata. However size of in-capsule data buffer had not been
increased, and buffer overflow occurred when NVMe/TCP initiator
transfers in-capsule data to NVMe/TCP target with DIF insert/strip
being enabled.

This patch increases size of in-capsule data buffer size to store
metadata. 16 byte metadata per 512 byte data block is the current
maximum ratio of metadata per block.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I88b127efd7a945bde167a95df19a0b9175cb8cd0
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/461333
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
2019-07-11 05:30:28 +00:00
Shuhei Matsumoto
9d4ee5f344 nvmf/tcp: Fix wrong data offset in nvmf_tcp_pdu_payload_insert_dif
We updated readv_offset before generating DIF to avoid adding
the temporary variable _rc in the previous patch, but that caused
write error when inserting DIF.

Fix the bug in this patch.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Id0788280a83cbea2554c851db77751432fc00cba
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/461116
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2019-07-11 05:30:28 +00:00
Shuhei Matsumoto
2c9b0af271 nvmf/tcp: Get DIF context when handling capsule command header
When handling the capsule command header, call spdk_nvmf_request_get_dif_ctx
by passing the NVMf request and the reference to the DIF context, and set
the flag dif_insert_or_strip of the NVMf/TCP request to true.

spdk_nvmf_request_get_dif_ctx returns false immediately when the
corresponding NVMf controller disables DIF insert/strip.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I16f6b322f2692d5f9653d011a490e7929ec37365
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/458928
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
2019-07-11 05:30:28 +00:00
Shuhei Matsumoto
1c7f92f075 nvmf: Hide DIF setting of the backend bdev if DIF insert/strip is enabled
When the NVMf controller's flag dif_insert_or_strip is enabled, DIF is
inserted for write I/O and stripped for read I/O, and the corresponding
NVMe-oF initiator should not be aware of the DIF setting of the
backend bdev.

Hence this patch hides the DIF setting of the backend bdev
when the flag dif_insert_or_strip is enabled.

Change-Id: I3c14880c2e94cba7f76b1bca78afb36bfe884e26
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/456731
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
2019-07-11 05:30:28 +00:00
Shuhei Matsumoto
4ff3665ce9 nvmf: Check DIF insert/strip setting of NVMf controller when getting DIF context
The first idea was that the caller of spdk_nvmf_request_get_dif_ctx()
should check if the current transport enables DIF insert/strip before
calling spdk_nvmf_request_get_dif_ctx().

But NVMf controller knows if DIF/insert/strip is enabled now by the
previous patch. Hence spdk_nvmf_request_get_dif_ctx() checks if the NVMf
controller enables DIF insert/strip at its head.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I78253d356b694800c3a9a9608514df58e0c631a6
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/461314
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
2019-07-11 05:30:28 +00:00
Shuhei Matsumoto
91da9aaafe nvmf: Add a flag dif_insert_or_strip to struct spdk_nvmf_ctrlr
Add a flag dif_insert_or_strip to struct spdk_nvmf_ctrlr that indicates
whether DIF insert/strip is done.

Copy the DIF insert/strip setting of the corresponding transport options
to the flag at NVMf controller creation.

The purpose of this patch is to make DIF insert/strip not per-transport
option but per-controller option because we may want to be able to
control DIF insert/strip per controller at some point. Besides this patch
will clean the implementation.

Besides align indent around the change.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I57f65960b430e55f4021ed514aacd85581ff9993
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/461313
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
2019-07-11 05:30:28 +00:00
Ziye Yang
750a4213ef nvmf: add spdk_nvmf_get_optimal_poll_group
This patch is used to do the following work:

1 It is optimized for NVMe/TCP transport. If the qpair's
socket has same NAPI_ID, then the qpair will be handled
by the same polling group.

2. We add a new connection scheduling strategy, named as
ConnectionScheduler in the configuration file. It will be
used to input different scheduler according to the customers'
input.

Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: Ifc9246eece0da69bdd39fd63bfdefff18be64132
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/454550
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2019-07-10 02:30:41 +00:00
Ziye Yang
960460f0d1 nvmf: add spdk_nvmf_transport_get_optimal_poll_group
Add the optimal poll group get function.

Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: Ia9e57c6924a6563d79269cf535814883e83698cd
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/454549
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2019-07-10 02:30:41 +00:00
Ben Walker
09ef0593d4 nvmf: Leverage bdev uuid to correctly detected remove+add ns while
paused

Change-Id: Idbf00956394f7ee7ff7e27f2627785cd7146b01f
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/459605
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
2019-07-10 01:59:05 +00:00
Ben Walker
85e9760161 nvmf: Capture ns_info onto stack in poll_group_update_subsystem
By capturing this pointer onto the stack, we inform the compiler
that we don't expect it to change. That allows the compiler to
generate more efficient code.

Change-Id: I0f3ff9373662198e915269c4498e4902a2cdb808
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/459754
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
2019-07-10 01:59:05 +00:00
Ben Walker
ab3abc15aa nvmf: Capture channel variable to stack when updating poll groups
This signals to the compiler and analysis programs that this
won't change during iteration, so it may produce better code.

Change-Id: I478c0c9445d4ddf8a69ab1b3deaf628b82a0eaea
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/459753
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
2019-07-10 01:59:05 +00:00
Changpeng Liu
7b74274fbf nvmf: add parameter check when loading reservation information from a JSON file
Change-Id: Id217212fd82e57a4cfb32f62f11798c72187879e
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/460794
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-07-10 01:40:26 +00:00
Shuhei Matsumoto
aa322721cb nvmf: Add dif_insert_or_strip to transport options
This is a place holder and subsequent patches will use the option
dif_insert_or_strip and provide JSON RPCs to configure it.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I7e3fbb1d49c47647a9a0a1a2149152801591b283
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/456452
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2019-07-10 00:43:02 +00:00
Shuhei Matsumoto
ddb680ebab nvmf: Add helper function to get DIF context from NVMf request
Add a helper function to get DIF context when the passed NVMf request
is for I/O queue, NVMe read, write, or compare command, and its NSID
is valid.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I796c20607c7b64a8be85da5131c5ea95ffd9f8e4
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/458713
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
2019-07-10 00:43:02 +00:00
Shuhei Matsumoto
9b04e29173 nvmf: Add helper function to get DIF context from bdev and NVMe cmd
Add a helper function to get necessary DIF information and set
them into the passed DIF context and return. This function will
be called only when the specific requirement is satisfied and
the caller will be added in the next patch.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ic435886ca936a211f34278b813f547ffa43b9000
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/458712
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
2019-07-10 00:43:02 +00:00
Shuhei Matsumoto
7bfbc388d7 nvmf/tcp: Pass extended LBA based length as I/O length to NVMf controller
When DIF is inserted or stripped,
- in the TCP transport layer, we can use LBA based length throughout, but
- in the NVMf controller layer and BDEV layer, extended LBA based
  length must be used, and NVMf controller gets the length from
  tcp_req->req.length.

Hence by adding and using two variables, elba_length and orig_length
to struct spdk_nvmf_tcp_req, set the extended LBA length to
tcp_req->req.length before calling spdk_nvmf_request_exec(), and then
restore the original LBA based length to tcp_req->req.length after
calling spdk_nvmf_tcp_req_complete().

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I9309b8923c6386644c4fd8ef3ee83a19f5d21ce5
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/458926
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-07-09 03:39:25 +00:00
Shuhei Matsumoto
51b643648c nvmf/tcp: Increase buffer to insert/strip DIF in spdk_nvmf_tcp_req_parse_sgl
If tcp_req->dif_insert_or_strip, increase the length from LBA based
to extended LBA based by using its own DIF context.

Change-Id: Ie9f5cf757328dda795b43a7b6c70a72259865115
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/458925
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2019-07-09 03:39:25 +00:00
Shuhei Matsumoto
536bd70eb4 nvmf/tcp: Use cached length variable in spdk_nvmf_tcp_req_parse_sgl
The next patch will extend the length from LBA based to extended
LBA based and use it as buffer length to insert or strip DIF.

So cache sgl.unkeyed.length at the top of spdk_nvmf_tcp_req_parse_sgl
and use it throughout.

Besides, one unrelated change-the-line to improve the readability
is included.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I2a1dc9379bb5671ec80b5b478504c9879a4f0fff
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/458924
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2019-07-09 03:39:25 +00:00
Shuhei Matsumoto
975239c29d nvmf/tcp: Insert DIF to the newly read data to create extended LBA payload
Generate and insert DIF to each data block when reading more than a single
byte.

This update is very similar with the use case of spdk_dif_generate_stream
in iSCSI target.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I063919a32153ac0daf6d6eb1836c0d5995b65d33
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/459092
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-07-09 03:39:25 +00:00
Changpeng Liu
1edc5f0040 nvmf: restore the loaded reservation information to NS
Load reservation information based on ptpl configuration file, and
restore the information to NS data structure.

Change-Id: I5f46d49a6d1e6e49aab93ca7cd654469a3a08659
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/455912
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-07-08 08:21:03 +00:00
Shuhei Matsumoto
8448adaefa nvmf/tcp: Verify DIF before sending C2H data in spdk_nvmf_tcp_send_c2h_data
If DIF mode is local and C2H data is extended LBA payload, DIF should
be verified just before sending the payload.

Add a helper function nvmf_tcp_pdu_verify_dif and call it in
spdk_nvmf_tcp_send_c2h_data after completing nvme_tcp_pdu_set_data_buf.

When nvmf_tcp_pdu_verify_dif returns error, treat the error as fatal
transport error because the error is caused by the target itself.

Handle the fatal NVMe/TCP transport error by terminating the connection
as described in the NVMe specification.

On the other hand, data digest error is treated as a non-fatal transport
error because the error is caused outside the target. This is reasonable.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I9680af2556c08f5888aeaf0a772097e4744182be
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/458921
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2019-07-08 03:33:07 +00:00
Ziye Yang
57efada508 nvmf/tcp: reorg the structure of struct spdk_nvmf_tcp_req
I used pahole to see whether the alignment of the structure
is reasonable. After reorgnization, we can saved 16 bytes and 1
cacheline according to the information by pahole.

Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I1347e7c582fe2b00707e2841690b87d53cc61e33
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/460572
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-07-05 04:18:41 +00:00
Shuhei Matsumoto
3ff1ff004e nvme/tcp: Minor cleanups for SGL operations
Using naming rules consistent with other related libraries is helpful
to ensure the quality as verified by this patch series.

This patch changes a few parts to use iov and iovcnt for SGL operations.
Besides, name of an array points to the head of the array and is
constant. So copying name of array to an another pointer is not
necessary and can be removed.

Change-Id: I2324f28126b3088098c1c767cf6c060f22c175c3
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/455629
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Maciej Szwed <maciej.szwed@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
2019-07-04 08:58:40 +00:00
Shuhei Matsumoto
127cfac020 nvmf/tcp: Use nvme_tcp_pdu_set_data_buf for incapsule data
Previously we had used nvme_tcp_pdu_set_data() for incapsule data.
This patch changes handling incapsule data to use
nvme_tcp_pdu_set_data_buf() as same as H2C and C2H.

This unification is necessary to support DIF insert and strip
in NVMe/TCP target later.

Change-Id: I02cae8db94e51cf79a354dd64ad45f0e491ec08e
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/455920
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
2019-07-04 08:58:40 +00:00
Shuhei Matsumoto
3184884f9d nvmf/tcp: Properly handle multiple iovecs in processing H2C and C2H
NVMe/TCP target had assumed the size of each iovec was io_unit_size.
Using nvme_tcp_pdu_set_data_buf() instead removes the assumption
and supports any alignment transparently.

Hence this patch moves nvme_tcp_pdu_set_data_buf() to
include/spdk_internal/nvme_tcp.h and replaces the current code to use it.

Besides, this patch simplifies spdk_nvmf_tcp_calc_c2h_data_pdu_num()
because sum of iov_len of iovecs is equal to the variable length now.

We cannot separate code movement (lib/nvme/nvme_tcp.c to include/
spdk_internal/nvme_tcp.h) and code replacement (lib/nvmf/tcp.c)
because moved functions are static and compiler give warning if
they are not referenced in lib/nvmf/tcp.c.

The next patch will add UT code.

Change-Id: Iaece5639c6d9a41bd35ee4eb2b75220682dcecd1
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/455625
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2019-07-04 08:58:40 +00:00
Ziye Yang
b09bd95ad3 sock: update spdk_sock_group_add_sock
And also add spdk_sock_group_get_ctx function

Change-Id: I2a2a58b0588ff7d99d3538ea0a633a3b8c7a234b
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/454538
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Maciej Szwed <maciej.szwed@intel.com>
2019-07-04 08:21:05 +00:00
Shuhei Matsumoto
12d6dce2aa nvmf: Use not malloc'ed but fixed size string for host NQN
Maximum size of NQN is already defined to be SPDK_NVMF_NQN_MAX_LEN,
and hence use fixed size string whose size is SPDK_NVMF_NQN_MAX_LEN
+ 1 for spdk_nvmf_vhost::nqn.

This change will reduce the potential malloc failure.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I2b9c7cc21200b3e88b5485ebfdcd5040bc6e3589
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/459742
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2019-07-04 00:30:22 +00:00
Changpeng Liu
af6ed1e94a nvmf: update the reservation information for ACQUIRE/RLEASE commands
Change-Id: Ibfebffa4d683da08ae8f9350cce144fafe6a5538
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/455910
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-07-02 00:06:59 +00:00
Changpeng Liu
196d4f704a nvmf: enable ptpl feature with reservation register command
Add file based reservation information definition, the data structure
can be used to store all the reservation information to a json
based configuration file, and enable this feature with REGISTER
command.

Change-Id: Ic93cfc5934a4ad96f11b96ec77bacb877edf6c10
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/455909
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-07-02 00:06:59 +00:00
Ziye Yang
cdc0170c1b nvmf/tcp: Add a maximal PDU loop number
In our previous code, we will handle all the PDU until there is
no incoming data from the network if we can continue the loop.
However this is not quite fair when we handling multiple connections
in a polling group.

And this change is setting a maximal NVME/TCP PDU we can handle
for each conneciton, it can improve the performance. After some
tuing, 32 should be a good loop number. Our iSCSI target uses
16.

The following shows some performance data:

Configuration:
1 Command used in the initiator side:
./examples/nvme/perf/perf -r 'trtype:TCP adrfam:IPv4 traddr:192.168.4.11 trsvcid:4420'
-q 128 -o 4096 -w randrw -M 50 -t 10

2 target side, export 4 malloc bdev in a same subsystem

Result:

Before patch:

Starting thread on core 0
========================================================
                                                                                                           Latency(us)
Device Information                                                    :       IOPS      MiB/s    Average        min        max
TCP  (addr:192.168.4.11 subnqn:nqn.2016-06.io.spdk:cnode1) from core 0:   51554.20     201.38    2483.07     462.31    4158.45
TCP  (addr:192.168.4.11 subnqn:nqn.2016-06.io.spdk:cnode1) from core 0:   51533.00     201.30    2484.12     508.06    4464.07
TCP  (addr:192.168.4.11 subnqn:nqn.2016-06.io.spdk:cnode1) from core 0:   51630.20     201.68    2479.30     481.19    4120.83
TCP  (addr:192.168.4.11 subnqn:nqn.2016-06.io.spdk:cnode1) from core 0:   51700.70     201.96    2475.85     442.61    4018.67
========================================================
Total                                                                 :  206418.10     806.32    2480.58     442.61    4464.07

After patch:
Starting thread on core 0
========================================================
                                                                                                           Latency(us)
Device Information                                                    :       IOPS      MiB/s    Average        min        max
TCP  (addr:192.168.4.11 subnqn:nqn.2016-06.io.spdk:cnode1) from core 0:   57445.30     224.40    2228.46     450.03    4231.23
TCP  (addr:192.168.4.11 subnqn:nqn.2016-06.io.spdk:cnode1) from core 0:   57529.50     224.72    2225.17     676.07    4251.76
TCP  (addr:192.168.4.11 subnqn:nqn.2016-06.io.spdk:cnode1) from core 0:   57524.80     224.71    2225.29     627.08    4193.28
TCP  (addr:192.168.4.11 subnqn:nqn.2016-06.io.spdk:cnode1) from core 0:   57476.50     224.52    2227.17     663.14    4205.12
========================================================
Total                                                                 :  229976.10     898.34    2226.52     450.03    4251.76

Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I86b7af1b669169eee2225de2d28c2cc313e7d905
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/459572
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-06-28 12:28:54 +00:00
Or Gerlitz
6629202cbd nvmf/tcp: Use the success optimization by default
By now (5.1 is released), the Linux kernel initiator supports the
success optimization and further, the version that doesn't support
it (5.0) was EOL-ed. As such, lets open it up @ spdk by default.

Doing so provides a notable performance improvement: running perf with
iodepth of 64, randread, two threads and block size of 512 bytes for 60s
("-q 64 -w randread -o 512 -c 0x5000 -t 60") over the VMA socket acceleration
library and null backing store, we got 730K IOPS with the success
optimization vs 550K without it.

IOPS           MiB/s    Average       min      max
549274.10     268.20     232.99      93.23 3256354.96
728117.57     355.53     175.76      85.93   14632.16

To allow for interop with older kernel initiators, we added
a config knob under which the success optimization can be
enabled or disabled.

Change-Id: Ia4c79f607f82c3563523ae3e07a67eac95b56dbb
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/457644
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
2019-06-26 06:24:03 +00:00
Changpeng Liu
cf5c4a8a2e nvmf: add ptpl activated flag to Namespace
If users set the persist through power loss configuation file,
that means the Namespace has the capability to support ptpl
feature, here we added a ptpl_activated flag to indicate that
the users enable the feature or not.  Users can use Set features
or Reservation Register commands to change the value.

Change-Id: Iae3fd44085c5be5bf9574e49efa567e8212dee20
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/455906
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-06-26 01:54:10 +00:00
Hailiang Wang
73a171a07c rdma: assert ibv_send_wr is not NULL
Vhost testing crashed from Nightly testing, because a member
access within null pointer of type 'struct ibv_send_wr'.

Change-Id: If8f34f23864883ea73516d2d1fe3b30137c04316
Signed-off-by: Hailiang Wang <hailiangx.e.wang@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/458913
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-06-25 13:37:15 +00:00
Evgeniy Kochetov
9e3d841d3e nvmf: Fix connect command SQ size validation for IO queues
SQSIZE parameter validation in Connect command was broken because QID
field in qpair was used before intialization.

Signed-off-by: Evgeniy Kochetov <evgeniik@mellanox.com>
Change-Id: I8a0b359937d661df3b9888e6084e7d0b4a9056ea
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/455667
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-06-18 11:39:29 +00:00
Shuhei Matsumoto
c758dc088a nvmf: Reject bdev with separate metadata to attach to subsystem
NVMe bdev module support separate metadata now but NVMf subsystem
cannot process bdev with separate metadata yet.

Hence reject any bdev with separate metadata to be attached
explicitly by this patch.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I793c6c5f61deb766d7bf427ff67ccc57a48974cf
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/457167
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2019-06-13 00:48:11 +00:00
Changpeng Liu
3ec061800f nvmf: add a persist through power loss configuration file when constructing NS
For reservation feature in NVMoF, we can't support the persist through
power loss feature, now we will add the configuration file parameter
with Namespace, after users set the configuration file parameter with
one NS, then the PTPL feature can be enabled.

Change-Id: Id72699093f7e68318b9529f7bacc5c9804f7f86b
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/455905
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-06-12 00:30:03 +00:00
Alexey Marchuk
53777de855 rdma: Unset IBV_SEND_SIGNALED flag for RDMA_WRITE operations
Unsetting this flag will decrease the number of WRs retrieved during CQ polling and will decrease
the oeverall processing time. Since RDMA_WRITE operations are always paired with RDMA_SEND (response),
it is possible to track the number of outstanding WRs relying on the completed response WR.
Completed WRs of type RDMA_WR_TYPE_DATA are now always RDMA_READ operations.

The patch shows %2 better peformance for read operations on x86 machine. The performance was measured using perf with the following parameters:
-q 16 -o 4096 -w read -t 300 -c 2
with nvme null device, each measurement was done 4 times

avg IOPS (with patch): 865861.71
avg IOPS (master): 847958.77

avg latency (with patch): 18.46 [us]
avg latency (master): 18.85 [us]

Change-Id: Ifd3329fbd0e45dd5f27213b36b9444308660fc8b
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Signed-off-by: Evgenii Kochetov <evgeniik@mellanox.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/456469
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-06-11 18:07:28 +00:00
JinYu
8fc9ac7b0e nvmf: complete all I/Os before changing sgroup to PAUSED
For the nvme device, I/Os are completed asynchronously. So we
need to check the outstanding I/Os before putting IO channel
when we hot remove the device. We should be sure that all the
I/Os have been completed when we change the sgroup->state to
PAUSED, so that we can update the subsystem.

Fix #615 #755

Change-Id: I0f727a7bd0734fa9be1193e1f574892ab3e68b55
Signed-off-by: JinYu <jin.yu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/452038
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-06-11 01:51:56 +00:00
Ziye Yang
0bb626672b nvmf/tcp: Support single r2t usage
According to the TP 8000 spec in Page 26:
Maximum Number of Outstanding R2T (MAXR2T): Specifies the maximum
number of outstanding R2T PDUs for a command at any point in time
on the connection.

Note that by the spec, the target may only support single r2t
(which is the minimum possible), it doesn't have to use multiple r2ts
even if the initiator supports that. So remove the maxr2t and
pending_r2t variable in the tcp qpair structure.

In the original design, we think that maxr2t is the maximal active
r2t numbers for each connection. So if the initiator sends out maxr2t=16,
it means that all the commands of a qpair can use such number of R2T pdus.
So we need to wait for the available R2Ts for the request when the maxr2t
reaches the maximal value. But it is the wrong understanding of the spec.

In fact, each command has its own number of maximal r2t numbers, then we
do not need to use the wait method for R2T method anymore. So we remove
the state TCP_REQUEST_STATE_DATA_PENDING_FOR_R2T. Futhermore, we adjust
the related SPDK_TPOINT_ID definition.

In current patch, the target will support one active R2T for each
write NVMe command. Thus, we remove the function spdk_nvmf_tcp_handle_queued_r2t_req.

Reported-by: Or Gerlitz <ogerlitz@mellanox.com>

Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I7547b8facbc39139b4584637ccc51ba8b33ca285
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/455763
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Or Gerlitz <gerlitz.or@gmail.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-06-05 16:46:55 +00:00
Jim Harris
f758598c44 nvmf: fix assert in spdk_nvmf_tcp_req_fill_iovs
It's OK for iovcnt to equal SPDK_NVMF_MAX_SGL_ENTRIES.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ic95d04f5667858e7fbb025f469c027e2d47b8ba1

Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/456111
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
2019-05-31 14:46:35 +00:00
Jim Harris
bf647c168a nvmf: increase default max num qps to 128
This matches the Linux kernel target.  Users can
still decrease this default when creating the
transport (i.e. -p option for nvmf_create_transport
in rpc.py).

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Icad59350a2cd35cfc4ad76d06399345191680c05

Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/454820
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-05-22 14:50:05 +00:00
Seth Howell
61948a1ca7 rdma: add check for allocating too many SRQ.
We could run into issues with this if we were using an arbitrarily large
amount of cores to run SPDK.

Change-Id: Ia7add027d7e6ef1ccb4a69ac328dbdf4f2751fd8
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/452250
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-05-15 20:29:32 +00:00
Seth Howell
14777890a6 rdma: add an stailq for qpairs pending recv
This will help us not iterate through the whole list of connections when
only some of them have pending recvs.

Change-Id: I681bc98befbdda4e77ef333b7a086c08b2708eb3
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/449266
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-05-13 22:09:55 +00:00
Seth Howell
c3884f943c rdma: batch rdma recvs per poll.
This will help save MMIO overhead. Especially in the SRQ case.

Change-Id: I6fb70cf6de4763450f97961f41ccdce3acec2e63
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/449265
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-05-13 22:09:55 +00:00
Seth Howell
b4dc10fbb7 rdma: create a list for qpairs pending send transfers
By creating a list of qpairs, we can avoid looping over every connected
qpair to process sends each time we poll.

Change-Id: If24bbc363176f52fbfb756d56719edd885a21a11
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/449264
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-05-10 22:24:35 +00:00
Seth Howell
9d63933b7f rdma: batch rdma sends.
By batching ibv sends each time we poll, we can reduce the number of
MMIO writes that we do.

Change-Id: Ia5a07b0037365abfa8732629c34d34a9ed49ac70
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/449253
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-05-10 22:24:35 +00:00
Ben Walker
fbbbd6ab50 nvmf: Print a message out when a host is disconnecting due to keep alive
It isn't obvious why hosts are being disconnected at the moment.

Change-Id: I5515ba40883ccb20921d0da013b27670212bf649
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/453034
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-05-09 15:35:11 +00:00
Seth Howell
350e429a57 rdma: add a flag for disabling srq.
There are cases where srq can be a detriment. Add a flag to allow users
to disable srq even if they have a piece of hardware that supports it.

Change-Id: Ia3be8e8c8e8463964e6ff1c02b07afbf4c3cc8f7
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/452271
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-05-06 18:11:13 +00:00
Jim Harris
a95fdad68f nvmf: remove unnecessary size checks when creating transport
The individual transports will adjust these sizes when
necessary.  In fact, we have to remove this check, since
RDMA transport may adjust the io_unit_size based on the
max number of SGEs - and can adjust it to a value that
will fail this check if we reload the configuration.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I2708c7f5aaa54a368ec932ec40dd6447f1a4fde0

Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/452474
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-05-02 14:44:57 +00:00