numam-spdk

Author	SHA1	Message	Date
Changpeng Liu	acb9849c05	nvme: add arbitration configuration options to NVMe driver Weighted Round Robin can be enabled for users, and users can allocate different priority IO queues for different purpose. For now we will enable this feature in the NVMe driver first, following patches will enable this feature in bdev layer. Change-Id: I0f799236ca04eb85ef3c9f972ed63ff2718563ba Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/466852 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>	2019-09-20 02:04:06 +00:00
Seth Howell	579d44b0ee	nvme_rdma: make handling of cm_events more robust By splitting all cm_event handling into a single function, we can create a single point of contact for cm_events, whether we want to process them synchronously or asynchronously. Change-Id: I053a850358605115362f424de55e66806a769320 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/467546 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>	2019-09-18 22:19:37 +00:00
Seth Howell	ad7a01bde3	nvme_rdma: make cm_event fd asynchronous. This is paving the way for additional changes to enable polling for cm_events in the initiator. For now, just present the same blocking API on top of the now polled file descriptor. Later, we will change this API to be more useful. Change-Id: I174dac028720f95c30100f6dc2ed49b5bb2a7e40 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/467545 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>	2019-09-18 22:19:37 +00:00
Darek Stojaczyk	c049304a95	env: add spdk_pci_device_unclaim() spdk_pci_device_claim() could create a file on the filesystem that couldn't be deleted programatically. It could only be overwritten - e.g. by another spdk instance - but this didn't really work if that another instance had less privileges and hence no access to the previous file. This is exactly the case we're seeing on our CI when running SPDK as non-root. In general it's a good idea not to leave any leftover files, so now we'll delete the pci claim file when the spdk process exits. spdk_pci_device_claim() used to return a file descriptor that could be simply closed to "un-claim" the device. It'll now return only a return code. The fd will be stored inside spdk_pci_device and will be closed either when user calls the newly introduced spdk_pci_device_unclaim(), or when the device is detached. We'll still need to clean up those files somewhere in our test scripts (probably ./setup.sh cleanup) to clean up after crashed processes or so - but we don't necessarily want to run such scripts inside the autotest whenever a non-root spdk is about to be started. Change-Id: I797e079417bb56491013cc5b92f0f0d14f451d18 Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/467107 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2019-09-18 20:34:39 +00:00
Benjamin Saunders	7188bb994f	nvme: fix missing memory barrier in shadow doorbell update If the CPU reorders the eventidx read before the shadow doorbell write, it is indeterminate whether the controller will read the updated shadow doorbell without an MMIO write. See https://lkml.org/lkml/2018/8/14/1031 for details. Signed-off-by: Benjamin Saunders <bsaunders@google.com> Change-Id: I5aa08fdd5b32c7b81e8048ca6efe546318d80b5c Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/468188 Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2019-09-17 19:44:20 +00:00
Ben Walker	647afdec44	Revert "nvme: small code cleanup for nvme_transport_ctrlr_scan" This reverts commit `6129e78d26`. When the initiator sends the discovery log page, if the log page exceeds the size of its data buffer, it will break it up into multiple log page commands with appropriate offsets. However, supporting offsets in log pages is an optional feature in NVMe and reported by the EDLP bit in the identify data. This commit changed the discovery process to no longer send an identify command prior to doing the discovery log page command, so the values in the identify data are always 0. If the discovery log page exceeds the size of the data buffer (4k), it will then fail to send the second log page with an offset because it believes the controller does not support the feature. Revert this change to fix it. An identify should always be sent as part of the discovery process. A test case is included in a follow up patch the demonstrates the bug. Reported-by: Zahra Khatami <zahra.k.khatami@oracle.com> Reported-by: Akshay Shah <akshay.shah@oracle.com> Change-Id: Iefd512a7521e0fea90541b3eb547671cfa816ea6 Signed-off-by: Ben Walker <benjamin.walker@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/466819 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Paul Luse <paul.e.luse@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>	2019-09-09 21:52:07 +00:00
Ziye Yang	24eb7a84b0	nvme/tcp: fix the iov vector count. Since we use pdu->data_iovcnt to build the iov in nvme_tcp_build_iovs, so send out pdu has the maximal iov number equals to: 2 + pdu->data_iovcnt, so we change the comparison. This makes sure that we can handle all the data owned by one pdu. Signed-off-by: Ziye Yang <ziye.yang@intel.com> Change-Id: I2b9258cc5716d706c0fa38af609726c439708768 Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/467207 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>	2019-09-09 02:08:31 +00:00
Changpeng Liu	6ad44e8be6	nvme: add weighted round robin supported flags Change-Id: I4b303e7096dfdd29ef5d39f30223d03c32d20ae1 Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/466679 Reviewed-by: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2019-09-09 01:55:18 +00:00
Changpeng Liu	2f9d2b811c	nvme: move nvme_ctrlr_construct() before the PCI initialization This will be consistent with TCP and RDMA transport, and we will use ctrlr->flags in nvme_ctrlr_init_cap() in next patch, the flags will be cleared to 0 for now. Change-Id: Ic360cd0c00d60c77452d19cdc1e7a32a5fc34df0 Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/466678 Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2019-09-09 01:55:18 +00:00
Ziye Yang	ea5ad0b286	nvme/tcp: Change hdr in nvme_tcp_pdu to pointer Purpose: Prepare the further optimnization in the target side whening receving pdu headers, we expect to use zero copy. Change-Id: Iae7f9106844736d7160d39d0af1f5941084422ec Signed-off-by: Ziye Yang <ziye.yang@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/465380 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>	2019-08-28 15:38:02 +00:00
Jim Harris	32e22643ef	nvme: add NVME_QUIRK_DELAY_BEFORE_INIT quirk Currently we always wait 2 seconds before starting controller initialization during attach. This works around an issue where some older Intel NVMe SSDs could not handle MMIO writes too soon after a PCIe FLR (which would be triggered when VFIO was enabled). After further discussion with Intel experts, we know the SSD models that exhibit this issue. So we can quirk this so that only the older SSDs incur the extra delay. Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: Ieb408c24f6afd5bd5147d1c87239aa20f2d13511 Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/466064 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>	2019-08-26 17:35:06 +00:00
Chunyang Hui	0fae4f64c4	Opal: Add support for erase locking range Change-Id: Ie40ea642bc266f84ad5a3dbad8012b9eac178360 Signed-off-by: Chunyang Hui <chunyang.hui@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/465244 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>	2019-08-20 20:38:54 +00:00
Jim Harris	0aa72ffb74	nvme: fix WRITE_TO_RO_RANGE status code WRITE_TO_RO_PAGE was incorrect and misleading. This 0x82 NVMe status code indicates a write to a read-only range of LBAs. So modify the constant name and associated usages to use WRITE_TO_RO_RANGE instead. Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: I993dbebb5acc2e685a0e99aa14084942ef79d659 Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/465083 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Paul Luse <paul.e.luse@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>	2019-08-14 02:19:49 +00:00
Changpeng Liu	2226750a7c	nvme: add an option 'no_shn_notification' to driver spdk_nvme_detach() will do the normal shutdown notification for most cases, and it will take some time e.g. 2 seconds to finish the process for PCIe based controllers. If users' environment has several drives, each drive will call spdk_nvme_detach() one by one, and the shutdown process may take very long time. Since users know exactly what they would like to do for the next step, so here we provide an option to users, users can enable it to skip the shutdown notification process so that they can have very quick shutdown process, and when starting next time, the controller can be enabled again. Change-Id: Ie7f87115d57776729fab4cdac489cae6dc13511b Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/463949 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Paul Luse <paul.e.luse@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2019-08-13 22:50:03 +00:00
Changpeng Liu	7cbe1ccd56	nvme: move SPDK_NVME_DEFAULT_RETRY_COUNT out from nvme.h SPDK_NVME_DEFAULT_RETRY_COUNT is the default value for each controller, so we can move it out from public header file, and change the value if users provide a new one. "NvmeRetryCount" was deprecated for a long time, so we removed the support for this configuration option as well. Change-Id: I187251cc1e5342abb4fce96727d06631b7c16a01 Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/464489 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2019-08-09 00:44:50 +00:00
Changpeng Liu	62bb65289d	nvme: change retry count can be configured via bdev nvme driver Also eliminate 'spdk_nvme_retry_count' finally. Change-Id: I2f3e390e4b8a49208a11b54bb82c4891cf3e1845 Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/464473 Reviewed-by: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2019-08-09 00:44:50 +00:00
Changpeng Liu	936d856219	nvme: eliminate global configuration 'spdk_nvme_retry_count' option with PCIe transport We have defined NVMe controller initialization 'transport_retry_count' option, so global 'spdk_nvme_retry_count' can be removed, we will remove the variable with PCIe transport first, and make the retry count can be configured via RPC. Change-Id: I4d54f78c8da2180d536635587e7291f44a57c4fb Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/464472 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2019-08-09 00:44:50 +00:00
Chunyang Hui	a4516ad2ed	opal: Fix get string for bigger length Skip token header length which varies for short, medium and long atom. Fix Issue #898 Change-Id: I2351193e5a43608495f3d816ff4e5932399a6312 Signed-off-by: Chunyang Hui <chunyang.hui@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/464502 Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2019-08-08 20:06:40 +00:00
Ziye Yang	73d9cef8c5	nvmf/tcp: add nvme_tcp_pdu_cal_psh function. Purpose: 1 Do not caculated the psh_len every time. 2 Small fix, for ch_valid_bypes, and psh_valid_bytes, we do not need to use uin32_t. Signed-off-by: Ziye Yang <ziye.yang@intel.com> Change-Id: I9b643da4b0ebabdfe50f30e9e0a738fe95beb159 Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/464253 Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2019-08-07 01:46:54 +00:00
Tomasz Zawadzki	8df52a0f4a	lib/nvme_tcp: assert tcp_req->req before it is dereferenced The value of tcp_req->req was asserted after it was already dereferenced. This patch fixes that. Change-Id: I5eb01e88be09d41fb8e632c49d5a7ccf2315788f Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/462508 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>	2019-07-24 18:09:33 +00:00
Chunyang Hui	07f432641a	opal: Fix memory leakage Change-Id: I37f1468a41d568f7313143f0270f854f73bc4000 Signed-off-by: Chunyang Hui <chunyang.hui@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/461560 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: yidong0635 <dongx.yi@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>	2019-07-22 04:32:59 +00:00
Chunyang Hui	8522624d03	opal: Add multiuser support Admin can enable user and add user to locking range. Then the user can lock/unlock his range. Change-Id: Ifc5a8cf5c6b5febeb59c86333981f0cf5b938500 Signed-off-by: Chunyang Hui <chunyang.hui@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/460891 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>	2019-07-22 04:32:59 +00:00
Changpeng Liu	e27421b344	nvme: fix req leaks There are many req leaks when a controller failure occurs during submitting IO. It must free all of the children before freeing the parent req. If a part of the child req has been sent to the back end and a part of the child req fails, removes the failed req from the parent req and the parent req must be retained, freeing the parent req after all of the submitted reqs return. Change-Id: Ieb5423fd19c9bb0420f154b3cfc17918c2b80748 Signed-off-by: Huiming Xie <xiehuiming@huawei.com> Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/461734 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>	2019-07-22 04:15:34 +00:00
Changpeng Liu	c4f7c1bc2a	nvme: put child I/O helper functions in nvme_internal.h Existing children split functions defined in nvme_ns_cmd.c can also be used in nvme_qpair.c to free children requests with error paths. Change-Id: I640b32884424709da67ee89ff780c2de45acc54c Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/461372 Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2019-07-22 04:15:13 +00:00
James Bergsten	5acf617c6e	nvme: add functions to pretty-print commands and completions This change attempts to address the Trello request to decode I/O errors in NVMe hello_world example. See https://trello.com/c/MzJJw7hM/2-decode-io-errors-in-nvme-helloworld-example As part of this change, spdk_nvme_cpl_get_status_string was declared in nvme.h, and spdk_nvme_qpair_print_command and spdk_nvme_qpair_print_completion were renamed and added to nvme.h, allowing all three to used "externally." To test the failing paths, two compile time defines were added to force a write or read error (bad LBA) respectively. As the example does a read after write, if the write fails, the example fails. Signed-off-by: James Bergsten <jamesx.bergsten@intel.com> Change-Id: Ib94b4a02495eb40966e3f49517a5bdf64485538a Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/457076 Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2019-07-15 07:47:03 +00:00
Richael Zhuang	d4cbbf1751	nvme: use atomic builtins for g_signal_lock The __sync builtin based implementation generates full memory barriers on some non-x86 platforms. Replace it with C11 atomic builtins can make: ·arm and ppc from full barrier to half barrier ·x86 code same as before Signed-off-by: Richael Zhuang <richael.zhuang@arm.com> Change-Id: Ib6624ef8e45af497b9eced6ecfa7710bcc88a733 Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/461590 Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2019-07-15 06:01:37 +00:00
yidong0635	ff0a7dfc42	nvme: Handle CQ polling failures by marking the controller as failed. nvme_transport_qpair_process_completions calls nvme_rdma_qpair_process_completions There are some cases return -1 due to failure of "CQ errors". Handle CQ polling failures by marking the controller as failed. That a completion with an error will be treated as controller failed. Requests will be aborted after retry counter exceeded. Otherwise, code will keep on reporting errors without recovery. This is to fix issue #850. Change-Id: I0b324232310e107bf7fd5722aca54d402a19b14d Signed-off-by: yidong0635 <dongx.yi@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/460569 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>	2019-07-09 01:43:02 +00:00
Andrey Kuzmin	fa6bfa80af	Nvme: check spdk_nvme_qpair_process_completions return value. nvme_tcp_qpair_process_completions returns -1 on socket I/O error. Unless the caller checks this return value (which spdk_nvme_wait_for_completion_robust_lock currently doesn't), on connection loss or any other fatal connection error spdk_nvme_wait_for_completion will never exit the completion check loop. Change-Id: I92bb349beb071db312e6c31b84db2a7b51ec486c Signed-off-by: Andrey Kuzmin <akuzmin@jetstreamsoft.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/460657 Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2019-07-09 00:27:54 +00:00
Shuhei Matsumoto	8b539eb553	nvme: Set appropriate value to max_xfer_size and max_sge SPDK NVMe-oF initiator driver could not transfer IO whose size is more than 128KiB even if NVMe-oF target allows IO whose size is more than 128KiB both for RDMA and TCP transport. Some use cases need to transfer IO larger than 128KiB. For RDMA transport, max_mr_size by ibv_query_device of RDMA devices indicates the maximum size of a single memory region and is independent from the actual I/O size, and is very likely to be larger than 2 MiB which is the granularity we currently register memory regions. Actually some RDMA NICs return UINT64_MAX for max_mr_size by ibv_query_device. Hence use UINT32_MAX and let the generic layer use the controller data to moderate this value. On the other hand, for TCP transport, there is no limit for maximum IO size and hence use UINT32_MAX. Besides, for RDMA transport, max_sges should be the minimum of max_sge got by querying RDMA devices and NVME_RDMA_MAX_SGL_DESCRIPTORS. Hence do this change together in this patch. Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Change-Id: Idc813afd3e525bf5f370c0fcd2623f9c146a5528 Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/459218 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Seth Howell <seth.howell5141@gmail.com> Reviewed-by: Ziye Yang <ziye.yang@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>	2019-07-05 06:35:41 +00:00
Shuhei Matsumoto	cf3c54bc03	nvme: Ensure max_sges not to exceed what controller supports in generic layer Previously comparing the transport supported value and the target value was done in RDMA transport layer. However this comparison should be done in the generic layer like the maximum IO transfer size. Hence change the comparison to do in the generic layer in this patch. Besides, for MSDBD, the value 0 indicates no limit but we had handled this as maximum number of SGS entries was 0 by mistake. This patch fixes the bug together. Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Change-Id: I54365cf114169b10180ec2c659f9c7302672674c Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/459574 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Paul Luse <paul.e.luse@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>	2019-07-05 06:35:41 +00:00
Chunyang Hui	fbd2f3fd2e	opal: add support for getting locking range info Change-Id: I8e3e39673c260f823a9703e86006b5334dedc987 Signed-off-by: Chunyang Hui <chunyang.hui@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/457576 Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2019-07-05 02:23:28 +00:00
Chunyang Hui	505dbf59ff	Opal: Add locking range support Change-Id: I4974d4134aed3b63e204b79c9292ce940e32d40c Signed-off-by: Chunyang Hui <chunyang.hui@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/455175 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>	2019-07-05 02:23:28 +00:00
Chunyang Hui	755b4390f9	Opal: Add activate locking SP method Change-Id: I4189bdefdb5a6651bb73bd32e61c16e899b2ae5a Signed-off-by: Chunyang Hui <chunyang.hui@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/454211 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>	2019-07-05 02:23:28 +00:00
Shuhei Matsumoto	3ff1ff004e	nvme/tcp: Minor cleanups for SGL operations Using naming rules consistent with other related libraries is helpful to ensure the quality as verified by this patch series. This patch changes a few parts to use iov and iovcnt for SGL operations. Besides, name of an array points to the head of the array and is constant. So copying name of array to an another pointer is not necessary and can be removed. Change-Id: I2324f28126b3088098c1c767cf6c060f22c175c3 Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/455629 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Reviewed-by: Paul Luse <paul.e.luse@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Maciej Szwed <maciej.szwed@intel.com> Reviewed-by: Ziye Yang <ziye.yang@intel.com>	2019-07-04 08:58:40 +00:00
Shuhei Matsumoto	3184884f9d	nvmf/tcp: Properly handle multiple iovecs in processing H2C and C2H NVMe/TCP target had assumed the size of each iovec was io_unit_size. Using nvme_tcp_pdu_set_data_buf() instead removes the assumption and supports any alignment transparently. Hence this patch moves nvme_tcp_pdu_set_data_buf() to include/spdk_internal/nvme_tcp.h and replaces the current code to use it. Besides, this patch simplifies spdk_nvmf_tcp_calc_c2h_data_pdu_num() because sum of iov_len of iovecs is equal to the variable length now. We cannot separate code movement (lib/nvme/nvme_tcp.c to include/ spdk_internal/nvme_tcp.h) and code replacement (lib/nvmf/tcp.c) because moved functions are static and compiler give warning if they are not referenced in lib/nvmf/tcp.c. The next patch will add UT code. Change-Id: Iaece5639c6d9a41bd35ee4eb2b75220682dcecd1 Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/455625 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ziye Yang <ziye.yang@intel.com> Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>	2019-07-04 08:58:40 +00:00
Hailiang Wang	3a65c8729b	lib/nvme: fix a warning of spdk_pci_addr->domain Compilation Warning on fedora30. In file included from nvme_ut.c:42: /home/vagrant/spdk_repo/spdk/test/common/lib/test_env.c:517:17: warning: The left operand of '>' is a garbage value if (a1->domain > a2->domain) { ~~~~~~~~~~ ^ This is related to issue #822. Change-Id: I2b61e821130b89af04db3c475e81d2e91a380a90 Signed-off-by: Hailiang Wang <hailiangx.e.wang@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/459923 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2019-07-01 13:07:48 +00:00
Shuhei Matsumoto	f62d5ccbe6	nvme/tcp: Properly handle multiple iovecs in nvme_tcp_pdu_set_data_buf nvme_tcp_pdu_set_data_buf() has been used to process C2H and H2C for NVMe/TCP initiator. In this case, NVMe/TCP cuts out the part of the input data buffer and transfers the part, and repeats these cut and transfers until the whole data buffer is transferred. NVMe/TCP uses two SGLs, and use one to parse from the offset datao to datao + datal and another to append from the offset 0 to datal. However, the current nvme_tcp_pdu_set_data_buf() had used data_length as not data length of this transfer but total length of the whole transfers by mistake. Recently DIF library updated to properly handle very similar cases, and so this patch takes DIF library as a reference and corrects the implementation. The next patch will add UT code to verify the bug will be fixed. The code size is pretty large and so UT code is separated. Change-Id: Ibeed4de182b8b8740566e874e2757280dc21f9e8 Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/455623 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Ziye Yang <ziye.yang@intel.com>	2019-07-01 08:28:20 +00:00
Shuhei Matsumoto	a7b6d2ef00	nvme/tcp: Change parameters of nvme_tcp_pdu_set_data_buf to use in target This patch is the first patch of the patch series. The purpose of this patch series is to correct the bug of nvme_tcp_pdu_set_data_buf() when the multiple iovecs array is passed, to share nvme_tcp_pdu_set_data_buf() between NVMe/TCP initiator and target, and utilize nvme_tcp_pdu_set_data_buf() not only for C2H and H2C but also in-capsule data in NVMe/TCP target. This patch is necessary to satisfy the second requirement, to share nvme_tcp_pdu_set_data_buf() between NVMe/TCP initiator and target because struct nvme_tcp_req and struct spdk_nvmf_tcp_req are different. Four variables, iov, iovcnt, data_offset, and data_len are common, and hence this patch changes the parameters of nvme_tcp_pdu_set_data_buf() to accept them. The bug is fixed in the next patch and tested in after the next patch. Change-Id: Ifabd9a2227b25f4820738656e804d05dc3f874a5 Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/455622 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Paul Luse <paul.e.luse@intel.com> Reviewed-by: Ziye Yang <ziye.yang@intel.com>	2019-07-01 08:28:20 +00:00
Darek Stojaczyk	f9a6588f57	nvme: switch to spdk_malloc(). spdk_dma_malloc() is about to be deprecated. Change-Id: I6c308ee546c28c479ceb903bc1749bf5209dc6fe Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448172 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: <uma.willpower@gmail.com>	2019-06-27 04:34:50 +00:00
JinYu	77290bfe6b	nvme: fix the endless loop of aborting trackers The completion cb of outstanding_tr may submit new requeset to the outstanding_tr list of the qpair, it's an endless loop. We only abort the remaining outstanding trackers. Fix #819 Change-Id: I342f52f4d1836f8ef620ef9e3add0b1986727282 Signed-off-by: JinYu <jin.yu@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/457755 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>	2019-06-21 08:34:41 +00:00
Chunyang Hui	e3d21c7778	Opal: Optimize key creation and remove dev->dev_key Change-Id: Iaf20c8ec0d208e03269406b62608d981d84cc48c Signed-off-by: Chunyang Hui <chunyang.hui@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/457775 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2019-06-19 00:28:57 +00:00
James Bergsten	8785d5052d	nvme: spdk_nvme_ctrlr_alloc_io_qpair extensions Adds fields to structure spdk_nvme_io_qpair_opts. These fields allow specifying the locations of memory buffers used for the submission and/or completion queues. By default, vaddr is set to NULL meaning SPDK will allocate the memory to be used. If vaddr is NULL then paddr must be set to 0. If vaddr is non-NULL, and paddr is zero, SPDK derives the physical address for the NVMe device, in this case the memory must be registered. If a paddr value is non-zero, SPDK uses the vaddr and paddr as passed. SPDK assumes that the memory passed is both virtually and physically contiguous. If these fields are used, SPDK will NOT impose any restriction on the number of elements in the queues. The buffer sizes are in number of bytes, and are used to confirm that the buffers are large enough to contain the appropriate queue. These fields are only used by PCIe attached NVMe devices. They are presently ignored for other transports. Signed-off-by: James Bergsten <jamesx.bergsten@intel.com> Change-Id: Ibfab3939eefe48109335f43a1167082dd4865e7c Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/454074 Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2019-06-18 12:19:41 +00:00
Chunyang Hui	dd26583316	Opal: Add opal_create_key function Change-Id: Id1705636e25fe3ad90ff60a57aca7b1e4c2ef687 Signed-off-by: Chunyang Hui <chunyang.hui@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/453972 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Paul Luse <paul.e.luse@intel.com>	2019-06-11 01:12:24 +00:00
Chunyang Hui	9f988238fc	Opal: Refactor and clean functions Delete opal_next, introduce opal_add_tokens. Delete spdk_opal_cmd, seperate cmds to new APIs. Change-Id: Ide56817eec7fde7b110818966ebf10e65a952fc9 Signed-off-by: Chunyang Hui <chunyang.hui@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/454433 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>	2019-06-11 01:12:24 +00:00
Ziye Yang	679257db88	nvme/tcp: Properly deal with supporting single r2t According to the TP 8000 spec in Page 26: Maximum Number of Outstanding R2T (MAXR2T): Specifies the maximum number of outstanding R2T PDUs for a command at any point in time on the connection. This patch makes the current host driver implementation support one r2t. We cleanup the code to do the right advertising to the target in the icreq and avoid attempts to deal with multiple rt2s. Reported-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Ziye Yang <ziye.yang@intel.com> Change-Id: If06ad2e8bde31c2fd7e1c3739f651fb64040e3a9 Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/455750 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Or Gerlitz <gerlitz.or@gmail.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>	2019-06-06 00:58:58 +00:00
Ziye Yang	fe2dddbbbc	nvme/tcp: Correct nvme_tcp_qpair_disconnect behavior The current nvme_tcp_qpair_disconnect behaviour is not exactly correct, we do not re-initialize the state of some data structures of the tqpair. And this caused the coredump. Purpose: Fixes #808. Signed-off-by: Ziye Yang <ziye.yang@intel.com> Change-Id: I4d2cad8fc0712dbebfc2f3e52373cbe3b9908bf7 Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/456755 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2019-06-05 16:13:55 +00:00
Ziye Yang	31607f3f9e	nvme/tcp: fix the user iov length caculation in nvme_tcp_build_sgl_request The length should be no larger than the remaining_size. For example, The remaining_size(firstly, assigned by payload_size) is 128KB, and user's sgl length is 1MB. Since we already split the I/O, so we should not use the original length(1MB), but use the remaining_size. Fix issue reported by: https://github.com/spdk/spdk/issues/808 Signed-off-by: Ziye Yang <ziye.yang@intel.com> Change-Id: I0a7d0f2282c8ad0e253d8de7091b6c5b87018e9a Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/456760 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2019-06-05 01:46:52 +00:00
Ziye Yang	5391b29c79	nvme/tcp: Fix the issue of handling send pdu failure Previously, if the return value of nvme_tcp_qpair_process_send_queue is not zero, we directly return but not continue receiving the pdu. But this is wrong, we should only handle the case when the return value is negative. Reported-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Ziye Yang <ziye.yang@intel.com> Change-Id: I83453733f5a3e3350a0461b4cb0bc409fde32fea Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/455899 Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2019-06-05 01:44:49 +00:00
Jim Harris	6550abbac1	nvme: prefetch stailq before freeing pcie request We will need to put the recently completed nvme_request object on the qpair's STAILQ. We don't reference any real data from the nvme_request in the completion path since we've already stashed the cb_fn and cb_arg in the nvme_tracker. But we will need to reference the STAILQ_ENTRY to put it back in the qpair's STAILQ, so prefetch that cacheline. Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: Id76122afe4150c84a61fbe38bc874f10d606b3b3 Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/456673 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>	2019-06-04 00:01:35 +00:00
Jim Harris	b3d884b700	nvme: assign qpair when req is allocated There's no need to set this every time we allocate a request. While here, fix a typo near where we needed to modify the unit test to remove the qpair assertion. Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: I8af41a6c483415950f625d1ed2ef46088b75a622 Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/456270 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>	2019-06-04 00:01:35 +00:00

1 2 3 4 5 ...

836 Commits