numam-spdk

Author	SHA1	Message	Date
Seth Howell	33668b2254	rdma: change structure of drained_qpair to work w/ messages. This will become important later on. Change-Id: I94e5af03359e476afbc68664e43f44269ad5974c Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448074 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2019-03-18 23:32:21 +00:00
Seth Howell	7dd3cf441a	rdma: limit the completion queue based on the SRQ. When we have a shared receive queue, the number of outstanding items associated with a completion queue is deterministic, and limited by how many RECVs we have total in the SRQ. So, we can set the total size of the Completion queue at the beginning of time and never resize it. Change-Id: I787e4c5bbd52ac8948a323d1301f926f887cd91c Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/447492 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2019-03-18 23:32:21 +00:00
Seth Howell	a5972c6245	rdma: consolidate common error paths in qpair_init Consolidating error paths is common practice in SPDK so do that here to make the function more uniform and save space. Change-Id: I98c5d5f7feeb688f1d8b24f4d2d3461a43d00c1d Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448191 Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2019-03-18 23:32:21 +00:00
Seth Howell	97a43680a9	rdma: move cq_resize to its own function. Change-Id: I07aef399320fd4a014f63760670ea765d2e18b4b Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448190 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2019-03-18 23:32:21 +00:00
Seth Howell	fa79f64ad1	rdma: Keep a pointer to the SRQ in the qpair Change-Id: Id173038b6ad6b1564acf5d6886814f7d310964c7 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/447471 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2019-03-18 23:32:21 +00:00
Seth Howell	01201d3e87	rdma: remove compile time config for SRQ Change-Id: I44af3ee4dc6ec76045e1d0614910402487098a3d Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/447120 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2019-03-18 23:32:21 +00:00
Changpeng Liu	d11aa87320	nvmf: add reservation information to each subsystem's poll group Change-Id: Idcbc3053daf756c818ae3715b4ba0cbd91ed3d44 Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/446212 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2019-03-15 20:45:43 +00:00
Changpeng Liu	2099401e94	nvmf: rename subsystem poll group's num_channels to num_ns Array channels in the subsystem's poll group are indexed by nsid - 1, so rename the previous num_channels to num_ms makes more sense. Also embed the channels into a namespace data structure here, and this can be reused in the following patch. Change-Id: If5d9aab4b1d5bcf7a3c22f29fa58d84752f0d4cc Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/446211 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2019-03-15 20:45:43 +00:00
Seth Howell	0d3fcd10e9	rdma: add function to create qpair resources. Change-Id: Id865e2a2821fe04c1f927038d6dd967848cd9a55 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/446999 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2019-03-15 19:19:17 +00:00
Ben Walker	353fbcdaf0	nvmf/rdma: Create function to destroy rdma resources This unifies the clean up path between SRQ and normal operation. Change-Id: I396d7e3749579f27b5bb1e89b9d6761a77ba5beb Signed-off-by: Ben Walker <benjamin.walker@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/446979 Reviewed-by: Jim Harris <james.r.harris@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2019-03-15 19:19:17 +00:00
Ben Walker	b25751d99d	nvmf/rdma: Add a structure to hold rqpair/rpoller resources Depending on whether SRQ is enabled, resources may be allocated to the rqpair or to the rpoller. Create a struct to hold these pointers that can be used in both locations to avoid duplicated code. Change-Id: I2c8fc59009201d9e41721e6462a81732b529a9e0 Signed-off-by: Ben Walker <benjamin.walker@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/446978 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Eugene Kochetov <evgeniik@mellanox.com>	2019-03-15 19:19:17 +00:00
Ben Walker	527be2bf4e	nvmf: Remove qpair_is_idle This wasn't used anywhere. Change-Id: I405af3c808be284d19218f3f04c1e90e33e31de8 Signed-off-by: Ben Walker <benjamin.walker@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/446977 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Ziye Yang <ziye.yang@intel.com>	2019-03-15 19:19:17 +00:00
Evgeniy Kochetov	ed0b611fc5	nvmf/rdma: Add shared receive queue support This is a new feature for NVMEoF RDMA target, that is intended to save resource allocation (by sharing them) and utilize the locality (completions and memory) to get the best performance with Shared Receive Queues (SRQs). We'll create a SRQ per core (poll group), per device and associate each created QP/CQ with an appropriate SRQ. Our testing environment has 2 hosts. Host 1: CPU: Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz dual socket (8 cores total) Network: ConnectX-5, ConnectX-5 VPI , 100GbE, single-port QSFP28, PCIe3.0 x16 Disk: Intel Optane SSD 900P Series OS: Fedora 27 x86_64 Host 2: CPU: Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz dual-socket (24 cores total) Network: ConnectX-4 VPI , 100GbE, dual-port QSFP28 Disk: Intel Optane SSD 900P Series OS : CentOS 7.5.1804 x86_64 Hosts are connected via Spectrum switch. Host 1 is running SPDK NVMeoF target. Host 2 is used as initiator running fio with SPDK plugin. Configuration: - SPDK NVMeoF target: cpu mask 0x0F (4 cores), max queue depth 128, max SRQ depth 1024, max QPs per controller 1024 - Single NVMf subsystem with single namespace backed by physical SSD disk - fio with SPDK plugin: randread pattern, 1-256 jobs, block size 4k, IO depth 16, cpu_mask 0xFFF0, IO rate 10k, rate process “poisson” Here is a full fio command line: fio --name=Job --stats=1 --group_reporting=1 --idle-prof=percpu \ --loops=1 --numjobs=1 --thread=1 --time_based=1 --runtime=30s \ --ramp_time=5s --bs=4k --size=4G --iodepth=16 --readwrite=randread \ --rwmixread=75 --randrepeat=1 --ioengine=spdk --direct=1 \ --gtod_reduce=0 --cpumask=0xFFF0 --rate_iops=10k \ --rate_process=poisson \ --filename='trtype=RDMA adrfam=IPv4 traddr=1.1.79.1 trsvcid=4420 ns=1' SPDK allocates the following entities for every work request in receive queue (shared or not): reqs (1024 bytes), recvs (96 bytes), cmds (64 bytes), cpls (16 bytes), in_capsule_buffer. All except the last one are fixed size. In capsule data size is configured to 4096. Memory consumption calculation (target): - Multiple SRQ: core_num * ib_devs_num * SRQ_depth * (1200 + in_capsule_data_size) - Multiple RQ: queue_num * RQ_depth * (1200 + in_capsule_data_size) We ignore admin queues in calculations for simplicity. Cases: 1. Multiple SRQ with 1024 entries: - Mem = 4 * 1 * 1024 * (1200 + 4096) = 20.7 MiB (Constant number – does not depend on initiators number) 2. RQ with 128 entries for 64 initiators: - Mem = 64 * 128 * (1200 + 4096) = 41.4 MiB Results: FIO_JOBS kIOPS Bandwidth,MiB/s AvgLatency,us MaxResidentSize,kiB RQ SRQ RQ SRQ RQ SRQ RQ SRQ 1 8.623 8.623 33.7 33.7 13.89 14.03 144376 155624 2 17.3 17.3 67.4 67.4 14.03 14.1 145776 155700 4 34.5 34.5 135 135 14.15 14.23 146540 156184 8 69.1 69.1 270 270 14.64 14.49 148116 156960 16 138 138 540 540 14.84 15.38 151216 158668 32 276 276 1079 1079 16.5 16.61 157560 161936 64 513 502 2005 1960 1673 1612 170408 168440 128 535 526 2092 2054 3329 3344 195796 181524 256 571 571 2232 2233 6854 6873 246484 207856 We can see the benefit in memory consumption. Change-Id: I40c70f6ccbad7754918bcc6cb397e955b09d1033 Signed-off-by: Evgeniy Kochetov <evgeniik@mellanox.com> Signed-off-by: Sasha Kotchubievsky <sashakot@mellanox.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/428458 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2019-03-15 19:19:17 +00:00
Ziye Yang	58739014a3	nvmf/tcp: use the nvme_tcp_readv_data The purpose is to use the single readv to read both the payload the digest(if there is a possible one). And this patch will be prepared to support the multiple SGL in NVMe tcp transport later. Change-Id: Ia30a5e0080b041a65461d2be13db4e0592a70305 Signed-off-by: Ziye Yang <ziye.yang@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/447670 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2019-03-13 14:29:17 +00:00
Seth Howell	62266a72cf	rdma: allocate protection domains for devices up front. We were only using one pd per device anywas, and this is necessary for shared receive queue support. Change-Id: I86668d5b7256277fe50836863408af2215b5adf9 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/447385 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2019-03-12 21:37:51 +00:00
Seth Howell	bb3e441388	rdma: destroy qpairs based on num_outstanding_wr. Both Mellanox and Soft-RoCE NICs work with this approach. Change-Id: I7b05e54037761c4d5e58484e1c55934c47ac1ab9 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/446134 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2019-03-08 21:09:09 +00:00
Changpeng Liu	e39b4d6cdb	nvmf: set controller/namespace identify data to enable reservation Persist through power loss feature is not supported for now. Change-Id: Id2a5088389dc28b9d28d88c04ff819d20ea11902 Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/436940 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2019-03-08 20:55:10 +00:00
Changpeng Liu	4b55682e3a	nvmf: add namespace reservation report command support For number of registered controllers field in Reservation Status Data Structure, we caculate all the controllers in the subsystem which Host Identifier are same with existing registrants. Change-Id: Ib4de22c7020dbd8294f448f23c0c5c8c142629dd Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/436939 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2019-03-08 20:55:10 +00:00
Ziye Yang	4cd6544d44	nvmf: solve the memory leak issue caused by subsystem listerner port The possible issue could be following if you shutdown NVMe-oF target with TCP transport as an example, ================================================================= ==61022==ERROR: LeakSanitizer: detected memory leaks Direct leak of 560 byte(s) in 1 object(s) allocated from: #0 0x7ffff6efcfe0 in calloc (/lib64/libasan.so.3+0xc6fe0) #1 0x4c6216 in spdk_nvmf_tcp_listen /home/ziyeyang/spdk/lib/nvmf/tcp.c:680 Indirect leak of 48 byte(s) in 1 object(s) allocated from: #0 0x7ffff6efcfe0 in calloc (/lib64/libasan.so.3+0xc6fe0) #1 0x4a77b8 in spdk_posix_sock_create /home/ziyeyang/spdk/lib/sock/posix/posix.c:291 After checking the issue, it seems that we did not call spdk_nvmf_transport_stop_listen when removing the subsystem listener. And this patch can solve this issue. Change-Id: Ic75d99cb0c6a3ba1c47ac79a2d8e3887b0f6b012 Signed-off-by: Ziye Yang <ziye.yang@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/447020 Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: yidong0635 <dongx.yi@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Seth Howell <seth.howell5141@gmail.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2019-03-08 20:33:33 +00:00
Changpeng Liu	84ee3a62c7	nvmf: add namespace reservation release command support The reservation holder may release the reservation on a namespace, release notification feature is supported in comming patches. Change-Id: If5d3158e691fcc782f7cf0b67a326bf62edf0531 Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/436938 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2019-03-08 04:48:18 +00:00
Changpeng Liu	8ccf24ed52	nvmf: release the reservation when unregistering one registrant Unregistering by a host may cause a reservation held by the host to be released. If a host is the last remaining reservation holder or is the only reservation holder, then the reservation is released when the host unregisters. This may occur with Acquire/preempt and Register/unregister commands. Change-Id: If59fe2fdaa69c8ad70f364618d6c281494ad6245 Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/446821 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2019-03-07 06:46:45 +00:00
Changpeng Liu	71ac18d1ad	nvmf: add namespace reservation acquire command support A registrant can obtain a reservation on a namespace by executing acquire command. Acquire command is associated with specific namespace. For now only Acquire and Preempt reservation acquire action is supported, Preempt And Abort will be supported in future. Change-Id: Ifcbb6b414827393ffc266ceada5982b743716321 Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/436937 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>	2019-03-07 06:46:45 +00:00
Changpeng Liu	bc1d0b91b5	nvmf: add namespace reservation register command support Reservations can be used by two or more hosts to coordinate acccess to a shared namespace, host must register to a namespace prior to establishing a reservation. Unregistering by a host may cause a reservation release, this feature will be supported after reservation acquire patch. Change-Id: Id44aa1f82f30d9ecc5999a2a9a7c20b2af77774a Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/436936 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>	2019-03-07 06:46:45 +00:00
Ziye Yang	791d89bfa7	nvme/tcp: optimize nvme_tcp_build_iovecs function. Borrow the ideas from iSCSI and optimize the nvme_tcp_build_iovecs function. Change-Id: I19b165b5f6dc34b4bf655157170dec5c2ce3e19a Signed-off-by: Ziye Yang <ziye.yang@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/446836 Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2019-03-07 02:59:33 +00:00
Seth Howell	961cd6ab7e	rdma: register a poller to destroy defunct qpairs Not all RDMA drivers fail back the dummy recv and send operations that we send to them when destroying a qpair. We still need to free the resources from these qpairs to avoid eating up all of the system memory after multiple connect and disconnect events. Since we won't be getting any more completions, the best heuristic we can use is waiting a long time and then freeing the resources. qpair_fini is only called from the proper polling thread so we can safely call process_pending to flush the qpair before closing it out. Change-Id: I61e6931d7316d1e78bad26657bb671aa451e29f4 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/443057 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>	2019-03-04 19:12:48 +00:00
Ziye Yang	5f3c92c2fd	nvmf/tcp: fix the space alignment issue in spdk_nvmf_tcp_qpair Change-Id: Ieedfb46cadc8610ca8a6c33372e3a82ae8052550 Signed-off-by: Ziye Yang <ziye.yang@intel.com> Reviewed-on: https://review.gerrithub.io/c/446477 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>	2019-03-01 04:43:40 +00:00
Seth Howell	59f0d22e40	rdma: Fix misordered assert and decrement. In the error path, we were first decrementing a variable and then asserting that it must be >0. These operations should occur in the opposite order. Change-Id: I6cec544faf17bb75cbfca3d3a3c173dc5db14f99 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/446440 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: yidong0635 <dongx.yi@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2019-02-28 21:20:38 +00:00
Seth Howell	756ce464f6	rdma: update default number of shared buffers. When the decision was made to uncouple the number of shared buffers from the queue depth and allow the user to decide for themselves, the default was also significantly lowered, which caused some issues when trying torun performance tests (See https://github.com/spdk/spdk/issues/699). While this is a user modifiable variable, it is still best to keep the higher default value. The original value was equivalent to max_queue_depth * SPDK_NVMF_MAX_SGL_ENTRIES * 2 with the defaults for max_queue depth and max_sgl_entries being 128 and 16 respectively. Hence 4096 fixes: 0b20f2e552d978d84780e0ab968bb7fa65f7707e Change-Id: I809e97a10973093a2b485b85bca7160091166f70 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/446525 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2019-02-28 21:09:50 +00:00
Zahra Khatami	a55b2109bb	nvmf: remaning changes related to nvmf hooks Change-Id: I6780fa43cebd9f48d1ae0ea6fbeb92a95c4dfa15 Signed-off-by: zkhatami88 <z.khatami88@gmail.com> Reviewed-on: https://review.gerrithub.io/c/443653 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2019-02-22 21:16:36 +00:00
Seth Howell	b38e3a60c6	rdma: change the logic of rdma_qpair_process_pending I think this simplifies the process a little bit. Change-Id: Icc87a59c9f6fd965ef35531975b7036d85c4bc95 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/445916 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2019-02-22 18:31:02 +00:00
Seth Howell	80eecdd881	rdma: use an stailq for incoming_queue Change-Id: Ib1e59db4c5dffc9bc21f26461dabeff0d171ad22 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/445344 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2019-02-22 18:31:02 +00:00
Seth Howell	bfdc957c75	rdma: remove the state_cntr variable. We were only using one value from this array to tell us if the qpair was idle or not. Remove this array and all of the functions that are no longer needed after it is removed. This series is aimed at reverting fdec444aa8538aa6d782ad867821cf086e645e01 which has been tied to performance decreases on master. Change-Id: Ia3627c1abd15baee8b16d07e436923d222e17ffe Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/445336 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2019-02-22 18:31:02 +00:00
Seth Howell	04ebc6ea28	RDMA: Remove the state_queues Since we no longer rely on the state queues for draining qpairs, we can get rid of most of them. We cn keep just a few, and since we don't ever remove arbitrary elements, we can use stailqs to perform those operations. Operations on Stailqs carry about half the overhead as operations on tailqs Change-Id: I8f184e6269db853619a3581d387d97a795034798 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/445332 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2019-02-22 18:31:02 +00:00
Shuhei Matsumoto	df99e28158	nvmf: Expose bdev's PI setting to NVMe-oF Initiator This patch expose backend's bdev's PI setting to the corresponding NVMe-oF Initiator by Ideintify command, and removes the check if block size is 512 multiple. These change enables NVMe-oF Initiator to send extended LBA payload. Change-Id: Ia7aa8332d36f056872a515b6da90c83112edb909 Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-on: https://review.gerrithub.io/c/445056 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>	2019-02-22 00:36:55 +00:00
Ziye Yang	2da86de69f	nvmf/tcp: fix error message printing in spdk_nvmf_tcp_qpair_set_recv_state If the current recv_state of qpair is same with the state to be set, we will print error message. And checked the current code, we should add a check to avoid this. Change-Id: I49334f637c48e565e785d1fe6d0f000e18b2048a Signed-off-by: Ziye Yang <ziye.yang@intel.com> Reviewed-on: https://review.gerrithub.io/c/445653 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2019-02-21 18:04:10 +00:00
Ziye Yang	a1c5442d16	nvmf/tcp: remove the tqpair->group = NULL statement Purpose: solve the coredump issue for the buffer return later in spdk_nvmf_tcp_request_free_buffers. If keep this statement, we cannot return the buffer to the polling group. Change-Id: Ib5c95ba54b37540950e654110fe6317cab507076 Signed-off-by: Ziye Yang <optimistyzy@gmail.com> Reviewed-on: https://review.gerrithub.io/c/445435 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Seth Howell <seth.howell5141@gmail.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>	2019-02-21 03:37:47 +00:00
yidong0635	9d838d24ad	rdma: add return to avoid address points to the zero page Error logs in nvmf_rdma_dump_request lead to report error about address points to the zero page, add judgement to return. this issue occurs in heavy load fio testing. Change-Id: I50302be88b3af53f718e3800aa16df7c506ca4e8 Signed-off-by: yidong0635 <dongx.yi@intel.com> Reviewed-on: https://review.gerrithub.io/c/441110 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Seth Howell <seth.howell5141@gmail.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>	2019-02-15 04:29:40 +00:00
Changpeng Liu	d5b89466cc	nvmf: add get/set features with reservation notification mask support Change-Id: I93089c4b362930d1e2b3a847639e6cc18b15f217 Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-on: https://review.gerrithub.io/c/439933 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2019-02-14 01:28:43 +00:00
Ziye Yang	2d0ce5b48b	nvmf/tcp: Implement correct behavior of timeout for C2Htermreq case From TP8000 spec 7.4.7, "In response to a C2HTermReq PDU, the host shall terminate the connection. If the host does not terminate the connection in an implementation specific period that does not exceed 30 seconds, the controller may terminate the connection on its own". It means that the timeout is designed for: when the target is sending out C2hTermReq, if the host does not terminate the connection, the target should terminate the connection. PS: For detecting the malicous connection without sending response (such as no response of R2T PDU) which should be another patch. Change-Id: I586dbb235d99aeab5d748a19b9128cd8b0cef183 Signed-off-by: Ziye Yang <optimistyzy@gmail.com> Reviewed-on: https://review.gerrithub.io/c/440831 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2019-02-13 18:20:28 +00:00
Changpeng Liu	da30cda946	nvmf: add get/set features with reservation persistence support The persistence feature can't support for now, but as the features are mandatory for reservation, so add the two function here, and we can enable it with future patches for power loss persist feature. Change-Id: Ic358eda00058809bbfd6984b0861f8b6b5aabecd Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-on: https://review.gerrithub.io/c/438213 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2019-02-13 06:10:53 +00:00
Seth Howell	bdc81134c2	nvmf: use io unit size in transport buffer pools When this structure was brought up to the generic layer, the tcp transport was using max_io_size and the rdma transport was using io_unit_size. In the interest of conserving memory, we should use io_unit_size instead of max_io_size. Change-Id: I2633306fcbfd8c3d557445959c745cb2d9a0999e Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/442778 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Paul Luse <paul.e.luse@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2019-02-12 23:34:20 +00:00
Seth Howell	b7651b681c	NVMe-oF: add asserts for SGE counts We should never be going over these limits in the respective transports, but add asserts to check this during testing. Change-Id: Ifcaa82ccf58546a38020b31df54ee5d1d9822b8b Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/442777 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Paul Luse <paul.e.luse@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2019-02-12 23:34:20 +00:00
Seth Howell	145485769e	nvmf: remove qpair state activating. This intermediate state is unused and meaningless. the qpair transitions into this state right before calling a synchronous operation and then transitions to active as soon as that operation completes successfully. If the operation did not complete successfully, we were leaving qpairs in this weird intermediate state when for all intents and purposes they had reverted to an uninitialized state. Keeping qpairs in the uninitialized state until they have been added to a poll group creates a meaningful distinction between states that can be actionable from the transport level. Change-Id: I6de9bc424b393b6fff221aa2f4212aaa91488629 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/443471 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2019-02-12 20:39:44 +00:00
Seth Howell	b952668186	rdma: destroy uninitialized qpairs immediately. Connections in the uninitialized state haven't been added to a poll group yet, so submitting dummy requests to them will be pointless since they will never be polled. We need to reject the connection and destroy the qpair immediately. Change-Id: Id5dd711882e1ae7c13ae32c06da2285186b00a1b Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/443470 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2019-02-12 20:39:44 +00:00
Seth Howell	825cac2720	rdma.c: Create a single point of entry for qpair disconnect Since there are multiple events/conditions that can trigger a qpair disconnection, we need to funnel them to a single point of entry. If more than one of these events occurs, we can ignore all but the first since once a disconnect starts, it can't be stopped. Change-Id: I749c9087a25779fcd5e3fe6685583a610ad983d3 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/443305 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2019-02-12 20:39:44 +00:00
Seth Howell	b6b0a0ba59	rdma: adjust I/O unit based on device SGL support For devices that support fewer SGE elements than our default values, we need to adjust the I/O unit size so that we don't ever try to submit more SGLs than we are allowed to. Change-Id: I316d88459380f28009cc8a3d9357e9c67b08e871 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/442776 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2019-02-12 18:46:57 +00:00
Seth Howell	92f5548a91	rdma: properly account num_outstanding_data_wr This value was not being decremented when we got SEND completions for write operations because we were using the recv send to indicate when we had completed all writes associated with the request. I also erroneously made the assumption that spdk_nvmf_rdma_request_parse_sgl would properly reset this value to zero for all requests. However, for requests that return SPDK_NVME_DATA_NONE rom spdk_nvmf_rdma_request_get_xfer, this funxtion is skipped and the value is never reset. This can cause a coherency issue on admin queues when we request multiple log files. When the keep_alive request is resent, it can pick up an old rdma_req which reports the wrong number of outstanding_wrs and it will permanently increment the qpairs curr_send_depth. This change decrements num_outstanding_data_wrs on writes, and also resets that value when the request is freed to ensure that this problem doesn't occur again. Change-Id: I5866af97c946a0a58c30507499b43359fb6d0f64 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/443811 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Sasha Kotchubievsky <sashakot@mellanox.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2019-02-12 18:43:44 +00:00
Seth Howell	ceb32abbd8	nvmf: don't set qpair->group to NULL. The typical rdma qpair disconnect function goes through the function _nvmf_rdma_disconnect_retry. When this function was introduced, it was discovered that we could receive a qpair disconnect event for a given qpair before that qpair had been assigned to a poll group. In order to ensure that the disconnect procedure completed properly, we waited on the current thread in _nvmf_rdma_disconnect_retry for the qpair to be assigned a poll group before we finally disconnected. see rdma.c:2250. Since _nvmf_rdma_disconnect_retry was not necessarily called from the poll group's thread, we relied upon the assumption that the group variable would never be set back to NULL. See the comment on rdma.c: 2243. However, in _spdk_nvmf_qpair_destroy we were setting the group back to NULL. This operation can result in the following set of operations across multiple threads that prevent a qpair from ever being fully destroyed. 1. thread 1: receive a disconnect event - call nvmf_rdma_disconnect 2. thread 1: from nvmf_rdma_disconnect call spdk_nvmf_rdma_qpair_inc_refcnt - setting rqpair->refcnt to 1. 3. thread 2: call spdk_nvmf_rdma_poller_poll. 4. thread 2: in spdk_nvmf_rdma_poller_poll reap a completion with an error status which causes us to call spdk_nvmf_qpair_disconnect - rdma:2846 5. thread 2: spdk_nvmf_qpair_disconnect calls _spdk_nvmf_qpair_destroy which sets qpair->group = NULL 6. thread 1: from nvmf_rdma_disconnect we call _nvmf_rdma_disconnect_retry which checks if qpair->group == NULL. If that is the case, we assume that the qpair has not been assigned a group yet and send ourself a message to call _nvmf_rdma_disconnect_retry again. see rdma.c:2253 7. thread 2: from _spdk_nvmf_qpair_destroy we call spdk_nvmf_transport_qpair_fini which results in a call to spdk_nvmf_rdma_close_qpair. which sends dummy send and recvs to the qpair. 8. thread 2: we call poller_poll and get completions for both the send and recv dummy requests. This results in a call to spdk_nvmf_rdma_qpair_destroy. 9. thread 2: spdk_nvmf_rdma_qpair_destroy checks rqpair->refcnt and when it sees that it does not = 0 (see step 2 above) it returns without freeing the resources. see rdma.c:629 10. thread 1: we keep churning in _nvmf_rdma_disconnect_retry sending ourselves messages because rqpair->group is going to be null. Thread 1 never reaches line 2257 where it sends a message to call _nvmf_rdma_qpair_disconnect. _nvmf_rdma_qpair_disconnect is the function that decreases the rqpair->refcnt and allows us to make forward progress on destroying the qpair. I encountered this issue while trying to disconnect from our target using the kernel initiator with an x722 NIC. I think the timing on this bug comes out with that specific configuration because come of the calls in the disconnect path on thread 1 fail causing it to take longer giving a chance to the second thread to delete the qpair. There are really two issues at play here. We don't have a single point of entry for disconnecting RDMA qpairs, and we rely on the qpair->group variable never being set back to NULL. This patch addresses the second issue, and the next patch in the series addresses the first. Change-Id: I65395d0bbb67edfa7bad2ddc70906606c3d83781 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/443304 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Paul Luse <paul.e.luse@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>	2019-02-11 19:25:51 +00:00
Ben Walker	7a4d6af182	nvmf/tcp: Stay in AWAIT_PDU_READY state until atleast 1 byte arrives This doesn't fix any bug, but it makes more sense to leave the qpair in the NVME_TCP_PDU_RECV_STATE_AWAIT_PDU_READY state until it receives at least one byte. Change-Id: Ic5f34a733a80b58f65a1334fae7e07dbded2b3d0 Signed-off-by: Ben Walker <benjamin.walker@intel.com> Reviewed-on: https://review.gerrithub.io/c/441811 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>	2019-02-08 16:35:12 +00:00
Ben Walker	63de221bf6	nvmf/tcp: Eliminate management channel in favor of poll group The management channel was used in the RDMA transport prior to the introduction of poll groups and made its way over to the TCP transport when it was written. Eliminate it in favor of just using the poll group. Change-Id: Icde631dd97a6a29190c4a4a6a10a0cb7c4f07a0e Signed-off-by: Ben Walker <benjamin.walker@intel.com> Reviewed-on: https://review.gerrithub.io/c/442432 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Seth Howell <seth.howell5141@gmail.com>	2019-02-06 16:02:43 +00:00

1 2 3 4 5 ...

937 Commits