numam-spdk

Author	SHA1	Message	Date
Seth Howell	6189c0ceb7	lib/nvme: abort all requests when disconnecting a qpair. By aborting all requests from every qpair when it is disconnected, we can completely avoid having to abort requests when we enable the qpair since nothing will be left enabled. Signed-off-by: Seth Howell <seth.howell@intel.com> Change-Id: Iba3bd866405dd182b72285def0843c9809f6500e Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1788 Community-CI: Mellanox Build Bot Community-CI: Broadcom CI Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2020-04-22 19:06:26 +00:00
Seth Howell	b2a93a320d	lib/nvme: set qpairs to destroy when ctrlr is removed. This is the onlyreasonable thing to do. Plus we need to be in the destroying or disconnecting state to avoid an infinite loop when aborting requests. Signed-off-by: Seth Howell <seth.howell@intel.com> Change-Id: I38462a01f0455c3d6496434626f6f2f4663bf508 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1857 Community-CI: Mellanox Build Bot Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2020-04-22 19:06:26 +00:00
Seth Howell	7defb70d3a	lib/nvme: don't requeue I/O while destroying. When we destroy a qpair, we need to flush all of the I/O. But some applications will try to resubmit that I/O. We need to not re-queue those I/O while in the context of the destroy call so as to avoid an infinite loop. Signed-off-by: Seth Howell <seth.howell@intel.com> Change-Id: I3e4863a563d461092f6e6b4a893f965f41bf34e3 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1856 Community-CI: Mellanox Build Bot Community-CI: Broadcom CI Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2020-04-22 19:06:26 +00:00
Seth Howell	af2d56ed94	lib/nvme: Don't re-queue I/O while disconnecting. This can cause infinite loops if the callback tries to queue an additional I/O. Signed-off-by: Seth Howell <seth.howell@intel.com> Change-Id: I4b80b97d334082465d9228b799ef901645fa968e Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1854 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2020-04-22 19:06:26 +00:00
Seth Howell	b874f65743	lib/nvme: disconnect qpairs if they are failed during reset. Signed-off-by: Seth Howell <seth.howell@intel.com> Change-Id: I15079cb35d48221bd92b7ca41766148fdb58e668 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1855 Community-CI: Mellanox Build Bot Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2020-04-22 19:06:26 +00:00
Seth Howell	9fe5084860	lib/nvme: when destroying qpairs, abort queued requests. We should be giving completions for all requests when we destroy a qpair. Signed-off-by: Seth Howell <seth.howell@intel.com> Change-Id: I802f5120f2e8289aa825872f8085ac21b5fce0f3 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1756 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Broadcom CI Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Paul Luse <paul.e.luse@intel.com>	2020-04-14 11:34:24 +00:00
Alexey Marchuk	4279766935	nvme: Abort queued reqs when destroying qpair Change-Id: Idef1b88cf47cf9f82b1f4499ef836dfa741c0c7f Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1791 Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: <dongx.yi@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2020-04-14 11:33:39 +00:00
Jacek Kalwas	a7a0d02d8b	nvme: fix command specific status code Given enum was not aligned with spec. This status can be reported when size equals 0. Signed-off-by: Jacek Kalwas <jacek.kalwas@intel.com> Change-Id: If51f6b051c13880c1fd4e6bb0a02f134b28b5a88 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/928 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>	2020-02-20 09:49:24 +00:00
Changpeng Liu	ff9516bdcc	nvme: call the callback for the queued requests when there is submission failure For the requests which don't have children requests, SPDK may queue them to the queued_req list due to limited resources, in the completion path, we may resubmit them to the controller. When the controller was removed the submission path will return -ENXIO and we will free the requests directly, so the callback will not be trigerred for these requests. Here we added a flag to indicate the request is from queued_req list or not, so for the failure submission, we can triger user's callback. Fix issue #1097 Change-Id: I901ac81733c2319e540d24baf5b8faa1c649eb35 Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/477754 Community-CI: SPDK CI Jenkins <sys_sgci@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>	2019-12-20 10:04:57 +00:00
Seth Howell	61537a190e	nvme: replace nvme_qpair_state_equals. nvme_qpair_get_state fits more closely with the semantics in other modules. Change-Id: I6ea8e02abe27253d9b4d779a43ac1963be56356a Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/476920 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>	2019-12-09 13:55:41 +00:00
Seth Howell	24bca2eadd	nvme: add an enum for why a qpair disconnected Change-Id: I1a9517d9673051615942c873416505704740691a Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/475805 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>	2019-12-09 13:55:41 +00:00
Seth Howell	3911922005	nvme: remove redundant transport_qp_is_failed checks The qpair state transport_qpair_is_failed is actually equivalent to NVME_QPAIR_IS_CONNECTED in the qpair state machine. There are a couple of places where we check against transport_qp_is_failed and then immediately check to see if we are in the connected state. If we are failed, or we are not in the connected state we return the same value to the calling function. Since the checks for transport_qpair_is_failed are not necessary, they can be removed. As a result, there is no need to keep track of it and it can be removed from the qpair structure. Change-Id: I4aef5d20eb267bfd6118e5d1d088df05574d9ffd Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/475802 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>	2019-12-09 13:55:41 +00:00
Ziye Yang	542185b7e0	nvme/qpair: merge two if case into one. Purpose: To remove the duplicated code. Change-Id: Iab9989f9928698967533e45e7cffad4f09bde16a Signed-off-by: Ziye Yang <ziye.yang@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/473376 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2019-11-08 22:18:18 +00:00
Seth Howell	13f30a254e	nvme: don't disconnect qpairs from admin thread. Disconnecting qpairs from the admin thread during a reset led to an inevitable race with the data thread. QP related memory is freed during the disconnect and cannot be touched from the other threads. The only way to fix this is to force the qpair disconnect onto the data thread. This requires a small change in the way that resets are handled for pcie. Please see the code in reset.c for that change. fixes: `bb01a089` Change-Id: I8a39e444c7cbbe85fafca42ffd040e929721ce95 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/472749 Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2019-10-31 04:50:59 +00:00
Seth Howell	ae3a9b8f08	nvme_qpair: return -ENXIO when the qpair is failed. This will be the canonical way of informing the user that we have lost the qpair connection somehow. Also update all of the functions that will return -ENXIO to the user. Change-Id: Ic6c7c2d0e07e9d3e857a3476bb6b91fb4b6454fa Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/471416 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2019-10-22 21:14:22 +00:00
Seth Howell	81b20a4d96	nvme_ctrlr: Allow resets from failed state Failed is not a final state for either fabric or pcie controllers. We have historically not allowed resets in the failed state, but we should. Instead of checking for the failed state, we should check for the removed state. If the controller is removed, then we cannot even attempt a reset. Change-Id: I2c1a3d85db84f84cd1895cbfaf16575c8b496155 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/471415 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2019-10-22 21:14:22 +00:00
Seth Howell	552898ec17	nvme_qpair: fail the ctrlr only for errors on admin qpair. We shouldn't always fail the whole controller if we get a failure on an individual qpair. Change-Id: Id0c90af83e5231593a895be66e7a7de48939e240 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/471660 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2019-10-22 21:14:22 +00:00
Seth Howell	4c1a18c41d	nvme_qpair: fix check_enabled. check_enabled had a couple bugs in it that made it unfriendly for enabling I/O qpairs after a reset. 1. It was calling nvme_qpair_abort_queued_requests before setting the enabled flag to true. For applications that submit new I/O in the completion callback for old I/O, this means you enter an infinite loop of submitting requests, and then immediately completing them. SO instead, wait for the qpair to reset, then just submit those requests to the lower layer. 2. It didn't check whether we were already in the middle of calling it, so we could reenter function calls like nvme_qpair_abort_queued_requests. Also, now that we have a coherent state machine for qpairs, we can limit the enabling to a specific state in that state machine. Change-Id: Ie0b74819a6b16839965bced47c33dec967f725a8 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/470256 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2019-10-22 21:14:22 +00:00
Seth Howell	08d4d977e8	nvme: combine qpair->is_connecting and is_enabled These will form the base of a little state machine for managing the nvme qpair structure. Change-Id: If6f6df38cc17221ac8fcb7d8c0d7e2e808897a99 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/470534 Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2019-10-22 21:14:22 +00:00
Seth Howell	4473732398	nvme: allow fabrics commands during reconnect. When doing a reset on an NVMe-oF target with active I/O qpairs, we need to be able to submit fabrics commands on them in order to perform a reset. Currently, resetting a fabric controller with any I/O qpairs active will cause the reset to hang indefinitely. Change-Id: Ic972a301390a4dd64adabedfe01aa4e5253e40b0 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/469935 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2019-10-11 20:13:26 +00:00
Seth Howell	2575aaec5a	nvme: make sure we queue requests in order. My recent changes that introduced batching to queued request resubmission also introduced a regression that can lead to reordering requests before submitting them to the drive. This change prevents that. We wait until inside the internal _nvme_qpair_submit_request function to check for queued entries to avoid queueing a request that has children. If a request that has children gets queued, when we process completions and resubmit the parent, it will result in the children being submitted. Since we only account for the number of requests we completed in the last iteration, some of the child requests may be requeued out of order, or worse, none of the child requests will end up being submitted to the transport and they will all be queued behind previously queued requests. Change-Id: I58e1c458c25fbf3f9f75364f05b1076b166a6212 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/470890 Reviewed-by: Ziye Yang <ziye.yang@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2019-10-11 18:45:13 +00:00
Seth Howell	f5d88e46e2	nvme: always set ctrlr->is_failed through API Use the standard API function to fail the controller in all cases. This patch, and the several following patches are aimed at creating a mechanism for reporting up to the application layer that a controller is failed and or removed. To do this, I use the reset_cb to inform the upper layer that the controller is failed. This also requires changes to how we handle a controller reset to pave the way for doing optional reset retries in the libraries. Change-Id: I06dfce08326c23472a1caa8f6efbac2fd1a720f2 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/469635 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2019-10-07 15:05:00 +00:00
Seth Howell	2c68fef058	nvme: move queued request resubmit to generic layer We were already passing up from each transport the number of completions done during the transport specific call. So just use that return code and batch all of the submissions together at one time in the generic code. This change and subsequent moves of code from the transport layer to the genric layer are aimed at making reset handling at the generic NVMe layer simpler. Change-Id: I028aea86d76352363ffffe661deec2215bc9c450 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/469757 Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2019-10-07 15:05:00 +00:00
Seth Howell	afc9800b06	nvme: _nvme_qpair_submit_request does not requeue This will be handled by nvme_qpair_submit_request when it receives -EAGAIN from _nvme_qpair_submit_request. Change-Id: I5e76aae170c981df0cadaadcd5da1163c715006f Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/470407 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>	2019-10-07 15:05:00 +00:00
Seth Howell	18dc53c531	nvme: move submit_request impl to a private function This patch series is aimed at preserving the order of qpair entries when resubmitting queued requests. The hope is that we will make the API fool proof and future proof against ever reordering any queued requests. Change-Id: Ib20d61d3abaed637c9c305b75081947630190fd4 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/470062 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>	2019-10-07 15:05:00 +00:00
Seth Howell	7630daa204	nvme: move queueing requests to the generic layer The tailq and the requests all belong to the generic layer, might as well put the queueing code there for better encapsulation. Change-Id: Id5f08f798121b50a21044cfc61856999c50ca227 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/469758 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2019-09-30 21:17:47 +00:00
Jim Harris	0aa72ffb74	nvme: fix WRITE_TO_RO_RANGE status code WRITE_TO_RO_PAGE was incorrect and misleading. This 0x82 NVMe status code indicates a write to a read-only range of LBAs. So modify the constant name and associated usages to use WRITE_TO_RO_RANGE instead. Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: I993dbebb5acc2e685a0e99aa14084942ef79d659 Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/465083 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Paul Luse <paul.e.luse@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>	2019-08-14 02:19:49 +00:00
Changpeng Liu	e27421b344	nvme: fix req leaks There are many req leaks when a controller failure occurs during submitting IO. It must free all of the children before freeing the parent req. If a part of the child req has been sent to the back end and a part of the child req fails, removes the failed req from the parent req and the parent req must be retained, freeing the parent req after all of the submitted reqs return. Change-Id: Ieb5423fd19c9bb0420f154b3cfc17918c2b80748 Signed-off-by: Huiming Xie <xiehuiming@huawei.com> Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/461734 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>	2019-07-22 04:15:34 +00:00
James Bergsten	5acf617c6e	nvme: add functions to pretty-print commands and completions This change attempts to address the Trello request to decode I/O errors in NVMe hello_world example. See https://trello.com/c/MzJJw7hM/2-decode-io-errors-in-nvme-helloworld-example As part of this change, spdk_nvme_cpl_get_status_string was declared in nvme.h, and spdk_nvme_qpair_print_command and spdk_nvme_qpair_print_completion were renamed and added to nvme.h, allowing all three to used "externally." To test the failing paths, two compile time defines were added to force a write or read error (bad LBA) respectively. As the example does a read after write, if the write fails, the example fails. Signed-off-by: James Bergsten <jamesx.bergsten@intel.com> Change-Id: Ib94b4a02495eb40966e3f49517a5bdf64485538a Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/457076 Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2019-07-15 07:47:03 +00:00
yidong0635	ff0a7dfc42	nvme: Handle CQ polling failures by marking the controller as failed. nvme_transport_qpair_process_completions calls nvme_rdma_qpair_process_completions There are some cases return -1 due to failure of "CQ errors". Handle CQ polling failures by marking the controller as failed. That a completion with an error will be treated as controller failed. Requests will be aborted after retry counter exceeded. Otherwise, code will keep on reporting errors without recovery. This is to fix issue #850. Change-Id: I0b324232310e107bf7fd5722aca54d402a19b14d Signed-off-by: yidong0635 <dongx.yi@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/460569 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>	2019-07-09 01:43:02 +00:00
Darek Stojaczyk	f9a6588f57	nvme: switch to spdk_malloc(). spdk_dma_malloc() is about to be deprecated. Change-Id: I6c308ee546c28c479ceb903bc1749bf5209dc6fe Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448172 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: <uma.willpower@gmail.com>	2019-06-27 04:34:50 +00:00
Jim Harris	b3d884b700	nvme: assign qpair when req is allocated There's no need to set this every time we allocate a request. While here, fix a typo near where we needed to modify the unit test to remove the qpair assertion. Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: I8af41a6c483415950f625d1ed2ef46088b75a622 Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/456270 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>	2019-06-04 00:01:35 +00:00
Jim Harris	c85164bd69	nvme: add explicit "inline" keyword to a couple of functions Profiling showed these weren't getting inlined - so add the inline keyword to make sure it happens. This helps improve performance a bit. Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: Ia86edccc9163258efdcddcce6989a71fb180caf6 Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/456099 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Paul Luse <paul.e.luse@intel.com>	2019-05-30 23:09:16 +00:00
Jim Harris	ef1f844395	nvme: add qpair parameter to nvme_complete_request In some cases we have the qpair already when calling this function. So pass the qpair to avoid having to get it from the request. This shows about a 3% performance improvement for high IOPs single core tests. Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: I22fcca560492f4e7cf5ffedd252e41a027d0dd79 Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/455286 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>	2019-05-22 14:51:01 +00:00
Jim Harris	af38d200e6	nvme: add ctrlr option for logging errors Currently the nvme driver will always log any request completed with error status. Some applications may not want this behavior. So provide an option to disable it at the controller level. When this option is enabled, any failed requests from queues associated with that controller (including the admin queue) will not log the failed request. Of course the application will still receive the failed status code and can decide to do its own logging there. Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: Ia093fcd23cf321a820fd53183ee7e2dac4f9d378 Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/454081 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>	2019-05-14 13:51:44 +00:00
Jim Harris	5309873d39	nvme: add qpair is_connecting flag This will be used on the adminq, and set while the qpair is connecting. It allows the qpair_process_completions routine to know that it should still try to process completions, even if the controller is resetting. Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: I377b9c934295eb5f45f03efd90c2a268defb4bd4 Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/453938 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>	2019-05-14 08:48:11 +00:00
Jim Harris	36d2149a70	nvme: allow admin queue fabrics cmds while resetting For fabrics controllers, the fabrics cmds are what gets the controller out of reset. Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: I6804874e867466669a55dff11a0a865add8bbc99 Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/453937 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>	2019-05-14 08:48:11 +00:00
Jim Harris	963e450a71	nvme: complete error reqs when re-enabling queue We cannot complete error reqs from spdk_nvme_ctrlr_reset - this could result in completions on threads not expected by the user for I/O queues. Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: I2e266a2618f1791ef1a1b713d1940357f23f7bff Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/453932 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>	2019-05-14 08:48:11 +00:00
Jim Harris	b9fe38c1b9	nvme: reuse err_req_head completion code in nvme_qpair_deinit Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: I563165ce103fe5f72885adb0486bcb05bc2817e0 Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/453931 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>	2019-05-14 08:48:11 +00:00
Jim Harris	b9b7ed0af2	nvme: move nvme_qpair_complete_error_reqs We are going to use it earlier in this file in an upcoming patch. Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: Ie388ca76370e53465edb73a99d191492580603c9 Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/453930 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>	2019-05-14 08:48:11 +00:00
Jim Harris	f0be163639	nvme: check is_enabled flag at common layer Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: I85e8289d10b481d3ca1cd125f73bd5abc4d1bf16 Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/453928 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>	2019-05-14 08:48:11 +00:00
Jim Harris	4aac975b35	nvme: make nvme_qpair_enable just set the is_enabled flag Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: I6782f311156dba87875a754fc64525f5ad7d06ea Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/453748 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>	2019-05-14 08:48:11 +00:00
Jim Harris	63d5459656	nvme: move nvme_qpair_abort_queued_reqs Next patch will use this function earlier in the file, so move the function now rather than in the later patch. Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: I50de44f69d0aedffddd251d00491912fd4a0f503 Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/453780 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ziye Yang <ziye.yang@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2019-05-10 19:43:31 +00:00
Jim Harris	a3945e8ec9	nvme: create nvme_qpair_abort_queued_reqs function Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: I12b4081d3cf57bda8b01911c25a9c13102a1115d Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/453741 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ziye Yang <ziye.yang@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2019-05-10 19:43:31 +00:00
Jim Harris	859f598b69	nvme: add dnr to nvme_qpair_manual_complete_request Also fix call to this function that was treating the print_on_error parameter as if it was dnr. Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: I9f048e8873ae0fcf07c9c6d11329a3fb21d92bda Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/453740 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2019-05-10 19:43:31 +00:00
Jim Harris	fabd7fbb41	nvme: remove qpair_disable This transport function is a complete nop now, so remove it. Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: I5cc6ac75795a3cf5311f24e2ac293fb53d4b9f8c Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/453487 Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2019-05-08 01:44:20 +00:00
Jim Harris	783a2a20f1	nvme: add transport_qpair_abort_reqs This will allow us to move more of the reset-related functionality to the common layer, as part of enabling resets for fabrics controllers. The transport qpair_enable and qpair_fail functions acted similarly - so those are both removed now and replaced with this new qpair_abort_reqs function. Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: I9486630ad5b807239b0b5bcde50e8cfd313695d3 Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/453486 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>	2019-05-08 01:44:20 +00:00
Jim Harris	5d431efd6d	nvme: move is_enabled logic to common layer Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: Idd938f255226256d864f70921ecd70c54769b9b2 Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/453485 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>	2019-05-08 01:44:20 +00:00
Jim Harris	74aa552ef9	nvme: make helper function to abort outstanding err reqs The nvme_qpair_disable functions will be going away in an upcoming patch, so move this one bit of functionality into a helper function in advance. Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: I61c2de535c2230b988d56dea13b00f39cb59dcfa Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/453483 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>	2019-05-08 01:44:20 +00:00
Ben Walker	d02950e6f5	nvme: Cache the cb_fn and cb_arg in the tracker This avoids a data dependent load to find which callback to call in the completion path. Change-Id: Ifa20790a7af3332a74bc45037e589668744af797 Signed-off-by: Ben Walker <benjamin.walker@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/450558 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>	2019-04-10 21:29:03 +00:00

1 2 3

144 Commits