Abort any queued admin requests once admin queue gets enabled. A request
can get queued if a controller is being reset and it gets submitted
while admin qpair is being reconnected. If these requests aren't
aborted, the init process will stall, as requests don't get resubmitted
while controller is resetting and subsequent admin commands required for
the initialization would be queued too.
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: If456a297d2d434b3cc741816cbfb13b01d37e963
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9324
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Renamed nvme_qpair_abort_reqs() to nvme_qpair_abort_reqs_with_cbarg() to
highlight the fact that it only aborts requests with specified cb_arg
and to distinguish it from _nvme_qpair_abort_reqs() which aborts all
requests immediately.
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: I32fec5ab0501b1beb8605689d73ec42a6424fba5
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9323
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This patch introduces asynchronous versions of the ctrlr_(get|set)_reg
functions. Not all transports need to define them - for those that it
doesn't make sense (e.g. PCIe), the transport layer will call the
synchronous API and queue the callback to be executed during the next
process_completions call.
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: I2e78e72b5eba58340885381cb279f3c28e7995ec
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8607
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
This allows for submitting IO requests before the CONNECT command is
sent and not stopping the connect process due to the CONNECT being
queued. It could happen once IO qpair connection is asynchronous.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: I04e45b1c2f49f9da3412c843ea899341c56a0420
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8624
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This gives us a context that we can poll on when using the
nvme_wait_for_completion framework for async operations.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: I5b23f9096eed5ce95848d1995c1ed0b4c74edc92
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8602
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
This avoids an infite loop when user sends and then immediately aborts
all requests while the transport cannot submit them (returning -EAGAIN).
It might happen in two cases: 1) if the transport doesn't have any free
requests or 2) if the qpair is still in the NVME_QPAIR_CONNECTING state.
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: Ia1d0b7c93f387524be0ad39597ec49851e1d3af7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8711
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
async_mode option is currently supported in PCIe transport layer
to create io qpair asynchronously. User polls the io_qpair for
completions, after create cq and sq completes in order, pqpair
is set to READY state. I/O submitted before the qpair is ready
is queued internally. Currently other transports only support
synchronous io qpair creation.
Signed-off-by: Monica Kenguva <monica.kenguva@intel.com>
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: Ib2f9043872bd5602274e2508cf1fe9ff4211cabb
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8911
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
If the transport returns error when polling for
completions, it gets to a uint32_t and we end up
trying to resubmit all of the requests that are
currently queued. But that's not correct - if
the transport returns an error we shouldn't be
trying to resubmit requests at all.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I9198e3e2d71875cc1e46e0ac928338bb983487f3
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8395
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Current version provides unclear output
Change-Id: Ib044b00b5f91b1e363911f1b79c51c73c8a6920c
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/5743
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
In TCP NVME initiator with zero copy enabled requests might be
completed asynchronously - out of qpair_process_completions
context. At the same time we calculate requests completed
asynchronously so that generic NVME layer can resubmit
queued requests after calling qpair_process_requests (or
poll_group_process_requests).
But there is a time gap between async request complete and
qpair_process_completions and the user can submit new IO
thereby decrease the number of free TCP requests. That means
that there might be less free requests than we excpected when
we try to resubmit queued requests.
The solution is change ERRLOG to DEBUG log since it is not a
fatal case.
Change-Id: If045ecd331cc6693e8ef450d8e15432dfa5d8812
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4859
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
When we disconnect a qpair, part of the code path is
calling _nvme_qpair_abort_queued_reqs. This takes
care of aborting any requests that were queued waiting
for slots to open on the submission queue.
It walks the STAILQ one by one and manually completes
them with ABORT status back to the caller.
But if the callback path submits another request, this
request may also get queued to the end of the queued_req
TAILQ. This can result in an infinite loop.
The solution is to use an STAILQ_SWAP to a local, empty
STAILQ. Then we ensure we only abort the requests that
were queued when _nvme_qpair_abort_queued_reqs() started
executing.
Fixes issue #1588.
I used the multipath.sh test to reproduce this on my local
system. If it ever dropped into the STAILQ loop in this
function, we would hit the infinite loop. With this patch,
I confirmed locally that now we safely avoid the infinite
loop and the test passes.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I657db23efe5983bd8613c870ad62695a7fc7f689
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4284
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: <dongx.yi@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This becomes a problem when the qpair is reconnected.
Signed-off-by: Seth Howell <seth.howell@intel.com>
Change-Id: I6677b396cf766684a4891ffbee93aa3e4e83374d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3391
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Use another list dedicated to hold queued requests being aborted
to avoid potential infinite recursive calls.
Add a helper function nvme_qpair_abort_queued_req() to move requests
whose cb_arg matches from qpair->queued_req to qpair->aborted_queued_req.
Then nvme_qpair_resubmit_requests() aborts all requests in
qpair->aborted_queued_req.
The first idea was that nvme_qpair_abort_queued_req() aborts queued
requests directly. However, this caused infinite recursive calls.
Hence separate requesting abort to queued requests and actually
aborting queued requests.
The detail of the infinite recursive calls is as follows:
Some SPDK tool submits the next request from the callback to the completion
of a request in the completion polling loop. For such tool, if the callback
submits a request and then aborts the request immediately, and the request
could not be submitted but queued, it will create infinite recursive calls
by request submit and abort, and it will not be able to get out of
completion polling loop.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I8196182b981bc52dee2074d7642498a5d6ef97d4
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2891
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Having functions without qpair on the interface allows for wider usage
e.g. by nvmf layer.
Signed-off-by: Jacek Kalwas <jacek.kalwas@intel.com>
Change-Id: I3a51ad53f00eb29e2ba2681ef4ff0cc2a197b65d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3176
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
This is a preparation to the next patch.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I15356c69e676dc41d3af69caa6d12c1fcb282152
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3071
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Michael Haeuptle <michaelhaeuptle@gmail.com>
When a NVMe device is hot removed, subsequent calls to
nvme_qpair_submit_request can fail with ENXIO.
The failure path handling for ENXIO did not free the request which
exhausts the qpair's free_req list eventually and all IOs are stuck
going forward.
This fix adds the same cleanup handling to nvme_qpair_submit_request
for this error case as it is done in _nvme_qpair_submit_request.
Signed-off-by: Michael Haeuptle <michael.haeuptle@hpe.com>
Change-Id: I5677d53965bdbd6d339c013483cdf42ce782099a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3018
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
When one of the children is failed to submit, if any children is
already submitted, the function can return success to wait for those children
to complete, but the parent should be set to failure.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I2ea53856ee58da991bceca0058d1e1f55d42af37
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2492
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Michael Haeuptle <michaelhaeuptle@gmail.com>
This will need to be done separately for poll groups.
Signed-off-by: Seth Howell <seth.howell@intel.com>
Change-Id: I0e432493bdb02e13fe5c73a8a09911cef573307b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1664
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
By aborting all requests from every qpair when it is disconnected,
we can completely avoid having to abort requests when we enable the
qpair since nothing will be left enabled.
Signed-off-by: Seth Howell <seth.howell@intel.com>
Change-Id: Iba3bd866405dd182b72285def0843c9809f6500e
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1788
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This is the onlyreasonable thing to do. Plus we need to
be in the destroying or disconnecting state to avoid
an infinite loop when aborting requests.
Signed-off-by: Seth Howell <seth.howell@intel.com>
Change-Id: I38462a01f0455c3d6496434626f6f2f4663bf508
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1857
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
When we destroy a qpair, we need to flush all of the I/O.
But some applications will try to resubmit that I/O. We need
to not re-queue those I/O while in the context of the destroy
call so as to avoid an infinite loop.
Signed-off-by: Seth Howell <seth.howell@intel.com>
Change-Id: I3e4863a563d461092f6e6b4a893f965f41bf34e3
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1856
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This can cause infinite loops if the callback tries to
queue an additional I/O.
Signed-off-by: Seth Howell <seth.howell@intel.com>
Change-Id: I4b80b97d334082465d9228b799ef901645fa968e
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1854
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
We should be giving completions for all requests when we destroy a qpair.
Signed-off-by: Seth Howell <seth.howell@intel.com>
Change-Id: I802f5120f2e8289aa825872f8085ac21b5fce0f3
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1756
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Given enum was not aligned with spec. This status can be reported when
size equals 0.
Signed-off-by: Jacek Kalwas <jacek.kalwas@intel.com>
Change-Id: If51f6b051c13880c1fd4e6bb0a02f134b28b5a88
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/928
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
For the requests which don't have children requests, SPDK may queue them to
the queued_req list due to limited resources, in the completion path, we
may resubmit them to the controller. When the controller was removed
the submission path will return -ENXIO and we will free the requests directly,
so the callback will not be trigerred for these requests. Here we added a
flag to indicate the request is from queued_req list or not, so for the failure
submission, we can triger user's callback.
Fix issue #1097
Change-Id: I901ac81733c2319e540d24baf5b8faa1c649eb35
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/477754
Community-CI: SPDK CI Jenkins <sys_sgci@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
nvme_qpair_get_state fits more closely with the semantics in other
modules.
Change-Id: I6ea8e02abe27253d9b4d779a43ac1963be56356a
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/476920
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
The qpair state transport_qpair_is_failed is actually equivalent to
NVME_QPAIR_IS_CONNECTED in the qpair state machine.
There are a couple of places where we check against
transport_qp_is_failed and then immediately check to see if we are in
the connected state. If we are failed, or we are not in the connected
state we return the same value to the calling function.
Since the checks for transport_qpair_is_failed are not necessary, they
can be removed. As a result, there is no need to keep track of it and it
can be removed from the qpair structure.
Change-Id: I4aef5d20eb267bfd6118e5d1d088df05574d9ffd
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/475802
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Purpose: To remove the duplicated code.
Change-Id: Iab9989f9928698967533e45e7cffad4f09bde16a
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/473376
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Disconnecting qpairs from the admin thread during a reset led to an
inevitable race with the data thread. QP related memory is freed during
the disconnect and cannot be touched from the other threads.
The only way to fix this is to force the qpair disconnect onto the
data thread.
This requires a small change in the way that resets are handled for
pcie. Please see the code in reset.c for that change.
fixes: bb01a089
Change-Id: I8a39e444c7cbbe85fafca42ffd040e929721ce95
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/472749
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This will be the canonical way of informing the user that we have lost
the qpair connection somehow.
Also update all of the functions that will return -ENXIO to the user.
Change-Id: Ic6c7c2d0e07e9d3e857a3476bb6b91fb4b6454fa
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/471416
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Failed is not a final state for either fabric or pcie controllers. We
have historically not allowed resets in the failed state, but we should.
Instead of checking for the failed state, we should check for the
removed state. If the controller is removed, then we cannot even attempt
a reset.
Change-Id: I2c1a3d85db84f84cd1895cbfaf16575c8b496155
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/471415
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
We shouldn't always fail the whole controller if we get a failure on an
individual qpair.
Change-Id: Id0c90af83e5231593a895be66e7a7de48939e240
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/471660
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
check_enabled had a couple bugs in it that made it unfriendly for enabling
I/O qpairs after a reset.
1. It was calling nvme_qpair_abort_queued_requests before setting the
enabled flag to true. For applications that submit new I/O in the
completion callback for old I/O, this means you enter an infinite loop
of submitting requests, and then immediately completing them. SO
instead, wait for the qpair to reset, then just submit those requests to
the lower layer.
2. It didn't check whether we were already in the middle of calling it,
so we could reenter function calls like
nvme_qpair_abort_queued_requests.
Also, now that we have a coherent state machine for qpairs, we can limit
the enabling to a specific state in that state machine.
Change-Id: Ie0b74819a6b16839965bced47c33dec967f725a8
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/470256
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
These will form the base of a little state machine for managing the nvme
qpair structure.
Change-Id: If6f6df38cc17221ac8fcb7d8c0d7e2e808897a99
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/470534
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
When doing a reset on an NVMe-oF target with active I/O qpairs, we need
to be able to submit fabrics commands on them in order to perform a reset.
Currently, resetting a fabric controller with any I/O qpairs active will
cause the reset to hang indefinitely.
Change-Id: Ic972a301390a4dd64adabedfe01aa4e5253e40b0
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/469935
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
My recent changes that introduced batching to queued request
resubmission also introduced a regression that can lead to reordering
requests before submitting them to the drive. This change prevents that.
We wait until inside the internal _nvme_qpair_submit_request function to
check for queued entries to avoid queueing a request that has children.
If a request that has children gets queued, when we process completions
and resubmit the parent, it will result in the children being submitted.
Since we only account for the number of requests we completed in the
last iteration, some of the child requests may be requeued out of order,
or worse, none of the child requests will end up being submitted to the
transport and they will all be queued behind previously queued requests.
Change-Id: I58e1c458c25fbf3f9f75364f05b1076b166a6212
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/470890
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Use the standard API function to fail the controller in all cases.
This patch, and the several following patches are aimed at creating a
mechanism for reporting up to the application layer that a controller is
failed and or removed. To do this, I use the reset_cb to inform the
upper layer that the controller is failed.
This also requires changes to how we handle a controller reset to
pave the way for doing optional reset retries in the libraries.
Change-Id: I06dfce08326c23472a1caa8f6efbac2fd1a720f2
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/469635
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
We were already passing up from each transport the number of completions
done during the transport specific call. So just use that return code
and batch all of the submissions together at one time in the generic
code.
This change and subsequent moves of code from the transport layer to the
genric layer are aimed at making reset handling at the generic NVMe
layer simpler.
Change-Id: I028aea86d76352363ffffe661deec2215bc9c450
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/469757
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>