nvme: fix infinite loop when aborting queued reqs

When we disconnect a qpair, part of the code path is
calling _nvme_qpair_abort_queued_reqs.  This takes
care of aborting any requests that were queued waiting
for slots to open on the submission queue.
It walks the STAILQ one by one and manually completes
them with ABORT status back to the caller.

But if the callback path submits another request, this
request may also get queued to the end of the queued_req
TAILQ.  This can result in an infinite loop.

The solution is to use an STAILQ_SWAP to a local, empty
STAILQ.  Then we ensure we only abort the requests that
were queued when _nvme_qpair_abort_queued_reqs() started
executing.

Fixes issue #1588.

I used the multipath.sh test to reproduce this on my local
system. If it ever dropped into the STAILQ loop in this
function, we would hit the infinite loop.  With this patch,
I confirmed locally that now we safely avoid the infinite
loop and the test passes.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I657db23efe5983bd8613c870ad62695a7fc7f689
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4284
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: <dongx.yi@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This commit is contained in:
Jim Harris 2020-09-16 23:19:34 +00:00
parent 5198fbeda8
commit 792807a444

View File

@ -545,10 +545,14 @@ static void
_nvme_qpair_abort_queued_reqs(struct spdk_nvme_qpair *qpair, uint32_t dnr)
{
struct nvme_request *req;
STAILQ_HEAD(, nvme_request) tmp;
while (!STAILQ_EMPTY(&qpair->queued_req)) {
req = STAILQ_FIRST(&qpair->queued_req);
STAILQ_REMOVE_HEAD(&qpair->queued_req, stailq);
STAILQ_INIT(&tmp);
STAILQ_SWAP(&tmp, &qpair->queued_req, nvme_request);
while (!STAILQ_EMPTY(&tmp)) {
req = STAILQ_FIRST(&tmp);
STAILQ_REMOVE_HEAD(&tmp, stailq);
if (!qpair->ctrlr->opts.disable_error_logging) {
SPDK_ERRLOG("aborting queued i/o\n");
}