ioat(4): Fix race between process_events and reset_hw

In the case where a hardware error is detected during
ioat_process_events, hardware may advance (by one descriptor, probably)
and a subsequent ioat_process_events may race the intended ioat_reset_hw
followup.  In that case, the second process_events would observe a
completion update that does not match the software "last_seen" status,
and attempt to successfully complete already-failed descriptors.

Guard against this race with the resetting_cleanup flag.

Reviewed by:	bdrewery, markj
Sponsored by:	Dell EMC Isilon
This commit is contained in:
Conrad Meyer 2016-11-11 20:09:54 +00:00
parent 836055542f
commit 3a37091931

View File

@ -765,6 +765,15 @@ out:
mtx_lock(&ioat->submit_lock);
mtx_lock(&ioat->cleanup_lock);
ioat->quiescing = TRUE;
/*
* This is safe to do here because we have both locks and the submit
* queue is quiesced. We know that we will drain all outstanding
* events, so ioat_reset_hw can't deadlock. It is necessary to
* protect other ioat_process_event threads from racing ioat_reset_hw,
* reading an indeterminate hw state, and attempting to continue
* issuing completions.
*/
ioat->resetting_cleanup = TRUE;
chanerr = ioat_read_4(ioat, IOAT_CHANERR_OFFSET);
if (1 <= g_ioat_debug_level)