ioat(4): Fix race between process_events and reset_hw

In the case where a hardware error is detected during ioat_process_events, hardware may advance (by one descriptor, probably) and a subsequent ioat_process_events may race the intended ioat_reset_hw followup. In that case, the second process_events would observe a completion update that does not match the software "last_seen" status, and attempt to successfully complete already-failed descriptors. Guard against this race with the resetting_cleanup flag. Reviewed by: bdrewery, markj Sponsored by: Dell EMC Isilon
2016-11-11 20:09:54 +00:00 · 2016-11-11 20:09:54 +00:00 · 3a37091931
commit 3a37091931
parent 836055542f
1 changed files with 9 additions and 0 deletions
--- a/sys/dev/ioat/ioat.c
+++ b/sys/dev/ioat/ioat.c
@ -765,6 +765,15 @@ out:
 	mtx_lock(&ioat->submit_lock);
 	mtx_lock(&ioat->cleanup_lock);
 	ioat->quiescing = TRUE;
+	/*
+	 * This is safe to do here because we have both locks and the submit
+	 * queue is quiesced.  We know that we will drain all outstanding
+	 * events, so ioat_reset_hw can't deadlock.  It is necessary to
+	 * protect other ioat_process_event threads from racing ioat_reset_hw,
+	 * reading an indeterminate hw state, and attempting to continue
+	 * issuing completions.
+	 */
+	ioat->resetting_cleanup = TRUE;

 	chanerr = ioat_read_4(ioat, IOAT_CHANERR_OFFSET);
 	if (1 <= g_ioat_debug_level)