safer wait-free iteration of shared interrupt handlers

The code that iterates a list of interrupt handlers for a (shared) interrupt, whether in the ISR context or in the context of an interrupt thread, does so in a lock-free fashion. Thus, the routines that modify the list need to take special steps to ensure that the iterating code has a consistent view of the list. Previously, those routines tried to play nice only with the code running in the ithread context. The iteration in the ISR context was left to a chance. After commit r336635 atomic operations and memory fences are used to ensure that ie_handlers list is always safe to navigate with respect to inserting and removal of list elements. There is still a question of when it is safe to actually free a removed element. The idea of this change is somewhat similar to the idea of the epoch based reclamation. There are some simplifications comparing to the general epoch based reclamation. All writers are serialized using a mutex, so we do not need to worry about concurrent modifications. Also, all read accesses from the open context are serialized too. So, we can get away just two epochs / phases. When a thread removes an element it switches the global phase from the current phase to the other and then drains the previous phase. Only after the draining the removed element gets actually freed. The code that iterates the list in the ISR context takes a snapshot of the global phase and then increments the use count of that phase before iterating the list. The use count (in the same phase) is decremented after the iteration. This should ensure that there should be no iteration over the removed element when its gets freed. This commit also simplifies the coordination with the interrupt thread context. Now we always schedule the interrupt thread when removing one of handlers for its interrupt. This makes the code both simpler and safer as the interrupt thread masks the interrupt thus ensuring that there is no interaction with the ISR context. P.S. This change matters only for shared interrupts and I realize that those are becoming a thing of the past (and quickly). I also understand that the problem that I am trying to solve is extremely rare. PR: 229106 Reviewed by: cem Discussed with: Samy Al Bahra MFC after: 5 weeks Differential Revision: https://reviews.freebsd.org/D15905
2018-08-03 14:27:28 +00:00 · 2018-08-03 14:27:28 +00:00 · e0fa977ea5
commit e0fa977ea5
parent 380122186e
2 changed files with 67 additions and 28 deletions
--- a/sys/kern/kern_intr.c
+++ b/sys/kern/kern_intr.c
@ -682,6 +682,45 @@ intr_handler_source(void *cookie)
 	return (ie->ie_source);
 }

+/*
+ * If intr_event_handle() is running in the ISR context at the time of the call,
+ * then wait for it to complete.
+ */
+static void
+intr_event_barrier(struct intr_event *ie)
+{
+	int phase;
+
+	mtx_assert(&ie->ie_lock, MA_OWNED);
+	phase = ie->ie_phase;
+
+	/*
+	 * Switch phase to direct future interrupts to the other active counter.
+	 * Make sure that any preceding stores are visible before the switch.
+	 */
+	KASSERT(ie->ie_active[!phase] == 0, ("idle phase has activity"));
+	atomic_store_rel_int(&ie->ie_phase, !phase);
+
+	/*
+	 * This code cooperates with wait-free iteration of ie_handlers
+	 * in intr_event_handle.
+	 * Make sure that the removal and the phase update are not reordered
+	 * with the active count check.
+	 * Note that no combination of acquire and release fences can provide
+	 * that guarantee as Store->Load sequences can always be reordered.
+	 */
+	atomic_thread_fence_seq_cst();
+
+	/*
+	 * Now wait on the inactive phase.
+	 * The acquire fence is needed so that that all post-barrier accesses
+	 * are after the check.
+	 */
+	while (ie->ie_active[phase] > 0)
+		cpu_spinwait();
+	atomic_thread_fence_acq();
+}
+
 /*
 * Sleep until an ithread finishes executing an interrupt handler.
 *
@ -752,44 +791,30 @@ intr_event_remove_handler(void *cookie)
 	}

 	/*
-	 * If there is no ithread, then just remove the handler and return.
-	 * XXX: Note that an INTR_FAST handler might be running on another
-	 * CPU!
+	 * If there is no ithread, then directly remove the handler.  Note that
+	 * intr_event_handle() iterates ie_handlers in a lock-less fashion, so
+	 * care needs to be taken to keep ie_handlers consistent and to free
+	 * the removed handler only when ie_handlers is quiescent.
 	 */
 	if (ie->ie_thread == NULL) {
 		CK_SLIST_REMOVE_PREVPTR(prevptr, ih, ih_next);
+		intr_event_barrier(ie);
+		intr_event_update(ie);
 		mtx_unlock(&ie->ie_lock);
 		free(handler, M_ITHREAD);
 		return (0);
 	}

 	/*
-	 * If the interrupt thread is already running, then just mark this
-	 * handler as being dead and let the ithread do the actual removal.
-	 *
-	 * During a cold boot while cold is set, msleep() does not sleep,
-	 * so we have to remove the handler here rather than letting the
-	 * thread do it.
+	 * Let the interrupt thread do the job.
+	 * The interrupt source is disabled when the interrupt thread is
+	 * running, so it does not have to worry about interaction with
+	 * intr_event_handle().
 	 */
-	thread_lock(ie->ie_thread->it_thread);
-	if (!TD_AWAITING_INTR(ie->ie_thread->it_thread) && !cold) {
-		handler->ih_flags |= IH_DEAD;
-
-		/*
-		 * Ensure that the thread will process the handler list
-		 * again and remove this handler if it has already passed
-		 * it on the list.
-		 *
-		 * The release part of the following store ensures
-		 * that the update of ih_flags is ordered before the
-		 * it_need setting.  See the comment before
-		 * atomic_cmpset_acq(&ithd->it_need, ...) operation in
-		 * the ithread_execute_handlers().
-		 */
-		atomic_store_rel_int(&ie->ie_thread->it_need, 1);
-	} else
-		CK_SLIST_REMOVE_PREVPTR(prevptr, ih, ih_next);
-	thread_unlock(ie->ie_thread->it_thread);
+	KASSERT((handler->ih_flags & IH_DEAD) == 0,
+	    ("duplicate handle remove"));
+	handler->ih_flags |= IH_DEAD;
+	intr_event_schedule_thread(ie);
 	while (handler->ih_flags & IH_DEAD)
 		msleep(handler, &ie->ie_lock, 0, "iev_rmh", 0);
 	intr_event_update(ie);
@ -1154,6 +1179,7 @@ intr_event_handle(struct intr_event *ie, struct trapframe *frame)
 	struct trapframe *oldframe;
 	struct thread *td;
 	int ret, thread;
+	int phase;

 	td = curthread;

@ -1178,6 +1204,15 @@ intr_event_handle(struct intr_event *ie, struct trapframe *frame)
 	oldframe = td->td_intr_frame;
 	td->td_intr_frame = frame;

+	phase = ie->ie_phase;
+	atomic_add_int(&ie->ie_active[phase], 1);
+
+	/*
+	 * This fence is required to ensure that no later loads are
+	 * re-ordered before the ie_active store.
+	 */
+	atomic_thread_fence_seq_cst();
+
 	CK_SLIST_FOREACH(ih, &ie->ie_handlers, ih_next) {
 		if (ih->ih_filter == NULL) {
 			thread = 1;
@ -1215,6 +1250,8 @@ intr_event_handle(struct intr_event *ie, struct trapframe *frame)
 				thread = 1;
 		}
 	}
+	atomic_add_rel_int(&ie->ie_active[phase], -1);
+
 	td->td_intr_frame = oldframe;

 	if (thread) {
--- a/sys/sys/interrupt.h
+++ b/sys/sys/interrupt.h
@ -122,6 +122,8 @@ struct intr_event {
 	struct timeval	ie_warntm;
 	int		ie_irq;		/* Physical irq number if !SOFT. */
 	int		ie_cpu;		/* CPU this event is bound to. */
+	volatile int	ie_phase;	/* Switched to establish a barrier. */
+	volatile int	ie_active[2];	/* Filters in ISR context. */
 };

 /* Interrupt event flags kept in ie_flags. */