Fix an issue with critical sections and SMP rendezvous handlers.

Specifically, a critical_exit() call that drops the nesting level to zero
has a brief window where the pending preemption flag is set and the
nesting level is set to zero.  This is done purposefully to avoid races
where a preemption scheduled by an interrupt could be lost otherwise (see
revision 144777).  However, this does mean that if an interrupt fires
during this window and enters and exits a critical section, it may preempt
from the interrupt context.  This is generally fine as the interrupt code
is careful to arrange critical sections so that they are not exited until
it is safe to preempt (e.g. interrupts EOI'd and masked if necessary).

However, the SMP rendezvous IPI handler does not quite follow this rule,
and in general a rendezvous can never be preempted.  Rendezvous handlers
are also not permitted to schedule threads to execute, so they will not
typically trigger preemptions.  SMP rendezvous handlers may use
spinlocks (carefully) such as the rm_cleanIPI() handler used in rmlocks,
but using a spinlock also enters and exits a critical section.  If the
interrupted top-half code is in the brief window of critical_exit() where
the nesting level is zero but a preemption is pending, then releasing the
spinlock can trigger a preemption.  Because we know that SMP rendezvous
handlers can never schedule a thread, we know that a critical_exit() in
an SMP rendezvous handler will only preempt in this edge case.  We also
know that the top-half thread will happily handle the deferred preemption
once the SMP rendezvous has completed, so the preemption will not be lost.

This makes it safe to employ a workaround where we use a nested critical
section in the SMP rendezvous code itself around rendezvous action
routines to prevent any preemptions during an SMP rendezvous.  The
workaround intentionally avoids checking for a deferred preemption
when leaving the critical section on the assumption that if there is a
pending preemption it will be handled by the interrupted top-half code.

Submitted by:	mlaier (variation specific to rm_cleanIPI())
Obtained from:	Isilon
MFC after:	1 week
This commit is contained in:
John Baldwin 2011-05-24 13:36:41 +00:00
parent af21235ac4
commit 47ad691f87

View File

@ -311,11 +311,15 @@ restart_cpus(cpumask_t map)
void
smp_rendezvous_action(void)
{
struct thread *td;
void *local_func_arg;
void (*local_setup_func)(void*);
void (*local_action_func)(void*);
void (*local_teardown_func)(void*);
int generation;
#ifdef INVARIANTS
int owepreempt;
#endif
/* Ensure we have up-to-date values. */
atomic_add_acq_int(&smp_rv_waiters[0], 1);
@ -329,6 +333,34 @@ smp_rendezvous_action(void)
local_teardown_func = smp_rv_teardown_func;
generation = smp_rv_generation;
/*
* Use a nested critical section to prevent any preemptions
* from occurring during a rendezvous action routine.
* Specifically, if a rendezvous handler is invoked via an IPI
* and the interrupted thread was in the critical_exit()
* function after setting td_critnest to 0 but before
* performing a deferred preemption, this routine can be
* invoked with td_critnest set to 0 and td_owepreempt true.
* In that case, a critical_exit() during the rendezvous
* action would trigger a preemption which is not permitted in
* a rendezvous action. To fix this, wrap all of the
* rendezvous action handlers in a critical section. We
* cannot use a regular critical section however as having
* critical_exit() preempt from this routine would also be
* problematic (the preemption must not occur before the IPI
* has been acknowleged via an EOI). Instead, we
* intentionally ignore td_owepreempt when leaving the
* critical setion. This should be harmless because we do not
* permit rendezvous action routines to schedule threads, and
* thus td_owepreempt should never transition from 0 to 1
* during this routine.
*/
td = curthread;
td->td_critnest++;
#ifdef INVARIANTS
owepreempt = td->td_owepreempt;
#endif
/*
* If requested, run a setup function before the main action
* function. Ensure all CPUs have completed the setup
@ -362,14 +394,18 @@ smp_rendezvous_action(void)
*/
MPASS(generation == smp_rv_generation);
atomic_add_int(&smp_rv_waiters[2], 1);
if (local_teardown_func == smp_no_rendevous_barrier)
return;
while (smp_rv_waiters[2] < smp_rv_ncpus &&
generation == smp_rv_generation)
cpu_spinwait();
if (local_teardown_func != smp_no_rendevous_barrier) {
while (smp_rv_waiters[2] < smp_rv_ncpus &&
generation == smp_rv_generation)
cpu_spinwait();
if (local_teardown_func != NULL)
local_teardown_func(local_func_arg);
if (local_teardown_func != NULL)
local_teardown_func(local_func_arg);
}
td->td_critnest--;
KASSERT(owepreempt == td->td_owepreempt,
("rendezvous action changed td_owepreempt"));
}
void