Fix a deadlock in the shutdown code:

When performing a smp_rendezvous() or more likely, on amd64 and i386,
a smp_tlb_shootdown() the caller will end up with the smp_ipi_mtx
spinlock held, busy-waiting for other CPUs to acknowledge the operation.
As long as CPUs are suspended (via cpu_reset()) between the active mask
read and IPI sending there can be a deadlock where the caller will wait
forever for a dead CPU to acknowledge the operation.
Please note that on CPU0 that is going to be someway heavier because of
the spinlocks being disabled earlier than quitting the machine.

Fix this bug by calling cpu_reset() with the smp_ipi_mtx held.
Note that it is very likely that a saner offline/online CPUs mechanism
will help heavilly in fixing similar cases as it is likely more bugs
of this type may arise in the future.

Reported by:	rwatson
Discussed with:	jhb
Tested by:	rnoland, Giovanni Trematerra
		<giovanni dot trematerra at gmail dot com>
MFC:		2 weeks

Special deciation to:	anyone who made possible to have 16-ways machines
			in Netperf
This commit is contained in:
Attilio Rao 2010-04-19 23:27:54 +00:00
parent 6da6d0a9e3
commit 248bb9379f

View File

@ -62,7 +62,7 @@ __FBSDID("$FreeBSD$");
#include <sys/reboot.h>
#include <sys/resourcevar.h>
#include <sys/sched.h>
#include <sys/smp.h> /* smp_active */
#include <sys/smp.h>
#include <sys/sysctl.h>
#include <sys/sysproto.h>
@ -485,15 +485,20 @@ static void
shutdown_reset(void *junk, int howto)
{
/*
* Disable interrupts on CPU0 in order to avoid fast handlers
* to preempt the stopping process and to deadlock against other
* CPUs.
*/
spinlock_enter();
printf("Rebooting...\n");
DELAY(1000000); /* wait 1 sec for printf's to complete and be read */
/*
* Acquiring smp_ipi_mtx here has a double effect:
* - it disables interrupts avoiding CPU0 preemption
* by fast handlers (thus deadlocking against other CPUs)
* - it avoids deadlocks against smp_rendezvous() or, more
* generally, threads busy-waiting, with this spinlock held,
* and waiting for responses by threads on other CPUs
* (ie. smp_tlb_shootdown()).
*/
mtx_lock_spin(&smp_ipi_mtx);
/* cpu_boot(howto); */ /* doesn't do anything at the moment */
cpu_reset();
/* NOTREACHED */ /* assuming reset worked */