Fix sched_switch_migrate():

- In 8.x and above the run-queue locks are nomore shared even in the
  HTT case, so remove the special case.
- The deadlock explained in the removed comment here is still possible
  even with different locks, with the contribution of tdq_lock_pair().
  An explanation is here:
  (hypotesis: a thread needs to migrate on another CPU, thread1 is doing
  sched_switch_migrate() and thread2 is the one handling the sched_switch()
  request or in other words, thread1 is the thread that needs to migrate
  and thread2 is a thread that is going to be preempted, most likely an
  idle thread. Also, 'old' is referred to the context (in terms of
  run-queue and CPU) thread1 is leaving and 'new' is referred to the
  context thread1 is going into.  Finally, thread3 is doing tdq_idletd()
  or sched_balance() and definitively doing tdq_lock_pair())

  * thread1 blocks its td_lock. Now td_lock is 'blocked'
  * thread1 drops its old runqueue lock
  * thread1 acquires the new runqueue lock
  * thread1 adds itself to the new runqueue and sends an IPI_PREEMPT
    through tdq_notify() to the new CPU
  * thread1 drops the new lock
  * thread3, scanning the runqueues, locks the old lock
  * thread2 received the IPI_PREEMPT and does thread_lock() with td_lock
    pointing to the new runqueue
  * thread3 wants to acquire the new runqueue lock, but it can't because
    it is held by thread2 so it spins
  * thread1 wants to acquire old lock, but as long as it is held by
    thread3 it can't
  * thread2 going further, at some point wants to switchin in thread1,
    but it will wait forever because thread1->td_lock is in blocked state

This deadlock has been manifested mostly on 7.x and reported several time
on mailing lists under the voice 'spinlock held too long'.
Many thanks to des@ for having worked hard on producing suitable textdumps
and Jeff for help on the comment wording.

Reviewed by:	jeff
Reported by:	des, others
Tested by:	des, Giovanni Trematerra
		<giovanni dot trematerra at gmail dot com>
		(STABLE_7 based version)
This commit is contained in:
Attilio Rao 2009-09-15 16:56:17 +00:00
parent 95f08808b6
commit 435068aab7

View File

@ -1749,19 +1749,19 @@ sched_switch_migrate(struct tdq *tdq, struct thread *td, int flags)
*/
spinlock_enter();
thread_block_switch(td); /* This releases the lock on tdq. */
TDQ_LOCK(tdn);
/*
* Acquire both run-queue locks before placing the thread on the new
* run-queue to avoid deadlocks created by placing a thread with a
* blocked lock on the run-queue of a remote processor. The deadlock
* occurs when a third processor attempts to lock the two queues in
* question while the target processor is spinning with its own
* run-queue lock held while waiting for the blocked lock to clear.
*/
tdq_lock_pair(tdn, tdq);
tdq_add(tdn, td, flags);
tdq_notify(tdn, td);
/*
* After we unlock tdn the new cpu still can't switch into this
* thread until we've unblocked it in cpu_switch(). The lock
* pointers may match in the case of HTT cores. Don't unlock here
* or we can deadlock when the other CPU runs the IPI handler.
*/
if (TDQ_LOCKPTR(tdn) != TDQ_LOCKPTR(tdq)) {
TDQ_UNLOCK(tdn);
TDQ_LOCK(tdq);
}
TDQ_UNLOCK(tdn);
spinlock_exit();
#endif
return (TDQ_LOCKPTR(tdn));