Fix sched_switch_migrate():
- In 8.x and above the run-queue locks are nomore shared even in the HTT case, so remove the special case. - The deadlock explained in the removed comment here is still possible even with different locks, with the contribution of tdq_lock_pair(). An explanation is here: (hypotesis: a thread needs to migrate on another CPU, thread1 is doing sched_switch_migrate() and thread2 is the one handling the sched_switch() request or in other words, thread1 is the thread that needs to migrate and thread2 is a thread that is going to be preempted, most likely an idle thread. Also, 'old' is referred to the context (in terms of run-queue and CPU) thread1 is leaving and 'new' is referred to the context thread1 is going into. Finally, thread3 is doing tdq_idletd() or sched_balance() and definitively doing tdq_lock_pair()) * thread1 blocks its td_lock. Now td_lock is 'blocked' * thread1 drops its old runqueue lock * thread1 acquires the new runqueue lock * thread1 adds itself to the new runqueue and sends an IPI_PREEMPT through tdq_notify() to the new CPU * thread1 drops the new lock * thread3, scanning the runqueues, locks the old lock * thread2 received the IPI_PREEMPT and does thread_lock() with td_lock pointing to the new runqueue * thread3 wants to acquire the new runqueue lock, but it can't because it is held by thread2 so it spins * thread1 wants to acquire old lock, but as long as it is held by thread3 it can't * thread2 going further, at some point wants to switchin in thread1, but it will wait forever because thread1->td_lock is in blocked state This deadlock has been manifested mostly on 7.x and reported several time on mailing lists under the voice 'spinlock held too long'. Many thanks to des@ for having worked hard on producing suitable textdumps and Jeff for help on the comment wording. Reviewed by: jeff Reported by: des, others Tested by: des, Giovanni Trematerra <giovanni dot trematerra at gmail dot com> (STABLE_7 based version)
This commit is contained in:
parent
95f08808b6
commit
435068aab7
@ -1749,19 +1749,19 @@ sched_switch_migrate(struct tdq *tdq, struct thread *td, int flags)
|
||||
*/
|
||||
spinlock_enter();
|
||||
thread_block_switch(td); /* This releases the lock on tdq. */
|
||||
TDQ_LOCK(tdn);
|
||||
|
||||
/*
|
||||
* Acquire both run-queue locks before placing the thread on the new
|
||||
* run-queue to avoid deadlocks created by placing a thread with a
|
||||
* blocked lock on the run-queue of a remote processor. The deadlock
|
||||
* occurs when a third processor attempts to lock the two queues in
|
||||
* question while the target processor is spinning with its own
|
||||
* run-queue lock held while waiting for the blocked lock to clear.
|
||||
*/
|
||||
tdq_lock_pair(tdn, tdq);
|
||||
tdq_add(tdn, td, flags);
|
||||
tdq_notify(tdn, td);
|
||||
/*
|
||||
* After we unlock tdn the new cpu still can't switch into this
|
||||
* thread until we've unblocked it in cpu_switch(). The lock
|
||||
* pointers may match in the case of HTT cores. Don't unlock here
|
||||
* or we can deadlock when the other CPU runs the IPI handler.
|
||||
*/
|
||||
if (TDQ_LOCKPTR(tdn) != TDQ_LOCKPTR(tdq)) {
|
||||
TDQ_UNLOCK(tdn);
|
||||
TDQ_LOCK(tdq);
|
||||
}
|
||||
TDQ_UNLOCK(tdn);
|
||||
spinlock_exit();
|
||||
#endif
|
||||
return (TDQ_LOCKPTR(tdn));
|
||||
|
Loading…
Reference in New Issue
Block a user