a4b7a5a45c
Eliminate problematic race condition in rte_timer_manage() that can lead to corruption of per-lcore pending-lists (implemented as skip-lists). The race condition occurs when rte_timer_manage() expires multiple timers on lcore A, while lcore B simultaneously invokes rte_timer_reset() for one of the expiring timers (other than the first one). Lcore A splits its pending-list, creating a local list of expired timers linked through their sl_next[0] pointers, and sets the first expired timer to the RUNNING state, all during one list-lock round trip. Lcore A then unlocks the list-lock to run the first callback, and that is when A and B can have different interpretations of the subsequent expired timers' true state. Lcore B sees an expired timer still in the PENDING state, atomically changes the timer to the CONFIG state, locks lcore A's list-lock, and reinserts the timer into A's pending-list. The two lcores try to use the same next-pointers to maintain both lists! Our solution is to remove expired timers from the pending-list and try to set them all to the RUNNING state in one atomic step, i.e., rte_timer_manage() should perform these two actions within one ownership of the list-lock. After splitting the pending-list at the current point in time and trying to set all expired timers to the RUNNING state, we must put back into the pending-list any timers that we failed to set to the RUNNING state, all while still holding the list-lock. It is then safe to release the lock and run the callback functions for all expired timers that remain on our local run-list. Signed-off-by: Robert Sanford <rsanford@akamai.com> |
||
---|---|---|
.. | ||
Makefile | ||
rte_timer_version.map | ||
rte_timer.c | ||
rte_timer.h |