From 10c8755706ec266d357e10ac7fd1b82fe94a3613 Mon Sep 17 00:00:00 2001 From: Hans Petter Selasky Date: Fri, 20 Jan 2017 17:40:31 +0000 Subject: [PATCH] Fix for race leading to endless timer interrupts related to configtimer(). During normal operation "state->nextcallopt" will always be less than or equal to "state->nextcall" and checking only "state->nextcallopt" before calling "callout_process()" is sufficient. However when "configtimer()" is called a race might happen requiring both of these binary times to be checked. Short description of race: 1) A configtimer() call will reset both "state->nextcall" and "state->nextcallopt" to the same binary time. 2) If a "callout_reset()" call happens between "configtimer()" and the next "callout_process()" call, "state->nextcallopt" will get updated and "state->nextcall" will remain at the current time. Refer to logic inside cpu_new_callout(). 3) getnextcpuevent() only respects "state->nextcall" and returns this value over and over again, even if it is in the past, until "now >= state->nextcallopt" becomes true. Then these two time variables are corrected by a "callout_process()" call and the situation goes back to normal. The problem manifests itself in different ways. The common factor is the timer process(es) consume all CPU on one or more CPU cores for a long time, blocking other kernel processes from getting execution time. This can be seen by very high interrupt counts as displayed by "vmstat -i | grep timer" right after boot. When EARLY_AP_STARTUP was enabled in r310177 the likelyhood of hitting this bug apparently increased. Example output from "vmstat -i" before patch: cpu0:timer 7591 69 cpu9:timer 39031773 358089 cpu4:timer 9359 85 cpu3:timer 9100 83 cpu2:timer 9620 88 Example output from "vmstat -i" after patch: cpu0:timer 4242 34 cpu6:timer 5531 44 cpu3:timer 6450 52 cpu1:timer 4545 36 cpu9:timer 7153 58 Before the patch cpu9 in the example above, was spinning in a loop in order to reach 39 million interrupts just a few seconds after bootup. After the patch the timer interrupt counts are more or less consistent. Discussed with: mav @ Reported by: several people MFC after: 1 week Sponsored by: Mellanox Technologies --- sys/kern/kern_clocksource.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sys/kern/kern_clocksource.c b/sys/kern/kern_clocksource.c index 7f7769d7a51a..8bacff61d2d7 100644 --- a/sys/kern/kern_clocksource.c +++ b/sys/kern/kern_clocksource.c @@ -207,7 +207,7 @@ handleevents(sbintime_t now, int fake) } } else state->nextprof = state->nextstat; - if (now >= state->nextcallopt) { + if (now >= state->nextcallopt || now >= state->nextcall) { state->nextcall = state->nextcallopt = SBT_MAX; callout_process(now); }