freebsd-dev/sys/kern
Hans Petter Selasky 10c8755706 Fix for race leading to endless timer interrupts related to
configtimer().

During normal operation "state->nextcallopt" will always be less than
or equal to "state->nextcall" and checking only "state->nextcallopt"
before calling "callout_process()" is sufficient. However when
"configtimer()" is called a race might happen requiring both of these
binary times to be checked.

Short description of race:

1) A configtimer() call will reset both "state->nextcall" and
"state->nextcallopt" to the same binary time.

2) If a "callout_reset()" call happens between "configtimer()" and the
next "callout_process()" call, "state->nextcallopt" will get updated
and "state->nextcall" will remain at the current time. Refer to logic
inside cpu_new_callout().

3) getnextcpuevent() only respects "state->nextcall" and returns this
value over and over again, even if it is in the past, until "now >=
state->nextcallopt" becomes true. Then these two time variables are
corrected by a "callout_process()" call and the situation goes back to
normal.

The problem manifests itself in different ways. The common factor is
the timer process(es) consume all CPU on one or more CPU cores for a
long time, blocking other kernel processes from getting execution
time. This can be seen by very high interrupt counts as displayed by
"vmstat -i | grep timer" right after boot.

When EARLY_AP_STARTUP was enabled in r310177 the likelyhood of hitting
this bug apparently increased.

Example output from "vmstat -i" before patch:
cpu0:timer                          7591         69
cpu9:timer                      39031773     358089
cpu4:timer                          9359         85
cpu3:timer                          9100         83
cpu2:timer                          9620         88

Example output from "vmstat -i" after patch:
cpu0:timer                          4242         34
cpu6:timer                          5531         44
cpu3:timer                          6450         52
cpu1:timer                          4545         36
cpu9:timer                          7153         58

Before the patch cpu9 in the example above, was spinning in a loop in
order to reach 39 million interrupts just a few seconds after
bootup. After the patch the timer interrupt counts are more or less
consistent.

Discussed with:		mav @
Reported by:		several people
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-01-20 17:40:31 +00:00
..
bus_if.m "Buses" is the preferred plural of "bus" 2017-01-15 17:54:01 +00:00
capabilities.conf Update capabilities.conf comment 2016-09-08 14:04:04 +00:00
clock_if.m
cpufreq_if.m
device_if.m Import the 'iflib' API library for network drivers. From the author: 2016-05-18 04:35:58 +00:00
genassym.sh genassym.sh: call nm(1) with NMFLAGS. 2015-08-14 22:57:13 +00:00
imgact_aout.c Implement vsyscall hack. Prior to 2.13 glibc uses vsyscall 2016-01-09 20:18:53 +00:00
imgact_binmisc.c sys/kern: spelling fixes in comments. 2016-04-29 22:15:33 +00:00
imgact_elf32.c
imgact_elf64.c
imgact_elf.c don't abort writing of a core dump after EFAULT 2017-01-20 13:39:07 +00:00
imgact_gzip.c Implement lockless resource limits. 2015-06-10 10:48:12 +00:00
imgact_shell.c
inflate.c ANSIfy inflate.c 2016-10-04 17:57:30 +00:00
init_main.c Fix improper use of "its". 2016-11-08 23:59:41 +00:00
init_sysent.c Regen after r310638. 2016-12-27 20:22:17 +00:00
kern_acct.c Revert r312119 and reword the intent to fix -Wshadow issues 2017-01-15 09:25:33 +00:00
kern_alq.c Use SI_SUB_LAST instead of SI_SUB_SMP as the "catch-all" subsystem. 2016-03-11 23:18:06 +00:00
kern_clock.c Initialize 'ticks' earlier in boot after 'hz' is set. 2016-11-22 01:02:59 +00:00
kern_clocksource.c Fix for race leading to endless timer interrupts related to 2017-01-20 17:40:31 +00:00
kern_condvar.c cv: do a lockless check for no waiters in cv_signal and cv_broadcastpri 2016-09-06 17:16:59 +00:00
kern_conf.c Undo r309891. Konstantin is right in that this condition normally 2016-12-12 19:11:04 +00:00
kern_cons.c Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
kern_context.c
kern_cpu.c Add an EARLY_AP_STARTUP option to start APs earlier during boot. 2016-05-14 18:22:52 +00:00
kern_cpuset.c Add more fine-grained kernel options for NUMA support. 2016-04-09 13:58:04 +00:00
kern_ctf.c Fix improper use of "its". 2016-11-08 23:59:41 +00:00
kern_descrip.c Remove deprecated fgetsock() and fputsock(). 2017-01-13 22:16:41 +00:00
kern_dtrace.c
kern_dump.c Add support for encrypted kernel crash dumps. 2016-12-10 16:20:39 +00:00
kern_environment.c Create wrappers for uint64_t and int64_t for the tunables. While not 2016-04-15 03:09:55 +00:00
kern_et.c Add labels to sysctls related to clocks. 2016-12-14 12:56:58 +00:00
kern_event.c Add kevent EVFILT_EMPTY for notification when a client has received all data 2017-01-16 08:25:33 +00:00
kern_exec.c Explicitely add "opt_compat.h" to kern_exec.c: fix powerpc LINT builds. 2017-01-06 16:56:24 +00:00
kern_exit.c When a zombie gets reparented due to the parent exit, send SIGCHLD to 2016-12-12 11:11:50 +00:00
kern_fail.c Fix some cosmetic issues in kern_fail.c omitted from r296927. 2016-06-09 13:17:08 +00:00
kern_ffclock.c kernel: use our nitems() macro when it is available through param.h. 2016-04-19 23:48:27 +00:00
kern_fork.c vfs: add vrefact, to be used when the vnode has to be already active 2016-12-12 15:37:11 +00:00
kern_gzio.c
kern_hhook.c Get closer to a VIMAGE network stack teardown from top to bottom rather 2016-06-21 13:48:49 +00:00
kern_idle.c
kern_intr.c The part of r285680 which removed release semantic for two stores to 2015-07-21 14:39:34 +00:00
kern_jail.c Move IPv4-specific jail functions to new file netinet/in_jail.c 2016-08-09 02:16:21 +00:00
kern_khelp.c
kern_kthread.c Re-schedule signals after kthread exits, since apparently there are 2016-08-10 13:47:12 +00:00
kern_ktr.c Fix the logic in the ddb command 'show ktr /a'. Prior to r118269 it would 2016-01-31 17:32:20 +00:00
kern_ktrace.c ANSYfy kern_ktrace.c and remove archaic register keyword 2017-01-20 14:59:56 +00:00
kern_linker.c kern_linker: Handle module-loading failures in preloaded .ko files 2016-10-13 02:06:23 +00:00
kern_lock.c Microoptimize locking primitives by avoiding unnecessary atomic ops. 2016-06-01 18:32:20 +00:00
kern_lockf.c Fix LINT building. 2016-09-18 07:37:00 +00:00
kern_lockstat.c Consistently use a reader/writer flag for lockstat probes in rwlock(9) and 2015-07-19 22:24:33 +00:00
kern_loginclass.c Speed up rctl operation with large rulesets, by holding the lock 2015-11-15 12:10:51 +00:00
kern_malloc.c Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
kern_mbuf.c Import the 'iflib' API library for network drivers. From the author: 2016-05-18 04:35:58 +00:00
kern_mib.c Mark a bunch of mpsafe sysctls as such. 2016-10-19 19:42:01 +00:00
kern_module.c Provide better debug message on kernel module name clash. 2015-10-10 09:21:55 +00:00
kern_mtxpool.c sys/kern: spelling fixes in comments. 2016-04-29 22:15:33 +00:00
kern_mutex.c mtx: plug open-coded mtx_lock access missed in r311172 2017-01-04 02:25:31 +00:00
kern_ntptime.c Fix a bug in r302252. 2016-07-27 11:40:06 +00:00
kern_numa.c Add an initial NUMA affinity/policy configuration for threads and processes. 2015-07-11 15:21:37 +00:00
kern_osd.c osd(9): Change array pointer to array pointer type from void* 2016-04-26 19:57:35 +00:00
kern_physio.c Add four new RCTL resources - readbps, readiops, writebps and writeiops, 2016-04-07 04:23:25 +00:00
kern_pmc.c
kern_poll.c
kern_priv.c
kern_proc.c Export the whole thread name in kinfo_proc 2016-12-07 15:04:22 +00:00
kern_procctl.c reaper: Make REAPER_KILL_SUBTREE actually work. 2016-12-14 22:49:20 +00:00
kern_prot.c Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
kern_racct.c Get rid of rctl_lock; use racct_lock where appropriate. The fast paths 2016-04-21 16:22:52 +00:00
kern_rangelock.c
kern_rctl.c sys/kern: spelling fixes in comments. 2016-04-29 22:15:33 +00:00
kern_resource.c Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
kern_rmlock.c sys/kern: spelling fixes in comments. 2016-04-29 22:15:33 +00:00
kern_rwlock.c rwlock: reduce lock accesses similarly to r311172 2017-01-18 17:53:57 +00:00
kern_sdt.c
kern_sema.c
kern_sendfile.c Move bogus_page declaration to vm_page.h and initialization to vm_page.c. 2017-01-04 22:27:19 +00:00
kern_sharedpage.c Split kerne timekeep ABI structure vdso_sv_tk out of the struct 2015-11-23 07:09:35 +00:00
kern_shutdown.c Stop the scheduler upon panic even in non-SMP kernels. 2017-01-14 22:16:03 +00:00
kern_sig.c Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
kern_switch.c
kern_sx.c sx: reduce lock accesses similarly to r311172 2017-01-18 17:55:08 +00:00
kern_synch.c disambiguate msleep KASSERT diagnostics 2017-01-16 20:34:42 +00:00
kern_syscalls.c Implement lockless resource limits. 2015-06-10 10:48:12 +00:00
kern_sysctl.c Document the existence of the {0, 6, ...} sysctl. 2016-12-15 15:45:11 +00:00
kern_tc.c Add labels to sysctls related to clocks. 2016-12-14 12:56:58 +00:00
kern_thr.c thr_set_name(): silently truncate the given name as needed 2016-12-03 01:14:21 +00:00
kern_thread.c Rewrite subr_sleepqueue.c use of callouts to not depend on the 2016-07-28 09:09:55 +00:00
kern_time.c Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
kern_timeout.c Permit timed sleeps for threads other than thread0 before timers are working. 2016-11-25 18:02:43 +00:00
kern_umtx.c [mips] make UMTX_CHAINS configurable at compile time. 2016-11-15 01:34:38 +00:00
kern_uuid.c
kern_xxx.c Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
ksched.c Use P1B_PRIO_MAX to designate max posix priority for the RR/FIFO 2015-08-30 18:02:57 +00:00
link_elf_obj.c sys/kern: spelling fixes in comments. 2016-04-29 22:15:33 +00:00
link_elf.c kern: for pointers replace 0 with NULL. 2016-04-15 16:10:11 +00:00
linker_if.m sys/kern: spelling fixes in comments. 2016-04-29 22:15:33 +00:00
Make.tags.inc Bring the tags and links entries for amd64 up to date. 2015-10-27 22:59:24 +00:00
Makefile Don't create pointless backups of generated files in "make sysent". 2016-07-28 21:29:04 +00:00
makesyscalls.sh makesyscalls.sh: remove trailing space on the "created from" line 2016-10-17 13:52:24 +00:00
md4c.c crypto routines: Hint minimum buffer sizes to the compiler 2016-05-26 19:29:29 +00:00
md5c.c crypto routines: Hint minimum buffer sizes to the compiler 2016-05-26 19:29:29 +00:00
msi_if.m Introduce MSI and MSI-X support to intrng. This adds a new msi device 2016-05-16 09:11:40 +00:00
p1003_1b.c In preparation for switching linuxulator to the use the native 1:1 2015-05-24 14:44:06 +00:00
pic_if.m INTRNG: Rework handling with resources. Partially revert r301453. 2016-08-19 10:52:39 +00:00
posix4_mib.c posix4_mib: Don't overrun facility_initialized array 2016-04-27 00:10:32 +00:00
sched_4bsd.c fix a thread preemption regression in schedulers introduced in r270423 2017-01-19 18:46:41 +00:00
sched_ule.c fix a thread preemption regression in schedulers introduced in r270423 2017-01-19 18:46:41 +00:00
serdev_if.m
stack_protector.c Use nitems() macro instead of __arraycount() 2015-06-16 20:19:00 +00:00
subr_acl_nfs4.c Expose an interface to determine if an ACE is inherited. 2015-09-04 00:14:20 +00:00
subr_acl_posix1e.c
subr_autoconf.c Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
subr_blist.c Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
subr_bufring.c
subr_bus_dma.c Fix a bug introduced in r291716: 2016-01-11 20:38:39 +00:00
subr_bus.c "Buses" is the preferred plural of "bus" 2017-01-15 17:54:01 +00:00
subr_busdma_bufalloc.c Fix printf format to allow for bus_size_t not being u_long on all platforms. 2015-10-20 03:25:17 +00:00
subr_capability.c capsicum: plug spurious memset in __cap_rights_init 2015-12-01 02:48:42 +00:00
subr_clock.c Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
subr_counter.c Zero return value when counter_rate() switches over to next second and 2016-12-13 20:11:45 +00:00
subr_devmap.c Include machine/acle-compat.h in cdefs.h on arm if the compiler doesn't 2016-05-25 19:44:26 +00:00
subr_devstat.c Add support for managing Shingled Magnetic Recording (SMR) drives. 2016-05-19 14:08:36 +00:00
subr_disk.c
subr_dummy_vdso_tc.c
subr_eventhandler.c
subr_fattime.c
subr_firmware.c Fix improper use of "its". 2016-11-08 23:59:41 +00:00
subr_gtaskqueue.c Remove Assert that seems to be hit in various configurations during 2017-01-16 19:01:41 +00:00
subr_hash.c Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
subr_hints.c
subr_intr.c INTRNG - fix MSI/MSIX release path 2016-10-11 17:00:29 +00:00
subr_kdb.c
subr_kobj.c
subr_lock.c Implement trivial backoff for locking primitives. 2016-08-01 21:48:37 +00:00
subr_log.c Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
subr_mbpool.c sys/kern: spelling fixes in comments. 2016-04-29 22:15:33 +00:00
subr_mchain.c Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
subr_module.c preload_search_info: make sure mod is set 2015-08-21 15:57:57 +00:00
subr_msgbuf.c sys/kern: spelling fixes in comments. 2016-04-29 22:15:33 +00:00
subr_param.c Initialize 'ticks' earlier in boot after 'hz' is set. 2016-11-22 01:02:59 +00:00
subr_pcpu.c Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
subr_pctrie.c sys: extend use of the howmany() macro when available. 2016-04-26 15:38:17 +00:00
subr_power.c
subr_prf.c Include <stdarg.h> instead of <machine/stdarg.h> when compiled as 2016-10-24 18:03:04 +00:00
subr_prof.c Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
subr_rman.c Add new bus methods for mapping resources. 2016-05-20 17:57:47 +00:00
subr_rtc.c Make resettodr_lock accessible outside subr_rtc.c. Protect 2016-09-21 10:15:08 +00:00
subr_sbuf.c Fail the sbuf if vsnprintf(3) fails. 2015-10-02 09:23:14 +00:00
subr_scanf.c Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
subr_sfbuf.c
subr_sglist.c Add sglist functions for working with arrays of VM pages. 2016-05-20 23:28:43 +00:00
subr_sleepqueue.c Add a comment explaining the race fixed by r310423. 2016-12-23 05:02:17 +00:00
subr_smp.c Handle broadcast NMIs. 2016-10-24 16:40:27 +00:00
subr_stack.c Add support for a configurable output channel to witness(4). 2015-11-19 05:56:59 +00:00
subr_syscall.c Add PROC_TRAPCAP procctl(2) controls and global sysctl kern.trap_enocap. 2016-09-21 08:23:33 +00:00
subr_taskqueue.c While draining a timeout task prevent the taskqueue_enqueue_timeout() 2016-09-29 10:38:20 +00:00
subr_terminal.c
subr_trap.c The assertion re-added in r302614 was triggered when stopping signal 2016-07-18 10:53:47 +00:00
subr_turnstile.c ddb(4): Add sleepchains to "show allchains" 2016-10-22 18:02:20 +00:00
subr_uio.c In the fueword64(9) wrapper for architectures which do not implemented 2016-10-23 11:23:17 +00:00
subr_unit.c Clean up trailing whitespace 2017-01-14 04:16:13 +00:00
subr_vmem.c subr_vmem: Fix double-free in error case of vmem_create 2016-05-11 23:16:11 +00:00
subr_witness.c Fix WITNESS hints for pagequeue locks. 2016-10-29 20:01:48 +00:00
sys_capability.c capsicum: perform copyout without the fildesc lock held in sys_cap_ioctls_get 2016-10-21 16:12:23 +00:00
sys_generic.c Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
sys_pipe.c Generate syscall tables and update pipe() implementation after r302094. 2016-06-22 21:18:19 +00:00
sys_procdesc.c Hide the boottime and bootimebin globals, provide the getboottime(9) 2016-07-27 11:08:59 +00:00
sys_process.c Don't set P2_PTRACE_FSTP in a process that invokes ptrace(PT_TRACE_ME). 2016-08-19 17:57:14 +00:00
sys_socket.c Set MORETOCOME for AIO write requests on a socket. 2017-01-06 23:41:45 +00:00
syscalls.c Regen after r310638. 2016-12-27 20:22:17 +00:00
syscalls.master Rename the 'flags' argument to getfsstat() to 'mode' and validate it. 2016-12-27 20:21:11 +00:00
systrace_args.c Regen after r310638. 2016-12-27 20:22:17 +00:00
sysv_ipc.c
sysv_msg.c Remove a comment that was part of copied code, and is misleading in 2016-06-09 15:34:33 +00:00
sysv_sem.c sys/kern: spelling fixes in comments. 2016-04-29 22:15:33 +00:00
sysv_shm.c Add shmatt_t. 2016-07-26 17:23:49 +00:00
tty_compat.c Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
tty_info.c Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
tty_inq.c Check tty_gone() after allocating IO buffers. The tty lock has to be 2017-01-13 16:37:38 +00:00
tty_outq.c Check tty_gone() after allocating IO buffers. The tty lock has to be 2017-01-13 16:37:38 +00:00
tty_pts.c sys/kern: spelling fixes in comments. 2016-04-29 22:15:33 +00:00
tty_tty.c tty: replace several curthread->td_proc with stored curproc 2015-07-06 18:53:56 +00:00
tty_ttydisc.c Don't clear the software flow control flag before draining for last 2016-01-26 14:46:39 +00:00
tty.c Correct the comments about how much buffer is allocated. 2017-01-13 17:03:23 +00:00
uipc_accf.c Use correct size type in do_setopt_accept_filter 2016-10-12 00:56:49 +00:00
uipc_debug.c Refactor the AIO subsystem to permit file-type-specific handling and 2016-03-01 18:12:14 +00:00
uipc_domain.c Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
uipc_mbuf2.c Remove writability requirement for single-mbuf, contiguous-range 2017-01-12 06:38:03 +00:00
uipc_mbuf.c Suppress a warning about m_assertbuf being unused. 2017-01-15 03:53:20 +00:00
uipc_mbufhash.c
uipc_mqueue.c Initialize reserved bytes in struct mq_attr and its 32compat 2016-11-14 13:20:10 +00:00
uipc_sem.c Clean up some style(9) violations. 2016-04-14 17:07:26 +00:00
uipc_shm.c When tmpfs and POSIX shm pagein a page for the sole purpose of performing 2016-12-11 19:24:41 +00:00
uipc_sockbuf.c Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
uipc_socket.c Implement kernel support for hardware rate limited sockets. 2017-01-18 13:31:17 +00:00
uipc_syscalls.c Rework r306337. 2016-10-21 18:27:30 +00:00
uipc_usrreq.c Add a new socket option SO_TS_CLOCK to pick from several different clock 2017-01-16 17:46:38 +00:00
vfs_acl.c Replace struct filedesc argument in getvnode with struct thread 2015-06-16 13:09:18 +00:00
vfs_aio.c Remove duplicated code. 2016-08-17 10:14:22 +00:00
vfs_bio.c Do not set BIO_DONE if the BIO specifies a completion handler. 2017-01-10 21:41:28 +00:00
vfs_cache.c cache: sprinkle __predict_false 2016-12-29 16:35:49 +00:00
vfs_cluster.c Move bogus_page declaration to vm_page.h and initialization to vm_page.c. 2017-01-04 22:27:19 +00:00
vfs_default.c Do not allocate struct statfs on kernel stack. 2017-01-05 17:19:26 +00:00
vfs_export.c Fix build when no INET and INET6 in kernel config. 2016-11-17 16:13:30 +00:00
vfs_extattr.c Replace struct filedesc argument in getvnode with struct thread 2015-06-16 13:09:18 +00:00
vfs_hash.c Add vfs_hash_ref(9) function, which finds a vnode by the hash value 2016-05-11 06:32:22 +00:00
vfs_init.c Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
vfs_lookup.c Fix bug that would result in a kernel crash in some cases involving 2017-01-04 14:43:57 +00:00
vfs_mount.c Do not allocate struct statfs on kernel stack. 2017-01-05 17:19:26 +00:00
vfs_mountroot.c Limit scope of the optimization in r306608 to dounmount() caller only. 2016-10-07 11:38:28 +00:00
vfs_subr.c vfs: switch nodes_created, recycles_count and free_owe_inact to counter(9) 2016-12-31 19:59:31 +00:00
vfs_syscalls.c Do not allocate struct statfs on kernel stack. 2017-01-05 17:19:26 +00:00
vfs_vnops.c Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
vnode_if.src Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00