freebsd-nq

History

Attilio Rao 3d7acbbabf Fix several callout migration races:

- Problem1:
   Hypothesis: thread1 is doing a callout_reset_on(), within his
   callout handler, willing to implicitly or explicitly migrate the
   callout.  thread2 is draining the callout.

   Thesys:
   * thread1 calls callout_lock() and locks the old callout cpu
   * thread1 performs the checks in the first path of the
     callout_reset_on()
   * thread1 hits this codepiece:
       /*
        * If the lock must migrate we have to check the state again as
        * we can't hold both the new and old locks simultaneously.
        */
       if (c->c_cpu != cpu) {
               c->c_cpu = cpu;
               CC_UNLOCK(cc);
               goto retry;
       }

     which means it will drop the lock and 'retry'
   * thread2 will callout_lock() and locks the new callout cpu.
     thread1 spins on the new lock and will not keep going for the
     moment.
   * thread2 checks that the callout is not pending (as callout is
     currently running) and that it is not on cc->cc_curr (because cc
     now refers to the new callout and the callout is running on the
     old callout cpu) thus it thinks it is done and returns.
   * thread1  will now acquire the lock and then adds the callout
     to the new callout cpu queue

   That seems an obvious race as callout_stop() falsely reports
   the callout stopped or worse, callout_drain() falsely returns
   while the callout is still in use.
 - Solution1:
   Fixing this problem would require, in general, to lock both
   callout cpus at once while switching the c_cpu field and avoid
   cyclic deadlocks between callout cpus locks.
   The concept of CPUBLOCK is then introduced (working more or less
   like the blocked_lock for thread_lock() function) meaning:
   "in callout_lock(), spin until the c->c_cpu is not different from
   CPUBLOCK". That way the "original" callout cpu, referred to the
   above mentioned code snippet, will remain blocked until the lock
   handover is over critical path will remain covered.

 - Problem2:
   Having the callout currently executed on a specific callout cpu
   and contemporary pending on another callout cpu (as it can happen
   with current code) breaks, at least, the assumption callout_drain()
   returns just once the callout cannot be referenced anymore.
 - Solution2:
   Callout migration is deferred if the current callout is already
   under execution.
   The best place to do that is in softclock() and new members are
   added to the callout cpu structure in order to specify a pending
   migration is requested. That is necessary because the callout
   cannot be trusted (not freed) the 100% of times after the execution
   of the callout handler.
   CPUBLOCK will prevent, in the "deferred migration" case, that the
   callout gets freed in this case, stopping any callout_stop() and
   callout_drain() possible activity until the migration is
   actually performed.

 - Problem3:
   There is a further race in callout_drain().
   In order to avoid a race between sleepqueue lock and callout cpu
   spinlock, in _callout_stop_safe(), the callout cpu lock is dropped,
   the sleepqueue lock is acquired and a new callout cpu lookup is
   performed.  Note that the channel used for locking the sleepqueue is
   obtained from the "current" callout cpu (&cc->cc_waiting).
   If the callout migrated in the meanwhile, callout_drain() will end up
   using the wrong wchan for the sleepqueue (the locked one will be the
   older, while the new one will not really be locked) leading to a
   lock leak and a race access to sleepqueue.
 - Solution3:
   It is enough to check if a migration happened between the operation
   of acquiring the sleepqueue lock and the new callout cpu lock and
   eventually unwind all those and try again.

This problems can lead to deathly races on moderate (4-ways) SMP
environment, leading to easy panic or deadlocks.
The 24-ways of the reporter, could easilly panic, with completely
normal workload, almost daily.
gianni@ kindly wrote the following prof-of-concept which can
panic a FreeBSD machine in less than one hour, in smaller SMP:
http://www.freebsd.org/~attilio/callout/test.c

Reported by:	Nicholas Esborn <nick at desert dot net>, DesertNet
In collabouration with:	gianni, pho, Nicholas Esborn
Reviewed by:	jhb
MFC after:	1 week (*)

* Usually, I would aim for a larger MFC timeout, but I really want this
  in before 8.2-RELEASE, thus re@ accepted a shorter timeout as a special
  case for this patch

2010-12-29 18:17:36 +00:00

bus_if.m

bus_add_child: add specialized default implementation that calls panic

2010-09-13 08:34:20 +00:00

clock_if.m

…

cpufreq_if.m

…

device_if.m

…

genassym.sh

…

imgact_aout.c

Reorganize syscall entry and leave handling.

2010-05-23 18:32:02 +00:00

imgact_elf32.c

…

imgact_elf64.c

…

imgact_elf.c

Add the ability for GDB to printout the thread name along with other

2010-11-22 14:42:13 +00:00

imgact_gzip.c

…

imgact_shell.c

Fix exec_imgact_shell()'s handling of two error cases: (1) Previously, if

2010-09-21 16:24:51 +00:00

inflate.c

…

init_main.c

MFp4:

2010-12-09 02:42:02 +00:00

init_sysent.c

Regen

2010-08-30 14:26:02 +00:00

kern_acct.c

Use ISO C99 integer types in sys/kern where possible.

2010-06-21 09:55:56 +00:00

kern_alq.c

- Rework the underlying ALQ storage to be a circular buffer, which amongst other

2010-04-26 13:48:22 +00:00

kern_clock.c

After some off-list discussion, revert a number of changes to the

2010-11-22 19:32:54 +00:00

kern_clocksource.c

After some off-list discussion, revert a number of changes to the

2010-11-22 19:32:54 +00:00

kern_condvar.c

…

kern_conf.c

Fix race in devfs by using LIST_FIRST() instead of

2010-12-11 08:44:10 +00:00

kern_cons.c

Add descriptions to a handful of sysctl nodes.

2010-08-09 14:48:31 +00:00

kern_context.c

Move prototypes for kern_sigtimedwait() and kern_sigprocmask() to

2010-06-30 18:03:42 +00:00

kern_cpu.c

cpufreq: allocate long-lived buffer for handling of sysctl requests

2010-07-23 16:46:42 +00:00

kern_cpuset.c

Use integer for size of cpuset, as it won't be bigger than INT_MAX,

2010-11-01 00:42:25 +00:00

kern_ctf.c

…

kern_descrip.c

Remove one zero from the double-0.

2010-04-23 14:32:58 +00:00

kern_dtrace.c

Bump KDTRACE_THREAD_ZERO and use M_ZERO as a malloc flag instead of

2010-08-22 11:09:53 +00:00

kern_environment.c

Merge change r198561 from projects/mips to head:

2010-01-10 22:34:18 +00:00

kern_et.c

Refactor timer management code with priority to one-shot operation mode.

2010-09-13 07:25:35 +00:00

kern_event.c

Defer freeing a kevent list until after dropping kqueue locks.

2010-03-30 18:31:55 +00:00

kern_exec.c

- When disabling ktracing on a process, free any pending requests that

2010-10-21 19:17:40 +00:00

kern_exit.c

By using the 32-bit Linux version of Sun's Java Development Kit 1.6

2010-11-22 09:06:59 +00:00

kern_fail.c

Initialize fp_location for explicitly managed fail points, and push

2010-12-21 18:23:03 +00:00

kern_fork.c

Refactor fork1() to make it easier to follow. No functional changes.

2010-12-10 08:33:56 +00:00

kern_gzio.c

Do not set IO_NODELOCKED while writing to vnodes as our consumers

2010-04-30 03:10:53 +00:00

kern_hhook.c

- Introduce the Hhook (Helper Hook) KPI. The KPI is closely modelled on pfil(9),

2010-12-21 13:45:29 +00:00

kern_idle.c

Split P_NOLOAD into a per-thread flag (TDF_NOLOAD).

2009-11-03 16:46:52 +00:00

kern_intr.c

Store interrupt trap frame into struct thread. It allows interrupt handler

2010-06-10 16:14:05 +00:00

kern_jail.c

Don't exit kern_jail_set without freeing options when enforce_statfs

2010-09-10 21:45:42 +00:00

kern_khelp.c

- Introduce the Hhook (Helper Hook) KPI. The KPI is closely modelled on pfil(9),

2010-12-21 13:45:29 +00:00

kern_kthread.c

In thr_exit() and kthread_exit(), only remove thread from

2010-10-23 13:16:39 +00:00

kern_ktr.c

Probabilly defaulting to KTR_GEN is not the right decision when KTR_MASK

2010-07-21 10:14:04 +00:00

kern_ktrace.c

- When disabling ktracing on a process, free any pending requests that

2010-10-21 19:17:40 +00:00

kern_linker.c

kdb_backtrace: use stack_print_ddb instead of stack_print

2010-09-22 06:45:07 +00:00

kern_lock.c

Add dedicated routines to toggle lockmgr flags such as LK_NOSHARE and

2010-08-20 19:46:50 +00:00

kern_lockf.c

…

kern_lockstat.c

…

kern_malloc.c

add kmem_map_free sysctl: query largest contiguous free range in kmem_map

2010-10-09 09:03:17 +00:00

kern_mbuf.c

Use ISO C99 integer types in sys/kern where possible.

2010-06-21 09:55:56 +00:00

kern_mib.c

When compat32 binary asks for the value of hw.machine_arch, report the

2010-07-22 09:13:49 +00:00

kern_module.c

Style fix.

2010-11-22 15:28:54 +00:00

kern_mtxpool.c

…

kern_mutex.c

- Remove <machine/mutex.h>. Most of the headers were empty, and the

2010-11-09 20:46:41 +00:00

kern_ntptime.c

there must be only one SYSINIT with SI_SUB_RUN_SCHEDULER+SI_ORDER_ANY order

2010-09-30 17:05:23 +00:00

kern_osd.c

…

kern_physio.c

Account i/o done on cdevs.

2010-11-25 20:05:11 +00:00

kern_pmc.c

…

kern_poll.c

…

kern_priv.c

Add an extra comment to the SDT probes definition. This allows us to get

2010-08-22 11:18:57 +00:00

kern_proc.c

Fix some more style(9) issues.

2010-11-14 16:10:15 +00:00

kern_prot.c

Revert r210225 - turns out I was wrong; the "/*-" is not license-only

2010-07-18 20:57:53 +00:00

kern_resource.c

- Follow r216313, the sched_unlend_user_prio is no longer needed, always

2010-12-29 09:26:46 +00:00

kern_rmlock.c

No need to include sys/systm.h twice.

2010-11-16 14:08:21 +00:00

kern_rwlock.c

Print the pointer to the lock with the panic message. The previous

2010-03-24 19:21:26 +00:00

kern_sdt.c

Use ISO C99 integer types in sys/kern where possible.

2010-06-21 09:55:56 +00:00

kern_sema.c

…

kern_shutdown.c

Mostly revert r203420, and add similar functionality into ada(4) since the

2010-10-24 16:31:57 +00:00

kern_sig.c

In kern_sigtimedwait(), move initialization code out of process lock,

2010-10-14 08:01:33 +00:00

kern_switch.c

Update several places that iterate over CPUs to use CPU_FOREACH().

2010-06-11 18:46:34 +00:00

kern_sx.c

Fix a sign bug that caused adaptive spinning in sx_xlock() to not work

2010-06-08 16:17:47 +00:00

kern_synch.c

…

kern_syscalls.c

Call chainevh callback when we are invoked with neither MOD_LOAD nor

2010-10-21 20:31:50 +00:00

kern_sysctl.c

Fix uninitialized variable warning that shows on Tinderbox but not my

2010-11-29 21:53:21 +00:00

kern_tc.c

Add parentheses for clarity. The parentheses around the two terms of the &&

2010-11-23 04:50:01 +00:00

kern_thr.c

In thr_exit() and kthread_exit(), only remove thread from

2010-10-23 13:16:39 +00:00

kern_thread.c

MFp4:

2010-12-09 05:16:20 +00:00

kern_time.c

Create a global thread hash table to speed up thread lookup, use

2010-10-09 02:50:23 +00:00

kern_timeout.c

Fix several callout migration races:

2010-12-29 18:17:36 +00:00

kern_umtx.c

- Follow r216313, the sched_unlend_user_prio is no longer needed, always

2010-12-29 09:26:46 +00:00

kern_uuid.c

…

kern_xxx.c

…

ksched.c

sched_getparam was just plain broke for time-share

2010-03-03 21:46:51 +00:00

link_elf_obj.c

2010-11-14 20:14:25 +00:00

link_elf.c

Whitespace and other aspects of style(9). No functional changes.

2010-11-08 20:57:08 +00:00

linker_if.m

…

Make.tags.inc

…

Makefile

Adjust the all target message (but maybe all: sysent is better?

2010-10-02 22:12:41 +00:00

makesyscalls.sh

Count number of threads that enter and leave dynamically registered

2010-06-28 18:06:46 +00:00

md4c.c

…

md5c.c

…

p1003_1b.c

Set various POSIX capability sysctls to the version of the API that is

2010-11-19 17:56:16 +00:00

posix4_mib.c

Set various POSIX capability sysctls to the version of the API that is

2010-11-19 17:56:16 +00:00

sched_4bsd.c

- Follow r216313, the sched_unlend_user_prio is no longer needed, always

2010-12-29 09:26:46 +00:00

sched_ule.c

- Follow r216313, the sched_unlend_user_prio is no longer needed, always

2010-12-29 09:26:46 +00:00

serdev_if.m

…

stack_protector.c

Random number generator initialization cleanup:

2009-10-20 16:36:51 +00:00

subr_acl_nfs4.c

Adapt filesystem-independent NFSv4 ACL code (used by UFS, but not by ZFS)

2010-12-13 18:56:04 +00:00

subr_acl_posix1e.c

execve(2) has a special check for file permissions: a file must have at

2010-08-30 16:30:18 +00:00

subr_autoconf.c

Allow interrupt driven config hooks to be registered from config hook callbacks.

2010-08-12 19:50:40 +00:00

subr_blist.c

…

subr_bufring.c

Switch to our preferred 2-clause BSD license.

2010-05-05 20:39:02 +00:00

subr_bus.c

removed tag is '-', not '+'.

2010-12-02 04:28:01 +00:00

subr_clock.c

Don't tie ct_debug to bootverbose. Provide a sysctl to turn it on or off.

2010-12-09 22:02:48 +00:00

subr_devstat.c

Use ISO C99 integer types in sys/kern where possible.

2010-06-21 09:55:56 +00:00

subr_disk.c

Correct bioq_disksort so that bioq_insert_tail() offers barrier semantic.

2010-09-02 19:40:28 +00:00

subr_eventhandler.c

Split eventhandler_register() into an internal part and a wrapper function

2010-03-19 19:51:03 +00:00

subr_fattime.c

Use ISO C99 integer types in sys/kern where possible.

2010-06-21 09:55:56 +00:00

subr_firmware.c

Bump up the firmware_table from 30 to 50. bwn needs more than 30, it

2010-03-07 22:37:35 +00:00

subr_hash.c

Decompose the most lousy named file in sys/kern; kern_subr.c.

2010-02-21 19:53:33 +00:00

subr_hints.c

…

subr_kdb.c

debug.kdb.stop_cpus sysctl: hint that this is also a tunable

2010-09-30 16:47:01 +00:00

subr_kobj.c

…

subr_lock.c

Fix typos.

2010-11-09 10:59:09 +00:00

subr_log.c

Make /dev/klog and kern.msgbuf* MPSAFE.

2009-11-03 21:06:19 +00:00

subr_mbpool.c

…

subr_mchain.c

Use ISO C99 integer types in sys/kern where possible.

2010-06-21 09:55:56 +00:00

subr_module.c

Use ISO C99 integer types in sys/kern where possible.

2010-06-21 09:55:56 +00:00

subr_msgbuf.c

…

subr_param.c

Add Xen to the list of virtual vendors. In the non PV (HVM) case this fixes

2010-08-06 15:04:40 +00:00

subr_pcpu.c

After some off-list discussion, revert a number of changes to the

2010-11-22 19:32:54 +00:00

subr_power.c

…

subr_prf.c

Use type-specific inline function imax() instead of deprecated macro MAX().

2010-07-12 15:32:45 +00:00

subr_prof.c

Revert r210225 - turns out I was wrong; the "/*-" is not license-only

2010-07-18 20:57:53 +00:00

subr_rman.c

…

subr_rtc.c

Add the half of time-of-day clock resolution when we adjust system time from

2010-08-12 17:17:05 +00:00

subr_sbuf.c

Re-add r212370 now that the LOR in powerpc64 has been resolved:

2010-09-16 16:13:12 +00:00

subr_scanf.c

…

subr_sglist.c

…

subr_sleepqueue.c

Re-add r212370 now that the LOR in powerpc64 has been resolved:

2010-09-16 16:13:12 +00:00

subr_smp.c

generic_stop_cpus: prevent parallel execution

2010-10-12 17:40:45 +00:00

subr_stack.c

kdb_backtrace: use stack_print_ddb instead of stack_print

2010-09-22 06:45:07 +00:00

subr_taskqueue.c

taskqueue: drop unused tq_name field

2010-11-23 14:30:22 +00:00

subr_trap.c

Remove extra braces for style(9) (found while cleaning up an old work tree).

2010-09-28 01:36:01 +00:00

subr_turnstile.c

Introduce the new kernel thread called "deadlock resolver".

2010-01-09 01:46:38 +00:00

subr_uio.c

Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and

2010-05-08 20:34:01 +00:00

subr_unit.c

Remove redundant high >= 0.

2010-07-09 10:57:55 +00:00

subr_witness.c

Re-add r212370 now that the LOR in powerpc64 has been resolved:

2010-09-16 16:13:12 +00:00

sys_generic.c

For some file types, select code registers two selfd structures. E.g.,

2010-08-28 17:42:08 +00:00

sys_pipe.c

Introduce and use a new VM interface for temporarily pinning pages. This

2010-12-25 21:26:56 +00:00

sys_process.c

Introduce vm_fault_hold() and use it to (1) eliminate a long-standing race

2010-12-20 22:49:31 +00:00

sys_socket.c

Send SIGPIPE to the thread that issued the offending system call

2010-06-29 20:44:19 +00:00

syscalls.c

Regen

2010-08-30 14:26:02 +00:00

syscalls.master

Make the syscalls reserved for AFS usable by OpenAFS port.

2010-08-30 14:24:44 +00:00

systrace_args.c

Regen

2010-08-30 14:26:02 +00:00

sysv_ipc.c

Move SysV IPC freebsd32 compat shims helpers from freebsd32_misc.c to

2010-03-19 11:01:51 +00:00

sysv_msg.c

Remove useless NULL checks for M_WAITOK mallocs.

2010-12-02 01:14:45 +00:00

sysv_sem.c

Add some descriptions to sys/kern sysctls.

2010-11-14 06:09:50 +00:00

sysv_shm.c

Remove useless NULL checks for M_WAITOK mallocs.

2010-12-02 01:14:45 +00:00

tty_compat.c

Make TIOCSTI work again.

2010-01-04 20:59:52 +00:00

tty_info.c

…

tty_inq.c

Remove statistics from the TTY queues.

2010-02-07 15:42:15 +00:00

tty_outq.c

Remove statistics from the TTY queues.

2010-02-07 15:42:15 +00:00

tty_pts.c

Do not leak master pty or ptmx vnode.

2010-04-08 08:58:18 +00:00

tty_tty.c

…

tty_ttydisc.c

Print backspaces after echoing an EOF.

2009-10-17 08:59:41 +00:00

tty.c

Just make callout devices and /dev/console force CLOCAL on open().

2010-09-19 16:35:42 +00:00

uipc_accf.c

(S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument.

2009-12-28 22:56:30 +00:00

uipc_cow.c

Correct the order of the arguments to vm_fault_quick_hold_pages().

2010-12-26 01:42:52 +00:00

uipc_debug.c

…

uipc_domain.c

…

uipc_mbuf2.c

Use ISO C99 integer types in sys/kern where possible.

2010-06-21 09:55:56 +00:00

uipc_mbuf.c

Revert r210225 - turns out I was wrong; the "/*-" is not license-only

2010-07-18 20:57:53 +00:00

uipc_mqueue.c

Create a global thread hash table to speed up thread lookup, use

2010-10-09 02:50:23 +00:00

uipc_sem.c

Set the POSIX semaphore capability when the semaphore module is enabled.

2010-11-19 17:57:50 +00:00

uipc_shm.c

Replace pointer to "struct uidinfo" with pointer to "struct ucred"

2010-12-02 17:37:16 +00:00

uipc_sockbuf.c

…

uipc_socket.c

This commit implements the SO_USER_COOKIE socket option, which lets

2010-11-12 13:02:26 +00:00

uipc_syscalls.c

Just pass M_ZERO to malloc(9) instead of clearing allocated memory separately.

2010-12-14 06:19:13 +00:00

uipc_usrreq.c

Trim whitespaces at the end of lines. Use the commit to record

2010-12-03 20:39:06 +00:00

vfs_acl.c

The 'acl_cnt' field is unsigned; no point in checking if it's >= 0.

2010-06-03 13:45:27 +00:00

vfs_aio.c

Create a global thread hash table to speed up thread lookup, use

2010-10-09 02:50:23 +00:00

vfs_bio.c

Introduce and use a new VM interface for temporarily pinning pages. This

2010-12-25 21:26:56 +00:00

vfs_cache.c

Fix some more style(9) issues.

2010-11-14 16:10:15 +00:00

vfs_cluster.c

Bumping the read-ahead count once more, to value equivalent to 512 KiB on

2010-08-09 22:56:10 +00:00

vfs_default.c

If we read zero bytes from the directory, early out with ENOENT

2010-08-25 18:09:51 +00:00

vfs_export.c

…

vfs_extattr.c

Revert r210225 - turns out I was wrong; the "/*-" is not license-only

2010-07-18 20:57:53 +00:00

vfs_hash.c

…

vfs_init.c

…

vfs_lookup.c

Add an extra comment to the SDT probes definition. This allows us to get

2010-08-22 11:18:57 +00:00

vfs_mount.c

Update MNT_ROOTFS comments after changes in the root mount logic.

2010-11-23 13:49:15 +00:00

vfs_mountroot.c

Add support for asterisk characters when filling in the GELI password

2010-11-14 14:12:43 +00:00

vfs_subr.c

Teach ddb "show mount" about MNTK_SUJ flag.

2010-12-27 12:06:38 +00:00

vfs_syscalls.c

Add an extra comment to the SDT probes definition. This allows us to get

2010-08-22 11:18:57 +00:00

vfs_vnops.c

Correct arguments order.

2010-06-26 21:44:45 +00:00

vnode_if.src

Add VOP_ADVLOCKPURGE so that the file system is called when purging

2010-05-12 21:24:46 +00:00