freebsd-skq/sys
kib 79752b63e1 Below is slightly edited description of the LOR by Tor Egge:
--------------------------
[Deadlock] is caused by a lock order reversal in vfs_lookup(), where
[some] process is trying to lock a directory vnode, that is the parent
directory of covered vnode) while holding an exclusive vnode lock on
covering vnode.

A simplified scenario:

root fs					var fs
/    		A			/    (/var)	D
/var		B			/log (/var/log) E
vfs lock	C			vfs lock	F

Within each file system, the lock order is clear: C->A->B and F->D->E

When traversing across mounts, the system can choose between two lock orders,
but everything must then follow that lock order:

      L1: C->A->B
		|
	        +->F->D->E

      L2: F->D->E
	     |
             +->C->A->B

The lookup() process for namei("/var") mixes those two lock orders:

    VOP_LOOKUP() obtains B while A is held
    vfs_busy() obtains a shared lock on F while A and B are held (follows L1,
    violates L2)
    vput() releases lock on B
    VOP_UNLOCK() releases lock on A
    VFS_ROOT() obtains lock on D while shared lock on F is held
    vfs_unbusy() releases shared lock on F
    vn_lock() obtains lock on A while D is held (violates L1, follows L2)

dounmount() follows L1 (B is locked while F is drained).

Without unmount activity, vfs_busy() will always succeed without blocking
and the deadlock isn't triggered (the system behaves as if L2 is followed).

With unmount, you can get 4 processes in a deadlock:

     p1: holds D, want A (in lookup())
     p2: holds shared lock on F, want D (in VFS_ROOT())
     p3: holds B, want drain lock on F (in dounmount())
     p4: holds A, want B (in VOP_LOOKUP())

You can have more than one instance of p2.

The reversal was introduced in revision 1.81 of src/sys/kern/vfs_lookup.c and
MFCed to revision 1.80.2.1, probably to avoid a cascade of vnode locks when nfs
servers are dead (VFS_ROOT() just hangs) spreading to the root fs root vnode.

- Tor Egge

To fix the LOR, ups@ noted that when crossing the mount point, ni_dvp
is actually not used by the callers of namei. Thus, placeholder deadfs
vnode vp_crossmp is introduced that is filled into ni_dvp.

Idea by:	ups
Reviewed by:	tegge, ups, jeff, rwatson (mac interaction)
Tested by:	Peter Holm
MFC after:	2 weeks
2007-01-22 11:25:22 +00:00
..
amd64 MFp4 (113077, 113083, 113103, 113124, 113097): 2007-01-20 14:58:59 +00:00
arm - Add a uart_rxready() and corresponding device-specific implementations 2007-01-18 22:01:19 +00:00
boot o Wrap long lines. 2007-01-14 13:55:43 +00:00
bsm Merge OpenBSM 1.0 alpha 12 import changes into src/sys/bsm. New events 2006-09-25 12:22:07 +00:00
cam Add quirk for EasyMP3 EM732X usb 2.0 flash mp3 player. 2007-01-22 04:34:03 +00:00
coda change vop_lock handling to allowing tracking of callers' file and line for 2006-11-13 05:51:22 +00:00
compat Use a printf-modifier which doesn't need a cast. 2007-01-21 13:18:52 +00:00
conf Add front-ends for the 'lebuffer' variants found on some SBus cards. 2007-01-20 12:53:30 +00:00
contrib Clean up pfr_kentry_pl2 as well. This fixes a kernel panic in the vm.zone 2007-01-01 16:51:11 +00:00
crypto Initialize T1 to silent gcc warning. 2006-10-22 02:19:33 +00:00
ddb Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form. 2007-01-17 15:05:52 +00:00
dev Change the remainder of the drivers for DMA'ing devices enabled in the 2007-01-21 19:32:51 +00:00
fs Below is slightly edited description of the LOR by Tor Egge: 2007-01-22 11:25:22 +00:00
gdb
geom Softc may be NULL in g_journal_orphan(), so don't be surprised. 2006-12-02 09:10:29 +00:00
gnu Previously, the mount_ext2fs binary listed the acceptable mount 2006-11-18 18:22:11 +00:00
i4b Fix fat-fingering in previous commit. 2006-12-29 16:38:22 +00:00
i386 MFp4 (113077, 113083, 113103, 113124, 113097): 2007-01-20 14:58:59 +00:00
ia64 Remove 3rd clause, renumber, ok per email 2007-01-12 07:26:21 +00:00
isa Be consistent with the spelling of "dependent" in user-visible places. 2006-12-30 11:55:47 +00:00
isofs/cd9660 The ISO9660 spec does allow files up to 4G. Change the i_size 2006-12-08 07:43:53 +00:00
kern Below is slightly edited description of the LOR by Tor Egge: 2007-01-22 11:25:22 +00:00
libkern
modules Add front-ends for the 'lebuffer' variants found on some SBus cards. 2007-01-20 12:53:30 +00:00
net Set topology change propagation on all ports _except_ the caller. 2007-01-18 07:13:01 +00:00
net80211 Add initial support for 900MHz cards like the Ubiquiti SR9: 2007-01-15 01:12:28 +00:00
netatalk Clean up DDP layer netatalk code: 2007-01-12 15:07:51 +00:00
netatm Sweep kernel replacing suser(9) calls with priv(9) calls, assigning 2006-11-06 13:42:10 +00:00
netgraph A less draconian fix to the build. 2007-01-18 19:41:39 +00:00
netinet - most all includes (#include <>) migrate to the sctp_os_bsd.h file 2007-01-18 09:58:43 +00:00
netinet6 - most all includes (#include <>) migrate to the sctp_os_bsd.h file 2007-01-18 09:58:43 +00:00
netipsec s,#if INET6,#ifdef INET6, 2006-12-14 17:33:46 +00:00
netipx Factor out UCB and my copyrights from copyrights of Mike Mitchell; 2007-01-08 22:14:00 +00:00
netkey
netnatm Factor out my copyrights + licenses from Charles D. Cranor and 2007-01-08 22:30:39 +00:00
netncp Sweep kernel replacing suser(9) calls with priv(9) calls, assigning 2006-11-06 13:42:10 +00:00
netsmb Sweep kernel replacing suser(9) calls with priv(9) calls, assigning 2006-11-06 13:42:10 +00:00
nfs NFSv4 client: 2006-11-28 19:33:28 +00:00
nfs4client NFSv4 client: 2006-11-28 19:33:28 +00:00
nfsclient NetApp filers return corrupt post op attrs in the wcc on NFS error responses. 2006-12-11 19:54:25 +00:00
nfsserver The nfsm_srvpathsiz() macro in nfsrv_symlink() in nfs_serv.c should 2007-01-02 20:42:08 +00:00
opencrypto
pc98 MFi386: revision 1.646. 2007-01-07 12:13:10 +00:00
pccard
pci Change the remainder of the drivers for DMA'ing devices enabled in the 2007-01-21 19:32:51 +00:00
powerpc Propagate the CPU model to the hw.model sysctl. 2007-01-14 21:45:05 +00:00
rpc
security When returning early from audit_arg_file() due to so->so_pcb being NULL 2007-01-06 22:28:28 +00:00
sparc64 Quiet GCC4 warnings regarding the width of printf()-arguments not 2007-01-20 17:14:12 +00:00
sun4v Convert the remainder of the low hanging fruits regarding including 2007-01-19 11:15:34 +00:00
sys Reviewed by: rwatson 2007-01-15 15:06:28 +00:00
tools
ufs Fix build. chkdquot() should not return anything. 2007-01-20 13:54:28 +00:00
vm Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form. 2007-01-17 15:05:52 +00:00
Makefile o Add cam to a list of cscope dirs. 2006-11-26 18:27:16 +00:00