freebsd-dev/sys/fs
Konstantin Belousov 7f92c4ee02 Below is slightly edited description of the LOR by Tor Egge:
--------------------------
[Deadlock] is caused by a lock order reversal in vfs_lookup(), where
[some] process is trying to lock a directory vnode, that is the parent
directory of covered vnode) while holding an exclusive vnode lock on
covering vnode.

A simplified scenario:

root fs					var fs
/    		A			/    (/var)	D
/var		B			/log (/var/log) E
vfs lock	C			vfs lock	F

Within each file system, the lock order is clear: C->A->B and F->D->E

When traversing across mounts, the system can choose between two lock orders,
but everything must then follow that lock order:

      L1: C->A->B
		|
	        +->F->D->E

      L2: F->D->E
	     |
             +->C->A->B

The lookup() process for namei("/var") mixes those two lock orders:

    VOP_LOOKUP() obtains B while A is held
    vfs_busy() obtains a shared lock on F while A and B are held (follows L1,
    violates L2)
    vput() releases lock on B
    VOP_UNLOCK() releases lock on A
    VFS_ROOT() obtains lock on D while shared lock on F is held
    vfs_unbusy() releases shared lock on F
    vn_lock() obtains lock on A while D is held (violates L1, follows L2)

dounmount() follows L1 (B is locked while F is drained).

Without unmount activity, vfs_busy() will always succeed without blocking
and the deadlock isn't triggered (the system behaves as if L2 is followed).

With unmount, you can get 4 processes in a deadlock:

     p1: holds D, want A (in lookup())
     p2: holds shared lock on F, want D (in VFS_ROOT())
     p3: holds B, want drain lock on F (in dounmount())
     p4: holds A, want B (in VOP_LOOKUP())

You can have more than one instance of p2.

The reversal was introduced in revision 1.81 of src/sys/kern/vfs_lookup.c and
MFCed to revision 1.80.2.1, probably to avoid a cascade of vnode locks when nfs
servers are dead (VFS_ROOT() just hangs) spreading to the root fs root vnode.

- Tor Egge

To fix the LOR, ups@ noted that when crossing the mount point, ni_dvp
is actually not used by the callers of namei. Thus, placeholder deadfs
vnode vp_crossmp is introduced that is filled into ni_dvp.

Idea by:	ups
Reviewed by:	tegge, ups, jeff, rwatson (mac interaction)
Tested by:	Peter Holm
MFC after:	2 weeks
2007-01-22 11:25:22 +00:00
..
cd9660 The ISO9660 spec does allow files up to 4G. Change the i_size 2006-12-08 07:43:53 +00:00
coda change vop_lock handling to allowing tracking of callers' file and line for 2006-11-13 05:51:22 +00:00
deadfs Below is slightly edited description of the LOR by Tor Egge: 2007-01-22 11:25:22 +00:00
devfs Sweep kernel replacing suser(9) calls with priv(9) calls, assigning 2006-11-06 13:42:10 +00:00
fdescfs Restore the ability to mount procfs and fdescfs filesystems via the 2006-05-15 19:42:10 +00:00
fifofs Add a_fdidx to comment prototype for fifo_open(). 2006-03-15 10:15:35 +00:00
hpfs Sweep kernel replacing suser(9) calls with priv(9) calls, assigning 2006-11-06 13:42:10 +00:00
msdosfs Add a 3rd entry in the cache, which keeps the end position 2007-01-16 23:43:14 +00:00
ntfs Fix an integer overflow and allow access to files larger than 4GB on 2006-11-20 19:28:36 +00:00
nullfs change vop_lock handling to allowing tracking of callers' file and line for 2006-11-13 05:51:22 +00:00
nwfs Drop crummy fattime to timespec conversion routines. 2006-10-24 11:43:41 +00:00
portalfs Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag. 2006-09-26 04:12:49 +00:00
procfs Threading cleanup.. part 2 of several. 2006-12-06 06:34:57 +00:00
pseudofs Use the vnode interlock to close a race where pfs_vncache_alloc() could 2007-01-02 17:27:52 +00:00
smbfs Sweep kernel replacing suser(9) calls with priv(9) calls, assigning 2006-11-06 13:42:10 +00:00
udf Rewrite the udf_read() routine to use a file vnode instead of the devvp vnode. 2007-01-15 18:45:36 +00:00
umapfs Sweep kernel replacing suser(9) calls with priv(9) calls, assigning 2006-11-06 13:42:10 +00:00
unionfs Simplify code in union_hashins() and union_hashget() functions. These 2007-01-05 14:06:42 +00:00