freebsd-dev/sys
Kirk McKusick 23d6e518da A file cannot be deallocated until its last name has been removed
and it is no longer referenced by a user process. The inode for a
file whose name has been removed, but is still referenced at the
time of a crash will still be allocated in the filesystem, but will
have no references (e.g., they will have no names referencing them
from any directory).

With traditional soft updates these unreferenced inodes will be
found and reclaimed when the background fsck is run. When using
journaled soft updates, the kernel must keep track of these inodes
so that it can find and reclaim them during the cleanup process.
Their existence cannot be stored in the journal as the journal only
handles short-term events, and they may persist for days. So, they
are tracked by keeping them in a linked list whose head pointer is
stored in the superblock. The journal tracks them only until their
linked list pointers have been commited to disk. Part of the cleanup
process involves traversing the list of unreferenced inodes and
reclaiming them.

This bug was triggered when confusion arose in the commit steps
of keeping the unreferenced-inode linked list coherent on disk.
Notably, a race between the link() system call adding a link-count
to a file and the unlink() system call removing a link-count to
the file. Here if the unlink() ran after link() had looked up
the file but before link() had incremented the link-count of the
file, the file's link-count would drop to zero before the link()
incremented it back up to one. If the file was referenced by a
user process, the first transition through zero made it appear
that it should be added to the unreferenced-inode list when in
fact it should not have been added. If the new name created by
link() was deleted within a few seconds (with the file still
referenced by a user process) it would legitimately be a candidate
for addition to the unreferenced-inode list. The result was that
there were two attempts to add the same inode to the unreferenced-inode
list which scrambled the unreferenced-inode list's pointers leading
to a panic. The fix is to detect and avoid the false attempt at
adding it to the unreferenced-inode list by having the link()
system call check to see if the link count is zero before it
increments it. If it is, the link() fails with ENOENT (showing that
it has failed the link()/unlink() race).

While tracking down this bug, we have added additional assertions
to detect the problem sooner and also simplified some of the code.

Reported by:      Kirk Russell
Fix submitted by: Jeff Roberson
Tested by:        Peter Holm
PR:               kern/159971
MFC (to 9 only):  2 weeks
2012-04-02 21:58:37 +00:00
..
amd64 Make machine check exception logging more readable. On newer Intel systems, 2012-04-02 15:07:22 +00:00
arm Add software PMC support. 2012-03-28 20:58:30 +00:00
boot Fix build after changes to trap headers. 2012-03-29 16:04:42 +00:00
bsm
cam Be more conservative in using READ CAPACITY(16) command. Previous code 2012-03-31 11:23:09 +00:00
cddl Instead of only iterating over the set of known SDT probes when sdt.ko is 2012-03-27 15:07:43 +00:00
compat Remove some unnecessary includes. 2012-03-18 19:15:11 +00:00
conf MFhead_mfi r227068 2012-03-30 23:05:48 +00:00
contrib MFV: r233615 2012-03-28 17:21:59 +00:00
crypto Add support for the extended FPU states on amd64, both for native 2012-01-21 17:45:27 +00:00
ddb Use strchr() and strrchr(). 2012-01-02 12:12:10 +00:00
dev Move struct megasas_sge from mfi_ioctl.h to mfivar.h so we can 2012-04-02 19:13:02 +00:00
fs Add sysctl vfs.nfs.nfs_keep_dirty_on_error to switch the nfs client 2012-03-17 23:03:20 +00:00
gdb kern cons: introduce infrastructure for console grabbing by kernel 2011-12-17 15:08:43 +00:00
geom VMDB offset should be greater than logical volume size only for MBR. 2012-03-29 07:29:27 +00:00
gnu/fs Make ReiserFS MPSAFE 2012-03-27 20:36:03 +00:00
i386 Make machine check exception logging more readable. On newer Intel systems, 2012-04-02 15:07:22 +00:00
ia64 Remove pty(4) from our kernel configurations. 2012-03-21 08:38:42 +00:00
isa - There's no need to overwrite the default device method with the default 2011-11-22 21:28:20 +00:00
kern When process exists, not only the children shall be reparented to 2012-04-02 19:35:36 +00:00
kgssapi
libkern Remove second consts in r233288 in order to appease C++ compilers. 2012-03-26 18:22:04 +00:00
mips Reinstate the XTLB handler for CPU_NLM and CPU_RMI 2012-04-02 11:41:33 +00:00
modules MFhead_mfi r227068 2012-03-30 23:05:48 +00:00
net Retire the IF_ADDR_LOCK() and IF_ADDR_UNLOCK() compat macros from HEAD. 2012-03-19 21:09:12 +00:00
net80211 Correct the ordering of tid/crypto ic_name. 2012-03-27 04:15:38 +00:00
netatalk Fix typos 2012-02-28 15:07:05 +00:00
netgraph Fix compiler warnings, mostly signed issues, 2012-04-02 10:50:42 +00:00
netinet Don't check malloc(M_WAITOK) results. 2012-03-31 11:20:48 +00:00
netinet6 in6_pcblookup_local() still can return a pcb with NULL 2012-03-21 08:43:38 +00:00
netipsec Add multi-FIB IPv6 support to the core network stack supplementing 2012-02-03 13:08:44 +00:00
netipx Convert all users of IF_ADDR_LOCK to use new locking macros that specify 2012-01-05 19:00:36 +00:00
netnatm
netncp
netsmb Add unicode support to msdosfs and smbfs; original pathes from imura, 2011-11-18 03:05:20 +00:00
nfs Add multi-FIB IPv6 support to the core network stack supplementing 2012-02-03 13:08:44 +00:00
nfsclient Remove fifo.h. The only used function declaration from the header is 2012-03-11 12:19:58 +00:00
nfsserver Honor NFSv3 commit call (RFC 1813, Section 3.3.21) where when count is 0, 2011-12-15 02:26:53 +00:00
nlm jwd@ reported a problem via email to freebsd-fs@ on Aug 25, 2011 2012-01-31 02:11:05 +00:00
ofed Use VM_MEMATTR_UNCACHEABLE instead of VM_MEMATTR_UNCACHED for UC mappings. 2012-03-27 14:24:29 +00:00
opencrypto
pc98 Move the legacy(4) driver to x86. 2012-03-30 19:10:14 +00:00
pci Use correct Config registers for RTL8139 family. Unlike RTL8168 and 2012-02-25 04:54:51 +00:00
powerpc - Rename VM_MEMATTR_UNCACHED to VM_MEMATTR_WEAK_UNCACHEABLE on x86 to 2012-03-29 16:51:22 +00:00
rpc Both a crash reported on freebsd-current on Oct. 18 under the 2011-11-03 14:38:03 +00:00
security Remove direct access to si_name. 2012-02-10 12:35:57 +00:00
sparc64 Remove checks that are redundant due to tf_type being unsigned. 2012-03-31 14:03:16 +00:00
sys Export some more useful info about shared memory objects to userland 2012-04-01 18:22:48 +00:00
teken
tools Make vnode_if.awk parse vnode operations with underscores, like VOP_FOO_BAR. 2012-02-21 19:35:59 +00:00
ufs A file cannot be deallocated until its last name has been removed 2012-04-02 21:58:37 +00:00
vm Keep track of the mount point associated with a special device 2012-03-28 20:49:11 +00:00
x86 Further tweak the changes made in r233709. The kernel doesn't permit 2012-04-02 17:26:21 +00:00
xdr
xen blkif interface comment cleanups. No functional changes 2012-02-29 17:47:01 +00:00
Makefile Add sys/ofed to the 'make cscope' target. 2012-03-20 18:05:15 +00:00