freebsd-skq/sys/ufs/ffs
Kirk McKusick 23d6e518da A file cannot be deallocated until its last name has been removed
and it is no longer referenced by a user process. The inode for a
file whose name has been removed, but is still referenced at the
time of a crash will still be allocated in the filesystem, but will
have no references (e.g., they will have no names referencing them
from any directory).

With traditional soft updates these unreferenced inodes will be
found and reclaimed when the background fsck is run. When using
journaled soft updates, the kernel must keep track of these inodes
so that it can find and reclaim them during the cleanup process.
Their existence cannot be stored in the journal as the journal only
handles short-term events, and they may persist for days. So, they
are tracked by keeping them in a linked list whose head pointer is
stored in the superblock. The journal tracks them only until their
linked list pointers have been commited to disk. Part of the cleanup
process involves traversing the list of unreferenced inodes and
reclaiming them.

This bug was triggered when confusion arose in the commit steps
of keeping the unreferenced-inode linked list coherent on disk.
Notably, a race between the link() system call adding a link-count
to a file and the unlink() system call removing a link-count to
the file. Here if the unlink() ran after link() had looked up
the file but before link() had incremented the link-count of the
file, the file's link-count would drop to zero before the link()
incremented it back up to one. If the file was referenced by a
user process, the first transition through zero made it appear
that it should be added to the unreferenced-inode list when in
fact it should not have been added. If the new name created by
link() was deleted within a few seconds (with the file still
referenced by a user process) it would legitimately be a candidate
for addition to the unreferenced-inode list. The result was that
there were two attempts to add the same inode to the unreferenced-inode
list which scrambled the unreferenced-inode list's pointers leading
to a panic. The fix is to detect and avoid the false attempt at
adding it to the unreferenced-inode list by having the link()
system call check to see if the link count is zero before it
increments it. If it is, the link() fails with ENOENT (showing that
it has failed the link()/unlink() race).

While tracking down this bug, we have added additional assertions
to detect the problem sooner and also simplified some of the code.

Reported by:      Kirk Russell
Fix submitted by: Jeff Roberson
Tested by:        Peter Holm
PR:               kern/159971
MFC (to 9 only):  2 weeks
2012-04-02 21:58:37 +00:00
..
ffs_alloc.c
ffs_balloc.c Add a third flags argument to ffs_syncvnode to avoid a possible conflict 2012-03-25 00:02:37 +00:00
ffs_extern.h Add a third flags argument to ffs_syncvnode to avoid a possible conflict 2012-03-25 00:02:37 +00:00
ffs_inode.c A refinement of change 232351 to avoid a race with a forcible unmount. 2012-03-28 21:21:19 +00:00
ffs_rawread.c Add a third flags argument to ffs_syncvnode to avoid a possible conflict 2012-03-25 00:02:37 +00:00
ffs_snapshot.c Add a third flags argument to ffs_syncvnode to avoid a possible conflict 2012-03-25 00:02:37 +00:00
ffs_softdep.c A file cannot be deallocated until its last name has been removed 2012-04-02 21:58:37 +00:00
ffs_subr.c
ffs_tables.c
ffs_vfsops.c Keep track of the mount point associated with a special device 2012-03-28 20:49:11 +00:00
ffs_vnops.c Add a third flags argument to ffs_syncvnode to avoid a possible conflict 2012-03-25 00:02:37 +00:00
fs.h
softdep.h