Convert the primary ACL allocator from malloc(9) to using a UMA zone instead.
Also introduce an aclinit function which will be used to create the UMA zone
for use by file systems at system start up.
Original commit messages:
Modified files:
sys/ufs/ufs ufs_lookup.c
Log:
Close a race in the ufs_lookup() code that handles the ISDOTDOT
case by saving the value of dp->i_ino before unlocking the vnode
for the current directory and passing the saved value to VFS_VGET().
Without this change, another thread can overwrite dp->i_ino after
the current directory is unlocked, causing ufs_lookup() to lock
and return the wrong vnode in place of the vnode for its parent
directory. A deadlock can occur if dp->i_ino was changed to a
subdirectory of the current directory because the root to leaf vnode
lock ordering will be violated. A vnode lock can be leaked if
dp->i_ino was changed to point to the current directory, which
causes the current vnode lock for the current directory to be
recursed, which confuses lookup() into calling vrele() when it
should be calling vput().
The probability of this bug being triggered seems to be quite low
unless the sysctl variable debug.vfscache is set to 0.
Reviewed by: jhb
MFC after: 2 weeks
Revision Changes Path
1.78 +3 -1 src/sys/ufs/ufs/ufs_lookup.c
Modified files:
sys/ufs/ufs ufs_lookup.c
Log:
Correct the type of the temporary variable used by ufs_lookup.c:1.78
to fix the race condition in the ufs_lookup() ISDOTDOT code.
Noticed by: bde
MFC after: 12 days
Revision Changes Path
1.79 +1 -1 src/sys/ufs/ufs/ufs_lookup.c
Approved by: re (scottl)
When performing a VOP_LOOKUP() as part of UFS1 extended attribute
auto-start, set cnp.cn_lkflags to LK_EXCLUSIVE. This flag must now
be set so that lockmgr knows what kind of lock to acquire, and it
will panic if not specified. This resulted in a panic when using
extended attributes on UFS1 as of locking work present in the 6.x
branch.
This is a RELENG_6_0 merge candidate.
Reported by: lofi
Approved by: re (kensmith)
MFC after: 1 day
Original commit message:
FreeBSD src repository
Modified files:
sys/ufs/ffs ffs_alloc.c
Log:
Initialize the inode i_flag field in ffs_valloc() to clean up any
stale flag bits left over from before the inode was recycled.
Without this change, a leftover IN_SPACECOUNTED flag could prevent
softdep_freefile() and softdep_releasefile() from incrementing
fs_pendinginodes. Because handle_workitem_freefile() unconditionally
decrements fs_pendinginodes, a negative value could be reported at
file system unmount time with a message like:
unmount pending error: blocks 0 files -3
The pending block count in fs_pendingblocks could also be negative
for similar reasons. These errors can cause the data returned by
statfs() to be slightly incorrect. Some other cleanup code in
softdep_releasefile() could also be incorrectly bypassed.
Reviewed by: tegge
Approved by: re (scottl)
src/sys/kern/vfs_bio.c 1.495, 1.496
src/sys/kern/vfs_subr.c 1.648
src/sys/sys/buf.h 1.190, 1.191
src/sys/sys/proc.h 1.436
src/sys/ufs/ffs/ffs_snapshot.c 1.104, 1.105, 1.106
Original commit messages:
Log:
Un-staticize runningbufwakeup() and staticize updateproc.
Add a new private thread flag to indicate that the thread should
not sleep if runningbufspace is too large.
Set this flag on the bufdaemon and syncer threads so that they skip
the waitrunningbufspace() call in bufwrite() rather than than
checking the proc pointer vs. the known proc pointers for these two
threads. A way of preventing these threads from being starved for
I/O but still placing limits on their outstanding I/O would be
desirable.
Set this flag in ffs_copyonwrite() to prevent bufwrite() calls from
blocking on the runningbufspace check while holding snaplk. This
prevents snaplk from being held for an arbitrarily long period of
time if runningbufspace is high and greatly reduces the contention
for snaplk. The disadvantage is that ffs_copyonwrite() can start
a large amount of I/O if there are a large number of snapshots,
which could cause a deadlock in other parts of the code.
Call runningbufwakeup() in ffs_copyonwrite() to decrement runningbufspace
before attempting to grab snaplk so that I/O requests waiting on
snaplk are not counted in runningbufspace as being in-progress.
Increment runningbufspace again before actually launching the
original I/O request.
Prior to the above two changes, the system could deadlock if enough
I/O requests were blocked by snaplk to prevent runningbufspace from
falling below lorunningspace and one of the bawrite() calls in
ffs_copyonwrite() blocked in waitrunningbufspace() while holding
snaplk.
See <http://www.holm.cc/stress/log/cons143.html>
Revision Changes Path
1.495 +3 -3 src/sys/kern/vfs_bio.c
1.648 +2 -1 src/sys/kern/vfs_subr.c
1.190 +1 -0 src/sys/sys/buf.h
1.436 +1 -1 src/sys/sys/proc.h
1.104 +16 -4 src/sys/ufs/ffs/ffs_snapshot.c
Log:
Un-staticize waitrunningbufspace() and call it before returning from
ffs_copyonwrite() if any async writes were launched.
Restore the threads previous TDP_NORUNNINGBUF state before returning
from ffs_copyonwrite().
Revision Changes Path
1.496 +1 -1 src/sys/kern/vfs_bio.c
1.191 +1 -0 src/sys/sys/buf.h
1.105 +13 -1 src/sys/ufs/ffs/ffs_snapshot.c
Log:
Correct previous commit to fix the sense of the TDP_NORUNNINGBUF
check in ffs_copyonwrite() that is a precondition for calling
waitrunningbufspace().
Pointed out by: tegge
Pointy hat to: truckman
MFC after: 3 days
Revision Changes Path
1.106 +1 -1 src/sys/ufs/ffs/ffs_snapshot.c
Approved by: re (scottl)
Original commit message:
truckman 2005-09-29 21:50:26 UTC
FreeBSD src repository
Modified files:
sys/ufs/ffs ffs_softdep.c
Log:
After a rmdir()ed directory has been truncated, force an update of
the directory's inode after queuing the dirrem that will decrement
the parent directory's link count. This will force the update of
the parent directory's actual link to actually be scheduled. Without
this change the parent directory's actual link count would not be
updated until ufs_inactive() cleared the inode of the newly removed
directory, which might be deferred indefinitely. ufs_inactive()
will not be called as long as any process holds a reference to the
removed directory, and ufs_inactive() will not clear the inode if
the link count is non-zero, which could be the result of an earlier
system crash.
If a background fsck is run before the update of the parent directory's
actual link count has been performed, or at least scheduled by
putting the dirrem on the leaf directory's inodedep id_bufwait list,
fsck will corrupt the file system by decrementing the parent
directory's effective link count, which was previously correct
because it already took the removal of the leaf directory into
account, and setting the actual link count to the same value as the
effective link count after the dangling, removed, leaf directory
has been removed. This happens because fsck acts based on the
actual link count, which will be too high when fsck creates the
file system snapshot that it references.
This change has the fortunate side effect of more quickly cleaning
up the large number dirrem structures that linger for an extended
time after the removal of a large directory tree. It also fixes a
potential problem with the shutdown of the syncer thread timing out
if the system is rebooted immediately after removing a large directory
tree.
Submitted by: tegge
MFC after: 3 days
Revision Changes Path
1.185 +2 -0 src/sys/ufs/ffs/ffs_softdep.c
Submitted by: tegge
Approved by: re (scottl)
the RDONLY option, so subsequent call of UFS_TRUNCATE (ffs_truncate)
would not panic the system. This fixes a panic that can happen
when mounting a corrputed filesystem read-only, and reading data
from it.
Reviewed by: mckusick
Approved by: re (scottl)
Don't free a struct inodedep if another process is allocating saved
inode memory for the same struct inodedep in
initiate_write_inodeblock_ufs[12]().
Handle disappearing dependencies in softdep_disk_io_initiation().
Approved by: re (scottl)
Set the mountpoint path in the superblock (fs_fsmnt) at mount-time
so that it appears in the various messages (not cleanly unmounted,
filesystem full, etc). This has been broken since rev 1.261.
Approved by: re (scottl)
Delay freeing disk space for file system blocks until all
dirty buffers are safely released. This fixes softdep
problems on truncation (deletion) of files with dirty
buffers.
Approved by: re (kensmith)
UFS by:
- Making the pre and post hooks for the VOP functions work even when
DEBUG_VFS_LOCKS is not defined.
- Moving the KNOTE activations into the corresponding VOP hooks.
- Creating a MNTK_NOKNOTE flag for the mnt_kern_flag field of struct
mount that permits filesystems to disable the new behavior.
- Creating a default VOP_KQFILTER function: vfs_kqfilter()
My benchmarks have not revealed any performance degradation.
Reviewed by: jeff, bde
Approved by: rwatson, jmg (kqueue changes), grehan (mentor)
file's access time should be updated when it gets executed. A while
ago the mechanism used to exec was changed to use a more mmap based
mechanism and this behavior was broken as a side-effect of that.
A new vnode flag is added that gets set when the file gets executed,
and the VOP_SETATTR() vnode operation gets called. The underlying
filesystem is expected to handle it based on its own semantics, some
filesystems don't support access time at all. Those that do should
handle it in a way that does not block, does not generate I/O if possible,
etc. In particular vn_start_write() has not been called. The UFS code
handles it the same way as it would normally handle the access time if
a file was read - the IN_ACCESS flag gets set in the inode but no other
action happens at this point. The actual time update will happen later
during a sync (which handles all the necessary locking).
Got me into this: cperciva
Discussed with: a lot with bde, a little with kan
Showed patches to: phk, jeffr, standards@, arch@
Minor discussion on: arch@
are subtle differences in the read and write completion path. Instead,
grab an extra write ref so the write path can drop it when we recursively
call bufdone(). I believe this may be the source of the wrong bufobj
panics.
Reported by: pho, kkenn
occur on a filesystem running with soft updates after a crash and
before a background fsck has been run. To prevent discrepancies
from arising in a background fsck that may already be running,
the directory is removed but its inode is not freed and is left
with the residual reference count. When encountered by the
background fsck it will be reclaimed.
export. This was happening anyway since this file manually sets DEBUG.
- Add a sysctl for the number of items on the worklist.
- Use a more canonical loop restart in softdep_fsync_mountdev, it saves
some code at the expense of a goto and makes me worry less about
modifying a variable that should be private to the TAILQ_FOREACH_SAFE
macro.
- Don't intermingle direct calls to lockmgr and indirect calls through
VOPs. This will be important in the future.
- Dont lock the devvp's interlock just to release it on the next line by
passing LK_INTERLOCK to lockmgr.
- Restructure ffs_snapshot_unmount so we don't call free() with the
devvp's interlock locked.
because it may change identities while we're sleeping on the lock.
Otherwise we may bail out of ffs_sync() early due to an error from
deadfs.
- Collapse a VOP_UNLOCK, vrele into a single vput().
two bugs.
- ffs_disk_prewrite was pulling the vp from the buf and checking for
COPYONWRITE, when really it wanted the vp from the bufobj that we're
writing to, which is the devvp. This lead to us skipping the copy on
write to all file data, which significantly broke snapshots for the
last few months.
- When the SOFTUPDATES option was not included in the kernel config we
would also skip the copy on write check, which would effectively disable
snapshots.
- Remove an invalid mp_fixme().
Debugging tips from: mckusick
Reported by: iedowse, others
Discussed with: phk
rely on ufs to always leave the parent locked except in the ISDOTDOT
case. Adjust asserts to deal with these changes.
Sponsored by: Isilon Systems, Inc.
add more work are forced to process two worklist items first.
However, processing an item may generate additional work, causing the
unlucky thread to recursively process the worklist. Add a per-thread
flag to detect this situation and avoid the recursion. This should
fix the stack overflows that could occur while removing large
directory trees.
Tested by: kris
Reviewed by: mckusick