Commit Graph

218 Commits

Author SHA1 Message Date
Robert Watson
237fdd787b In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation.  This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after:	1 month
Discussed with:	imp, rink
2008-03-16 10:58:09 +00:00
Coleman Kane
6c62df7e49 Replace the non-MPSAFE timeout(9) API in ffs_softdep.c with the MPSAFE
callout_* API (e.g. callout_init_mtx(9)). This was one of the numerous
items on the http://wiki.freebsd.org/SMPTODO list.

Reviewed by:	imp, obrien, jhb
MFC after:	1 week
2008-03-13 20:15:48 +00:00
Ed Maste
3eb8098d2b Remove include of opt_quota.h; as of revision 1.205 there is no longer
any #ifdef QUOTA conditional code.
2008-03-10 18:44:07 +00:00
Attilio Rao
628f51d275 Introduce some functions in the vnode locks namespace and in the ffs
namespace in order to handle lockmgr fields in a controlled way instead
than spreading all around bogus stubs:
- VN_LOCK_AREC() allows lock recursion for a specified vnode
- VN_LOCK_ASHARE() allows lock sharing for a specified vnode

In FFS land:
- BUF_AREC() allows lock recursion for a specified buffer lock
- BUF_NOREC() disallows recursion for a specified buffer lock

Side note: union_subr.c::unionfs_node_update() is the only other function
directly handling lockmgr fields. As this is not simple to fix, it has
been left behind as "sole" exception.
2008-02-24 16:38:58 +00:00
Attilio Rao
22db15c06f VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
2008-01-13 14:44:15 +00:00
Attilio Rao
cb05b60a89 vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by:	Diego Sardina <siarodx at gmail dot com>,
		Andrea Di Pasquale <whyx dot it at gmail dot com>
2008-01-10 01:10:58 +00:00
Ruslan Ermilov
5b4ab4a032 Fix build without INVARIANTS and update a comment to match
a change made in previous revision.
2007-11-09 11:04:36 +00:00
David E. O'Brien
1102b89baa Turn most ffs 'DIAGNOSTIC's into INVARIANTS. 2007-11-08 17:21:51 +00:00
Julian Elischer
3745c395ec Rename the kthread_xxx (e.g. kthread_create()) calls
to kproc_xxx as they actually make whole processes.
Thos makes way for us to add REAL kthread_create() and friends
that actually make theads. it turns out that most of these
calls actually end up being moved back to the thread version
when it's added. but we need to make this cosmetic change first.

I'd LOVE to do this rename in 7.0  so that we can eventually MFC the
new kthread_xxx() calls.
2007-10-20 23:23:23 +00:00
Konstantin Belousov
d66ba37013 Fix livelock that could occur when snapshoting UFS with quotas, where
some quota limit was exceeded. Sequence of UFS_VALLOC()/UFS_VFREE()
call there could cause inodeblock to have both freefile and inodedep
dependencies without any inode in the block being marked for write.
Then, softdep_check_suspend() would return EAGAIN forewer.

Force write of inodeblock with allocated freefile softdependency by
setting IN_MODIFIED flag in softdep_freefile and unconditionally calling
UFS_UPDATE() in ufs_reclaim.

Reported by:	kris
Debug help and tested by: 	Peter Holm
Approved by:	re (kensmith)
MFC after:	3 weeks
2007-06-22 13:22:37 +00:00
Andrew Thompson
832eef31d1 Add a newline to the printf message. 2007-05-03 22:39:52 +00:00
Konstantin Belousov
9724167c2a Recalculate the NEWBLOCK flag for pagedep structure after the softdep
lock is dropped, since pagedep may be already processed and deallocated.

Found and tested by:	kris
MFC after:		2 weeks
2007-04-10 09:30:41 +00:00
Konstantin Belousov
23743f6a11 When LK_NOWAIT is passed as argument to process_worklist_item(), this
does not prevent handle_workitem_remove() from recursing into a blocking
version. Add the dirrem to worklist instead of processing it now if this
is the case.

Reported and tested by:	kris
Submitted by:		tegge
MFC after:		2 weeks
2007-04-10 09:28:17 +00:00
Xin LI
04533fc68e Use *_EMPTY macros when appropriate. 2007-04-04 07:29:53 +00:00
Konstantin Belousov
06f0c8dc4d Revert rev. 1.205. Replace unconditional acquision of Giant when QUOTAS are
defined with VFS_LOCK_GIANT(NULL) call.
This shall fix softdep operation when mpsafe_vfs = 0.

Reported and tested by:	kris
Submitted by:	tegge
MFC after:	1 week
2007-03-29 08:26:04 +00:00
Konstantin Belousov
36d4667907 Mark UFS as being MP-Safe in "options QUOTA" case too. Remove no more
neccessary Giant acquisions in softdepend processing code.

Tested by:	Peter Holm
Reviewed by:	tegge
Approved by:	re (kensmith)
2007-03-20 10:51:45 +00:00
Brian Somers
98fff6b57c Account for di_blocks allocations when IN_SPACECOUNTED is set in an
inode's i_flag.

It's possible that after ufs_infactive() calls softdep_releasefile(),
i_nlink stays >0 for a considerable amount of time (> 60 seconds here).
During this period, any ffs allocation routines that alter di_blocks
must also account for the blocks in the filesystem's fs_pendingblocks
value.

This change fixes an eventual df/du discrepency that will happen as
the result of fs_pendingblocks being reduced to <0.

The only manifestation of this that people may recognise is the
following message on boot:

    /somefs: update error: blocks -N files M

at which point the negative pending block count is adjusted to zero.

Reviewed by:	tegge
MFC after:	3 weeks
2007-02-23 20:23:35 +00:00
Konstantin Belousov
2276d0814f Aquire Giant in the softdep_flush for clear_remove() and clear_inodedeps()
processing when QUOTA is set.

Reported and tested by:	Peter Holm
Reviewed by:	tegge
MFC after:	3 days
2006-11-01 13:48:44 +00:00
Tor Egge
e60c361218 Reduce fluctuations of mnt_flag to allow unlocked readers to get a
slightly more consistent view.
2006-09-26 04:20:09 +00:00
Tor Egge
55b4ff0d9f Increase mnt_noasync once in softdep_mount() to disallow async io,
closing a window where a file system using softupdates could be async
for a short while if both MNT_UPDATE and MNT_ASYNC were passed as flags
to nmount().  Add MNTK_SOFTDEP flag to ensure that softdep_mount()
doesn't increase mnt_noasync multiple times.
2006-09-26 04:17:17 +00:00
Tor Egge
5da56ddb21 Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag.
This eliminates a race where MNT_UPDATE flag could be lost when nmount()
raced against sync(), sync_fsync() or quotactl().
2006-09-26 04:12:49 +00:00
Konstantin Belousov
28de2218ec Fix the glitch introduced in rev. 1.93. In softdep_sync_metadata(),
switch by worklist type contains two for() loops, for D_INDIRDEP and
D_PAGEDEP. On error, these loops are exited by break, where the switch
actually shall be leaved. Use goto instead of break to reach the error
handling code.

Reported by:	Peter Holm
Reviewed by:	tegge
Approved by:	pjd (mentor)
MFC after:	2 weeks
2006-09-20 07:49:28 +00:00
Tom Rhodes
e45269bf51 Provide a less cryptic panic message in place of just "found inode." 2006-05-16 18:51:22 +00:00
Tor Egge
43e07fffb6 ffs_syncvnode() might skip some of the blocks due to them being locked,
assuming them to be inflight write buffers.  This is not always the case.
bufdaemon might hold the buffer lock and give up writing the buffer due to it
having dependencies, the file system being suspended or the vnode lock being
held by another thread.  When bufdaemon decides to write the buffer there is
still a window before bufobj_wref() has been called, allowing other threads to
believe that the vnode has no dirty buffers or inflight writes.

Try harder to flush first block of new subdirectory to get rid of MKDIR_BODY
dependency.
2006-05-06 20:51:31 +00:00
Ken Smith
39fac37953 Fix panic() message to give the right function name. 2006-04-17 07:43:56 +00:00
Tor Egge
68e8466655 Eliminate softdep_flush() livelock by accounting for number of worklist items
marked as being in progress.
2006-04-03 22:23:23 +00:00
Jeff Roberson
2eedeb7e60 - Remove the call to softdep_waitidle after suspending the filesystem.
This does not do what I wanted as all dirty buffers must be flushed
   by the call to ffs_sync and any remaining dependency work would mean
   that this failed.

Pointed out by:	tegge
2006-03-12 05:24:14 +00:00
Tor Egge
791dd2fade Use vn_start_secondary_write() and vn_finished_secondary_write() as a
replacement for vn_write_suspend_wait() to better account for secondary write
processing.

Close race where secondary writes could be started after ffs_sync() returned
but before the file system was marked as suspended.

Detect if secondary writes or softdep processing occurred during vnode sync
loop in ffs_sync() and retry the loop if needed.
2006-03-08 23:43:39 +00:00
Jeff Roberson
b9b12498fd - Acquire lk in softdep_slowdown so that it's owned when we call
softdep_speedup().
 - Assert that lk is held in softdep_speedup() rather than acquiring it.
   This avoids a potential lock recursion.
2006-03-02 08:52:53 +00:00
Jeff Roberson
eb2ea10590 - Move softdep from using a global worklist to per-mount worklists. This
has many positive effects including improved smp locking, reducing
   interdependencies between mounts that can lead to deadlocks, etc.
 - Add the softdep worklist and various counters to the ufsmnt structure.
 - Add a mount pointer to the workitem and remove mount pointers from the
   various structures derived from the workitem as they are now redundant.
 - Remove the poor-man's semaphore protecting softdep_process_worklist and
   softdep_flushworklist.  Several threads may now process the list
   simultaneously.
 - Add softdep_waitidle() to block the thread until all pending
   dependencies being operated on by other threads have been flushed.
 - Use softdep_waitidle() in unmount and snapshots to block either
   operation until the fs is stable.
 - Remove softdep worklist processing from the syncer and move it into the
   softdep_flush() thread.  This thread processes all softdep mounts
   once each second and when it is called via the new softdep_speedup()
   when there is a resource shortage.  This removes the softdep hook
   from the kernel and various hacks in header files to support it.

Reviewed by/Discussed with:	tegge, truckman, mckusick
Tested by:	kris
2006-03-02 05:50:23 +00:00
Tor Egge
6c62b2acd0 If the lock passed to getdirtybuf() is the softdep lock then the background
write completed wakeup could be missed.  Close the race by grabbing the lock
normally used for protection of bp->b_xflags.

Reviewed by:	truckman
2006-01-09 19:32:21 +00:00
Tor Egge
c8c7711d66 Broaden scope of softdep_worklist_busy rwlock protection of softdep processing
to avoid some dependencies being missed by softdep_flushworklist().

Reviewed by:	truckman
2006-01-09 19:16:56 +00:00
Xin LI
cd34c8b6a2 Typo. 2005-12-23 15:50:57 +00:00
Don Lewis
445193b887 After a rmdir()ed directory has been truncated, force an update of
the directory's inode after queuing the dirrem that will decrement
the parent directory's link count.  This will force the update of
the parent directory's actual link to actually be scheduled.  Without
this change the parent directory's actual link count would not be
updated until ufs_inactive() cleared the inode of the newly removed
directory, which might be deferred indefinitely.  ufs_inactive()
will not be called as long as any process holds a reference to the
removed directory, and ufs_inactive() will not clear the inode if
the link count is non-zero, which could be the result of an earlier
system crash.

If a background fsck is run before the update of the parent directory's
actual link count has been performed, or at least scheduled by
putting the dirrem on the leaf directory's inodedep id_bufwait list,
fsck will corrupt the file system by decrementing the parent
directory's effective link count, which was previously correct
because it already took the removal of the leaf directory into
account, and setting the actual link count to the same value as the
effective link count after the dangling, removed, leaf directory
has been removed.  This happens because fsck acts based on the
actual link count, which will be too high when fsck creates the
file system snapshot that it references.

This change has the fortunate side effect of more quickly cleaning
up the large number dirrem structures that linger for an extended
time after the removal of a large directory tree.  It also fixes a
potential problem with the shutdown of the syncer thread timing out
if the system is rebooted immediately after removing a large directory
tree.

Submitted by:	tegge
MFC after:	3 days
2005-09-29 21:50:26 +00:00
Tor Egge
d536ff2edb Retain generation count when writing zeroes instead of an inode to disk.
Don't free a struct inodedep if another process is allocating saved inode
memory for the same struct inodedep in initiate_write_inodeblock_ufs[12]().

Handle disappearing dependencies in softdep_disk_io_initiation().

Reviewed by:	mckusick
2005-09-05 22:14:33 +00:00
Tor Egge
15da51f73c Don't set the COMPLETE flag in an inodedep structure before the related
inode has been written.
2005-08-21 18:19:06 +00:00
Stephan Uphoff
ed8938e082 Delay freeing disk space for file system blocks until all dirty buffers
are safely released. This fixes softdep problems on truncation (deletion)
of files with dirty buffers.

Reviewed by:	jeff@, mckusick@, ps@, tegge@
Tested by: 	glebius@, ps@
MFC after:	3 weeks
2005-07-31 20:24:14 +00:00
Jeff Roberson
6c71a2208d - Don't restrict the softdep stats to DEBUG kernels, they cost nothing to
export.  This was happening anyway since this file manually sets DEBUG.
 - Add a sysctl for the number of items on the worklist.
 - Use a more canonical loop restart in softdep_fsync_mountdev, it saves
   some code at the expense of a goto and makes me worry less about
   modifying a variable that should be private to the TAILQ_FOREACH_SAFE
   macro.
2005-05-03 11:03:29 +00:00
Jeff Roberson
153910e0f5 - Move the contents of softdep_disk_prewrite into ffs_geom_strategy to fix
two bugs.
 - ffs_disk_prewrite was pulling the vp from the buf and checking for
   COPYONWRITE, when really it wanted the vp from the bufobj that we're
   writing to, which is the devvp.  This lead to us skipping the copy on
   write to all file data, which significantly broke snapshots for the
   last few months.
 - When the SOFTUPDATES option was not included in the kernel config we
   would also skip the copy on write check, which would effectively disable
   snapshots.
 - Remove an invalid mp_fixme().

Debugging tips from:	mckusick
Reported by:		iedowse, others
Discussed with:		phk
2005-04-03 10:29:55 +00:00
David Schultz
188f6433f6 When the softupdates worklist gets too long, threads that attempt to
add more work are forced to process two worklist items first.
However, processing an item may generate additional work, causing the
unlucky thread to recursively process the worklist.  Add a per-thread
flag to detect this situation and avoid the recursion.  This should
fix the stack overflows that could occur while removing large
directory trees.

Tested by:	kris
Reviewed by:	mckusick
2005-03-25 17:30:31 +00:00
Jeff Roberson
fdcc82276e - The VI_DOOMED flag now signals the end of a vnode's relationship with
the filesystem.  Check that rather than VI_XLOCK.

Sponsored by:	Isilon Systems, Inc.
2005-03-13 12:01:50 +00:00
Jeff Roberson
1a4a9672f1 - Add VOP locking asserts in several functions that have been implicated in
recent deadlocks.
2005-02-22 23:56:42 +00:00
Xin LI
a16baf37b9 The recomputation of file system summary at mount time can be a
very slow process, especially for large file systems that is just
recovered from a crash.

Since the summary is already re-sync'ed every 30 second, we will
not lag behind too much after a crash.  With this consideration
in mind, it is more reasonable to transfer the responsibility to
background fsck, to reduce the delay after a crash.

Add a new sysctl variable, vfs.ffs.compute_summary_at_mount, to
control this behavior.  When set to nonzero, we will get the
"old" behavior, that the summary is computed immediately at mount
time.

Add five new sysctl variables to adjust ndir, nbfree, nifree,
nffree and numclusters respectively.  Teach fsck_ffs about these
API, however, intentionally not to check the existence, since
kernels without these sysctls must have recomputed the summary
and hence no adjustments are necessary.

This change has eliminated the usual tens of minutes of delay of
mounting large dirty volumes.

Reviewed by:	mckusick
MFC After:	1 week
2005-02-20 08:02:15 +00:00
Poul-Henning Kamp
1121c39497 Make non-SOFTUPDATES kernels compile again.
Integrate the stubfile into the main file now that license issues have been
long resolved.
2005-02-11 08:13:31 +00:00
Poul-Henning Kamp
365b18aa89 style polishing. 2005-02-09 12:22:16 +00:00
Poul-Henning Kamp
44787ceb0b Use ffs_truncate() directly instead of UFS_TRUNCATE() 2005-02-08 20:51:00 +00:00
Poul-Henning Kamp
dd19a799b8 Background writes are entirely an FFS/Softupdates thing.
Give FFS vnodes a specific bufwrite method which contains all the
background write stuff and then calls into the default bufwrite()
for the rest of the job.

Remove all the background write related stuff from the normal bufwrite.

This drags the softdep_move_dependencies() back into FFS.

Long term, it is worth looking at simply copying the data into
allocated memory and issuing the bio directly and not create the
"shadow buf" in the first place (just like copy-on-write is done
in snapshots for instance).  I don't think we really gain anything
but complexity from doing this with a buf.
2005-02-08 20:29:10 +00:00
Poul-Henning Kamp
88e5b12a20 Drag another softupdates tentacle back into FFS: Now that FFS's
vop_fsync is separate from the internal use we can do the full job
there.
2005-02-08 18:09:11 +00:00
Poul-Henning Kamp
efd6d9808c Don't use the UFS_* and VFS_* functions where a direct call is possble.
The UFS_ functions are for UFS to call back into VFS.  The VFS functions
are external entry points into the filesystem.
2005-02-08 17:40:01 +00:00
Poul-Henning Kamp
40854ff546 For snapshots we need all VOP_LOCKs to be exclusive.
The "business class upgrade" was implemented in UFS's VOP_LOCK
implementation ufs_lock() which is the wrong layer, so move it to
ffs_lock().

Also, as long as we have not abandonned advanced vfs-stacking we
should not preclude it from happening: instead of implementing a
copy locally, use the VOP_LOCK_APV(&ufs) to correctly arrive at
vop_stdlock() at the bottom.
2005-02-08 16:25:50 +00:00