Commit Graph

1555 Commits

Author SHA1 Message Date
Tor Egge
4613aa0e99 Bring the call to softdep_releasefile() within the region protected by
vn_start_secondary_write() since it might cause file system write activity
(e.g. ffs_snapremove()).
2006-05-09 22:33:43 +00:00
Tor Egge
43e07fffb6 ffs_syncvnode() might skip some of the blocks due to them being locked,
assuming them to be inflight write buffers.  This is not always the case.
bufdaemon might hold the buffer lock and give up writing the buffer due to it
having dependencies, the file system being suspended or the vnode lock being
held by another thread.  When bufdaemon decides to write the buffer there is
still a window before bufobj_wref() has been called, allowing other threads to
believe that the vnode has no dirty buffers or inflight writes.

Try harder to flush first block of new subdirectory to get rid of MKDIR_BODY
dependency.
2006-05-06 20:51:31 +00:00
Tor Egge
b673e7b7eb Return error if vnode was reclaimed while it was temporarily unlocked.
Add missing calls to vn_finished_write() in error handling.
2006-05-05 21:27:31 +00:00
Tor Egge
0911ecffe7 Turn off disk quotas for snapshot files. 2006-05-05 20:10:04 +00:00
Tor Egge
c7793f61dc Avoid locking overhead when snapshots are disabled. 2006-05-05 19:58:36 +00:00
Pawel Jakub Dawidek
5b139b2d75 - Set bio_done directly to NULL to indicate that we want to wait for the bio.
- Use biowait() instead of copying the code.

MFC after:	1 month
2006-05-05 10:06:22 +00:00
Tor Egge
d81daf63bc Detect the snapshot file being prematurely unlinked. 2006-05-03 00:29:22 +00:00
Tor Egge
868bb88ff2 Temporarily undo clusters contribution to global runningbufspace while
handling copy on write for the buffers taking part in the cluster.
2006-05-03 00:10:29 +00:00
Tor Egge
5515ad4282 A side effect of calling runningbufwakeup() is that bp->b_runningbufspace is
cleared.  Save old value and restore bp->b_runningbufspace before returning
from ffs_copyonwrite().
2006-05-03 00:04:38 +00:00
Tor Egge
6d94935d36 Close a race when VOP_LOCK() on a snapshot file is attempted at the
same time as it is changed back into a normal file.  The locker would
get the shared "snaplk" lock which would no longer be the correct lock
for the vnode.
2006-05-02 23:52:43 +00:00
Scott Long
cbd6fedbf2 Fix a typo. 2006-04-28 04:39:50 +00:00
Jeff Roberson
6ca9fcc586 - Add a BO_NEEDSGIANT flag to the bufobj. This flag forces all child
buffers to go on the buf daemon's DIRTYGIANT queue.
 - Set BO_NEEDSGIANT on ffs's devvp since the ffs_copyonwrite handler
   runs in the context of the buf daemon and may require Giant.
2006-04-28 01:05:31 +00:00
Tom Rhodes
7b3f1bbd61 Revert previous to this file before an actual request is made. 2006-04-22 04:22:15 +00:00
Tom Rhodes
8fc22c9d2e Remove what I believe are two useless ifdefs. If a user or administrator
enables multilabel, or any option for that matter, most likely they have
a reason.  This will allow users to see that mulilabel is enabled via an
issued "mount" command and remove an annoying warning - printed only when
a MAC kernel is not installed - on boot up.

Discussed with:	green, brueffer, Samy Al Bahra.
Probably ran past:	csjp (though I can't remember).
2006-04-21 07:14:25 +00:00
Ken Smith
39fac37953 Fix panic() message to give the right function name. 2006-04-17 07:43:56 +00:00
Tor Egge
68e8466655 Eliminate softdep_flush() livelock by accounting for number of worklist items
marked as being in progress.
2006-04-03 22:23:23 +00:00
Jeff Roberson
3bbd6d8ae6 - Release the references acquired by VOP_GETWRITEMOUNT and vfs_getvfs().
Discussed with:	tegge
Tested by:	kris
Sponsored by:	Isilon Systems, Inc.
2006-03-31 03:54:20 +00:00
Tor Egge
700118c72f Allow compilation when not using softupdates. 2006-03-19 22:16:44 +00:00
Tor Egge
7de3839d0d Let snapshots make a copy of old contents for all buffers taking part in a
cluster instead of just the first buffer.

Delay buf_start() calls until snapshots have a copy of old content.

PR:		kern/93942
2006-03-19 21:43:36 +00:00
Tor Egge
30b3a49fab Add kludge to avoid deadlock when unlinking snapshot. 2006-03-19 21:29:20 +00:00
Tor Egge
95e7a3c3ac Reduce probability of unmount failing after having unmounted snapshots. 2006-03-19 21:09:19 +00:00
Tor Egge
8c86028f11 Ensure that vnode for directory isn't reclaimed before ffs_snapshot() has
completed expunging unlinked files.  It could come back at another memory
location causing a lock order reversal.
2006-03-19 21:05:10 +00:00
Jeff Roberson
8db357205c - Remove the call to softdep_waitidle after suspending the filesystem.
This does not do what I wanted as all dirty buffers must be flushed
   by the call to ffs_sync and any remaining dependency work would mean
   that this failed.

Pointed out by: tegge
2006-03-12 05:26:12 +00:00
Jeff Roberson
2eedeb7e60 - Remove the call to softdep_waitidle after suspending the filesystem.
This does not do what I wanted as all dirty buffers must be flushed
   by the call to ffs_sync and any remaining dependency work would mean
   that this failed.

Pointed out by:	tegge
2006-03-12 05:24:14 +00:00
Tor Egge
ca2fa80767 Block secondary writes while expunging active unlinked files.
Fix detection of active unlinked files by checking VI_OWEINACT and
VI_DOINGINACT in addition to v_usecount.

Defer inactive handling for unlinked files if the file system is mostly
suspended (secondary writes being blocked).

Perform deferred inactive handling after the file system is resumed.
2006-03-11 01:08:37 +00:00
Tor Egge
1e70cd7fc7 Remove unneeded (and broken) usage of MNT_REF()/MNT_REL(). 2006-03-10 02:31:12 +00:00
Tor Egge
791dd2fade Use vn_start_secondary_write() and vn_finished_secondary_write() as a
replacement for vn_write_suspend_wait() to better account for secondary write
processing.

Close race where secondary writes could be started after ffs_sync() returned
but before the file system was marked as suspended.

Detect if secondary writes or softdep processing occurred during vnode sync
loop in ffs_sync() and retry the loop if needed.
2006-03-08 23:43:39 +00:00
Tor Egge
a695d54404 Don't set IN_CHANGE and IN_UPDATE on inodes for potentially suspended
file systems.  This could cause deadlocks when creating snapshots.

Reviewed by:	jeff
2006-03-08 02:14:39 +00:00
Tor Egge
3b582b4e72 Eliminate a deadlock when creating snapshots. Blocking vn_start_write() must
be called without any vnode locks held.  Remove calls to vn_start_write() and
vn_finished_write() in vnode_pager_putpages() and add these calls before the
vnode lock is obtained to most of the callers that don't already have them.
2006-03-02 22:13:28 +00:00
Jeff Roberson
b9b12498fd - Acquire lk in softdep_slowdown so that it's owned when we call
softdep_speedup().
 - Assert that lk is held in softdep_speedup() rather than acquiring it.
   This avoids a potential lock recursion.
2006-03-02 08:52:53 +00:00
Jeff Roberson
eb2ea10590 - Move softdep from using a global worklist to per-mount worklists. This
has many positive effects including improved smp locking, reducing
   interdependencies between mounts that can lead to deadlocks, etc.
 - Add the softdep worklist and various counters to the ufsmnt structure.
 - Add a mount pointer to the workitem and remove mount pointers from the
   various structures derived from the workitem as they are now redundant.
 - Remove the poor-man's semaphore protecting softdep_process_worklist and
   softdep_flushworklist.  Several threads may now process the list
   simultaneously.
 - Add softdep_waitidle() to block the thread until all pending
   dependencies being operated on by other threads have been flushed.
 - Use softdep_waitidle() in unmount and snapshots to block either
   operation until the fs is stable.
 - Remove softdep worklist processing from the syncer and move it into the
   softdep_flush() thread.  This thread processes all softdep mounts
   once each second and when it is called via the new softdep_speedup()
   when there is a resource shortage.  This removes the softdep hook
   from the kernel and various hacks in header files to support it.

Reviewed by/Discussed with:	tegge, truckman, mckusick
Tested by:	kris
2006-03-02 05:50:23 +00:00
Jeff Roberson
f5a4db791d - Using LK_NOWAIT in qsync() can get us into infinite loop situations that
lead to deadlocks.  Remove it.

MFC After:	1 week
2006-02-22 06:12:53 +00:00
Robert Watson
5652c15c24 In quotaoff(), lock the vnode instead of asserting it when manipulating
v_vflags.

MFC after:	1 week
Submitted by:	Antoine Brodin <antoine at brodin at laposte dot net>
2006-02-12 13:20:06 +00:00
Robert Watson
4a99d6f90a Instead of asserting the vnode lock before manipulating v_vflag, acquire
it and drop it afterwards.

Found by:	kris
MFC after:	1 week
2006-02-11 21:09:27 +00:00
Jeff Roberson
89b0e10910 - Reorder calls to vrele() after calls to vput() when the vrele is a
directory.  vrele() may lock the passed vnode, which in these cases would
   give an invalid lock order of child -> parent.  These situations are
   deadlock prone although do not typically deadlock because the vrele
   is typically not releasing the last reference to the vnode.  Users of
   vrele must consider it as a call to vn_lock() and order it appropriately.

MFC After: 	1 week
Sponsored by:	Isilon Systems, Inc.
Tested by:	kkenn
2006-02-01 00:25:26 +00:00
Tor Egge
82be0a5a24 Add marker vnodes to ensure that all vnodes associated with the mount point are
iterated over when using MNT_VNODE_FOREACH.

Reviewed by:	truckman
2006-01-09 20:42:19 +00:00
Tor Egge
6c62b2acd0 If the lock passed to getdirtybuf() is the softdep lock then the background
write completed wakeup could be missed.  Close the race by grabbing the lock
normally used for protection of bp->b_xflags.

Reviewed by:	truckman
2006-01-09 19:32:21 +00:00
Tor Egge
c8c7711d66 Broaden scope of softdep_worklist_busy rwlock protection of softdep processing
to avoid some dependencies being missed by softdep_flushworklist().

Reviewed by:	truckman
2006-01-09 19:16:56 +00:00
Warner Losh
5c65ae3a88 New option: NO_FFS_SNAPSHOT. I did this in p4 about the same time
that NetBSD implemented it independently of them (don't know which one
was actually first).  This saves about 24k for those times you don't
need snapshot support (like when running off a ram disk, or in an
embedded environment where size matters).
2006-01-06 04:44:09 +00:00
Xin LI
cd34c8b6a2 Typo. 2005-12-23 15:50:57 +00:00
Dag-Erling Smørgrav
0430a5e289 Eradicate caddr_t from the VFS API. 2005-12-14 00:49:52 +00:00
Craig Rodrigues
b6bd025c35 Fix parsing of atime, clusterr, clusterw, exec, suid, symfollow
mount options.

Noticed by:	Amir Shalem < amir at boom dot org dot il>
2005-11-24 15:06:40 +00:00
Craig Rodrigues
cea903627f If export mount flag is not passed in, set default parameters
for export structure and pass that to vfs_export().
Currently in userland mount(8), an export structure is unconditionally
passed in, only for UFS.  This is an attempt to move that UFS-specific
behavior out of mount(8) and into the UFS filesystem code.
2005-11-20 17:04:50 +00:00
Craig Rodrigues
359d438885 Add more options to ffs_opts, so that vfs_filteropts() will not
complain when we pass these options to a UFS filesystem as strings
via nmount():  noexec, nosuid, nosymfollow, sync, suiddir
2005-11-19 23:28:19 +00:00
Craig Rodrigues
26f59b6455 - Add parsing for the following existing UFS/FFS mount options in the nmount()
callpath via vfs_getopt(), and set the appropriate MNT_* flag:
  -> acls, async, force, multilabel, noasync, noatime,
  -> noclusterr, noclusterw, snapshot, update

- Allow errmsg as a valid mount option via vfs_getopt(),
  so we can later add a hook to propagate mount errors back
  to userspace via vfs_mount_error().
2005-11-18 06:06:10 +00:00
Xin LI
fad951e3a0 Slightly reorganize to reduce duplicated code.
Reviewed by:	rwatson
2005-11-07 18:25:23 +00:00
Paul Saab
e1cef62715 Rate limit filesystem full and out of inodes messages to once a
second.
2005-10-31 20:33:28 +00:00
Robert Watson
5bb84bc84b Normalize a significant number of kernel malloc type names:
- Prefer '_' to ' ', as it results in more easily parsed results in
  memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
  as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
  memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
  attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion.  Similar changes are required for UMA zone names.
2005-10-31 15:41:29 +00:00
Xin LI
eb2893ec18 Remove an unneeded "a" from comment. 2005-10-25 19:46:15 +00:00
Nate Lawson
8680d6985f Adjust maxfilesize for UFS1 and old 4.4 FFS. For UFS1, increase the limit
to (max block - 1) * bsize.  For DEV_BSIZE, this doubles the limit from
0.5 TB to 1 TB.  For the old 4.4 FFS case, decrease the limit from 0.5 TB
to 2 GB - 1.  Older systems had a 32 bit off_t so they couldn't access the
larger files anyway.

Collaboration with:	bde
2005-10-21 01:54:00 +00:00
Don Lewis
875e108755 Correct the type of the temporary variable used by ufs_lookup.c:1.78
to fix the race condition in the ufs_lookup() ISDOTDOT code.

Noticed by:	bde
MFC after:	12 days
2005-10-16 21:31:46 +00:00
Don Lewis
12d360453c Close a race in the ufs_lookup() code that handles the ISDOTDOT
case by saving the value of dp->i_ino before unlocking the vnode
for the current directory and passing the saved value to VFS_VGET().

Without this change, another thread can overwrite dp->i_ino after
the current directory is unlocked, causing  ufs_lookup() to lock
and return the wrong vnode in place of the vnode for its parent
directory.  A deadlock can occur if dp->i_ino was changed to a
subdirectory of the current directory because the root to leaf vnode
lock ordering will be violated.  A vnode lock can be leaked if
dp->i_ino was changed to point to the current directory, which
causes the current vnode lock for the current directory to be
recursed, which confuses lookup() into calling vrele() when it
should be calling vput().

The probability of this bug being triggered seems to be quite low
unless the sysctl variable debug.vfscache is set to 0.

Reviewed by:	jhb
MFC after:	2 weeks
2005-10-14 22:13:33 +00:00
Robert Watson
606dcf085f When performing a VOP_LOOKUP() as part of UFS1 extended attribute
auto-start, set cnp.cn_lkflags to LK_EXCLUSIVE.  This flag must now
be set so that lockmgr knows what kind of lock to acquire, and it
will panic if not specified.  This resulted in a panic when using
extended attributes on UFS1 as of locking work present in the 6.x
branch.

This is a RELENG_6_0 merge candidate.

Reported by:	lofi
MFC after:	3 days
2005-10-12 14:18:58 +00:00
Diomidis Spinellis
9f5c1d1955 Move execve's access time update functionality into a new
vfs_mark_atime() function, and use the new function for
performing efficient atime updates in mmap().

Reviewed by:	bde
MFC after:	2 weeks
2005-10-12 06:56:00 +00:00
Tor Egge
48c2ac4539 Avoid unintended VMIO on directories and symlinks due to leftover object
not having been destroyed.
2005-10-10 19:02:04 +00:00
Tor Egge
4e0cd00988 Adjust totread argument passed to cluster_read() to account for offset not
being block aligned.
2005-10-09 21:11:25 +00:00
Tor Egge
9248a8271c Don't pretend that a failed sync write was succesful. 2005-10-09 20:49:01 +00:00
Tor Egge
021869b542 Reduce probability for a deadlock that can occur when a snapshot inode is
updated by a process holding the snapshot lock.  Another process updating a
different inode in the same inodeblock will do copy on write checks and lock in
the opposite direction.

The snapshot code force a copy on write of these blocks manually (cf. start of
expunge_ufs[12]) and these inode blocks are later put on snapblklist.

This partial fix is to 'drain' the relevant ffs_copyonwrite() operation after
installing new snapblklist.  This is not a 100% solution since a failed block
allocation can cause implicit fsync() which might deadlock before the new
snapblklist has been installed.
2005-10-09 20:15:15 +00:00
Tor Egge
d4d530da96 Eliminate a deadlock that can occur when a dirty block belonging to a snapshot
file is flushed by a process not holding snaplk (e.g. bufdaemon).  Another
process might hold snaplk and try to access the block due to ffs_copyonwrite
processing.
2005-10-09 20:07:51 +00:00
Tor Egge
45f91051da Eliminate a deadlock that can occur during the cgaccount() processing due to
the cg map buffer being held when writing indirect blocks.  The process ends up
in ffs_copyonwrite(), attempting to get snaplk while holding the cg map buffer
lock.

Another process might be in ffs_copyonwrite(), trying to allocate a new block
for a copy.  It would hold snaplk while trying to get the cg map buffer lock.

Release the cg map buffer early and use the copy for most of the cgaccount
processing to avoid this deadlock.
2005-10-09 20:00:16 +00:00
Tor Egge
17026ff61a Reduce the probability of low block numbers passed to ffs_snapblkfree() by
skipping the call from ffs_snapremove() if the block number is zero.

Simplify snapshot locking in ffs_copyonwrite() and ffs_snapblkfree() by using
the same locking protocol for low block numbers as for larger block numbers.
This removes a lock leak that could happen if vn_lock() succeeded after
lockmgr() failed in ffs_snapblkfree().

Check if snapshot is gone before retrying a lock in ffs_copyonwrite().
2005-10-09 19:45:01 +00:00
Tor Egge
c73e9e9c7b Reinitialize v_type and v_op fields in case vnode has been reused without
reclamation.  If the vnode previously was a fifo then v_op would point to
ffs_fifoops[12] instead of the expected ffs_vnodeops[12], causing a panic at
the end of ffsext_strategy.
2005-10-09 19:06:34 +00:00
Don Lewis
448434c3c9 Initialize the inode i_flag field in ffs_valloc() to clean up any
stale flag bits left over from before the inode was recycled.

Without this change, a leftover IN_SPACECOUNTED flag could prevent
softdep_freefile() and softdep_releasefile() from incrementing
fs_pendinginodes.  Because handle_workitem_freefile() unconditionally
decrements fs_pendinginodes, a negative value could be reported at
file system unmount time with a message like:
        unmount pending error: blocks 0 files -3
The pending block count in fs_pendingblocks could also be negative
for similar reasons.  These errors can cause the data returned by
statfs() to be slightly incorrect.  Some other cleanup code in
softdep_releasefile() could also be incorrectly bypassed.

MFC after:	3 days
2005-10-03 21:57:43 +00:00
Don Lewis
460858e9ef Correct previous commit to fix the sense of the TDP_NORUNNINGBUF
check in ffs_copyonwrite() that is a precondition for calling
waitrunningbufspace().

Pointed out by:	tegge
Pointy hat to:	truckman
MFC after:	3 days
2005-10-01 19:10:48 +00:00
Don Lewis
bd3c2d867d Un-staticize waitrunningbufspace() and call it before returning from
ffs_copyonwrite() if any async writes were launched.

Restore the threads previous TDP_NORUNNINGBUF state before returning
from ffs_copyonwrite().
2005-09-30 18:07:41 +00:00
Don Lewis
6c8b634f1d Un-staticize runningbufwakeup() and staticize updateproc.
Add a new private thread flag to indicate that the thread should
not sleep if runningbufspace is too large.

Set this flag on the bufdaemon and syncer threads so that they skip
the waitrunningbufspace() call in bufwrite() rather than than
checking the proc pointer vs. the known proc pointers for these two
threads.  A way of preventing these threads from being starved for
I/O but still placing limits on their outstanding I/O would be
desirable.

Set this flag in ffs_copyonwrite() to prevent bufwrite() calls from
blocking on the runningbufspace check while holding snaplk.  This
prevents snaplk from being held for an arbitrarily long period of
time if runningbufspace is high and greatly reduces the contention
for snaplk.  The disadvantage is that ffs_copyonwrite() can start
a large amount of I/O if there are a large number of snapshots,
which could cause a deadlock in other parts of the code.

Call runningbufwakeup() in ffs_copyonwrite() to decrement runningbufspace
before attempting to grab snaplk so that I/O requests waiting on
snaplk are not counted in runningbufspace as being in-progress.
Increment runningbufspace again before actually launching the
original I/O request.

Prior to the above two changes, the system could deadlock if enough
I/O requests were blocked by snaplk to prevent runningbufspace from
falling below lorunningspace and one of the bawrite() calls in
ffs_copyonwrite() blocked in waitrunningbufspace() while holding
snaplk.

See <http://www.holm.cc/stress/log/cons143.html>
2005-09-30 01:30:01 +00:00
Don Lewis
445193b887 After a rmdir()ed directory has been truncated, force an update of
the directory's inode after queuing the dirrem that will decrement
the parent directory's link count.  This will force the update of
the parent directory's actual link to actually be scheduled.  Without
this change the parent directory's actual link count would not be
updated until ufs_inactive() cleared the inode of the newly removed
directory, which might be deferred indefinitely.  ufs_inactive()
will not be called as long as any process holds a reference to the
removed directory, and ufs_inactive() will not clear the inode if
the link count is non-zero, which could be the result of an earlier
system crash.

If a background fsck is run before the update of the parent directory's
actual link count has been performed, or at least scheduled by
putting the dirrem on the leaf directory's inodedep id_bufwait list,
fsck will corrupt the file system by decrementing the parent
directory's effective link count, which was previously correct
because it already took the removal of the leaf directory into
account, and setting the actual link count to the same value as the
effective link count after the dangling, removed, leaf directory
has been removed.  This happens because fsck acts based on the
actual link count, which will be too high when fsck creates the
file system snapshot that it references.

This change has the fortunate side effect of more quickly cleaning
up the large number dirrem structures that linger for an extended
time after the removal of a large directory tree.  It also fixes a
potential problem with the shutdown of the syncer thread timing out
if the system is rebooted immediately after removing a large directory
tree.

Submitted by:	tegge
MFC after:	3 days
2005-09-29 21:50:26 +00:00
Robert Watson
5f419982c2 Back out alpha/alpha/trap.c:1.124, osf1_ioctl.c:1.14, osf1_misc.c:1.57,
osf1_signal.c:1.41, amd64/amd64/trap.c:1.291, linux_socket.c:1.60,
svr4_fcntl.c:1.36, svr4_ioctl.c:1.23, svr4_ipc.c:1.18, svr4_misc.c:1.81,
svr4_signal.c:1.34, svr4_stat.c:1.21, svr4_stream.c:1.55,
svr4_termios.c:1.13, svr4_ttold.c:1.15, svr4_util.h:1.10,
ext2_alloc.c:1.43, i386/i386/trap.c:1.279, vm86.c:1.58,
unaligned.c:1.12, imgact_elf.c:1.164, ffs_alloc.c:1.133:

Now that Giant is acquired in uprintf() and tprintf(), the caller no
longer leads to acquire Giant unless it also holds another mutex that
would generate a lock order reversal when calling into these functions.
Specifically not backed out is the acquisition of Giant in nfs_socket.c
and rpcclnt.c, where local mutexes are held and would otherwise violate
the lock order with Giant.

This aligns this code more with the eventual locking of ttys.

Suggested by:	bde
2005-09-28 07:03:03 +00:00
John Baldwin
7e9e371f2d Use the refcount API to manage the reference count for user credentials
rather than using pool mutexes.

Tested on:	i386, alpha, sparc64
2005-09-27 18:09:42 +00:00
Xin LI
0e4f6eecd7 Restore a historical ufs_inactive behavior that has been changed
in rev. 1.40 of ufs_inode.c, which allows an inode being truncated
even when the filesystem itself is marked RDONLY.  A subsequent
call of UFS_TRUNCATE (ffs_truncate) would panic the system as it
asserts that it can only be called when the filesystem is mounted
read-write (same changeset, rev. 1.74 of sys/ufs/ffs/ffs_inode.c).

Because ffs_mount() already takes care of sync'ing the filesystem
to disk before being downgraded to readonly, it appears to be more
desirable that we should not permit this sort of writes to disk.

This change would fix a panic that occours when read-only mounted
a corrupted filesystem and doing some file operations.

MT6/5/4 candidate

Reviewed by:	mckusick
2005-09-23 20:49:57 +00:00
Robert Watson
84d2b7df26 Add GIANT_REQUIRED and WITNESS sleep warnings to uprintf() and tprintf(),
as they both interact with the tty code (!MPSAFE) and may sleep if the
tty buffer is full (per comment).

Modify all consumers of uprintf() and tprintf() to hold Giant around
calls into these functions.  In most cases, this means adding an
acquisition of Giant immediately around the function.  In some cases
(nfs_timer()), it means acquiring Giant higher up in the callout.

With these changes, UFS no longer panics on SMP when either blocks are
exhausted or inodes are exhausted under load due to races in the tty
code when running without Giant.

NB: Some reduction in calls to uprintf() in the svr4 code is probably
desirable.

NB: In the case of nfs_timer(), calling uprintf() while holding a mutex,
or even in a callout at all, is a bad idea, and will generate warnings
and potential upset.  This needs to be fixed, but was a problem before
this change.

NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having
non-MPSAFE tty code.

MFC after:	1 week
2005-09-19 16:51:43 +00:00
Tor Egge
2f0ffabcf4 Giant is no longer needed here. 2005-09-12 01:21:42 +00:00
Christian S.J. Peron
d1dfd92177 Convert the primary ACL allocator from malloc(9) to using a UMA zone instead.
Also introduce an aclinit function which will be used to create the UMA zone
for use by file systems at system start up.

MFC after:	1 month
Discussed with:	rwatson
2005-09-06 00:06:30 +00:00
Tor Egge
d536ff2edb Retain generation count when writing zeroes instead of an inode to disk.
Don't free a struct inodedep if another process is allocating saved inode
memory for the same struct inodedep in initiate_write_inodeblock_ufs[12]().

Handle disappearing dependencies in softdep_disk_io_initiation().

Reviewed by:	mckusick
2005-09-05 22:14:33 +00:00
Suleiman Souhlal
fdedad764a ffs_mountfs() needs devvp to be locked, so lock it.
Glanced at by:	phk
Tested by:	pjd
MFC after:	3 days
2005-09-02 13:52:55 +00:00
Suleiman Souhlal
93373c422f Set the mountpoint path in the superblock (fs_fsmnt) at mount-time
so that it appears in the various messages (not cleanly unmounted,
filesystem full, etc). This has been broken since rev 1.261.
2005-08-21 22:06:41 +00:00
Tor Egge
15da51f73c Don't set the COMPLETE flag in an inodedep structure before the related
inode has been written.
2005-08-21 18:19:06 +00:00
Ian Dowse
65ed954554 In the ufsdirhash_build() failure case for corrupted directories
or unreadable blocks, make sure to destroy the mutex we created.
Also fix an unrelated typo in a comment.

Found by:	Peter Holm's stress tests
Reviewed by:	dwmalone
MFC after:	3 days
2005-08-17 08:48:42 +00:00
Stephan Uphoff
ed8938e082 Delay freeing disk space for file system blocks until all dirty buffers
are safely released. This fixes softdep problems on truncation (deletion)
of files with dirty buffers.

Reviewed by:	jeff@, mckusick@, ps@, tegge@
Tested by: 	glebius@, ps@
MFC after:	3 weeks
2005-07-31 20:24:14 +00:00
Alan Cox
ec9c9e7363 Eliminate inconsistency in the setting of the B_DONE flag. Specifically,
make the b_iodone callback responsible for setting it if it is needed.
Previously, it was set unconditionally by bufdone() without holding
whichever lock is shared by the b_iodone callback and the corresponding
top-half function.  Consequently, in a race, the top-half function could
conclude that operation was done before the b_iodone callback finished.
See, for example, aio_physwakeup() and aio_fphysio().

Note: I don't believe that the other, more widely-used b_iodone callbacks
are affected.

Discussed with: jeff
Reviewed by: phk
MFC after: 2 weeks
2005-07-20 19:06:06 +00:00
Suleiman Souhlal
679985d03a Allow EVFILT_VNODE events to work on every filesystem type, not just
UFS by:
- Making the pre and post hooks for the VOP functions work even when
DEBUG_VFS_LOCKS is not defined.
- Moving the KNOTE activations into the corresponding VOP hooks.
- Creating a MNTK_NOKNOTE flag for the mnt_kern_flag field of struct
mount that permits filesystems to disable the new behavior.
- Creating a default VOP_KQFILTER function: vfs_kqfilter()

My benchmarks have not revealed any performance degradation.

Reviewed by:	jeff, bde
Approved by:	rwatson, jmg (kqueue changes), grehan (mentor)
2005-06-09 20:20:31 +00:00
Ken Smith
6341095e0d This patch addresses a standards violation issue. The standards say a
file's access time should be updated when it gets executed.  A while
ago the mechanism used to exec was changed to use a more mmap based
mechanism and this behavior was broken as a side-effect of that.

A new vnode flag is added that gets set when the file gets executed,
and the VOP_SETATTR() vnode operation gets called.  The underlying
filesystem is expected to handle it based on its own semantics, some
filesystems don't support access time at all.  Those that do should
handle it in a way that does not block, does not generate I/O if possible,
etc.  In particular vn_start_write() has not been called.  The UFS code
handles it the same way as it would normally handle the access time if
a file was read - the IN_ACCESS flag gets set in the inode but no other
action happens at this point.  The actual time update will happen later
during a sync (which handles all the necessary locking).

Got me into this:	cperciva
Discussed with:		a lot with bde, a little with kan
Showed patches to:	phk, jeffr, standards@, arch@
Minor discussion on:	arch@
2005-05-31 19:39:52 +00:00
Jeff Roberson
204ec66d38 - Don't set our bio op to be a READ when we've just completed a write. There
are subtle differences in the read and write completion path.  Instead,
   grab an extra write ref so the write path can drop it when we recursively
   call bufdone().  I believe this may be the source of the wrong bufobj
   panics.

Reported by:	pho, kkenn
2005-05-30 07:04:15 +00:00
Kirk McKusick
6a52a06851 Allow removal of empty directories with high link counts. These can
occur on a filesystem running with soft updates after a crash and
before a background fsck has been run. To prevent discrepancies
from arising in a background fsck that may already be running,
the directory is removed but its inode is not freed and is left
with the residual reference count. When encountered by the
background fsck it will be reclaimed.
2005-05-18 22:18:21 +00:00
Jeff Roberson
6c71a2208d - Don't restrict the softdep stats to DEBUG kernels, they cost nothing to
export.  This was happening anyway since this file manually sets DEBUG.
 - Add a sysctl for the number of items on the worklist.
 - Use a more canonical loop restart in softdep_fsync_mountdev, it saves
   some code at the expense of a goto and makes me worry less about
   modifying a variable that should be private to the TAILQ_FOREACH_SAFE
   macro.
2005-05-03 11:03:29 +00:00
Jeff Roberson
2524c26de8 - Use bdone() directly instead of calling it indirectly through
ffs_rawreaddone().

Sponsored by:	Isilon Systems, Inc.
2005-04-30 11:28:19 +00:00
Pawel Jakub Dawidek
231b1be179 - Plug memory leak.
- Fix two style nits.

Found by:	Coverity Prevent analysis tool
Reviewed by:	rwatson
MFC after:	1 week
2005-04-16 10:57:49 +00:00
Jeff Roberson
4585e3ac5a - Change all filesystems and vfs_cache to relock the dvp once the child is
locked in the ISDOTDOT case.  Se vfs_lookup.c r1.79 for details.

Sponsored by:	Isilon Systems, Inc.
2005-04-13 10:59:09 +00:00
Jeff Roberson
ece9473efa - Consistently call 'vp' vp rather than ovp sometimes in ffs_truncate().
Do the same for oip.

Pointed out by:	glebius
2005-04-05 08:49:41 +00:00
Jeff Roberson
bcc8f66c8b - Use M_ZERO rather than explicitly calling bzero().
- Don't intermingle direct calls to lockmgr and indirect calls through
   VOPs.  This will be important in the future.
 - Dont lock the devvp's interlock just to release it on the next line by
   passing LK_INTERLOCK to lockmgr.
 - Restructure ffs_snapshot_unmount so we don't call free() with the
   devvp's interlock locked.
2005-04-03 12:03:44 +00:00
Jeff Roberson
41d4783d49 - In ffs_sync we need to pass LK_SLEEPFAIL in when we lock the vnode
because it may change identities while we're sleeping on the lock.
   Otherwise we may bail out of ffs_sync() early due to an error from
   deadfs.
 - Collapse a VOP_UNLOCK, vrele into a single vput().
2005-04-03 10:38:18 +00:00
Jeff Roberson
153910e0f5 - Move the contents of softdep_disk_prewrite into ffs_geom_strategy to fix
two bugs.
 - ffs_disk_prewrite was pulling the vp from the buf and checking for
   COPYONWRITE, when really it wanted the vp from the bufobj that we're
   writing to, which is the devvp.  This lead to us skipping the copy on
   write to all file data, which significantly broke snapshots for the
   last few months.
 - When the SOFTUPDATES option was not included in the kernel config we
   would also skip the copy on write check, which would effectively disable
   snapshots.
 - Remove an invalid mp_fixme().

Debugging tips from:	mckusick
Reported by:		iedowse, others
Discussed with:		phk
2005-04-03 10:29:55 +00:00
Jeff Roberson
278c5a6efa - Fix botched LK_NOWAIT removal. I mistakenly thought this compiled as
part of GENERIC.
2005-03-31 05:58:14 +00:00
Jeff Roberson
aa7ba42796 - FFS supports shared locks, clear LK_NOSHARE from our vnode locks.
Sponsored by:	Isilon Systems, Inc.
2005-03-31 05:23:20 +00:00
Jeff Roberson
ec3db02a3e - Set LK_NOSHARE for snapshot locks. snapshots require exclusive only
access.
 - Remove the hack from ffs_lock() to implement LK_NOSHARE in a ffs
   specific way.

Sponsored by:	Isilon Systems, Inc.
2005-03-31 05:21:17 +00:00
Jeff Roberson
f247a5240d - LK_NOPAUSE is a nop now.
Sponsored by:   Isilon Systems, Inc.
2005-03-31 04:37:09 +00:00
Jeff Roberson
52f6886551 - Remove wantparent, it is no longer necessary. An assert in vfs_lookup.c
prevents any callers from doing a modifying op without
   LOCKPARENT or WANTPARENT.  It wasn't even properly used in the CREATE
   or DELETE cases.
2005-03-29 13:16:38 +00:00
Jeff Roberson
d6919865fa - Upgrade a shared lock request to exclusive in ffs_vget() if we have
to create the vnode.

Sponsored by:	Isilon Systems, Inc.
2005-03-29 10:10:51 +00:00
Jeff Roberson
a69c43548d - Honor the cn_lkflags passed from namei() when locking the leaf.
Sponsored by:	Isilon Systems, Inc.
2005-03-29 10:10:01 +00:00
Jeff Roberson
e19881ff08 - UFS no longer uses PDIRUNLOCK to track the parent state. Instead, we now
rely on ufs to always leave the parent locked except in the ISDOTDOT
   case.  Adjust asserts to deal with these changes.

Sponsored by:	Isilon Systems, Inc.
2005-03-28 09:35:58 +00:00
Jeff Roberson
eddcb03d02 - We no longer have to bother with PDIRUNLOCK, lookup() handles it for us.
Sponsored by:   Isilon Systems, Inc.
2005-03-28 09:34:36 +00:00
David Schultz
188f6433f6 When the softupdates worklist gets too long, threads that attempt to
add more work are forced to process two worklist items first.
However, processing an item may generate additional work, causing the
unlucky thread to recursively process the worklist.  Add a per-thread
flag to detect this situation and avoid the recursion.  This should
fix the stack overflows that could occur while removing large
directory trees.

Tested by:	kris
Reviewed by:	mckusick
2005-03-25 17:30:31 +00:00
Jeff Roberson
080c061ad0 - Call VFS_ROOT() with LK_EXCLUSIVE.
Sponsored by:	Isilon Systems, Inc.
2005-03-24 07:33:45 +00:00
Jeff Roberson
469ec10c1e - Update the ufs_root() prototype.
- Pass the ufs_root() flags argument to VFS_VGET() to allow callers to
   specify shared locks.

Sponsored by:	Isilon Systems, Inc.
2005-03-24 07:32:50 +00:00
Jeff Roberson
23d15e852d - Lock the clearing of v_data in ufs_reclaim() to prevent a pagefault
in ffs_lock() when it acesses v_data without the vnlock.

Sponsored by:	Isilon Systems, Inc.
2005-03-17 11:58:43 +00:00
Poul-Henning Kamp
51f5ce0c8c Add two arguments to the vfs_hash() KPI so that filesystems which do
not have unique hashes (NFS) can also use it.
2005-03-16 11:20:51 +00:00
Poul-Henning Kamp
de68347b1b Don't hold a reference on the disk vnode for each inode. 2005-03-15 20:50:58 +00:00
Poul-Henning Kamp
45c26fa2b6 Improve the vfs_hash() API: vput() the unneeded vnode centrally to
avoid replicating the vput in all the filesystems.
2005-03-15 20:00:03 +00:00
Poul-Henning Kamp
e82ef95c11 Simplify the vfs_hash calling convention. 2005-03-15 08:07:07 +00:00
Jeff Roberson
4483fe9227 - Destroy the vnode object earlier in VOP_RECLAIM as we need more of
the vnode valid before the vm flushes pages.
 - Get rid of some extraneous uses of the vnode interlock.

Sponsored by:	Isilon Systems, Inc.
2005-03-15 01:42:58 +00:00
Poul-Henning Kamp
14bc0685ac Use vfs_hash instead of home-rolled. 2005-03-14 10:21:16 +00:00
Jeff Roberson
9cbe5da9d5 - It is not legal to access v_data without the vnode lock or interlock
held.  Grab the vnode interlock if LK_INTERLOCK has not been passed in
   so that we can inspect v_data in ffs_lock().

Sponsored by:	Isilon Systems, Inc.
2005-03-13 12:04:12 +00:00
Jeff Roberson
fe68abe291 - The VI_DOOMED flag now signals the end of a vnode's relationship with
the filesystem.  Check that rather than VI_XLOCK.
 - Shorten ffs_reload by one step.  The old check for an inactive vnode
   was slightly racey, and the code which deals with still active vnodes
   is not much more expensive.

Sponsored by:	Isilon Systems, Inc.
2005-03-13 12:03:14 +00:00
Jeff Roberson
fdcc82276e - The VI_DOOMED flag now signals the end of a vnode's relationship with
the filesystem.  Check that rather than VI_XLOCK.

Sponsored by:	Isilon Systems, Inc.
2005-03-13 12:01:50 +00:00
Jeff Roberson
b5411d4fcb - Fix an assert now that the XLOCK no longer exists.
Sponsored by:	Isilon Systems, Inc.
2005-03-13 12:00:41 +00:00
Jeff Roberson
b6ee8476d3 - In ufs_mknod(), hold the lock across the call to vgone() as that is now
required.
 - In ufs_close(), don't do the EAGAIN vrele hack, the top layer now calls
   vn_start_write before the lock is acquired as it should.

Sponsored by:	Isilon Systems, Inc.
2005-03-13 11:59:14 +00:00
Jeff Roberson
38d504db44 - Don't drop the lock in ufs_inactive().
- Also in ufs_inactive, don't acquire the vnode interlock where it isn't
   strictly needed.  Also owning the vnode interlock while calling vprint()
   will cause locking assertions to trip.

Sponsored by:	Isilon Systems, Inc.
2005-03-13 11:57:39 +00:00
Jeff Roberson
41766826eb - Fix anoter dyslexic moment; an atomic_set_int should've become ACTIVESET,
not ACTIVECLEAR.

Submitted by:	iedowse
2005-03-01 07:38:45 +00:00
Poul-Henning Kamp
7ce296cf04 Remove debug printout of major/minor numbers, print name instead. 2005-02-27 21:16:26 +00:00
Sam Leffler
d5bbad8372 use uiomove return value instead of always returning 0 when doing a
readlink of a fast link

Noticed by:	Coverity Prevent analysis tool
Reviewed by:	phk
2005-02-27 18:58:31 +00:00
Jeff Roberson
1a4a9672f1 - Add VOP locking asserts in several functions that have been implicated in
recent deadlocks.
2005-02-22 23:56:42 +00:00
Xin LI
a16baf37b9 The recomputation of file system summary at mount time can be a
very slow process, especially for large file systems that is just
recovered from a crash.

Since the summary is already re-sync'ed every 30 second, we will
not lag behind too much after a crash.  With this consideration
in mind, it is more reasonable to transfer the responsibility to
background fsck, to reduce the delay after a crash.

Add a new sysctl variable, vfs.ffs.compute_summary_at_mount, to
control this behavior.  When set to nonzero, we will get the
"old" behavior, that the summary is computed immediately at mount
time.

Add five new sysctl variables to adjust ndir, nbfree, nifree,
nffree and numclusters respectively.  Teach fsck_ffs about these
API, however, intentionally not to check the existence, since
kernels without these sysctls must have recomputed the summary
and hence no adjustments are necessary.

This change has eliminated the usual tens of minutes of delay of
mounting large dirty volumes.

Reviewed by:	mckusick
MFC After:	1 week
2005-02-20 08:02:15 +00:00
Poul-Henning Kamp
dfd4be14bd Try to unbreak the vnode locking around vop_reclaim() (based mostly on
patch from kan@).

Pull bufobj_invalbuf() out of vinvalbuf() and make g_vfs call it on
close.  This is not yet a generally safe function, but for this very
specific use it is safe.  This solves the problem with buffers not
being flushed by unmount or after failed mount attempts.
2005-02-19 11:44:57 +00:00
Xin LI
d5128ab2af When clearing a fragment, it's possible that the length is zero.
Reviewed by:	mckusick
MFC After:	1 week
2005-02-19 07:31:33 +00:00
Jeff Roberson
a8127ebb5d - Remove the unused and unsafe ufs_ihashlookup. This function returned a
vnode pointer that could not be used since no locks were held.

Sponsored by:	Isilon Systems, Inc.
2005-02-14 20:51:39 +00:00
Poul-Henning Kamp
1121c39497 Make non-SOFTUPDATES kernels compile again.
Integrate the stubfile into the main file now that license issues have been
long resolved.
2005-02-11 08:13:31 +00:00
Poul-Henning Kamp
adf4157738 Make a some SYSCTL_NODEs and some of FFS's VFS_ methods static. 2005-02-10 12:20:08 +00:00
Jeff Roberson
a3caf16e99 - In the softupdates case for ffs_truncate() we use vinvalbuf() to
invalidate pending io and dependencies.  However, vinvalbuf() rightfully
   does not call vnode_pager_setsize() for us.  We must do this here.  This
   could potentially have caused numerous kinds of bugs, but it was
   specifically causing msync() deadlocks because msync() was writing
   flushing pages that should not have been valid.

Sponsored by:	Isilon Systems, Inc.
Reported by:	kkenn
2005-02-09 23:05:20 +00:00
Poul-Henning Kamp
365b18aa89 style polishing. 2005-02-09 12:22:16 +00:00
Colin Percival
79653046d8 Add a new sysctl, "security.jail.chflags_allowed", which controls the
behaviour of chflags within a jail.  If set to 0 (the default), then a
jailed root user is treated as an unprivileged user; if set to 1, then
a jailed root user is treated the same as an unjailed root user.

This is necessary to allow "make installworld" to work inside a jail,
since it attempts to manipulate the system immutable flag on certain
files.

Discussed with:	csjp, rwatson
MFC after:	2 weeks
2005-02-08 21:31:11 +00:00
Poul-Henning Kamp
02f2c6a9d8 Split the vop_vector for ffs1 and ffs2, this is mostly for the different
EXTATTR support.
2005-02-08 21:03:52 +00:00
Poul-Henning Kamp
44787ceb0b Use ffs_truncate() directly instead of UFS_TRUNCATE() 2005-02-08 20:51:00 +00:00
Poul-Henning Kamp
dd19a799b8 Background writes are entirely an FFS/Softupdates thing.
Give FFS vnodes a specific bufwrite method which contains all the
background write stuff and then calls into the default bufwrite()
for the rest of the job.

Remove all the background write related stuff from the normal bufwrite.

This drags the softdep_move_dependencies() back into FFS.

Long term, it is worth looking at simply copying the data into
allocated memory and issuing the bio directly and not create the
"shadow buf" in the first place (just like copy-on-write is done
in snapshots for instance).  I don't think we really gain anything
but complexity from doing this with a buf.
2005-02-08 20:29:10 +00:00
Poul-Henning Kamp
88e5b12a20 Drag another softupdates tentacle back into FFS: Now that FFS's
vop_fsync is separate from the internal use we can do the full job
there.
2005-02-08 18:09:11 +00:00
Poul-Henning Kamp
efd6d9808c Don't use the UFS_* and VFS_* functions where a direct call is possble.
The UFS_ functions are for UFS to call back into VFS.  The VFS functions
are external entry points into the filesystem.
2005-02-08 17:40:01 +00:00
Robert Watson
45faa442c3 Don't use VOP_LEASE() with operations on extended attribute backing
files.

Pointed out by:	phk
2005-02-08 17:05:38 +00:00
Poul-Henning Kamp
40854ff546 For snapshots we need all VOP_LOCKs to be exclusive.
The "business class upgrade" was implemented in UFS's VOP_LOCK
implementation ufs_lock() which is the wrong layer, so move it to
ffs_lock().

Also, as long as we have not abandonned advanced vfs-stacking we
should not preclude it from happening: instead of implementing a
copy locally, use the VOP_LOCK_APV(&ufs) to correctly arrive at
vop_stdlock() at the bottom.
2005-02-08 16:25:50 +00:00
Poul-Henning Kamp
d6f622cc2f For snapshots we need all VOP_LOCKs to be exclusive.
The "business class upgrade" was implemented in UFS's VOP_LOCK
implementation ufs_lock() which is the wrong layer, so move it to
ffs_lock().

Also, as long as we have not abandonned advanced vfs-stacking we
should not preclude it from happening: instead of implementing a
copy locally, use the VOP_LOCK_APV(&ufs) to correctly arrive at
vop_stdlock() at the bottom.
2005-02-08 15:54:30 +00:00
Poul-Henning Kamp
32a870da8a Use VOP_STRATEGY_APV() instead of direct dereference, this is more
correct.
2005-02-08 15:40:11 +00:00
Jeff Roberson
9087d86e66 - Use a seperate malloc tag for saved inode contents to help in debugging
memory modified after free errors.

Sponsored by:	Isilon Systems, Inc.
2005-02-02 20:30:47 +00:00
Ken Smith
87c29bf93e Back out previous commit, bde@ provided an example of something this
breaks.
2005-02-02 14:21:01 +00:00
Ken Smith
0fac1537a2 It was noticed that we do not change a file's access time when it gets
executed.  This appears to violate most of the UNIX-ish standards.
One example quote from:

  http://www.opengroup.org/onlinepubs/009695399/functions/exec.html

    Upon successful completion, the exec functions shall mark for update
    the st_atime field of the file. If an exec function failed but was
    able to locate the process image file, whether the st_atime field is
    marked for update is unspecified. Should the exec function succeed,
    the process image file shall be considered to have been opened with
    open().

This appears to take care of it for ufs filesystems, doing the necessary
sanity checks (read-only filesystem, etc) without violating any other
standards (setting atime for any open appears to be allowed in any standards
I could find).

Noticed by:	cperciva
Reviewed by:	kan, rwatson
2005-02-02 00:21:38 +00:00
Warner Losh
1f0ce611b3 nit in /*- 2005-01-31 08:16:45 +00:00
Peter Edwards
e697161fa2 Tell vnode_create_vobject() how big an object to create, rather
than having it work it out via the more expensive VOP_GETATTR

Reviewed by: phk@
2005-01-29 14:23:09 +00:00
Poul-Henning Kamp
a369f34d76 Make filesystems get rid of their own vnodes vnode_pager object in
VOP_RECLAIM().
2005-01-28 14:42:17 +00:00
Poul-Henning Kamp
d4eb29ba71 Remove unused argument to vrecycle() 2005-01-28 13:08:21 +00:00
Poul-Henning Kamp
84a6975215 Introduce and use g_vfs_close(). 2005-01-25 15:52:04 +00:00
Poul-Henning Kamp
8516dd18e1 Don't use VOP_GETVOBJECT, use vp->v_object directly. 2005-01-25 00:40:01 +00:00
Poul-Henning Kamp
f74b3b1f6c Create a vnode object when the file is opened. Trust that we did so. 2005-01-24 23:04:33 +00:00
Poul-Henning Kamp
ce12d37e7b Don't create vnode_pager objects for the disk device.
geom_vfs will do that.
2005-01-24 22:41:59 +00:00
Poul-Henning Kamp
625d4bc03a Create a vp->v_object in VFS_FHTOVP() if we want to be exportable
with NFS.

We are moving responsibility for creating the vnode_pager object into
the filesystems which own the vnode, and this is one of the places
we have to cover.

We call vnode_create_vobject() directly because we own the vnode.

If we can get the size easily, pass it as an argument to save the
call to VOP_GETATTR() in vnode_create_vobject()
2005-01-24 21:51:19 +00:00
Poul-Henning Kamp
091710ab22 Polish style. 2005-01-24 12:19:28 +00:00
Jeff Roberson
08023360a0 - Convert the global LK lock to a mutex.
- Expand the scope of lk to cover not only interrupt races, but also
   top-half races, which includes many new uses over global top-half
   only data.
 - Get rid of interlocked_sleep() and use msleep or BUF_LOCK where
   appropriate.
 - Use the lk mutex in place of the various hand rolled semaphores.
 - Stop dropping the lk lock before we panic.
 - Fix getdirtybuf() callers so that they reacquire access to whatever
   softdep datastructure they were inxpecting in the failure/retry
   case.  Previously, sleeps in getdirtybuf() could leave us with
   pointers to bad memory.
 - Update handling of ffs to be compatible with ffs locking changes.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:18:31 +00:00
Jeff Roberson
3ba649d792 - Initialize and destroy the per-filesystem ufs lock where appropriate.
- Use the buffer lock on the superblock buf to serialize calls to
   sbupdate.
 - Set the MNTK_MPSAFE flag when QUOTA is not defined in the kernel.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:12:28 +00:00
Jeff Roberson
dec351f69e - Remove GIANT_REQUIRED where giant is no longer required.
Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:10:47 +00:00
Jeff Roberson
5cef9d6add - Use the ufs lock to protect fs_active.
Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:10:11 +00:00
Jeff Roberson
353255885c - Acquire the ufs lock around several ffs_alloc functions that require
it.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:09:10 +00:00
Jeff Roberson
8e37fbad3a - Don't use atomic operations to deal with the active array, instead
it is now quite naturally protected by the ufsmount mutex.
 - Use the ufs lock to protect various fields in struct fs, primarily the
   cg summary needs protection to avoid allocation races.  Several
   functions have been slightly re-arranged to reduce the number of
   lock operations.
 - Adjust several functions (blkfree, freefile, etc.) to accept a
   ufsmount as an argument so that we may access the ufs lock.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:08:35 +00:00
Jeff Roberson
5c77b03eff - Acquire the ufs lock when manipulating some fields of struct fs.
- Change arguments to various ffs functions to match their new
   prototypes.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:04:22 +00:00
Jeff Roberson
f2aa1113a3 - Mark the struct fs members that require the ufsmount mutex.
- Define some macros for manipulating the fs_active bitmap.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:03:17 +00:00
Jeff Roberson
aaee366929 - Change some function parameters so that the ufsmount structure is
accessable in places where the ufs lock will be needed.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:02:11 +00:00
Jeff Roberson
751d0d9fc9 - Add a mutex to the ufsmount structure. This mutex is used to protect
any per-instance global data that is not already protected by a
   buf or vnode lock.  Presently, only fields in ffs's struct fs utilize
   this lock.
 - Sort some ufsmount members so that fields used for quotas are grouped
   together.  This is in anticipation of quota locking.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:01:10 +00:00
Pawel Jakub Dawidek
39cfb23935 Fix ACLs handling for the root file system.
Without this fix, when ACLs are set via tunefs(8) on the root file system,
they are removed on boot when 'mount -a' is called, because mount(8)
called for the root file system always add MNT_UPDATE flag and MNT_UPDATE
flag isn't perfect.
Now, one cannot remove ACLs stored in superblock (configured with tunefs(8))
via 'mount -a' nor 'mount -u -o noacls <file system>', but it is still
possible to mount file system which doesn't have ACLs in superblock via
'mount -o acls <file system>' or /etc/fstab's 'acls' option.

Reported by:	Lech Lorens/pl.comp.os.bsd
Discussed with:	phk, rwatson
Reviewed by:	rwatson
MFC after:	2 weeks
2005-01-15 17:09:53 +00:00
Poul-Henning Kamp
7c0745eeae Eliminate unused and unnecessary "cred" argument from vinvalbuf() 2005-01-14 07:33:51 +00:00
Poul-Henning Kamp
e39db32ab0 Ditch vfs_object_create() and make the callers call VOP_CREATEVOBJECT()
directly.
2005-01-13 12:25:19 +00:00
Poul-Henning Kamp
6ef8480a88 Add BO_SYNC() and add a default which uses the secret vnode pointer
and VOP_FSYNC() for now.
2005-01-11 10:43:08 +00:00
Poul-Henning Kamp
0391e5a151 Wrap the bufobj operations in macros: BO_STRATEGY() and BO_WRITE() 2005-01-11 09:10:46 +00:00
Poul-Henning Kamp
8df6bac4c7 Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC().
I'm not sure why a credential was added to these in the first place, it is
not used anywhere and it doesn't make much sense:

	The credentials for syncing a file (ability to write to the
	file) should be checked at the system call level.

	Credentials for syncing one or more filesystems ("none")
	should be checked at the system call level as well.

	If the filesystem implementation needs a particular credential
	to carry out the syncing it would logically have to the
	cached mount credential, or a credential cached along with
	any delayed write data.

Discussed with:	rwatson
2005-01-11 07:36:22 +00:00
Warner Losh
60727d8b86 /* -> /*- for license, minor formatting changes 2005-01-07 02:29:27 +00:00
Poul-Henning Kamp
a7e8286f28 white space 2004-12-14 21:35:00 +00:00
Poul-Henning Kamp
59d42685ad Implement simpler panics for VOP_{read,write} on fifos. 2004-12-14 21:30:45 +00:00
Warner Losh
7a7e867742 LINT defines things which compile in code that as referring to the old
a_desc element.  change this to the new a_gen.a_desc to reflect
changes to vnode_if.h generation.

Noticed by: tinderbox, phk
2004-12-13 17:53:20 +00:00
Poul-Henning Kamp
4a18054d7b With the introduction of UFS2 we started looking for superblocks in
four different locations on a prospective filesystem.

If we found none, we forgot to invalidate the four buffers, thus the
following sequence would fails:

	(md0 = blank disk)
	mount /dev/md0 /mnt
	(fails, no superblocks)
	newfs /dev/md0
	(writes using physio which does not go through buffercache).
	mount /dev/md0 /mnt
	(still fails, the four cached buffers still contain no superblocks)

Found by:	ru
2004-12-12 14:19:11 +00:00
Marcel Moolenaar
9effe51e45 Revert previous commit. The null-pointer function call (a dereference
on ia64) was not the result of a change in the vector operations. It
was caused by the NFS locking code using a FIFO and those bypassing
the vnode. This indirectly caused the panic. The NFS locking code has
been changed.

Requested by: phk
2004-12-11 23:05:30 +00:00
Kirk McKusick
364ed814e7 Fixes a bug that caused UFS2 filesystems bigger than 2TB to
prematurely report that they were full and/or to panic the kernel
with the message ``ffs_clusteralloc: allocated out of group''.

Submitted by:	Henry Whincup <henry@jot.to>
MFC after:	1 week
2004-12-09 21:24:00 +00:00
Poul-Henning Kamp
8f25bad356 Fix snapshot creation. 2004-12-08 11:54:06 +00:00
Poul-Henning Kamp
f21cc2cafc Fix nfs exports (for now). The real fix is to teach mountd about
nmount.
2004-12-07 15:09:30 +00:00
Poul-Henning Kamp
20a92a18f1 The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly
split the conversion of the remaining three filesystems out from the root
mounting changes, so in one go:

cd9660:
	Convert to nmount.
	Add omount compat shims.
	Remove dedicated rootfs mounting code.
	Use vfs_mountedfrom()
	Rely on vfs_mount.c calling VFS_STATFS()

nfs(client):
	Convert to nmount (the simple way, mount_nfs(8) is still necessary).
	Add omount compat shims.
	Drop COMPAT_PRELITE2 mount arg compatibility.

ffs:
	Convert to nmount.
	Add omount compat shims.
	Remove dedicated rootfs mounting code.
	Use vfs_mountedfrom()
	Rely on vfs_mount.c calling VFS_STATFS()

Remove vfs_omount() method, all filesystems are now converted.

Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem
task, and they all do it now.

Change rootmounting to use DEVFS trampoline:

vfs_mount.c:
	Mount devfs on /.  Devfs needs no 'from' so this is clean.
	symlink /dev to /.  This makes it possible to lookup /dev/foo.
	Mount "real" root filesystem on /.
	Surgically move the devfs mountpoint from under the real root
	filesystem onto /dev in the real root filesystem.

Remove now unnecessary getdiskbyname().

kern_init.c:
	Don't do devfs mounting and rootvnode assignment here, it was
	already handled by vfs_mount.c.

Remove now unused bdevvp(), addaliasu() and addalias().  Put the
few necessary lines in devfs where they belong.  This eliminates the
second-last source of bogo vnodes, leaving only the lemming-syncer.

Remove rootdev variable, it doesn't give meaning in a global context and
was not trustworth anyway.  Correct information is provided by
statfs(/).
2004-12-07 08:15:41 +00:00
Poul-Henning Kamp
743312367a VFS_STATFS(mp, ...) is mostly called with &mp->mnt_stat, but a few cases
doesn't.  Most of the implementations have grown weeds for this so they
copy some fields from mnt_stat if the passed argument isn't that.

Fix this the cleaner way:  Always call the implementation on mnt_stat
and copy that in toto to the VFS_STATFS argument if different.
2004-12-05 22:41:02 +00:00
Marcel Moolenaar
061f5ec825 Fix null-pointer indirect function calls introduced in the previous
commit. In the new world order, the transitive closure on the vector
operations is not precomputed. As such, it's unsafe to actually use
any of the function pointers in an indirect function call. They can
be null, and we need to use the default vector in that case.
This is mostly a quick fix for the four function pointers that are
ed explicitly. A more generic or scalable solution is likely to see
the light of day.

No pathos on: current@
2004-12-05 22:30:28 +00:00
Poul-Henning Kamp
93e0b506e3 typo in comment. 2004-12-03 20:36:55 +00:00
Poul-Henning Kamp
aec0fb7b40 Back when VOP_* was introduced, we did not have new-style struct
initializations but we did have lofty goals and big ideals.

Adjust to more contemporary circumstances and gain type checking.

	Replace the entire vop_t frobbing thing with properly typed
	structures.  The only casualty is that we can not add a new
	VOP_ method with a loadable module.  History has not given
	us reason to belive this would ever be feasible in the the
	first place.

	Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc.

	Give coda correct prototypes and function definitions for
	all vop_()s.

	Generate a bit more data from the vnode_if.src file:  a
	struct vop_vector and protype typedefs for all vop methods.

	Add a new vop_bypass() and make vop_default be a pointer
	to another struct vop_vector.

	Remove a lot of vfs_init since vop_vector is ready to use
	from the compiler.

	Cast various vop_mumble() to void * with uppercase name,
	for instance VOP_PANIC, VOP_NULL etc.

	Implement VCALL() by making vdesc_offset the offsetof() the
	relevant function pointer in vop_vector.  This is disgusting
	but since the code is generated by a script comparatively
	safe.  The alternative for nullfs etc. would be much worse.

	Fix up all vnode method vectors to remove casts so they
	become typesafe.  (The bulk of this is generated by scripts)
2004-12-01 23:16:38 +00:00
Poul-Henning Kamp
6fde64c778 Mechanically change prototypes for vnode operations to use the new typedefs. 2004-12-01 12:24:41 +00:00
Poul-Henning Kamp
964ebefd8d Use system wide no-op vfs_start function. 2004-11-25 09:11:27 +00:00
Jeff Roberson
b646893f0f - Eliminate the acquisition and release of the bqlock in bremfree() by
setting the B_REMFREE flag in the buf.  This is done to prevent lock order
   reversals with code that must call bremfree() with a local lock held.
   This also reduces overhead by removing two lock operations per buf for
   fsync() and similar.
 - Check for the B_REMFREE flag in brelse() and bqrelse() after the bqlock
   has been acquired so that we may remove ourself from the free-list.
 - Provide a bremfreef() function to immediately remove a buf from a
   free-list for use only by NFS.  This is done because the nfsclient code
   overloads the b_freelist queue for its own async. io queue.
 - Simplify the numfreebuffers accounting by removing a switch statement
   that executed the same code in every possible case.
 - getnewbuf() can encounter locked bufs on free-lists once Giant is removed.
   Remove a panic associated with this condition and delay asserts that
   inspect the buf until after it is locked.

Reviewed by:	phk
Sponsored by:	Isilon Systems, Inc.
2004-11-18 08:44:09 +00:00
Poul-Henning Kamp
9c83534dd8 Make VOP_BMAP return a struct bufobj for the underlying storage device
instead of a vnode for it.

The vnode_pager does not and should not have any interest in what
the filesystem uses for backend.

(vfs_cluster doesn't use the backing store argument.)
2004-11-15 09:18:27 +00:00
Poul-Henning Kamp
51ac12ab28 Be prepared to accept NULL mountargs as part of root-mounting. 2004-11-13 13:04:31 +00:00
Poul-Henning Kamp
cf5e414960 Put back the vfs_object_create() calls, they do make a difference when
my test-setup does what I want it to instead of what I ask it to.

Pointed out by:	tegge
2004-11-12 10:27:14 +00:00
Poul-Henning Kamp
40ce27cb57 fix some comments 2004-11-10 06:53:31 +00:00
Poul-Henning Kamp
2e6649198a Use mount flags instead of NULL path to detect root filesystem mount. 2004-11-09 23:38:10 +00:00
Poul-Henning Kamp
5e2ccaff7a Stop pretending to have a vm_object backing the underlying disk vnode:
it isn't used for anything anywhere and the vnode_pager would explode
if we attempted to.
2004-11-09 23:12:45 +00:00
Poul-Henning Kamp
5349c79d75 Properly implement a default version of VOP_GETWRITEMOUNT.
Remove improper access to vop_stdgetwritemount() which should and
will instead rely on the VOP default path.
2004-11-06 11:41:22 +00:00
Poul-Henning Kamp
40c340aa5d Don't grab the exclusive bit on a root filesystem until we are willing
to mount it.  Doing so prevented fsck to be run after a refused mount.
2004-11-04 09:11:22 +00:00
Poul-Henning Kamp
4392001125 Move UFS from DEVFS backing to GEOM backing.
This eliminates a bunch of vnode overhead (approx 1-2 % speed
improvement) and gives us more control over the access to the storage
device.

Access counts on the underlying device are not correctly tracked and
therefore it is possible to read-only mount the same disk device multiple
times:
	syv# mount -p
	/dev/md0        /var    ufs rw  2 2
	/dev/ad0        /mnt    ufs ro  1 1
	/dev/ad0        /mnt2   ufs ro  1 1
	/dev/ad0        /mnt3   ufs ro  1 1

Since UFS/FFS is not a synchrousely consistent filesystem (ie: it caches
things in RAM) this is not possible with read-write mounts, and the system
will correctly reject this.

Details:

	Add a geom consumer and a bufobj pointer to ufsmount.

	Eliminate the vnode argument from softdep_disk_prewrite().
	Pick the vnode out of bp->b_vp for now.  Eventually we
	should find it through bp->b_bufobj->b_private.

	In the mountcode, use g_vfs_open() once we have used
	VOP_ACCESS() to check permissions.

	When upgrading and downgrading between r/o and r/w do the
	right thing with GEOM access counts.  Remove all the
	workarounds for not being able to do this with VOP_OPEN().

	If we are the root mount, drop the exclusive access count
	until we upgrade to r/w.  This allows fsck of the root
	filesystem and the MNT_RELOAD to work correctly.

	Set bo_private to the GEOM consumer on the device bufobj.

	Change the ffs_ops->strategy function to call g_vfs_strategy()

	In ufs_strategy() directly call the strategy on the disk
	bufobj.  Same in rawread.

	In ffs_fsync() we will no longer see VCHR device nodes, so
	remove code which synced the filesystem mounted on it, in
	case we came there.  I'm not sure this code made sense in
	the first place since we would have taken the specfs route
	on such a vnode.

	Redo the highly bogus readblock() function in the snapshot
	code to something slightly less bogus: Constructing an uio
	and using physio was really quite a detour.  Instead just
	fill in a bio and ship it down.
2004-10-29 10:15:56 +00:00
Poul-Henning Kamp
570a7ddaa3 We only support backing UFS/FFS with disks. 2004-10-28 06:19:28 +00:00
Poul-Henning Kamp
a40a512387 Eliminate unnecessary KASSERTS. 2004-10-27 06:45:06 +00:00
Poul-Henning Kamp
93d244fb1a KASSERT that we only get to prewrite() on writes. 2004-10-26 20:13:49 +00:00
Poul-Henning Kamp
8dd5650594 White space changes. Add missing static. 2004-10-26 20:13:21 +00:00
Poul-Henning Kamp
53389dd64a Replace single case switch() with if(). 2004-10-26 20:12:25 +00:00
Poul-Henning Kamp
b6e2606155 Vertically align comment. 2004-10-26 20:12:00 +00:00