Commit Graph

328 Commits

Author SHA1 Message Date
Xin LI
04533fc68e Use *_EMPTY macros when appropriate. 2007-04-04 07:29:53 +00:00
Konstantin Belousov
36d4667907 Mark UFS as being MP-Safe in "options QUOTA" case too. Remove no more
neccessary Giant acquisions in softdepend processing code.

Tested by:	Peter Holm
Reviewed by:	tegge
Approved by:	re (kensmith)
2007-03-20 10:51:45 +00:00
Konstantin Belousov
088ffd2086 Implement fine-grained locking for UFS quotas.
Each struct dquot gets dq_lock mutex to protect dq_flags and to interlock
with DQ_LOCK. qhash, dqfreelist and dq.dq_cnt are protected by global
dqhlock mutex.

i_dquot array for inode is protected by lockmgr' vnode lock, corresponding
assert added to the dqget(). Access to struct ufsmount quota-related fields
(um_quotas and um_qflags) is protected by um_lock.

Tested by:	Peter Holm
Reviewed by:	tegge
Approved by:	re (kensmith)

This work were not possible without enormous amount of help given by
Tor Egge and Peter Holm. Tor reviewed each version of patch, pointed out
numerous errors and provided invaluable suggestions. Peter did tireless
testing of the patch as it was developed.
2007-03-14 08:54:08 +00:00
Tor Egge
61b9d89ff0 Make insmntque() externally visibile and allow it to fail (e.g. during
late stages of unmount).  On failure, the vnode is recycled.

Add insmntque1(), to allow for file system specific cleanup when
recycling vnode on failure.

Change getnewvnode() to no longer call insmntque().  Previously,
embryonic vnodes were put onto the list of vnode belonging to a file
system, which is unsafe for a file system marked MPSAFE.

Change vfs_hash_insert() to no longer lock the vnode.  The caller now
has that responsibility.

Change most file systems to lock the vnode and call insmntque() or
insmntque1() after a new vnode has been sufficiently setup.  Handle
failed insmntque*() calls by propagating errors to callers, possibly
after some file system specific cleanup.

Approved by:	re (kensmith)
Reviewed by:	kib
In collaboration with:	kib
2007-03-13 01:50:27 +00:00
Pawel Jakub Dawidek
10bcafe9ab Move vnode-to-file-handle translation from vfs_vptofh to vop_vptofh method.
This way we may support multiple structures in v_data vnode field within
one file system without using black magic.

Vnode-to-file-handle should be VOP in the first place, but was made VFS
operation to keep interface as compatible as possible with SUN's VFS.
BTW. Now Solaris also implements vnode-to-file-handle as VOP operation.

VFS_VPTOFH() was left for API backward compatibility, but is marked for
removal before 8.0-RELEASE.

Approved by:	mckusick
Discussed with:	many (on IRC)
Tested with:	ufs, msdosfs, cd9660, nullfs and zfs
2007-02-15 22:08:35 +00:00
Konstantin Belousov
2cc7d26f7f Cylinder group bitmaps and blocks containing inode for a snapshot
file are after snaplock, while other ffs device buffers are before
snaplock in global lock order. By itself, this could cause deadlock
when bdwrite() tries to flush dirty buffers on snapshotted ffs. If,
during the flush, COW activity for snapshot needs to allocate block
and ffs_alloccg() selects the cylinder group that is being written
by bdwrite(), then kernel would panic due to recursive buffer lock
acquision.

Avoid dealing with buffers in bdwrite() that are from other side of
snaplock divisor in the lock order then the buffer being written. Add
new BOP, bop_bdwrite(), to do dirty buffer flushing for same vnode in
the bdwrite(). Default implementation, bufbdflush(), refactors the code
from bdwrite(). For ffs device buffers, specialized implementation is
used.

Reviewed by:	tegge, jeff, Russell Cattelan (cattelan xfs org, xfs changes)
Tested by:	Peter Holm
X-MFC after:	3 weeks (if ever: it changes ABI)
2007-01-23 10:01:19 +00:00
Robert Watson
acd3428b7d Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges.  These may
require some future tweaking.

Sponsored by:           nCircle Network Security, Inc.
Obtained from:          TrustedBSD Project
Discussed on:           arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
                        Alex Lyashkov <umka at sevcity dot net>,
                        Skip Ford <skip dot ford at verizon dot net>,
                        Antoine Brodin <antoine dot brodin at laposte dot net>
2006-11-06 13:42:10 +00:00
Pawel Jakub Dawidek
1a60c7fc8e Add gjournal specific code to the UFS file system:
- Add FS_GJOURNAL flag which enables gjournal support on a file system.
- Add cg_unrefs field to the cylinder group structure which holds
  number of unreferenced (orphaned) inodes in the given cylinder group.
- Add fs_unrefs field to the super block structure which holds
  total number of unreferenced (orphaned) inodes.
- When file or a directory is orphaned (last reference is removed, but
  object is still open), increase fs_unrefs and cg_unrefs fields,
  which is a hint for fsck in which cylinder groups looks for such
  (orphaned) objects.
- When file is last closed, decrease {fs,cg}_unrefs fields.
- Add VV_DELETED vnode flag which points at orphaned objects.

Sponsored by:	home.pl
2006-10-31 21:48:54 +00:00
Robert Watson
aed5570872 Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h.  sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from:	TrustedBSD Project
Sponsored by:	SPARTA
2006-10-22 11:52:19 +00:00
Tor Egge
8d0547c68b Protect change to bo_flag by holding the bufobj mutex. 2006-09-26 04:21:20 +00:00
Tor Egge
5da56ddb21 Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag.
This eliminates a race where MNT_UPDATE flag could be lost when nmount()
raced against sync(), sync_fsync() or quotactl().
2006-09-26 04:12:49 +00:00
Pawel Jakub Dawidek
5fe6d2beb4 Declare UFS module version. 2006-07-09 14:11:09 +00:00
Pawel Jakub Dawidek
946478fca6 Change fs->fs_fsmnt to mp->mnt_stat.f_mntonname in warnings about missing
MAC and ACLs support in the kernel. If it is a first mount, fs->fs_fsmnt
is empty.

MFC after:	1 week
2006-07-09 14:10:35 +00:00
Craig Rodrigues
71ac2d7c7c Check the sectorsize of the underlying disk before trying to
bread() the UFS superblock.  Should eliminate crashes when trying
to do: mount -t ufs on an audio CD.

PR:		kern/85893
Reported by:	Russell Francis <rfrancis at ev dot net>
MFC after:	1 week
2006-06-03 21:20:37 +00:00
Craig Rodrigues
ee98eb825b Remove "update" from ffs_opts. It has been moved to global_opts
in vfs_mount.c.
2006-05-26 12:44:12 +00:00
Craig Rodrigues
5eb304a91a Remove calls to vfs_export() for exporting a filesystem for NFS mounting
from individual filesystems.  Call it instead in vfs_mount.c,
after we call VFS_MOUNT() for a specific filesystem.
2006-05-26 00:32:21 +00:00
Craig Rodrigues
4ba8c2a5d3 Take errmsg out of ffs_opts. It is already part of global_opts
in vfs_mount.c.
2006-05-24 00:12:21 +00:00
Tor Egge
868bb88ff2 Temporarily undo clusters contribution to global runningbufspace while
handling copy on write for the buffers taking part in the cluster.
2006-05-03 00:10:29 +00:00
Scott Long
cbd6fedbf2 Fix a typo. 2006-04-28 04:39:50 +00:00
Jeff Roberson
6ca9fcc586 - Add a BO_NEEDSGIANT flag to the bufobj. This flag forces all child
buffers to go on the buf daemon's DIRTYGIANT queue.
 - Set BO_NEEDSGIANT on ffs's devvp since the ffs_copyonwrite handler
   runs in the context of the buf daemon and may require Giant.
2006-04-28 01:05:31 +00:00
Tom Rhodes
7b3f1bbd61 Revert previous to this file before an actual request is made. 2006-04-22 04:22:15 +00:00
Tom Rhodes
8fc22c9d2e Remove what I believe are two useless ifdefs. If a user or administrator
enables multilabel, or any option for that matter, most likely they have
a reason.  This will allow users to see that mulilabel is enabled via an
issued "mount" command and remove an annoying warning - printed only when
a MAC kernel is not installed - on boot up.

Discussed with:	green, brueffer, Samy Al Bahra.
Probably ran past:	csjp (though I can't remember).
2006-04-21 07:14:25 +00:00
Jeff Roberson
3bbd6d8ae6 - Release the references acquired by VOP_GETWRITEMOUNT and vfs_getvfs().
Discussed with:	tegge
Tested by:	kris
Sponsored by:	Isilon Systems, Inc.
2006-03-31 03:54:20 +00:00
Tor Egge
700118c72f Allow compilation when not using softupdates. 2006-03-19 22:16:44 +00:00
Tor Egge
7de3839d0d Let snapshots make a copy of old contents for all buffers taking part in a
cluster instead of just the first buffer.

Delay buf_start() calls until snapshots have a copy of old content.

PR:		kern/93942
2006-03-19 21:43:36 +00:00
Tor Egge
95e7a3c3ac Reduce probability of unmount failing after having unmounted snapshots. 2006-03-19 21:09:19 +00:00
Tor Egge
ca2fa80767 Block secondary writes while expunging active unlinked files.
Fix detection of active unlinked files by checking VI_OWEINACT and
VI_DOINGINACT in addition to v_usecount.

Defer inactive handling for unlinked files if the file system is mostly
suspended (secondary writes being blocked).

Perform deferred inactive handling after the file system is resumed.
2006-03-11 01:08:37 +00:00
Tor Egge
1e70cd7fc7 Remove unneeded (and broken) usage of MNT_REF()/MNT_REL(). 2006-03-10 02:31:12 +00:00
Tor Egge
791dd2fade Use vn_start_secondary_write() and vn_finished_secondary_write() as a
replacement for vn_write_suspend_wait() to better account for secondary write
processing.

Close race where secondary writes could be started after ffs_sync() returned
but before the file system was marked as suspended.

Detect if secondary writes or softdep processing occurred during vnode sync
loop in ffs_sync() and retry the loop if needed.
2006-03-08 23:43:39 +00:00
Tor Egge
82be0a5a24 Add marker vnodes to ensure that all vnodes associated with the mount point are
iterated over when using MNT_VNODE_FOREACH.

Reviewed by:	truckman
2006-01-09 20:42:19 +00:00
Craig Rodrigues
b6bd025c35 Fix parsing of atime, clusterr, clusterw, exec, suid, symfollow
mount options.

Noticed by:	Amir Shalem < amir at boom dot org dot il>
2005-11-24 15:06:40 +00:00
Craig Rodrigues
cea903627f If export mount flag is not passed in, set default parameters
for export structure and pass that to vfs_export().
Currently in userland mount(8), an export structure is unconditionally
passed in, only for UFS.  This is an attempt to move that UFS-specific
behavior out of mount(8) and into the UFS filesystem code.
2005-11-20 17:04:50 +00:00
Craig Rodrigues
359d438885 Add more options to ffs_opts, so that vfs_filteropts() will not
complain when we pass these options to a UFS filesystem as strings
via nmount():  noexec, nosuid, nosymfollow, sync, suiddir
2005-11-19 23:28:19 +00:00
Craig Rodrigues
26f59b6455 - Add parsing for the following existing UFS/FFS mount options in the nmount()
callpath via vfs_getopt(), and set the appropriate MNT_* flag:
  -> acls, async, force, multilabel, noasync, noatime,
  -> noclusterr, noclusterw, snapshot, update

- Allow errmsg as a valid mount option via vfs_getopt(),
  so we can later add a hook to propagate mount errors back
  to userspace via vfs_mount_error().
2005-11-18 06:06:10 +00:00
Nate Lawson
8680d6985f Adjust maxfilesize for UFS1 and old 4.4 FFS. For UFS1, increase the limit
to (max block - 1) * bsize.  For DEV_BSIZE, this doubles the limit from
0.5 TB to 1 TB.  For the old 4.4 FFS case, decrease the limit from 0.5 TB
to 2 GB - 1.  Older systems had a 32 bit off_t so they couldn't access the
larger files anyway.

Collaboration with:	bde
2005-10-21 01:54:00 +00:00
Tor Egge
9248a8271c Don't pretend that a failed sync write was succesful. 2005-10-09 20:49:01 +00:00
Suleiman Souhlal
fdedad764a ffs_mountfs() needs devvp to be locked, so lock it.
Glanced at by:	phk
Tested by:	pjd
MFC after:	3 days
2005-09-02 13:52:55 +00:00
Suleiman Souhlal
93373c422f Set the mountpoint path in the superblock (fs_fsmnt) at mount-time
so that it appears in the various messages (not cleanly unmounted,
filesystem full, etc). This has been broken since rev 1.261.
2005-08-21 22:06:41 +00:00
Alan Cox
ec9c9e7363 Eliminate inconsistency in the setting of the B_DONE flag. Specifically,
make the b_iodone callback responsible for setting it if it is needed.
Previously, it was set unconditionally by bufdone() without holding
whichever lock is shared by the b_iodone callback and the corresponding
top-half function.  Consequently, in a race, the top-half function could
conclude that operation was done before the b_iodone callback finished.
See, for example, aio_physwakeup() and aio_fphysio().

Note: I don't believe that the other, more widely-used b_iodone callbacks
are affected.

Discussed with: jeff
Reviewed by: phk
MFC after: 2 weeks
2005-07-20 19:06:06 +00:00
Jeff Roberson
204ec66d38 - Don't set our bio op to be a READ when we've just completed a write. There
are subtle differences in the read and write completion path.  Instead,
   grab an extra write ref so the write path can drop it when we recursively
   call bufdone().  I believe this may be the source of the wrong bufobj
   panics.

Reported by:	pho, kkenn
2005-05-30 07:04:15 +00:00
Jeff Roberson
41d4783d49 - In ffs_sync we need to pass LK_SLEEPFAIL in when we lock the vnode
because it may change identities while we're sleeping on the lock.
   Otherwise we may bail out of ffs_sync() early due to an error from
   deadfs.
 - Collapse a VOP_UNLOCK, vrele into a single vput().
2005-04-03 10:38:18 +00:00
Jeff Roberson
153910e0f5 - Move the contents of softdep_disk_prewrite into ffs_geom_strategy to fix
two bugs.
 - ffs_disk_prewrite was pulling the vp from the buf and checking for
   COPYONWRITE, when really it wanted the vp from the bufobj that we're
   writing to, which is the devvp.  This lead to us skipping the copy on
   write to all file data, which significantly broke snapshots for the
   last few months.
 - When the SOFTUPDATES option was not included in the kernel config we
   would also skip the copy on write check, which would effectively disable
   snapshots.
 - Remove an invalid mp_fixme().

Debugging tips from:	mckusick
Reported by:		iedowse, others
Discussed with:		phk
2005-04-03 10:29:55 +00:00
Jeff Roberson
aa7ba42796 - FFS supports shared locks, clear LK_NOSHARE from our vnode locks.
Sponsored by:	Isilon Systems, Inc.
2005-03-31 05:23:20 +00:00
Jeff Roberson
d6919865fa - Upgrade a shared lock request to exclusive in ffs_vget() if we have
to create the vnode.

Sponsored by:	Isilon Systems, Inc.
2005-03-29 10:10:51 +00:00
Poul-Henning Kamp
51f5ce0c8c Add two arguments to the vfs_hash() KPI so that filesystems which do
not have unique hashes (NFS) can also use it.
2005-03-16 11:20:51 +00:00
Poul-Henning Kamp
de68347b1b Don't hold a reference on the disk vnode for each inode. 2005-03-15 20:50:58 +00:00
Poul-Henning Kamp
45c26fa2b6 Improve the vfs_hash() API: vput() the unneeded vnode centrally to
avoid replicating the vput in all the filesystems.
2005-03-15 20:00:03 +00:00
Poul-Henning Kamp
e82ef95c11 Simplify the vfs_hash calling convention. 2005-03-15 08:07:07 +00:00
Poul-Henning Kamp
14bc0685ac Use vfs_hash instead of home-rolled. 2005-03-14 10:21:16 +00:00
Jeff Roberson
fe68abe291 - The VI_DOOMED flag now signals the end of a vnode's relationship with
the filesystem.  Check that rather than VI_XLOCK.
 - Shorten ffs_reload by one step.  The old check for an inactive vnode
   was slightly racey, and the code which deals with still active vnodes
   is not much more expensive.

Sponsored by:	Isilon Systems, Inc.
2005-03-13 12:03:14 +00:00