An initial tidyup of the mount() syscall and VFS mount code.
This code replaces the earlier work done by jlemon in an attempt to
make linux_mount() work.
* the guts of the mount work has been moved into vfs_mount().
* move `type', `path' and `flags' from being userland variables into being
kernel variables in vfs_mount(). `data' remains a pointer into
userspace.
* Attempt to verify the `type' and `path' strings passed to vfs_mount()
aren't too long.
* rework mount() and linux_mount() to take the userland parameters
(besides data, as mentioned) and pass kernel variables to vfs_mount().
(linux_mount() already did this, I've just tidied it up a little more.)
* remove the copyin*() stuff for `path'. `data' still requires copyin*()
since its a pointer into userland.
* set `mount->mnt_statf_mntonname' in vfs_mount() rather than in each
filesystem. This variable is generally initialised with `path', and
each filesystem can override it if they want to.
* NOTE: f_mntonname is intiailised with "/" in the case of a root mount.
* lockstatus() and VOP_ISLOCKED() gets a new process argument and a new
return value: LK_EXCLOTHER, when the lock is held exclusively by another
process.
* The ASSERT_VOP_(UN)LOCKED family is extended to use what this gives them
* Extend the vnode_if.src format to allow more exact specification than
locked/unlocked.
This commit should not do any semantic changes unless you are using
DEBUG_VFS_LOCKS.
Discussed with: grog, mch, peter, phk
Reviewed by: peter
fixed (many due to changing semantics in other parts of the kernel and not
the original author's fault), including one critical one: unionfs could
cause UFS corruption in the fronting store due to calling VOP_OPEN for
writing without turning on vmio for the UFS vnode.
Most of the bugs were related to semantics changes in VOP calls, lock
ordering problems (causing deadlocks), improper handling of a read-only
backing store (such as an NFS mount), improper referencing and locking
of vnodes, not using real struct locks for vnode locking, not using
recursive locks when accessing the fronting store, and things like that.
New functionality has been added: unionfs now has mmap() support, but
only partially tested, and rename has been enhanced considerably.
There are still some things that unionfs cannot do. You cannot
rename a directory without confusing unionfs, and there are issues
with softlinks, hardlinks, and special files. unionfs mostly doesn't
understand them (and never did).
There are probably still panic situations, but hopefully no where near
as many as before this commit.
The unionfs in this commit has been tested overlayed on /usr/src
(backing /usr/src being a read-only NFS mount, fronting /usr/src being
a local filesystem). kernel builds have been tested, buildworld is
undergoing testing. More testing is necessary.
reasonable defaults.
This avoids confusing and ugly casting to eopnotsupp or making dummy functions.
Bogus casting of filesystem sysctls to eopnotsupp() have been removed.
This should make *_vfsops.c more readable and reduce bloat.
Reviewed by: msmith, eivind
Approved by: phk
Tested by: Jeroen Ruigrok/Asmodai <asmodai@wxs.nl>
format errors exposed by this. It has nothing to do with diagnostics
since it does little more than control tracing of normal operation.
Actual diagnostics for the union file system are still controlled by
the DIAGNOSTIC option.
references to them.
The change a couple of days ago to ignore these numbers in statically
configured vfsconf structs was slightly premature because the cd9660,
cfs, devfs, ext2fs, nfs vfs's still used MOUNT_* instead of the number
in their vfsconf struct.
Reverse the VFS_VRELE patch. Reference counting of vnodes does not need
to be done per-fs. I noticed this while fixing vfs layering violations.
Doing reference counting in generic code is also the preference cited by
John Heidemann in recent discussions with him.
The implementation of alternative vnode management per-fs is still a valid
requirement for some filesystems but will be revisited sometime later,
most likely using a different framework.
Submitted by: Michael Hancock <michaelh@cet.co.jp>
a complement to all ops that return a vpp, VFS_VRELE. This is
initially only for file systems that implement the following ops
that do a WILLRELE:
vop_create, vop_whiteout, vop_mknod, vop_remove, vop_link,
vop_rename, vop_mkdir, vop_rmdir, vop_symlink
This is initial DNA that doesn't do anything yet. VFS_VRELE is
implemented but not called.
A default vfs_vrele was created for fs implementations that use the
standard vnode management routines.
VFS_VRELE implementations were made for the following file systems:
Standard (vfs_vrele)
ffs mfs nfs msdosfs devfs ext2fs
Custom
union umapfs
Just EOPNOTSUPP
fdesc procfs kernfs portal cd9660
These implementations may change as VOP changes are implemented.
In the next phase, in the vop implementations calls to vrele and the vrele
part of vput will be moved to the top layer vfs_vnops and made visible
to all layers. vput will be replaced by unlock in these cases. Unlocking
will still be done in the per fs layer but the refcount decrement will be
triggered at the top because it doesn't hurt to hold a vnode reference a
little longer. This will have minimal impact on the structure of the
existing code.
This will only be done for vnode arguments that are released by the various
fs vop implementations.
Wider use of VFS_VRELE will likely require restructuring of the code.
Reviewed by: phk, dyson, terry et. al.
Submitted by: Michael Hancock <michaelh@cet.co.jp>
Distribute all but the most fundamental malloc types. This time I also
remembered the trick to making things static: Put "static" in front of
them.
A couple of finer points by: bde
1. Clustered I/O is switched by the MNT_NOCLUSTERR and MNT_NOCLUSTERW
bits of the mnt_flag. The sysctl variables, vfs.foo.doclusterread
and vfs.foo.doclusterwrite are deleted. Only mount option can
control clustered I/O from userland.
2. When foofs_mount mounts block device, foofs_mount checks D_CLUSTERR
and D_CLUSTERW bits of the d_flags member in the block device switch
table. If D_NOCLUSTERR / D_NOCLUSTERW are set, MNT_NOCLUSTERR /
MNT_NOCLUSTERW bits will be set. In this case, MNT_NOCLUSTERR and
MNT_NOCLUSTERW cannot be cleared from userland.
3. Vnode driver disables both clustered read and write.
4. Union filesystem disables clutered write.
Reviewed by: bde
socket addresses in mbufs. (Socket buffers are the one exception.) A number
of kernel APIs needed to get fixed in order to make this happen. Also,
fix three protocol families which kept PCBs in mbufs to not malloc them
instead. Delete some old compatibility cruft while we're at it, and add
some new routines in the in_cksum family.
# mount -t union (or null) dir1 dir2
# mount -t union (or null) dir2 dir1
The function namei in union_mount calls union_root. The upper vnode
has been already locked and vn_lock in union_root causes above panic.
Add printf's included in `#ifdef DIAGNOSTIC' for EDEADLK cases.
same directory pair.
If we do:
mount -t union a b
mount -t union a b
then, (1) namei tries to lock fs which has been already locked by
first union mount and (2) union_root() tries to lock locked fs. To
avoid first deadlock condition, unlock vnode if lowerrootvp is union
node, and to avoid second case, union_mount returns EDEADLK when multi
union mount is detected.
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.
The system boots and can mount UFS filesystems.
Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.
Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.
VFCF_NETWORK (this FS goes over the net)
VFCF_READONLY (read-write mounts do not make any sense)
VFCF_SYNTHETIC (data in this FS is not real)
VFCF_LOOPBACK (this FS aliases something else)
cd9660 is readonly; nullfs, umapfs, and union are loopback; NFS is netowkr;
procfs, kernfs, and fdesc are synthetic.
Find enclosed a short bugfix to get the union filesystem up and running
in FreeBSD-current. We don't think we've got all the problems yet but
these fixes sort out the major ones (which mostly concert bad locking
of vnodes), no doubt we'll post others as necessary. Known problems
include the inability of the umount command (not the system call) to unmount
unions in certain circumstances (this is due the way "realpath" works),
and the failure of direntries to always get all available files in
unioned subdirectories. We are, as they say, working on it.
Submitted by: tim@cs.city.ac.uk (Tim Wilkinson)
- Make a number of filesystems work again when they are statically compiled
(blush)
- FIFOs are no longer optional; ``options FIFO'' removed from distributed
config files.