LK_INTERLOCK. The interlock will never be held on return from these
functions even when there is an error. Errors typically only occur when
the XLOCK is held which means this isn't the vnode we want anyway. Almost
all users of these interfaces expected this behavior even though it was
not provided before.
with interlock held in error conditions when the caller did not specify
LK_INTERLOCK.
- Add several comments to vn_lock() describing the rational behind the code
flow since it was not immediately obvious.
released. vcanrecycle() failed to unlock interlock under this condition.
- Remove an extra VOP_UNLOCK from a failure case in vcanrecycle().
Pointed out by: rwatson
- Use the new VI asserts in place of the old mtx_assert checks.
- Add the VI asserts to the automated lock checking in the VOP calls. The
interlock should not be held across vops with a few exceptions.
- Add the vop_(un)lock_{pre,post} functions to assert that interlock is held
when LK_INTERLOCK is set.
- Make getvfsbyname() take a struct xvfsconf *.
- Convert several consumers of getvfsbyname() to use struct xvfsconf.
- Correct the getvfsbyname.3 manpage.
- Create a new vfs.conflist sysctl to dump all the struct xvfsconf in the
kernel, and rewrite getvfsbyname() to use this instead of the weird
existing API.
- Convert some {set,get,end}vfsent() consumers to use the new vfs.conflist
sysctl.
- Convert a vfsload() call in nfsiod.c to kldload() and remove the useless
vfsisloadable() and endvfsent() calls.
- Add a warning printf() in vfs_sysctl() to tell people they are using
an old userland.
After these changes, it's possible to modify struct vfsconf without
breaking the binary compatibility. Please note that these changes don't
break this compatibility either.
When bp will have updated mount_smbfs(8) with the patch I sent him, there
will be no more consumers of the {set,get,end}vfsent(), vfsisloadable()
and vfsload() API, and I will promptly delete it.
- v_vflag is protected by the vnode lock and is used when synchronization
with VOP calls is needed.
- v_iflag is protected by interlock and is used for dealing with vnode
management issues. These flags include X/O LOCK, FREE, DOOMED, etc.
- All accesses to v_iflag and v_vflag have either been locked or marked with
mp_fixme's.
- Many ASSERT_VOP_LOCKED calls have been added where the locking was not
clear.
- Many functions in vfs_subr.c were restructured to provide for stronger
locking.
Idea stolen from: BSD/OS
kernel access control.
Invoke the necessary MAC entry points to maintain labels on vnodes.
In particular, initialize the label when the vnode is allocated or
reused, and destroy the label when the vnode is going to be released,
or reused. Wow, an object where there really is exactly one place
where it's allocated, and one other where it's freed. Amazing.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
SYSCTL_OUT() from blocking while locks are held. This should
only be done when it would be inconvenient to make a temporary copy of
the data and defer calling SYSCTL_OUT() until after the locks are
released.
As this code is not actually used by any of the existing
interfaces, it seems unlikely to break anything (famous
last words).
The internal kernel interface to manipulate these attributes
is invoked using two new IO_ flags: IO_NORMAL and IO_EXT.
These flags may be specified in the ioflags word of VOP_READ,
VOP_WRITE, and VOP_TRUNCATE. Specifying IO_NORMAL means that
you want to do I/O to the normal data part of the file and
IO_EXT means that you want to do I/O to the extended attributes
part of the file. IO_NORMAL and IO_EXT are mutually exclusive
for VOP_READ and VOP_WRITE, but may be specified individually
or together in the case of VOP_TRUNCATE. For example, when
removing a file, VOP_TRUNCATE is called with both IO_NORMAL
and IO_EXT set. For backward compatibility, if neither IO_NORMAL
nor IO_EXT is set, then IO_NORMAL is assumed.
Note that the BA_ and IO_ flags have been `merged' so that they
may both be used in the same flags word. This merger is possible
by assigning the IO_ flags to the low sixteen bits and the BA_
flags the high sixteen bits. This works because the high sixteen
bits of the IO_ word is reserved for read-ahead and help with
write clustering so will never be used for flags. This merge
lets us get away from code of the form:
if (ioflags & IO_SYNC)
flags |= BA_SYNC;
For the future, I have considered adding a new field to the
vattr structure, va_extsize. This addition could then be
exported through the stat structure to allow applications to
find out the size of the extended attribute storage and also
would provide a more standard interface for truncating them
(via VOP_SETATTR rather than VOP_TRUNCATE).
I am also contemplating adding a pathconf parameter (for
concreteness, lets call it _PC_MAX_EXTSIZE) which would
let an application determine the maximum size of the extended
atribute storage.
Sponsored by: DARPA & NAI Labs.
support creation times such as UFS2) to the value of the
modification time if the value of the modification time is older
than the current creation time. See utimes(2) for further details.
Sponsored by: DARPA & NAI Labs.
methodology similar to the vm_map_entry splay and the VM splay that Alan
Cox is working on. Extensive testing has appeared to have shown no
increase in overhead.
Disadvantages
Dirties more cache lines during lookups.
Not as fast as a hash table lookup (but still N log N and optimal
when there is locality of reference).
Advantages
vnode->v_dirtyblkhd is now perfectly sorted, making fsync/sync/filesystem
syncer operate more efficiently.
I get to rip out all the old hacks (some of which were mine) that tried
to keep the v_dirtyblkhd tailq sorted.
The per-vnode splay tree should be easier to lock / SMPng pushdown on
vnodes will be easier.
This commit along with another that Alan is working on for the VM page
global hash table will allow me to implement ranged fsync(), optimize
server-side nfs commit rpcs, and implement partial syncs by the
filesystem syncer (aka filesystem syncer would detect that someone is
trying to get the vnode lock, remembers its place, and skip to the
next vnode).
Note that the buffer cache splay is somewhat more complex then other splays
due to special handling of background bitmap writes (multiple buffers with
the same lblkno in the same vnode), and B_INVAL discontinuities between the
old hash table and the existence of the buffer on the v_cleanblkhd list.
Suggested by: alc
Tell vop_strategy_pre() to use this instead.
- Ignore B_CLUSTER bufs. Their components are locked but they don't really
exist so they don't have to be. This isn't ideal but it is safe.
- Cache a pointer to the vnode's object in the buf.
- Hold a reference to that object in addition to the vnode's reference just
to be consistent.
- Cleanup code that got the object indirectly through the vp and VOP calls.
This fixes at least one case where we were calling GETVOBJECT without a lock.
It also avoids an expensive layered call at the cost of another pointer in
struct buf.
- Disable original vop_strategy lock specification.
- Switch to the new vop_strategy_pre for lock validation.
VOP_STRATEGY requires only that the buf is locked UNLESS the block numbers need
to be translated. There may be other reasons, but as long as the underlying
layer uses a VOP to perform the operations they will be caught later.
The file vfs_conf.c which was dealing with root mounting has
been repo-copied into vfs_mount.c to preserve history.
This makes nmount related development easier, and help reducing
the size of vfs_syscalls.c, which is still an enormous file.
Reviewed by: rwatson
Repo-copy by: peter
direct calls for the two places where the kernel calls into soft
updates code. Set up the hooks in softdep_initialize() and NULL
them out in softdep_uninitialize(). This change allows soft updates
to function correctly when ufs is loaded as a module.
Reviewed by: mckusick
- Add vfs_badlock_print to control whether or not we print lock violations
- Add vfs_badlock_panic to control whether we panic on lock violations
Both default to on to mimic the original behavior if DEBUG_VFS_LOCKS is on.
a linked list. This is to allow the merging of the mount
options in the MNT_UPDATE case, as the current data structure
is unsuitable for this.
There are no functional differences in this commit.
Reviewed by: phk
Don't try to create a vm object before the file system has a chance to finish
initializing it. This is incorrect for a number of reasons. Firstly, that
VOP requires a lock which the file system may not have initialized yet. Also,
open and others will create a vm object if it is necessary later.
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.
Tested on: i386, alpha, sparc64
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.
Discussed on: smp@
new vfs_getopt()/vfs_copyopt() API. This is intended to be used
later, when there will be filesystems implementing the VFS_NMOUNT
operation. The mount(2) system call will disappear when all
filesystems will be converted to the new API. Documentation will
be committed in a while.
Reviewed by: phk