function null_destroy_proto() from null_insmntque_dtr(). Also
apply null_destroy_proto() in null_nodeget() when we raced and a vnode
is found in the hash, so the currently allocated protonode shall be
destroyed.
Lock the vnode interlock around reassigning the v_vnlock.
In fact, this path will not be exercised after several later commits,
since null_nodeget() cannot take shared-locked lowervp at all due to
insmntque() requirements.
Reported by: rea
Tested by: pho
MFC after: 1 week
a new jail parameter node with the following parameters:
allow.mount.devfs:
allow mounting the devfs filesystem inside a jail
allow.mount.nullfs:
allow mounting the nullfs filesystem inside a jail
Both parameters are disabled by default (equals the behavior before
devfs and nullfs in jails). Administrators have to explicitly allow
mounting devfs and nullfs for each jail. The value "-1" of the
devfs_ruleset parameter is removed in favor of the new allow setting.
Reviewed by: jamie
Suggested by: pjd
MFC after: 2 weeks
Several callers of null_nodeget() did the cleanup itself, but several
missed it, most prominent being null_bypass(). Remove the cleanup from
the callers, now null_nodeget() handles lowervp free itself.
Reported and tested by: pho
MFC after: 1 week
nullfs. The problem is that resulting vnode is only required to be
held on return from the successfull call to vop, instead of being
referenced.
Nullfs VOP_INACTIVE() method reclaims the vnode, which in combination
with the VOP_VPTOCNP() interface means that the directory vnode
returned from VOP_VPTOCNP() is reclaimed in advance, causing
vn_fullpath() to error with EBADF or like.
Change the interface for VOP_VPTOCNP(), now the dvp must be
referenced. Convert all in-tree implementations of VOP_VPTOCNP(),
which is trivial, because vhold(9) and vref(9) are similar in the
locking prerequisites. Out-of-tree fs implementation of VOP_VPTOCNP(),
if any, should have no trouble with the fix.
Tested by: pho
Reviewed by: mckusick
MFC after: 3 weeks (subject of re approval)
failure (the getnewvnode cannot return an error). In this case, the
null_insmntque_dtr() already unlocked the reclaimed vnode, so VOP_UNLOCK()
in the nullfs_mount() after null_nodeget() failure is wrong.
Tested by: pho
MFC after: 1 week
method, so that callers can indicate the minimum vnode
locking requirement. This will allow some file systems to choose
to return a LK_SHARED locked vnode when LK_SHARED is specified
for the flags argument. This patch only adds the flag. It
does not change any file system to use it and all callers
specify LK_EXCLUSIVE, so file system semantics are not changed.
Reviewed by: kib
of the lower level vnode is incremented to greater than 1 when
the upper level vnode's v_usecount is greater than one. This
is necessary for the NFS clients, so that they will do a silly
rename of the file instead of actually removing it when the
file is still in use. It is "racy", since the v_usecount is
incremented in many places in the kernel with
minimal synchronization, but an extraneous silly rename is
preferred to not doing a silly rename when it is required.
The only other file systems that currently check the value
of v_usecount in their VOP_REMOVE() functions are nwfs and
smbfs. These file systems choose to fail a remove when the
v_usecount is greater than 1 and I believe will function
more correctly with this patch, as well.
Tested by: to.my.trociny at gmail.com
Submitted by: to.my.trociny at gmail.com (earlier version)
Reviewed by: kib
MFC after: 2 weeks
vn_open_cred in default implementation. Valid struct ucred is needed for
audit and MAC, and curthread credentials may be wrong.
This further requires modifying the interface of vn_fullpath(9), but it
is out of scope of this change.
Reviewed by: rwatson
v_data is not-null before calling NULLVPTOLOWERVP(), and dropping the
interlock allows for reclaim to clean v_data and free the memory.
While there, remove unneeded semicolons and convert the infinite loops
to panics. I have a will to remove null_checkvp() altogether, or leave
it as a trivial stub, but not now.
Reported and tested by: pho
This should not really matter for correctness, since vp->v_lock is
not locked before the call, and null_lock() holds the interlock,
but makes the control flow for reclaim more clear.
Tested by: pho
permissions, such as VWRITE_ACL. For a filsystems that don't
implement it, there is a default implementation, which works
as a wrapper around VOP_ACCESS.
Reviewed by: rwatson@
the VFS. Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.
In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.
While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.
VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled. Bump __FreeBSD_version in order to signal such
situation.
implementation instead. The bypass does not assume that returned vnode
is only held.
Reported by: Paul B. Mahol <onemda gmail com>, pluknet <pluknet gmail com>
Reviewed by: jhb
Tested by: pho, pluknet <pluknet gmail com>
to add more V* constants, and the variables changed by this patch were often
being assigned to mode_t variables, which is 16 bit.
Approved by: rwatson (mentor)
always curthread.
As KPI gets broken by this patch, manpages and __FreeBSD_version will be
updated by further commits.
Tested by: Andrea Barberio <insomniac at slackware dot it>
VOP_ISLOCKED(arg, curthread). Now, VOP_ISLOCKED() and lockstatus() should
only acquire curthread as argument; this will lead in axing the additional
argument from both functions, making the code cleaner.
Reviewed by: jeff, kib
- Remove the "thread" argument from the lockmgr() function as it is
always curthread now
- Axe lockcount() function as it is no longer used
- Axe LOCKMGR_ASSERT() as it is bogus really and no currently used.
Hopefully this will be soonly replaced by something suitable for it.
- Remove the prototype for dumplockinfo() as the function is no longer
present
Addictionally:
- Introduce a KASSERT() in lockstatus() in order to let it accept only
curthread or NULL as they should only be passed
- Do a little bit of style(9) cleanup on lockmgr.h
KPI results heavilly broken by this change, so manpages and
FreeBSD_version will be modified accordingly by further commits.
Tested by: matteo
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.
KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.
Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.
Manpage and FreeBSD_version will be updated through further commits.
As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.
Tested by: Diego Sardina <siarodx at gmail dot com>,
Andrea Di Pasquale <whyx dot it at gmail dot com>
function calls are no more generated for vop_lock.
Rename _vop_lock to vop_lock1 to satisfy tools/vnode_if.awk assumption
about vop naming conventions. This restores pre/post-condition calls.
late stages of unmount). On failure, the vnode is recycled.
Add insmntque1(), to allow for file system specific cleanup when
recycling vnode on failure.
Change getnewvnode() to no longer call insmntque(). Previously,
embryonic vnodes were put onto the list of vnode belonging to a file
system, which is unsafe for a file system marked MPSAFE.
Change vfs_hash_insert() to no longer lock the vnode. The caller now
has that responsibility.
Change most file systems to lock the vnode and call insmntque() or
insmntque1() after a new vnode has been sufficiently setup. Handle
failed insmntque*() calls by propagating errors to callers, possibly
after some file system specific cleanup.
Approved by: re (kensmith)
Reviewed by: kib
In collaboration with: kib
This way we may support multiple structures in v_data vnode field within
one file system without using black magic.
Vnode-to-file-handle should be VOP in the first place, but was made VFS
operation to keep interface as compatible as possible with SUN's VFS.
BTW. Now Solaris also implements vnode-to-file-handle as VOP operation.
VFS_VPTOFH() was left for API backward compatibility, but is marked for
removal before 8.0-RELEASE.
Approved by: mckusick
Discussed with: many (on IRC)
Tested with: ufs, msdosfs, cd9660, nullfs and zfs
will allow the NFS server to call vfs_stdcheckexp() on the exported nullfs
filesystem, not the underlying filesystem being nullfs mounted.
If the lower filesystem was not NFS exported, then the NFS exported
null filesystem would not work.
Pointed out by: scottl
PR: kern/87906
MFC after: 1 week
vop_lock_post do not trigger.
- Rearrange null_inactive to null_hashrem earlier so there is no chance
of finding the null node on the hash list after the locks have been
switched.
- We should never have a NULL lowervp in null_reclaim() so there is
no need to handle this situation. panic instead.
MFC After: 1 week
- Simplify the logic dealing with recycled vnodes in null_hashget() and
null_hashins(). Since we hold the lower node locked in both cases
the null node can not be undergoing recycling unless reclaim somehow
called null_nodeget(). The logic that was in place was not safe and
was essentially dead code.
MFC After: 1 week
- Prefer '_' to ' ', as it results in more easily parsed results in
memory monitoring tools such as vmstat.
- Remove punctuation that is incompatible with using memory type names
as file names, such as '/' characters.
- Disambiguate some collisions by adding subsystem prefixes to some
memory types.
- Generally prefer lower case to upper case.
- If the same type is defined in multiple architecture directories,
attempt to use the same name in additional cases.
Not all instances were caught in this change, so more work is required to
finish this conversion. Similar changes are required for UMA zone names.
I benchmarked this by simultaneously extracting 4 large tarballs (basically
world images) on a 4-processor AMD64 system, in a malloc-backed md.
With this patch, system time was reduced by 43%, and wall clock time by 33%.
Submitted by: jeff
MFC after: 1 week
these filesystems will support shared locks until they are explicitly
modified to do so. Careful review must be done to ensure that this
is safe for each individual filesystem.
Sponsored by: Isilon Systems, Inc.
that they set v->v_vnlock. This is true for all filesystems in the
tree.
- Remove all uses of LK_THISLAYER. If the lower layer is locked, the
null layer is locked. We only use vget() to get a reference now.
null essentially does no locking. This fixes LOOKUP_SHARED with
nullfs.
- Remove the special LK_DRAIN considerations, I do not believe this is
needed now as LK_DRAIN doesn't destroy the lower vnode's lock, and
it's hardly used anymore.
- Add one well commented hack to prevent the lowervp from going away
while we're in it's VOP_LOCK routine. This can only happen if we're
forcibly unmounted while some callers are waiting in the lock. In
this case the lowervp could be recycled after we drop our last ref
in null_reclaim(). Prevent this with a vhold().
the filesystem. Check that rather than VI_XLOCK.
- VOP_INACTIVE should no longer drop the vnode lock.
- The vnode lock is required around calls to vrecycle() and vgone().
Sponsored by: Isilon Systems, Inc.
and KASSERT coverage.
After this check there is only one "nasty" cast in this code but there
is a KASSERT to protect against the wrong argument structure behind
that cast.
Un-inlining the meat of VOP_FOO() saves 35kB of text segment on a typical
kernel with no change in performance.
We also now run the checking and tracing on VOP's which have been layered
by nullfs, umapfs, deadfs or unionfs.
Add new (non-inline) VOP_FOO_AP() functions which take a "struct
foo_args" argument and does everything the VOP_FOO() macros
used to do with checks and debugging code.
Add KASSERT to VOP_FOO_AP() check for argument type being
correct.
Slim down VOP_FOO() inline functions to just stuff arguments
into the struct foo_args and call VOP_FOO_AP().
Put function pointer to VOP_FOO_AP() into vop_foo_desc structure
and make VCALL() use it instead of the current offsetoff() hack.
Retire vcall() which implemented the offsetoff()
Make deadfs and unionfs use VOP_FOO_AP() calls instead of
VCALL(), we know which specific call we want already.
Remove unneeded arguments to VCALL() in nullfs and umapfs bypass
functions.
Remove unused vdesc_offset and VOFFSET().
Generally improve style/readability of the generated code.
I'm not sure why a credential was added to these in the first place, it is
not used anywhere and it doesn't make much sense:
The credentials for syncing a file (ability to write to the
file) should be checked at the system call level.
Credentials for syncing one or more filesystems ("none")
should be checked at the system call level as well.
If the filesystem implementation needs a particular credential
to carry out the syncing it would logically have to the
cached mount credential, or a credential cached along with
any delayed write data.
Discussed with: rwatson
doesn't. Most of the implementations have grown weeds for this so they
copy some fields from mnt_stat if the passed argument isn't that.
Fix this the cleaner way: Always call the implementation on mnt_stat
and copy that in toto to the VFS_STATFS argument if different.
initializations but we did have lofty goals and big ideals.
Adjust to more contemporary circumstances and gain type checking.
Replace the entire vop_t frobbing thing with properly typed
structures. The only casualty is that we can not add a new
VOP_ method with a loadable module. History has not given
us reason to belive this would ever be feasible in the the
first place.
Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc.
Give coda correct prototypes and function definitions for
all vop_()s.
Generate a bit more data from the vnode_if.src file: a
struct vop_vector and protype typedefs for all vop methods.
Add a new vop_bypass() and make vop_default be a pointer
to another struct vop_vector.
Remove a lot of vfs_init since vop_vector is ready to use
from the compiler.
Cast various vop_mumble() to void * with uppercase name,
for instance VOP_PANIC, VOP_NULL etc.
Implement VCALL() by making vdesc_offset the offsetof() the
relevant function pointer in vop_vector. This is disgusting
but since the code is generated by a script comparatively
safe. The alternative for nullfs etc. would be much worse.
Fix up all vnode method vectors to remove casts so they
become typesafe. (The bulk of this is generated by scripts)
Null_open() was only here to handle MNT_NODEV, but since that does
not affect any filesystems anymore, it could only have any effect
if you nullfs mounted a devfs but didn't want devices to show up.
If you need that, there are easier ways.
and refuse initializing filesystems with a wrong version. This will
aid maintenance activites on the 5-stable branch.
s/vfs_mount/vfs_omount/
s/vfs_nmount/vfs_mount/
Name our filesystems mount function consistently.
Eliminate the namiedata argument to both vfs_mount and vfs_omount.
It was originally there to save stack space. A few places abused
it to get hold of some credentials to pass around. Effectively
it is unused.
Reorganize the root filesystem selection code.
This is to allow filesystems to decide based on the passed thread
which vnode to return.
Several filesystems used curthread, they now use the passed thread.
- Emulate lock draining (LK_DRAIN) in null_lock() to avoid deadlocks
when the vnode is being recycled.
- Don't allow null_nodeget() to return a nullfs vnode from the wrong
mount when multiple nullfs's are mounted. It's unclear why these checks
were removed in null_subr.c 1.35, but they are definitely necessary.
Without the checks, trying to unmount a nullfs mount will erroneously
return EBUSY, and forcibly unmounting with -f will cause a panic.
- Bump LOG2_SIZEVNODE up to 8, since vnodes are >256 bytes now. The old
value (7) didn't cause any problems, but made the hash algorithm
suboptimal.
These changes fix nullfs enough that a parallel buildworld succeeds.
Submitted by: tegge (partially; LK_DRAIN)
Tested by: kris