significantly. Upon investigation this was caused by name cache
misses for lookups of "..". For name cache entries for non-".."
directories, the cache entry serves double duty. It maps both the
named directory plus ".." for the parent of the directory. As such,
two ctime values (one for each of the directory and its parent) need
to be saved in the name cache entry.
This patch adds an entry for ctime of the parent directory to the
name cache. It also adds an additional uma zone for large entries
with this time value, in order to minimize memory wastage.
As well, it fixes a couple of cases where the mtime of the parent
directory was being saved instead of ctime for positive name cache
entries. With this patch, Lookup RPC counts return to values similar
to pre-r230394 kernels.
Reported by: bde
Discussed with: kib
Reviewed by: jhb
MFC after: 2 weeks
appropriate timestamps. Restore the assertions which verify that
NCF_TS is set when timestamp is asked for.
Reviewed by: jhb (previous version)
MFC after: 2 weeks
consistently, creating some namecache entries without NCF_TS flag.
This causes panic due to failed assertion.
As a temporal relief, remove the assert. Return epoch timestamp for
the entries without timestamp if asked.
While there, consolidate the code which returns timestamps, into a
helper cache_out_ts().
Discussed with: jhb
MFC after: 2 weeks
provide struct namecache_ts which is the old struct namecache. Only
allocate struct namecache_ts if non-null struct timespec *tsp was
passed to cache_enter_time, otherwise use struct namecache.
Change struct namecache allocation and deallocation macros into static
functions, since logic becomes somewhat twisty. Provide accessor for
the nc_name member of struct namecache to hide difference between
struct namecache and namecache_ts.
The aim of the change is to not waste 20 bytes per small namecache
entry.
Reviewed by: jhb
MFC after: 2 weeks
X-MFC-note: after r230394
entries on one client when a directory was renamed on another client. The
root cause for the stale entry being trusted is that each per-vnode nfsnode
structure has a single 'n_ctime' timestamp used to validate positive name
cache entries. However, if there are multiple entries for a single vnode,
they all share a single timestamp. To fix this, extend the name cache
to allow filesystems to optionally store a timestamp value in each name
cache entry. The NFS clients now fetch the timestamp associated with
each name cache entry and use that to validate cache hits instead of the
timestamps previously stored in the nfsnode. Another part of the fix is
that the NFS clients now use timestamps from the post-op attributes of
RPCs when adding name cache entries rather than pulling the timestamps out
of the file's attribute cache. The latter is subject to races with other
lookups updating the attribute cache concurrently. Some more details:
- Add a variant of nfsm_postop_attr() to the old NFS client that can return
a vattr structure with a copy of the post-op attributes.
- Handle lookups of "." as a special case in the NFS clients since the name
cache does not store name cache entries for ".", so we cannot get a
useful timestamp. It didn't really make much sense to recheck the
attributes on the the directory to validate the namecache hit for "."
anyway.
- ABI compat shims for the name cache routines are present in this commit
so that it is safe to MFC.
MFC after: 2 weeks
This function updates path string to vnode's full global path and checks
the size of the new path string against the pathlen argument.
In vfs_domount(), sys_unmount() and kern_jail_set() this new function
is used to update the supplied path argument to the respective global path.
Unbreaks jailed zfs(8) with enforce_statfs set to 1.
Reviewed by: kib
MFC after: 1 month
nullfs. The problem is that resulting vnode is only required to be
held on return from the successfull call to vop, instead of being
referenced.
Nullfs VOP_INACTIVE() method reclaims the vnode, which in combination
with the VOP_VPTOCNP() interface means that the directory vnode
returned from VOP_VPTOCNP() is reclaimed in advance, causing
vn_fullpath() to error with EBADF or like.
Change the interface for VOP_VPTOCNP(), now the dvp must be
referenced. Convert all in-tree implementations of VOP_VPTOCNP(),
which is trivial, because vhold(9) and vref(9) are similar in the
locking prerequisites. Out-of-tree fs implementation of VOP_VPTOCNP(),
if any, should have no trouble with the fix.
Tested by: pho
Reviewed by: mckusick
MFC after: 3 weeks (subject of re approval)
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.
Reviewed by: rwatson
Approved by: re (bz)
Move debug.ncnegfactor to vfs.ncnegfactor [1].
Provide some descriptions for the namecache related sysctls [1].
Based on the submission by: Rogier R. Mulhuijzen <drwilco drwilco net> [1]
MFC after: 2 weeks
X-MFC-note: remove debug.ncnegfactor in HEAD after MFC
use '-' in probe names, matching the probe names in Solaris.[1]
Add userland SDT probes definitions to sys/sdt.h.
Sponsored by: The FreeBSD Foundation
Discussed with: rwaston [1]
Assert this.
In the reported panic, vdestroy() fired the assertion "vp has namecache
for ..", because pseudofs may end up doing cache_enter() with reclaimed
dvp, after dotdot lookup temporary unlocked dvp.
Similar problem exists in ufs_lookup() for "." lookup, when vnode
lock needs to be upgraded.
Verify that dvp is not reclaimed before calling cache_enter().
Reported and tested by: pho
Reviewed by: kan
MFC after: 2 weeks
vn_open_cred in default implementation. Valid struct ucred is needed for
audit and MAC, and curthread credentials may be wrong.
This further requires modifying the interface of vn_fullpath(9), but it
is out of scope of this change.
Reviewed by: rwatson
and calls to vn_vptocnp() by moving more of the common code to
vn_vptocnp(). Rename vn_vptocnp() to vn_vptocnp_locked() to signify that
cache is locked around the call.
Do not track buffer position by both the pointer and offset, use only
buflen to record the start of the free space.
Export vn_vptocnp() for external consumers as a wrapper around
vn_vptocnp_locked() that locks the cache and handles hold counts.
Tested by: pho
not populated in parent directory if negative entry was being
created, yet entry itself was added to the nc_neg list. It was
possible for parent vnode to get discarded later, leaving negative
entry pointing to now unused memory block.
Reported by: dho
Revewed by: kib
Check the condition and return ENOENT then.
In nfs_lookup(), respect ENOENT return from cache_lookup() when it is caused
by dvp reclaim.
Reported and tested by: pho
the size and cost of name cache entries, but make adding debugging
and tracing easier.
Add SDT DTrace probes for various namecache events:
vfs:namecache:enter:done - new entry in the name cache, passed parent
directory vnode pointer, name added to the cache, and child vnode
pointer.
vfs:namecache:enter_negative:done - new negative entry in the name cache,
passed parent vnode pointer, name added to the cache.
vfs:namecache:fullpath:enter - call to vn_fullpath1() is made, passed
the vnode to resolve to a name.
vfs:namecache:fullpath:hit - vn_fullpath1() successfully resolved a
search for the parent of an object using the namecache, passed the
discovered parent directory vnode pointer, name, and child vnode
pointer.
vfs:namecache:fullpath:miss - vn_fullpath1() failed to resolve a search
for the parent of an object using the namecache, passed the child
vnode pointer.
vfs:namecache:fullpath:return - vn_fullpath1() has completed, passed the
error number, and if that is zero, the vnode to resolve, and the
returned path.
vfs:namecache:lookup:hit - postive name cache entry hit, passed the
parent directory vnode pointer, name, and child vnode pointer.
vfs:namecache:lookup:hit_negative - negative name cache entry hit,
passed the parent directory vnode pointer and name.
vfs:namecache:lookup:miss - name cache miss, passed the parent directory
pointer and the full remaining component name (not terminated after the
cache miss component).
vfs:namecache:purge:done - name cache purge for a vnode, passed the vnode
pointer to purge.
vfs:namecache:purge_negative:done - name cache purge of negative entries
for children of a vnode, passed the vnode pointer to purge.
vfs:namecache:purgevfs - name cache purge for a mountpoint, passed the
mount pointer. Separate probes will also be invoked for each cache
entry zapped.
vfs:namecache:zap:done - name cache entry zapped, passed the parent
directory vnode pointer, name, and child vnode pointer.
vfs:namecache:zap_negative:done - negative name cache entry zapped,
passed the parent directory vnode pointer and name.
For any probes involving an extant name cache entry (enter, hit, zapp),
we use the nul-terminated string for the name component. For misses,
the remainder of the path, including later components, is provided as
an argument instead since there is no handy nul-terminated version of
the string around. This is arguably a bug.
MFC after: 1 month
Sponsored by: Google, Inc.
Reviewed by: jhb, kan, kib (earlier version)
in directory vnodes. Allow namecache dotdot entry to be created pointing
from child vnode to parent vnode if no existing links in opposite
direction exist. Use direct link from parent to child for dotdot lookups
otherwise.
This restores more efficient dotdot caching in NFS filesystems which
was lost when vnodes stoppped being type stable.
Reviewed by: kib
debug.hashstat.rawnchash sysctl in particular as taking 7 milliseconds on
a 3GHz Intel Xeon (4x2) running 7.1. It accounted for almost a quarter of
the total runtime of 'sysctl -a'. It also performs lots of copyout's while
holding the namecache lock (this does not attempt to fix that).
MFC after: 2 weeks
stale entries, we save a copy of the directory's modification time when
the first negative cache entry was added in the directory's NFS node.
When a negative cache entry is hit during a pathname lookup, the parent
directory's modification time is checked. If it has changed, all of the
negative cache entries for that parent are purged and the lookup falls
back to using the RPC. This required adding a new cache_purge_negative()
method to the name cache to purge only negative cache entries for a given
directory.
Submitted by: mohans, Rick Macklem, Ricardo Labiaga @ NetApp
Reviewed by: mohans
mutex to a reader/writer lock. Lookup operations first grab a read lock and
perform the lookup. If the operation results in a need to modify the cache,
then it tries to do an upgrade. If that fails, it drops the read lock,
obtains a write lock, and redoes the lookup.
inside the SYSCTL() macros and thus does not need to be done for
all of the nodes scattered across the source tree.
- Mark the name-cache related sysctl's (including debug.hashstat.*) MPSAFE.
- Mark vm.loadavg MPSAFE.
- Remove GIANT_REQUIRED from vmtotal() (everything in this routine already
has sufficient locking) and mark vm.vmtotal MPSAFE.
- Mark the vm.stats.(sys|vm).* sysctls MPSAFE.
In normal operation, the number of cache entries is roughly equal to the
number of active vnodes. However, when most of the recently accessed
vnodes have many hard links, the number of cache entries can be 32000
times as large, exhausting kernel memory and provoking a panic in
kmem_malloc().
MFC after: 2 weeks
did not compared nc_dvp with supplied parent directory vnode pointer.
Add the check and note that now branches for vp != NULL and vp == NULL
are the same, thus can be merged.
Reported and reviewed by: kan
Tested by: pho
MFC after: 2 weeks