Commit Graph

224 Commits

Author SHA1 Message Date
Martin Matuska
9cbe30e1d5 Fix missing in r230129:
kern_jail.c: initialize fullpath_disabled to zero
vfs_cache.c: add missing dot in comment

Reported by:	kib
MFC after:	1 month
2012-01-15 18:08:15 +00:00
Martin Matuska
f6e633a9e1 Introduce vn_path_to_global_path()
This function updates path string to vnode's full global path and checks
the size of the new path string against the pathlen argument.

In vfs_domount(), sys_unmount() and kern_jail_set() this new function
is used to update the supplied path argument to the respective global path.

Unbreaks jailed zfs(8) with enforce_statfs set to 1.

Reviewed by:	kib
MFC after:	1 month
2012-01-15 12:08:20 +00:00
Andriy Gapon
7a7ce668ef put sys/systm.h at its proper place or add it if missing
Reported by:	lstewart, tinderbox
Pointyhat to:	avg, attilio
MFC after:	1 week
MFC with:	r228430
2011-12-12 10:05:13 +00:00
Konstantin Belousov
f82360acf2 Existing VOP_VPTOCNP() interface has a fatal flow that is critical for
nullfs.  The problem is that resulting vnode is only required to be
held on return from the successfull call to vop, instead of being
referenced.

Nullfs VOP_INACTIVE() method reclaims the vnode, which in combination
with the VOP_VPTOCNP() interface means that the directory vnode
returned from VOP_VPTOCNP() is reclaimed in advance, causing
vn_fullpath() to error with EBADF or like.

Change the interface for VOP_VPTOCNP(), now the dvp must be
referenced. Convert all in-tree implementations of VOP_VPTOCNP(),
which is trivial, because vhold(9) and vref(9) are similar in the
locking prerequisites. Out-of-tree fs implementation of VOP_VPTOCNP(),
if any, should have no trouble with the fix.

Tested by:	pho
Reviewed by:	mckusick
MFC after:	3 weeks (subject of re approval)
2011-11-19 07:50:49 +00:00
Ed Schouten
6472ac3d8a Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
Kip Macy
8451d0dd78 In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by:	rwatson
Approved by:	re (bz)
2011-09-16 13:58:51 +00:00
Rebecca Cran
8d065a3914 Fix some more style(9) issues. 2010-11-14 16:10:15 +00:00
Rebecca Cran
b389be97db Fix style(9) issues from r215281 and r215282.
MFC after:	1 week
2010-11-14 08:06:29 +00:00
Rebecca Cran
2baa5cddb6 Add some descriptions to sys/kern sysctls.
PR:	kern/148710
Tested by:	Chip Camden <sterling at camdensoftware.com>
MFC after:	1 week
2010-11-14 06:09:50 +00:00
Konstantin Belousov
3a40a00d56 Remove sysctl debug.ncnegfactor, it is renamed to vfs.ncnegfactor.
MFC:	do not
2010-10-30 14:08:26 +00:00
Konstantin Belousov
420cfbb460 Provide vfs.ncsizefactor instead of hard-coding namecache ratio.
Move debug.ncnegfactor to vfs.ncnegfactor [1].
Provide some descriptions for the namecache related sysctls [1].

Based on the submission by:	Rogier R. Mulhuijzen <drwilco drwilco net> [1]
MFC after:	2 weeks
X-MFC-note:	remove debug.ncnegfactor in HEAD after MFC
2010-10-16 09:44:31 +00:00
Rui Paulo
79856499bd Add an extra comment to the SDT probes definition. This allows us to get
use '-' in probe names, matching the probe names in Solaris.[1]

Add userland SDT probes definitions to sys/sdt.h.

Sponsored by:	The FreeBSD Foundation
Discussed with:	rwaston [1]
2010-08-22 11:18:57 +00:00
Ed Schouten
60ae52f785 Use ISO C99 integer types in sys/kern where possible.
There are only about 100 occurences of the BSD-specific u_int*_t
datatypes in sys/kern. The ISO C99 integer types are used here more
often.
2010-06-21 09:55:56 +00:00
Konstantin Belousov
5673e3cb08 The cache_enter(9) function shall not be called for doomed dvp.
Assert this.

In the reported panic, vdestroy() fired the assertion "vp has namecache
for ..", because pseudofs may end up doing cache_enter() with reclaimed
dvp, after dotdot lookup temporary unlocked dvp.
Similar problem exists in ufs_lookup() for "." lookup, when vnode
lock needs to be upgraded.

Verify that dvp is not reclaimed before calling cache_enter().

Reported and tested by:	pho
Reviewed by:	kan
MFC after:	2 weeks
2010-04-20 10:19:27 +00:00
Konstantin Belousov
3e22320c43 Fix typo.
MFC after:	3 days
2010-04-15 17:17:02 +00:00
Konstantin Belousov
8f40845151 Correctly handle unlock for !MAKEENTRY case, after successfull attempt of
lock upgrade cache shall be unlocked from write.

Reported by:	Lucius Windschuh <lwindschuh googlemail com>
Reviewed by:	kan
Approved by:	re (rwatson)
2009-08-14 10:57:28 +00:00
Konstantin Belousov
c808c9632d Add explicit struct ucred * argument for VOP_VPTOCNP, to be used by
vn_open_cred in default implementation. Valid struct ucred is needed for
audit and MAC, and curthread credentials may be wrong.

This further requires modifying the interface of vn_fullpath(9), but it
is out of scope of this change.

Reviewed by:	rwatson
2009-06-21 19:21:01 +00:00
Joe Marcus Clarke
8a4444049e Unlock the cache lock before returning when we run out of buffer space
trying to fill in the full path name.

Reported by:	David Naylor <naylor.b.david@gmail.com>
Approved by:	kib
2009-06-05 16:44:42 +00:00
Konstantin Belousov
1358a7957d Unbreak the build. Add missed probes.
Reviewed by:	rwatson
Pointy hat to:	me
2009-05-31 20:16:06 +00:00
Konstantin Belousov
0449e6e1eb Eliminate code duplication in vn_fullpath1() around the cache lookups
and calls to vn_vptocnp() by moving more of the common code to
vn_vptocnp(). Rename vn_vptocnp() to vn_vptocnp_locked() to signify that
cache is locked around the call.

Do not track buffer position by both the pointer and offset, use only
buflen to record the start of the free space.

Export vn_vptocnp() for external consumers as a wrapper around
vn_vptocnp_locked() that locks the cache and handles hold counts.

Tested by:	pho
2009-05-31 14:57:43 +00:00
Alexander Kabaev
348496ad39 More fallout from negative dotdot caching. Negative entries should
be removed from and reinserted to proper ncneg list.

Reported by:  pho
Submitted by: kib
2009-04-17 18:11:11 +00:00
Alexander Kabaev
9cf6772211 Redo previous change using simpler patch that happens to be also
more correct.

Submitted by: tor
2009-04-14 23:56:48 +00:00
Alexander Kabaev
eed8a9edba Fix yet another negative dotodot entry fallout.
Reported by: pho
2009-04-14 23:46:57 +00:00
Alexander Kabaev
9d75482f99 Fix v_cache_dd handling for negative entries. v_cache_dd pointer was
not populated in parent directory if negative entry was being
created, yet entry itself was added to the nc_neg list. It was
possible for parent vnode to get discarded later, leaving negative
entry pointing to now unused memory block.

Reported by:	dho
Revewed by:	kib
2009-04-11 20:23:08 +00:00
Konstantin Belousov
fd409594c6 When zapping v_cache_dd for !MAKEENTRY case in cache_lookup(), we shall
lock cache as writer.

Reviewed by:	kan
2009-04-11 16:12:20 +00:00
Konstantin Belousov
3f54086eba Cache_lookup() for DOTDOT drops dvp vnode lock, allowing dvp to be reclaimed.
Check the condition and return ENOENT then.

In nfs_lookup(), respect ENOENT return from cache_lookup() when it is caused
by dvp reclaim.

Reported and tested by:	pho
2009-04-10 10:22:44 +00:00
Robert Watson
5d5c174869 Nul-terminate strings in the VFS name cache, which negligibly change
the size and cost of name cache entries, but make adding debugging
and tracing easier.

Add SDT DTrace probes for various namecache events:

  vfs:namecache:enter:done - new entry in the name cache, passed parent
    directory vnode pointer, name added to the cache, and child vnode
    pointer.

  vfs:namecache:enter_negative:done - new negative entry in the name cache,
    passed parent vnode pointer, name added to the cache.

  vfs:namecache:fullpath:enter - call to vn_fullpath1() is made, passed
    the vnode to resolve to a name.

  vfs:namecache:fullpath:hit - vn_fullpath1() successfully resolved a
    search for the parent of an object using the namecache, passed the
    discovered parent directory vnode pointer, name, and child vnode
    pointer.

  vfs:namecache:fullpath:miss - vn_fullpath1() failed to resolve a search
    for the parent of an object using the namecache, passed the child
    vnode pointer.

  vfs:namecache:fullpath:return - vn_fullpath1() has completed, passed the
    error number, and if that is zero, the vnode to resolve, and the
    returned path.

  vfs:namecache:lookup:hit - postive name cache entry hit, passed the
    parent directory vnode pointer, name, and child vnode pointer.

  vfs:namecache:lookup:hit_negative - negative name cache entry hit,
    passed the parent directory vnode pointer and name.

  vfs:namecache:lookup:miss - name cache miss, passed the parent directory
    pointer and the full remaining component name (not terminated after the
    cache miss component).

  vfs:namecache:purge:done - name cache purge for a vnode, passed the vnode
    pointer to purge.

  vfs:namecache:purge_negative:done - name cache purge of negative entries
    for children of a vnode, passed the vnode pointer to purge.

  vfs:namecache:purgevfs - name cache purge for a mountpoint, passed the
    mount pointer.  Separate probes will also be invoked for each cache
    entry zapped.

  vfs:namecache:zap:done - name cache entry zapped, passed the parent
    directory vnode pointer, name, and child vnode pointer.

  vfs:namecache:zap_negative:done - negative name cache entry zapped,
    passed the parent directory vnode pointer and name.

For any probes involving an extant name cache entry (enter, hit, zapp),
we use the nul-terminated string for the name component.  For misses,
the remainder of the path, including later components, is provided as
an argument instead since there is no handy nul-terminated version of
the string around.  This is arguably a bug.

MFC after:      1 month
Sponsored by:   Google, Inc.
Reviewed by:	jhb, kan, kib (earlier version)
2009-04-07 20:58:56 +00:00
Alexander Kabaev
bb6418cbe3 Revert change 190655 temporarily. It breaks many setups where nullfs is
used and needs to be revisited.
2009-04-04 17:48:38 +00:00
Peter Wemm
0e875ecafe vn_vptocnp() unlocks the name cache and forgets to re-lock it before
returning in one error case, and mistakenly unlocks it for the
umount -f case.
2009-04-02 21:16:20 +00:00
Alexander Kabaev
607fc40b04 Replace v_dd vnode pointer with v_cache_dd pointer to struct namecache
in directory vnodes. Allow namecache dotdot entry to be created pointing
from child vnode to parent vnode if no existing links in opposite
direction exist. Use direct link from parent to child for dotdot lookups
otherwise.

This restores more efficient dotdot caching in NFS filesystems which
was lost when vnodes stoppped being type stable.

Reviewed by:	kib
2009-03-29 21:25:40 +00:00
John Baldwin
049ce0934f When a file lookup fails due to encountering a doomed vnode from a forced
unmount, consistently return ENOENT rather than EBADF.

Reviewed by:	kib
MFC after:	1 month
2009-03-24 18:16:42 +00:00
Konstantin Belousov
15fb32c07d Do not underflow the buffer and then report the problem. Check for the
condition before the buffer write.
Also, since buflen is unsigned, previous check was ignored.

Reviewed by:	marcus
Tested by:	pho
2009-03-20 11:08:57 +00:00
Konstantin Belousov
83817ce3b1 Remove unneeded braces to reduce used vertical screen space.
The location was missed in r190140.
2009-03-20 11:03:55 +00:00
Konstantin Belousov
9194007261 Do not forget to adjust buflen for the first resolution of the path
from namecache.
While there, compare pointers for equiality.

Reviewed by:	marcus
Tested by:	pho
2009-03-20 11:00:39 +00:00
Konstantin Belousov
065fc451f8 The nc_nlen member of the struct namecache contains the length of the cached
name, not the length + 1.

PR:	132620, 132542
Reported by:	bf2006a yahoo com
Tested by:	bf2006a, pho
Reviewed by:	marcus
2009-03-20 10:59:06 +00:00
Konstantin Belousov
c4a8c2ee24 When ktracing namei operations, log a result of the __getcwd().
MFC after:	1 week
2009-03-20 10:47:16 +00:00
Konstantin Belousov
bf5c835e1c Remove unneeded braces to reduce used vertical screen space. 2009-03-20 10:04:00 +00:00
John Baldwin
4ab2a9a022 Move the debug.hashstat sysctl tree under DIAGNOSTIC. I measured the
debug.hashstat.rawnchash sysctl in particular as taking 7 milliseconds on
a 3GHz Intel Xeon (4x2) running 7.1.  It accounted for almost a quarter of
the total runtime of 'sysctl -a'.  It also performs lots of copyout's while
holding the namecache lock (this does not attempt to fix that).

MFC after:	2 weeks
2009-03-09 19:04:53 +00:00
John Baldwin
03964c8e09 Enable caching of negative pathname lookups in the NFS client. To avoid
stale entries, we save a copy of the directory's modification time when
the first negative cache entry was added in the directory's NFS node.
When a negative cache entry is hit during a pathname lookup, the parent
directory's modification time is checked.  If it has changed, all of the
negative cache entries for that parent are purged and the lookup falls
back to using the RPC.  This required adding a new cache_purge_negative()
method to the name cache to purge only negative cache entries for a given
directory.

Submitted by:	mohans, Rick Macklem, Ricardo Labiaga @ NetApp
Reviewed by:	mohans
2009-02-19 22:28:48 +00:00
John Baldwin
9078981ab1 Convert the global mutex protecting the directory lookup name cache from a
mutex to a reader/writer lock.  Lookup operations first grab a read lock and
perform the lookup.  If the operation results in a need to modify the cache,
then it tries to do an upgrade.  If that fails, it drops the read lock,
obtains a write lock, and redoes the lookup.
2009-01-28 19:05:18 +00:00
John Baldwin
8a7ef10b71 - Mark all standalone INT/LONG/QUAD sysctl's MPSAFE. This is done
inside the SYSCTL() macros and thus does not need to be done for
  all of the nodes scattered across the source tree.
- Mark the name-cache related sysctl's (including debug.hashstat.*) MPSAFE.
- Mark vm.loadavg MPSAFE.
- Remove GIANT_REQUIRED from vmtotal() (everything in this routine already
  has sufficient locking) and mark vm.vmtotal MPSAFE.
- Mark the vm.stats.(sys|vm).* sysctls MPSAFE.
2009-01-23 22:49:23 +00:00
Stephen McKay
58c1607e03 Add a limit on namecache entries.
In normal operation, the number of cache entries is roughly equal to the
number of active vnodes.  However, when most of the recently accessed
vnodes have many hard links, the number of cache entries can be 32000
times as large, exhausting kernel memory and provoking a panic in
kmem_malloc().

MFC after: 2 weeks
2009-01-20 04:21:21 +00:00
Konstantin Belousov
83e73926ad In r185557, the check for existing negative entry for the given name
did not compared nc_dvp with supplied parent directory vnode pointer.
Add the check and note that now branches for vp != NULL and vp == NULL
are the same, thus can be merged.

Reported and reviewed by:	kan
Tested by:	pho
MFC after:	2 weeks
2008-12-30 12:51:14 +00:00
Joe Marcus Clarke
4769218f4b Do not KASSERT when vp->v_dd is NULL. Only directories which have had ".."
looked up would have v_dd set to a non-NULL value.  This fixes a panic
seen when running installworld on a diskless system with a separate /usr
file system.

Submitted by:	cracauer
Approved by:	kib
2008-12-23 20:43:42 +00:00
Konstantin Belousov
86dcb537c9 Keep the hold on the vnode during VOP_VPTOCNP() call, allowing the vop
implementation to drop vnode lock, if needed.

Reported and tested by:	pho
2008-12-23 20:04:31 +00:00
Joe Marcus Clarke
b9022449b3 Add a new VOP, VOP_VPTOCNP, which translates a vnode to its component name
on a best-effort basis.  Teach vn_fullpath to use this new VOP if a
regular VFS cache lookup fails.  This VOP is designed to supplement the
VFS cache to provide a better chance that a vnode-to-name lookup will
succeed.

Currently, an implementation for devfs is being committed.  The default
implementation is to return ENOENT.

A big thanks to kib for the mentorship on this, and to pho for running it
through his stress test suite.

Reviewed by:	arch
Approved by:	kib
2008-12-12 00:57:38 +00:00
Konstantin Belousov
d6568724e1 Shared lookup makes it possible to create several negative cache
entries for one name. Then, creating inode with that name would remove
one entry, leaving others dormant. Reclaiming the vnode would uncover
negative entries, causing false return of ENOENT from the calls like
stat, that do not create inode.

Prevent creation of the duplicated negative entries.

Reported and debugged with:	pho
Reviewed by:	jhb
X-MFC:	after shared lookup changes
2008-12-02 11:14:16 +00:00
Joe Marcus Clarke
ef61995ebd Move vn_fullpath1() outside of FILEDESC locking. This is being done in
advance of teaching vn_fullpath1() how to query file systems for
vnode-to-name mappings when cache lookups fail.

Thanks to kib for guidance and patience on this process.

Reviewed by:	kib
Approved by:	kib
2008-11-25 15:36:15 +00:00
John Baldwin
d2722d704c Part 1 of making shared lookups more resilient with respect to forced
unmounts.  When we upgrade a vnode lock from shared to exclusive during
a name cache lookup, fail the lookup with EBADF if the vnode is invalidated
while we are waiting for the exclusive lock.

Also, for correctness (though I'm not sure it can occur in practice),
downgrade an exclusively locked vnode if it should be share locked.

Tested by:	pho
2008-09-24 18:51:33 +00:00
John Baldwin
cbb598af66 Sort includes. 2008-09-18 20:04:22 +00:00
John Baldwin
969bf150df Fix a race condition with concurrent LOOKUP namecache operations for a vnode
not in the namecache when shared lookups are enabled (vfs.lookup_shared=1,
it is currently off by default) and the filesystem supports shared lookups
(e.g. NFS client).  Specifically, if multiple concurrent LOOKUPs both miss
in the name cache in parallel, each of the lookups may each end up adding an
entry to the namecache resulting in duplicate entries in the namecache
for the same pathname.  A subsequent removal of the mapping of that
pathname to that vnode (via remove or rename) would only evict one of the
entries from the name cache.  As a result, subseqent lookups for that
pathname would still return the old vnode.

This race was observed with shared lookups over NFS where a file was updated
by writing a new file out to a temporary file name and then renaming that
temporary file to the "real" file to effect atomic updates of a file.  Other
processes on the same client that were periodically reading the file would
occasionally receive an ESTALE error from open(2) because the VOP_GETATTR()
in nfs_open() would receive that error when given the stale vnode.

The fix here is to check for duplicates in cache_enter() and just return
if an entry for this same directory and leaf file name for this vnode is
already in the cache.  The check for duplicates is done by walking the
per-vnode list of name cache entries.  It is expected that this list should
be very small in the common case (usually 0 or 1 entries during a
cache_enter() since most files only have 1 "leaf" name).

Reviewed by:	ups, scottl
MFC after:	2 months
2008-08-23 15:13:39 +00:00
Alfred Perlstein
cbd3ba3edf Prevent crashes due to unlocked access to hash buckets in two sysctls.
Use CACHE_LOCK to prevent crashes.

Sysctls fixed: debug.hashstat.nchash and debug.hashstat.rawnchash.

Obtained from: Juniper Networks
MFC After: 1 week
2008-08-16 21:48:10 +00:00
Christian S.J. Peron
dfc714fba1 Currently, BSM audit pathname token generation for chrooted or jailed
processes are not producing absolute pathname tokens.  It is required
that audited pathnames are generated relative to the global root mount
point.  This modification changes our implementation of audit_canon_path(9)
and introduces a new function: vn_fullpath_global(9) which performs a
vnode -> pathname translation relative to the global mount point based
on the contents of the name cache.  Much like vn_fullpath,
vn_fullpath_global is a wrapper function which called vn_fullpath1.

Further, the string parsing routines have been converted to use the
sbuf(9) framework.  This change also removes the conditional acquisition
of Giant, since the vn_fullpath1 method will not dip into file system
dependent code.

The vnode locking was modified to use vhold()/vdrop() instead the vref()
and vrele().  This will modify the hold count instead of modifying the
user count.  This makes more sense since it's the kernel that requires
the reference to the vnode.  This also makes sure that the vnode does not
get recycled we hold the reference to it. [1]

Discussed with:	rwatson
Reviewed by:	kib [1]
MFC after:	2 weeks
2008-07-31 16:57:41 +00:00
Pawel Jakub Dawidek
b03d720760 - Use LK_TYPE_MASK where needed. Actually after sys/sys/lockmgr.h:1.69 it is
no longer needed, but for now we still want to be consistent with other
  similar checks in the tree.
- Call ASSERT_VOP_ELOCKED() only when vget() returns 0.

Reviewed by:	jeff
2008-04-09 20:19:55 +00:00
Konstantin Belousov
0a3af16a75 Add the utility function vn_commname() to retrieve the command name
from the vfs namecache, when available.

Reviewed by:	rwatson, rdivacky
Tested by:	pho
2008-03-31 11:53:03 +00:00
Robert Watson
237fdd787b In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation.  This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after:	1 month
Discussed with:	imp, rink
2008-03-16 10:58:09 +00:00
Attilio Rao
81c794f998 Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it is
always curthread.

As KPI gets broken by this patch, manpages and __FreeBSD_version will be
updated by further commits.

Tested by:	Andrea Barberio <insomniac at slackware dot it>
2008-02-25 18:45:57 +00:00
Attilio Rao
22db15c06f VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
2008-01-13 14:44:15 +00:00
Attilio Rao
cb05b60a89 vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by:	Diego Sardina <siarodx at gmail dot com>,
		Andrea Di Pasquale <whyx dot it at gmail dot com>
2008-01-10 01:10:58 +00:00
Kris Kennaway
e6d64a0f15 Remove remaining Giant acquisition around vn_fullpath1. This was missed
in r1.106 and has not been required for some years now.

Reviewed by:  jeff
MFC After:    1 week
2007-11-22 21:26:25 +00:00
Pawel Jakub Dawidek
b4d7e2983c Fix some locking cases where we ask for exclusively locked vnode, but we get
shared locked vnode in instead when vfs.lookup_shared is set to 1.

Discussed with:	kib, kris
Tested by:	kris
Approved by:	re (kensmith)
2007-09-21 10:16:56 +00:00
Pawel Jakub Dawidek
dfe97ff4a5 We only flush entries related to the given file system. Currently there are
no 'invalid' cache entires - file system is responsible for keeping it that
way. The comment should have been updated in rev.1.25.
2007-06-18 09:28:24 +00:00
Pawel Jakub Dawidek
6e042171bd To avoid a deadlock when handling .. directory during a lookup, we unlock
parent vnode and relock it after locking child vnode. The problem was that
we always relock it exclusively, even when it was share-locked.

Discussed with:	jeff
2007-05-25 22:23:38 +00:00
Pawel Jakub Dawidek
b4c85af977 We no longer need to put namecache entries onto temporary mplist.
It was useful in revision 1.86, but should have been removed in 1.89.
2007-05-25 22:19:49 +00:00
Pawel Jakub Dawidek
950afe9972 The cache_leaf_test() function seems to be unused, so remove it. 2007-05-25 22:16:17 +00:00
Pawel Jakub Dawidek
f013ccb768 - Remove redundant initialization.
- Compare pointer with NULL.
2007-05-22 23:05:48 +00:00
Robert Watson
5e3f7694b1 Replace custom file descriptor array sleep lock constructed using a mutex
and flags with an sxlock.  This leads to a significant and measurable
performance improvement as a result of access to shared locking for
frequent lookup operations, reduced general overhead, and reduced overhead
in the event of contention.  All of these are imported for threaded
applications where simultaneous access to a shared file descriptor array
occurs frequently.  Kris has reported 2x-4x transaction rate improvements
on 8-core MySQL benchmarks; smaller improvements can be expected for many
workloads as a result of reduced overhead.

- Generally eliminate the distinction between "fast" and regular
  acquisisition of the filedesc lock; the plan is that they will now all
  be fast.  Change all locking instances to either shared or exclusive
  locks.

- Correct a bug (pointed out by kib) in fdfree() where previously msleep()
  was called without the mutex held; sx_sleep() is now always called with
  the sxlock held exclusively.

- Universally hold the struct file lock over changes to struct file,
  rather than the filedesc lock or no lock.  Always update the f_ops
  field last. A further memory barrier is required here in the future
  (discussed with jhb).

- Improve locking and reference management in linux_at(), which fails to
  properly acquire vnode references before using vnode pointers.  Annotate
  improper use of vn_fullpath(), which will be replaced at a future date.

In fcntl(), we conservatively acquire an exclusive lock, even though in
some cases a shared lock may be sufficient, which should be revisited.
The dropping of the filedesc lock in fdgrowtable() is no longer required
as the sxlock can be held over the sleep operation; we should consider
removing that (pointed out by attilio).

Tested by:	kris
Discussed with:	jhb, kris, attilio, jeff
2007-04-04 09:11:34 +00:00
Robert Watson
873fbcd776 Further system call comment cleanup:
- Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde)
- Remove extra blank lines in some cases.
- Add extra blank lines in some cases.
- Remove no-op comments consisting solely of the function name, the word
  "syscall", or the system call name.
- Add punctuation.
- Re-wrap some comments.
2007-03-05 13:10:58 +00:00
Christian S.J. Peron
4f0840f348 Axe Giant from vn_fullpath(9). The vnode -> pathname lookup should be
filesystem agnostic. We are not touching any file system specific functions
in this code path. Since we have a cache lock, there is really no need to
keep Giant around here.

This eliminates Giant acquisitions for any syscall which is auditing pathnames.

Discussed with:	jeff
2006-06-16 05:09:28 +00:00
John-Mark Gurney
e98b5a89de remove duplicate sizeof vnode entry (debug.sizeof.vnode already existed)...
move ncsize into debug.sizeof and rename to namecache...
2006-04-16 18:38:30 +00:00
Jeff Roberson
2f0bca553a - Don't check v_mount for NULL to determine if a vnode has been recycled.
Use the more appropriate VI_DOOMED flag instead.

Sponsored by:	Isilon Systems, Inc.
MFC After:	1 week
2006-02-06 10:15:27 +00:00
Jeff Roberson
32b6dcd8a4 - Fix a leaked reference to a vnode via v_dd. We rely on cache_purge() and
cache_zap() to clear the v_dd pointers when a directory vnode is forcibly
   discarded.  For this to work, all vnodes with v_dd pointers to a directory
   must also have name cache entries linked via v_cache_dst to that dvp
   otherwise we could not find them at cache_purge() time.  The following
   code snipit could break this guarantee by unlinking a directory before
   fetching it's dotdot.  The dotdot lookup would initialize the v_dd field
   of the unlinked directory which could never be cleared.  To fix this
   we don't initialize v_dd for orphaned vnodes.
        printf("rmdir: %d\n", rmdir("../foo")); /* foo is cwd */
        printf("chdir: %d\n", chdir(".."));
        printf("%s\n", getwd(NULL));

Sponsored by:	Isilon Systems, Inc.
Discovered by:	kkenn
Approved by:	re (blanket vfs)
2005-06-17 01:05:13 +00:00
Jeff Roberson
6bd8103d33 - Clear v_dd in cache_zap() instead of cache_purge() as cache_purge() may
not be called in all cases where we free the cnp.

Sponsored by:	Isilon Systems, Inc.
2005-06-13 05:59:59 +00:00
Jeff Roberson
eff2d12635 - Add KTR_VFS messages for various name cache related events.
Sponsored by:	Isilon Systems, Inc.
2005-06-13 00:46:03 +00:00
Jeff Roberson
1b2da2d0fa - Assert that we're not adding a doomed vnode to the name cache.
Sponsored by:	Isilon Systems, Inc.
2005-06-11 08:47:30 +00:00
Jeff Roberson
4585e3ac5a - Change all filesystems and vfs_cache to relock the dvp once the child is
locked in the ISDOTDOT case.  Se vfs_lookup.c r1.79 for details.

Sponsored by:	Isilon Systems, Inc.
2005-04-13 10:59:09 +00:00
David Schultz
7ce7f713ee Eliminate v_id and v_ddid. The name cache now holds references to
vnodes whose names it caches, so we no longer need a `generation
number' to tell us if a referenced vnode is invalid.  Replace the use
of the parent's v_id in the hash function with the address of the
parent vnode.

Tested by:	Peter Holm
Glanced at by:	jeff, phk
2005-03-30 03:01:36 +00:00
David Schultz
dd33f0d92f Merge kern___cwd() and vn_fullpath(), which were virtually identical,
except for places where people forget to update one of them.  We now
collect only one set of stats for both of these routines.  Other
changes in this commit include:

- Start acquiring Giant again in vn_fullpath(), since it is required
  when crossing a mount point.

- Expand the scope of the cache lock to avoid dropping it and
  picking it up again for every pathname component.  This also
  makes it trivial to avoid races in stats collection.

- Assert that nc_dvp == v_dd for directories instead of returning
  an error to userland when this is not true.  AFAIK, it should
  always be true when v_dd is non-null.

- For vn_fullpath(), handle the first (non-directory) vnode
  separately.

Glanced at by:  jeff, phk
2005-03-30 02:59:32 +00:00
Jeff Roberson
5280e61f2f - Move the logic that locks and refs the new vnode from vfs_cache_lookup()
to cache_lookup().  This allows us to acquire the vnode interlock before
   dropping the cache lock.  This protects the vnodes identity until we
   have locked it.

Sponsored by:	Isilon Systems, Inc.
2005-03-29 12:59:06 +00:00
Jeff Roberson
571211c454 - Get rid of the old LOOKUP_SHARED code. namei() now supplies the
proper lock flags via cn_lkflag.

Sponsored by:	Isilon Systems, Inc.
2005-03-29 10:08:23 +00:00
Jeff Roberson
b75719afea - Invalidate the childrens v_dd pointers when we cache_purge() a directory.
Otherwise the stale pointer may be accessed after a vnode is freed.

Sponsored by:	Isilon Systems, Inc.
2005-03-29 09:58:41 +00:00
Jeff Roberson
f7b404d88f - Remove an unused variable.
Sponsored by:	Isilon Systems, Inc.
2005-03-28 13:29:48 +00:00
Jeff Roberson
ee5a0a2d7c - We no longer have to bother with PDIRUNLOCK, lookup() handles it for us.
Sponsored by:	Isilon Systems, Inc.
2005-03-28 09:26:17 +00:00
Jeff Roberson
fdd6a3ff3c - All of the bugs which lead to the complication of the LOOKUP_SHARED
config option have now been fixed.  All filesystems are properly locked
   and checked via DEBUG_VFS_LOCKS.  Remove the workaround code.

Sponsored by:	Isilon Systems, Inc.
2005-03-24 06:00:45 +00:00
Poul-Henning Kamp
2adc2b87c7 Make a SYSCTL_NODE and a mutex static 2005-02-10 12:16:42 +00:00
Jeff Roberson
799cc2dcee - Simplify the cache locking. The lock order relationship with the
vnode lock is much simpler than I originally thought it would be.
   Now, the cache lock is  always acquired before the vnode lock.
 - Provide some gotos in __getcwd() to simplify the unlocking a bit.
 - Move Giant acquisition down into __getcwd().

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:24:12 +00:00
Warner Losh
9454b2d864 /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
Warner Losh
7f8a436ff2 Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core
2004-04-05 21:03:37 +00:00
Jeff Roberson
98d7d155c1 - Apply a big giant lock around the namecache. This has been sitting in
my tree since BSDcon.
2003-10-05 07:13:50 +00:00
Dag-Erling Smørgrav
c2935410f6 Make the VFS cache use zones instead of malloc(9). This results in a
small but noticeable increase in performance for name lookup operations.

The code uses two zones, one for short names (less than 32 characters)
and one for long names (up to NAME_MAX).  Since most file names are
fairly short, this saves a considerable amount of space that would
otherwise be wasted if we always allocated NAME_MAX bytes.  The cutoff
value of 32 characters was picked arbitrarily and may benefit from some
tweaking; it could also be made into a tunable.

Submitted by:	hmp
2003-06-13 08:46:13 +00:00
Dag-Erling Smørgrav
ffe92432e3 Whitespace cleanup. 2003-06-11 07:35:56 +00:00
David E. O'Brien
677b542ea2 Use __FBSDID(). 2003-06-11 00:56:59 +00:00
Poul-Henning Kamp
cc34e37e5b Backout the getcwd changes, a more comprehensive effort will be needed. 2003-03-20 10:40:45 +00:00
Poul-Henning Kamp
9eaf5abceb (This commit certainly increases the need for a wash&clean of vfs_cache.c,
but I decided that it was important for this patch to not bit-rot, and
since it is mainly moving code around, the total amount of entropy is
epsilon /phk)

This is a patch to move the common parts of linux_getcwd() back into
kern/vfs_cache.c so that the standard FreeBSD libc getcwd() can use it's
extended functionality.  The linux syscall linux_getcwd() in
compat/linux/linux_getcwd.c has been rewritten to use it too.  It should
be possible to simplify libc's getcwd() after this.  No doubt this code
needs some cleaning up, since I've left in the sysctl variables I used
for debugging.

PR:	48169
Submitted by:	James Whitwell <abacau@yahoo.com.au>
2003-03-17 12:21:08 +00:00
Warner Losh
a163d034fa Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
Andrew R. Reiter
1f5a94d5f6 - Update a couple of comments to make sense with what today's code is
doing (stale comments make arr something something ;)).
2003-02-15 23:25:12 +00:00
Andrew R. Reiter
da8f0c8429 - Remove old comment for PURGE() as it no longer exists and implied it
was a comment to cache_zap().
- Add a comment to quickly state what cache_zap() does.

Reviewed by:	phk, mux
2003-02-15 18:58:06 +00:00
Alfred Perlstein
44956c9863 Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
Ian Dowse
48b52b7a32 Split up __getcwd so that kernel callers of the internal version
can specify whether the buffer is in user or system space.
2002-09-02 22:40:30 +00:00
Jeff Roberson
18c6acee26 - Move a VOP assert to the right place.
Spotted by:	i386 tinderbox
2002-08-05 08:55:53 +00:00