Commit Graph

124 Commits

Author SHA1 Message Date
jhb
6c646c3b72 Fix a race condition with concurrent LOOKUP namecache operations for a vnode
not in the namecache when shared lookups are enabled (vfs.lookup_shared=1,
it is currently off by default) and the filesystem supports shared lookups
(e.g. NFS client).  Specifically, if multiple concurrent LOOKUPs both miss
in the name cache in parallel, each of the lookups may each end up adding an
entry to the namecache resulting in duplicate entries in the namecache
for the same pathname.  A subsequent removal of the mapping of that
pathname to that vnode (via remove or rename) would only evict one of the
entries from the name cache.  As a result, subseqent lookups for that
pathname would still return the old vnode.

This race was observed with shared lookups over NFS where a file was updated
by writing a new file out to a temporary file name and then renaming that
temporary file to the "real" file to effect atomic updates of a file.  Other
processes on the same client that were periodically reading the file would
occasionally receive an ESTALE error from open(2) because the VOP_GETATTR()
in nfs_open() would receive that error when given the stale vnode.

The fix here is to check for duplicates in cache_enter() and just return
if an entry for this same directory and leaf file name for this vnode is
already in the cache.  The check for duplicates is done by walking the
per-vnode list of name cache entries.  It is expected that this list should
be very small in the common case (usually 0 or 1 entries during a
cache_enter() since most files only have 1 "leaf" name).

Reviewed by:	ups, scottl
MFC after:	2 months
2008-08-23 15:13:39 +00:00
alfred
f8f6317629 Prevent crashes due to unlocked access to hash buckets in two sysctls.
Use CACHE_LOCK to prevent crashes.

Sysctls fixed: debug.hashstat.nchash and debug.hashstat.rawnchash.

Obtained from: Juniper Networks
MFC After: 1 week
2008-08-16 21:48:10 +00:00
csjp
743d0edd92 Currently, BSM audit pathname token generation for chrooted or jailed
processes are not producing absolute pathname tokens.  It is required
that audited pathnames are generated relative to the global root mount
point.  This modification changes our implementation of audit_canon_path(9)
and introduces a new function: vn_fullpath_global(9) which performs a
vnode -> pathname translation relative to the global mount point based
on the contents of the name cache.  Much like vn_fullpath,
vn_fullpath_global is a wrapper function which called vn_fullpath1.

Further, the string parsing routines have been converted to use the
sbuf(9) framework.  This change also removes the conditional acquisition
of Giant, since the vn_fullpath1 method will not dip into file system
dependent code.

The vnode locking was modified to use vhold()/vdrop() instead the vref()
and vrele().  This will modify the hold count instead of modifying the
user count.  This makes more sense since it's the kernel that requires
the reference to the vnode.  This also makes sure that the vnode does not
get recycled we hold the reference to it. [1]

Discussed with:	rwatson
Reviewed by:	kib [1]
MFC after:	2 weeks
2008-07-31 16:57:41 +00:00
pjd
d876c8127f - Use LK_TYPE_MASK where needed. Actually after sys/sys/lockmgr.h:1.69 it is
no longer needed, but for now we still want to be consistent with other
  similar checks in the tree.
- Call ASSERT_VOP_ELOCKED() only when vget() returns 0.

Reviewed by:	jeff
2008-04-09 20:19:55 +00:00
kib
fe816f911f Add the utility function vn_commname() to retrieve the command name
from the vfs namecache, when available.

Reviewed by:	rwatson, rdivacky
Tested by:	pho
2008-03-31 11:53:03 +00:00
rwatson
877d7c65ba In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation.  This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after:	1 month
Discussed with:	imp, rink
2008-03-16 10:58:09 +00:00
attilio
4014b55830 Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it is
always curthread.

As KPI gets broken by this patch, manpages and __FreeBSD_version will be
updated by further commits.

Tested by:	Andrea Barberio <insomniac at slackware dot it>
2008-02-25 18:45:57 +00:00
attilio
71b7824213 VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
2008-01-13 14:44:15 +00:00
attilio
18d0a0dd51 vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by:	Diego Sardina <siarodx at gmail dot com>,
		Andrea Di Pasquale <whyx dot it at gmail dot com>
2008-01-10 01:10:58 +00:00
kris
e9fa0b52d6 Remove remaining Giant acquisition around vn_fullpath1. This was missed
in r1.106 and has not been required for some years now.

Reviewed by:  jeff
MFC After:    1 week
2007-11-22 21:26:25 +00:00
pjd
763f7449dc Fix some locking cases where we ask for exclusively locked vnode, but we get
shared locked vnode in instead when vfs.lookup_shared is set to 1.

Discussed with:	kib, kris
Tested by:	kris
Approved by:	re (kensmith)
2007-09-21 10:16:56 +00:00
pjd
a362498f2e We only flush entries related to the given file system. Currently there are
no 'invalid' cache entires - file system is responsible for keeping it that
way. The comment should have been updated in rev.1.25.
2007-06-18 09:28:24 +00:00
pjd
f9fef47d89 To avoid a deadlock when handling .. directory during a lookup, we unlock
parent vnode and relock it after locking child vnode. The problem was that
we always relock it exclusively, even when it was share-locked.

Discussed with:	jeff
2007-05-25 22:23:38 +00:00
pjd
29aee808df We no longer need to put namecache entries onto temporary mplist.
It was useful in revision 1.86, but should have been removed in 1.89.
2007-05-25 22:19:49 +00:00
pjd
0d899d01c1 The cache_leaf_test() function seems to be unused, so remove it. 2007-05-25 22:16:17 +00:00
pjd
641a216a63 - Remove redundant initialization.
- Compare pointer with NULL.
2007-05-22 23:05:48 +00:00
rwatson
765a83fd79 Replace custom file descriptor array sleep lock constructed using a mutex
and flags with an sxlock.  This leads to a significant and measurable
performance improvement as a result of access to shared locking for
frequent lookup operations, reduced general overhead, and reduced overhead
in the event of contention.  All of these are imported for threaded
applications where simultaneous access to a shared file descriptor array
occurs frequently.  Kris has reported 2x-4x transaction rate improvements
on 8-core MySQL benchmarks; smaller improvements can be expected for many
workloads as a result of reduced overhead.

- Generally eliminate the distinction between "fast" and regular
  acquisisition of the filedesc lock; the plan is that they will now all
  be fast.  Change all locking instances to either shared or exclusive
  locks.

- Correct a bug (pointed out by kib) in fdfree() where previously msleep()
  was called without the mutex held; sx_sleep() is now always called with
  the sxlock held exclusively.

- Universally hold the struct file lock over changes to struct file,
  rather than the filedesc lock or no lock.  Always update the f_ops
  field last. A further memory barrier is required here in the future
  (discussed with jhb).

- Improve locking and reference management in linux_at(), which fails to
  properly acquire vnode references before using vnode pointers.  Annotate
  improper use of vn_fullpath(), which will be replaced at a future date.

In fcntl(), we conservatively acquire an exclusive lock, even though in
some cases a shared lock may be sufficient, which should be revisited.
The dropping of the filedesc lock in fdgrowtable() is no longer required
as the sxlock can be held over the sleep operation; we should consider
removing that (pointed out by attilio).

Tested by:	kris
Discussed with:	jhb, kris, attilio, jeff
2007-04-04 09:11:34 +00:00
rwatson
69938bd196 Further system call comment cleanup:
- Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde)
- Remove extra blank lines in some cases.
- Add extra blank lines in some cases.
- Remove no-op comments consisting solely of the function name, the word
  "syscall", or the system call name.
- Add punctuation.
- Re-wrap some comments.
2007-03-05 13:10:58 +00:00
csjp
bff7d924ba Axe Giant from vn_fullpath(9). The vnode -> pathname lookup should be
filesystem agnostic. We are not touching any file system specific functions
in this code path. Since we have a cache lock, there is really no need to
keep Giant around here.

This eliminates Giant acquisitions for any syscall which is auditing pathnames.

Discussed with:	jeff
2006-06-16 05:09:28 +00:00
jmg
4a3ab050f7 remove duplicate sizeof vnode entry (debug.sizeof.vnode already existed)...
move ncsize into debug.sizeof and rename to namecache...
2006-04-16 18:38:30 +00:00
jeff
05c6b40c0d - Don't check v_mount for NULL to determine if a vnode has been recycled.
Use the more appropriate VI_DOOMED flag instead.

Sponsored by:	Isilon Systems, Inc.
MFC After:	1 week
2006-02-06 10:15:27 +00:00
jeff
9e1f35189b - Fix a leaked reference to a vnode via v_dd. We rely on cache_purge() and
cache_zap() to clear the v_dd pointers when a directory vnode is forcibly
   discarded.  For this to work, all vnodes with v_dd pointers to a directory
   must also have name cache entries linked via v_cache_dst to that dvp
   otherwise we could not find them at cache_purge() time.  The following
   code snipit could break this guarantee by unlinking a directory before
   fetching it's dotdot.  The dotdot lookup would initialize the v_dd field
   of the unlinked directory which could never be cleared.  To fix this
   we don't initialize v_dd for orphaned vnodes.
        printf("rmdir: %d\n", rmdir("../foo")); /* foo is cwd */
        printf("chdir: %d\n", chdir(".."));
        printf("%s\n", getwd(NULL));

Sponsored by:	Isilon Systems, Inc.
Discovered by:	kkenn
Approved by:	re (blanket vfs)
2005-06-17 01:05:13 +00:00
jeff
22b5cc07b0 - Clear v_dd in cache_zap() instead of cache_purge() as cache_purge() may
not be called in all cases where we free the cnp.

Sponsored by:	Isilon Systems, Inc.
2005-06-13 05:59:59 +00:00
jeff
bf15cf7167 - Add KTR_VFS messages for various name cache related events.
Sponsored by:	Isilon Systems, Inc.
2005-06-13 00:46:03 +00:00
jeff
8a4fe36603 - Assert that we're not adding a doomed vnode to the name cache.
Sponsored by:	Isilon Systems, Inc.
2005-06-11 08:47:30 +00:00
jeff
afab3762a0 - Change all filesystems and vfs_cache to relock the dvp once the child is
locked in the ISDOTDOT case.  Se vfs_lookup.c r1.79 for details.

Sponsored by:	Isilon Systems, Inc.
2005-04-13 10:59:09 +00:00
das
d3e0f098be Eliminate v_id and v_ddid. The name cache now holds references to
vnodes whose names it caches, so we no longer need a `generation
number' to tell us if a referenced vnode is invalid.  Replace the use
of the parent's v_id in the hash function with the address of the
parent vnode.

Tested by:	Peter Holm
Glanced at by:	jeff, phk
2005-03-30 03:01:36 +00:00
das
0fa243e9cd Merge kern___cwd() and vn_fullpath(), which were virtually identical,
except for places where people forget to update one of them.  We now
collect only one set of stats for both of these routines.  Other
changes in this commit include:

- Start acquiring Giant again in vn_fullpath(), since it is required
  when crossing a mount point.

- Expand the scope of the cache lock to avoid dropping it and
  picking it up again for every pathname component.  This also
  makes it trivial to avoid races in stats collection.

- Assert that nc_dvp == v_dd for directories instead of returning
  an error to userland when this is not true.  AFAIK, it should
  always be true when v_dd is non-null.

- For vn_fullpath(), handle the first (non-directory) vnode
  separately.

Glanced at by:  jeff, phk
2005-03-30 02:59:32 +00:00
jeff
39f5b5cc72 - Move the logic that locks and refs the new vnode from vfs_cache_lookup()
to cache_lookup().  This allows us to acquire the vnode interlock before
   dropping the cache lock.  This protects the vnodes identity until we
   have locked it.

Sponsored by:	Isilon Systems, Inc.
2005-03-29 12:59:06 +00:00
jeff
767230ce78 - Get rid of the old LOOKUP_SHARED code. namei() now supplies the
proper lock flags via cn_lkflag.

Sponsored by:	Isilon Systems, Inc.
2005-03-29 10:08:23 +00:00
jeff
16b7e8d14e - Invalidate the childrens v_dd pointers when we cache_purge() a directory.
Otherwise the stale pointer may be accessed after a vnode is freed.

Sponsored by:	Isilon Systems, Inc.
2005-03-29 09:58:41 +00:00
jeff
c4f7e75896 - Remove an unused variable.
Sponsored by:	Isilon Systems, Inc.
2005-03-28 13:29:48 +00:00
jeff
eb142209f0 - We no longer have to bother with PDIRUNLOCK, lookup() handles it for us.
Sponsored by:	Isilon Systems, Inc.
2005-03-28 09:26:17 +00:00
jeff
6ef6841ac5 - All of the bugs which lead to the complication of the LOOKUP_SHARED
config option have now been fixed.  All filesystems are properly locked
   and checked via DEBUG_VFS_LOCKS.  Remove the workaround code.

Sponsored by:	Isilon Systems, Inc.
2005-03-24 06:00:45 +00:00
phk
6d9a6aacc4 Make a SYSCTL_NODE and a mutex static 2005-02-10 12:16:42 +00:00
jeff
dc0d73570e - Simplify the cache locking. The lock order relationship with the
vnode lock is much simpler than I originally thought it would be.
   Now, the cache lock is  always acquired before the vnode lock.
 - Provide some gotos in __getcwd() to simplify the unlocking a bit.
 - Move Giant acquisition down into __getcwd().

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:24:12 +00:00
imp
20280f1431 /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
imp
74cf37bd00 Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core
2004-04-05 21:03:37 +00:00
jeff
d22d3e1bfb - Apply a big giant lock around the namecache. This has been sitting in
my tree since BSDcon.
2003-10-05 07:13:50 +00:00
des
6e6f4e8270 Make the VFS cache use zones instead of malloc(9). This results in a
small but noticeable increase in performance for name lookup operations.

The code uses two zones, one for short names (less than 32 characters)
and one for long names (up to NAME_MAX).  Since most file names are
fairly short, this saves a considerable amount of space that would
otherwise be wasted if we always allocated NAME_MAX bytes.  The cutoff
value of 32 characters was picked arbitrarily and may benefit from some
tweaking; it could also be made into a tunable.

Submitted by:	hmp
2003-06-13 08:46:13 +00:00
des
f27cbd8cfb Whitespace cleanup. 2003-06-11 07:35:56 +00:00
obrien
3b8fff9e4c Use __FBSDID(). 2003-06-11 00:56:59 +00:00
phk
afab808bc4 Backout the getcwd changes, a more comprehensive effort will be needed. 2003-03-20 10:40:45 +00:00
phk
90f136c592 (This commit certainly increases the need for a wash&clean of vfs_cache.c,
but I decided that it was important for this patch to not bit-rot, and
since it is mainly moving code around, the total amount of entropy is
epsilon /phk)

This is a patch to move the common parts of linux_getcwd() back into
kern/vfs_cache.c so that the standard FreeBSD libc getcwd() can use it's
extended functionality.  The linux syscall linux_getcwd() in
compat/linux/linux_getcwd.c has been rewritten to use it too.  It should
be possible to simplify libc's getcwd() after this.  No doubt this code
needs some cleaning up, since I've left in the sysctl variables I used
for debugging.

PR:	48169
Submitted by:	James Whitwell <abacau@yahoo.com.au>
2003-03-17 12:21:08 +00:00
imp
cf874b345d Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
arr
8e7322af29 - Update a couple of comments to make sense with what today's code is
doing (stale comments make arr something something ;)).
2003-02-15 23:25:12 +00:00
arr
e1e48e3d34 - Remove old comment for PURGE() as it no longer exists and implied it
was a comment to cache_zap().
- Add a comment to quickly state what cache_zap() does.

Reviewed by:	phk, mux
2003-02-15 18:58:06 +00:00
alfred
bf8e8a6e8f Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
iedowse
a62f952615 Split up __getcwd so that kernel callers of the internal version
can specify whether the buffer is in user or system space.
2002-09-02 22:40:30 +00:00
jeff
fbb42c58e2 - Move a VOP assert to the right place.
Spotted by:	i386 tinderbox
2002-08-05 08:55:53 +00:00