660 Commits

Author SHA1 Message Date
des
dc22b24953 MFC: expose vdropl() 2007-05-24 16:09:38 +00:00
jhb
cb722c46dc MFC: Do not set B_NOCACHE on buffers when releasing them in flushbuflist().
If B_NOCACHE is set the pages of vm backed buffers will be invalidated.
However clean buffers can be backed by dirty VM pages so invalidating them
can lead to data loss.
Add support for flush dirty page in the data invalidation function
of some network file systems.

This fixes data losses during vnode recycling (and other code paths
using invalbuf(*,V_SAVE,*,*)) for data written using an mmaped file.
2007-02-12 19:08:29 +00:00
pjd
2b6ef22ead MFC: sys/kern/vfs_subr.c 1.682
Add 'show vnode <addr>' DDB command.

Requested by:	kib
2006-12-04 08:45:52 +00:00
tegge
690853d66d MFC: Use mount interlock to protect all changes to mnt_flag and
mnt_kern_flag. This eliminates a race where MNT_UPDATE flag could be
     lost when nmount() raced against sync(), sync_fsync() or quotactl().

Approved by:	re (kensmith)
2006-10-09 19:47:17 +00:00
kib
5c65b0cfbb MFC rev. 1.685:
Correct the comment: numvnodes is decreased on vdestroying the vnode.

Approved by:	re (hrs), pjd (mentor)
2006-10-09 14:17:30 +00:00
tegge
3d40c50014 MFC rev 1.667: vfs_busy() holds reference on mount until vfs_unbusy()
is called.
               vfs_getvfs() returns a referenced mount.

Approved by:	re (kensmith)
2006-09-27 00:36:10 +00:00
pjd
c60bd1613d MFC: sys/kern/vfs_subr.c 1.680,1.681
Add a bandaid to avoid a deadlock in a situation, when we are trying to suspend
a file system, but need to obtain a vnode. We may not be able to do it, because
all vnodes could be already in use and other processes cannot release them,
because they are waiting in "suspfs" state.

In such situation, we allow to allocate a vnode anyway.

This is a temporary fix - there is no backpressure to free vnodes allocated in
those circumstances.

Reviewed by:	tegge
2006-09-04 09:58:25 +00:00
kib
6b3c448bc0 MFC rev. 1.637:
Fix the LOR that occurs when the MAC compiled into the kernel
and vnode is destroyed.

LOR:		189
Approved by:	kan (mentor)
2006-07-05 16:34:16 +00:00
pjd
0b852a3c37 MFC: sys/kern/vfs_subr.c 1.671
vn_start_write()/vn_finished_write() is not needed here, because
vn_start_write() is always called earlier in the code path and calling
the function recursively may lead to a deadlock.

Confirmed by:	tegge
2006-05-13 14:01:35 +00:00
scottl
0f81ec8494 MFC rev 1.669. This is done only because the change has been tested for a
month, and then only because it has been heavily reviewed and recommended.

Approved by: re
2006-05-04 07:42:52 +00:00
jeff
15991fa641 MFC Revs 1.664, 1.661, 1.660, 1.659, 1.658, 1.657
VFS SMP fixes, stack api, softupdates fixes.

Sponsored by:	Isilon Systems, Inc.
Approved by:	re (scottl)
2006-03-13 03:06:34 +00:00
tegge
7d50ddd92e MFC: Eliminate a deadlock when creating snapshots. Blocking
vn_start_write() must be called without any vnode locks held.
     Remove calls to vn_start_write() and vn_finished_write() in
     vnode_pager_putpages() and add these calls before the vnode lock
     is obtained to most of the callers that don't already have them.

Approved by:	re (mux)
2006-03-09 00:18:45 +00:00
tegge
ca00351165 MFC: Don't try to show marker nodes.
Approved by:	re (mux)
2006-03-09 00:04:27 +00:00
tegge
81ceadf72a MFC: Add marker vnodes to ensure that all vnodes associated with the mount
point are iterated over when using MNT_VNODE_FOREACH.
2006-01-14 01:18:03 +00:00
dds
3459412dad MFC changes from 2005.10.26:
Move execve's access time update functionality into a
new vfs_mark_atime() function, and use the new function
for performing efficient atime updates in mmap().
2005-12-26 13:47:20 +00:00
dwhite
16e12cca71 MFC:
src/sys/fs/devfs/devfs_vnops.c		1.128
 src/sys/kern/vfs_subr.c		1.652

This is a workaround for a complicated issue involving VFS cookies and devfs.
The PR and patch have the details. The ultimate fix requires architectural
changes and clarifications to the VFS API, but this will prevent the system
from panicking when someone does "ls /dev" while running in a shell under the
linuxulator.

PR:		88249
Submitted by:   "Devon H. O'Dell" <dodell@ixsystems.com>
2005-11-12 21:21:27 +00:00
kris
053c63be2b MFC r1.650:
Default to mpsafevfs=1 on sparc64

Approved by:	re (scottl)
2005-10-25 20:42:06 +00:00
truckman
4df51e5f85 MFC snaplk deadlock fix
src/sys/kern/vfs_bio.c          1.495, 1.496
        src/sys/kern/vfs_subr.c         1.648
        src/sys/sys/buf.h               1.190, 1.191
        src/sys/sys/proc.h              1.436
        src/sys/ufs/ffs/ffs_snapshot.c  1.104, 1.105, 1.106

Original commit messages:

    Log:
    Un-staticize runningbufwakeup() and staticize updateproc.

    Add a new private thread flag to indicate that the thread should
    not sleep if runningbufspace is too large.

    Set this flag on the bufdaemon and syncer threads so that they skip
    the waitrunningbufspace() call in bufwrite() rather than than
    checking the proc pointer vs. the known proc pointers for these two
    threads.  A way of preventing these threads from being starved for
    I/O but still placing limits on their outstanding I/O would be
    desirable.

    Set this flag in ffs_copyonwrite() to prevent bufwrite() calls from
    blocking on the runningbufspace check while holding snaplk.  This
    prevents snaplk from being held for an arbitrarily long period of
    time if runningbufspace is high and greatly reduces the contention
    for snaplk.  The disadvantage is that ffs_copyonwrite() can start
    a large amount of I/O if there are a large number of snapshots,
    which could cause a deadlock in other parts of the code.

    Call runningbufwakeup() in ffs_copyonwrite() to decrement runningbufspace
    before attempting to grab snaplk so that I/O requests waiting on
    snaplk are not counted in runningbufspace as being in-progress.
    Increment runningbufspace again before actually launching the
    original I/O request.

    Prior to the above two changes, the system could deadlock if enough
    I/O requests were blocked by snaplk to prevent runningbufspace from
    falling below lorunningspace and one of the bawrite() calls in
    ffs_copyonwrite() blocked in waitrunningbufspace() while holding
    snaplk.

    See <http://www.holm.cc/stress/log/cons143.html>

    Revision  Changes    Path
    1.495     +3 -3      src/sys/kern/vfs_bio.c
    1.648     +2 -1      src/sys/kern/vfs_subr.c
    1.190     +1 -0      src/sys/sys/buf.h
    1.436     +1 -1      src/sys/sys/proc.h
    1.104     +16 -4     src/sys/ufs/ffs/ffs_snapshot.c

    Log:
    Un-staticize waitrunningbufspace() and call it before returning from
    ffs_copyonwrite() if any async writes were launched.

    Restore the threads previous TDP_NORUNNINGBUF state before returning
    from ffs_copyonwrite().

    Revision  Changes    Path
    1.496     +1 -1      src/sys/kern/vfs_bio.c
    1.191     +1 -0      src/sys/sys/buf.h
    1.105     +13 -1     src/sys/ufs/ffs/ffs_snapshot.c

    Log:
    Correct previous commit to fix the sense of the TDP_NORUNNINGBUF
    check in ffs_copyonwrite() that is a precondition for calling
    waitrunningbufspace().

    Pointed out by: tegge
    Pointy hat to:  truckman
    MFC after:      3 days

    Revision  Changes    Path
    1.106     +1 -1      src/sys/ufs/ffs/ffs_snapshot.c

Approved by:	re (scottl)
2005-10-04 04:41:27 +00:00
phk
051ea6d1a8 MFC:
Various fixes for DEVFS, in particular "devfs ruleset already running".

Approved by:	re@ (scottl)
2005-09-18 07:10:57 +00:00
tegge
2adf020846 MFC: Break out of loop if next buffer pointer has become invalid while
flushing current buffer.

Approved by:	re (scottl)
2005-09-17 15:51:12 +00:00
rwatson
7d516f2454 Merge vfs_subr.c:1.646 from HEAD to RELENG_6:
In vfs_kqfilter(), return EINVAL instead of 1 (EPERM) when an unsupported
  kqueue filter type is requested on a vnode.

Approved by:	re (kensmith)
2005-09-15 20:52:53 +00:00
ssouhlal
e729b6acde MFC r1.643:
Fix a typo in vop_rename_pre() where we ended up using vholdl()
  instead of vhold(), even though the vnode interlock is unlocked.

Approved by:	re (scottl)
2005-09-03 19:01:13 +00:00
truckman
5d65465af5 MFC vfs_subr.c 1.636 and 1.642
vfs_subr.c 1.642 fixes a small race condition in vlrureclaim() and depends
on 1.636.

  Modified files:
    sys/kern             vfs_subr.c
  Log:
   - Allow vnlru to drop giant if the filesystem does not require it.  The
     vnlru proc is extremely inefficient, potentially iteration over tens of
     thousands of vnodes without blocking.  Droping Giant allows other threads
     to preempt us although we should revisit the algorithm to fix the runtime
     problems especially since this may hold up all vnode allocations.
   - Remove the LK_NOWAIT from the VOP_LOCK in vlrureclaim.  This provides
     a natural blocking point to help alleviate the situation described above
     although it may not technically be desirable.
   - yield after we make a pass on all mount points to prevent us from
     blocking other threads which require Giant.

  MFC after:      2 weeks

  Revision  Changes    Path
  1.636     +11 -2     src/sys/kern/vfs_subr.c

  Modified files:
    sys/kern             vfs_subr.c
  Log:
  Back out the removal of LK_NOWAIT from the VOP_LOCK() call in
  vlrureclaim() in vfs_subr.c 1.636  because waiting for the vnode
  lock aggravates an existing race condition.  It is also undesirable
  according to the commit log for 1.631.

  Fix the tiny race condition that remains by rechecking the vnode
  state after grabbing the vnode lock and grabbing the vnode interlock.

  Fix the problem of other threads being starved (which 1.636 attempted
  to fix by removing LK_NOWAIT) by calling uio_yield() periodically
  in vlrureclaim().  This should be more deterministic than hoping
  that VOP_LOCK() without LK_NOWAIT will block, which may not happen
  in this loop.

  Reviewed by:    kan
  MFC after:      5 days

  Revision  Changes    Path
  1.642     +37 -7     src/sys/kern/vfs_subr.c

Approved by:	re (scottl)
2005-09-03 00:21:55 +00:00
rwatson
bf826e6d59 Merge vfs_subr.c:1.641 from HEAD to RELENG_6:
Silence "busy" warnings when unmounting devfs at system shutdown.  This
  is a workaround for non-symetric teardown of the file systems at
  shutdown with respect to the mount order at boot.  The proper long term
  fix is to properly detach devfs from the root mount before unmounting
  each, and should be implemented, but since the problem is non-harmful,
  this temporary band-aid will prevent false positive bug reports and
  unnecessary error output for 6.0-RELEASE.

  Tested by:      pav, pjd

Approved by:	re (scottl)
2005-08-23 01:50:19 +00:00
kan
0b90ddcccf MFC r1.639: Do not drop the vnode interlock if vdropl is called on already
doomed vnode. vdropl callers expect it to return with interlock still being
held.

Approved by:	re (hrs)
2005-08-15 13:50:38 +00:00
ssouhlal
defd686212 MFC:
Holding a vnode doesn't prevent v_mount from disappearing (when the
  vnode is inactivated), possibly leading to a NULL dereference when
  checking if the mount wants knotes to be activated in the VOP hooks.
  So, we add a new vnode flag VV_NOKNOTE that is only set in  getnewvnode(),
  if necessary, and check it when activating knotes.
  Since the flags are not erased when a vnode is being held, we can safely
  read them.

Approved by:	re (kensmith)
2005-08-15 06:01:36 +00:00
pjd
38bf7eadf9 Fix one "wrong b_bufobj" panic in reassignbuf() by moving VI_UNLOCK(vp)
below KASSERT()s, which means there was no real problem here, we just
needed better locking for assertions.

OK'ed by:	jeff
Approved by:	re (scottl)
2005-07-05 15:57:55 +00:00
ssouhlal
efe31cd3da Fix the recent panics/LORs/hangs created by my kqueue commit by:
- Introducing the possibility of using locks different than mutexes
for the knlist locking. In order to do this, we add three arguments to
knlist_init() to specify the functions to use to lock, unlock and
check if the lock is owned. If these arguments are NULL, we assume
mtx_lock, mtx_unlock and mtx_owned, respectively.

- Using the vnode lock for the knlist locking, when doing kqueue operations
on a vnode. This way, we don't have to lock the vnode while holding a
mutex, in filt_vfsread.

Reviewed by:	jmg
Approved by:	re (scottl), scottl (mentor override)
Pointyhat to:	ssouhlal
Will be happy:	everyone
2005-07-01 16:28:32 +00:00
jeff
5970417966 - Try to catch the wrong bufobj panics a little earlier. I believe they
are actually caused by a buf with both VNCLEAN and VNDIRTY set.  In
   the traces it is clear that the buf is removed from the dirty queue while
   it is actually on the clean queue which leaves the tail pointer set.
   Assert that both flags are not set in buf_vlist_add and buf_vlist_remove.

Sponsored by:	Isilon Systems, Inc.
Approved by:	re (blanket vfs)
2005-06-18 18:17:03 +00:00
jeff
ca07a9f012 - Change holdcnt use around vnode recycling. We now always keep a holdcnt
ref while we're calling vgone().  This prevents transient refs from
   re-adding us to the free list.  Previously, a vfree() triggered via
   vinvalbuf() getting rid of all of a vnode's pages could place a partially
   destructed vnode on the free list where vtryrecycle() could find it.  The
   first call to vtryrecycle would hang up on the vnode lock, but when it
   failed it would place a now dead vnode onto the free list, and another
   call to vtryrecycle() would free an already free vnode.  There were many
   complications of having a zero ref count while freeing which can now go
   away.
 - Change vdropl() to release the interlock before returning.  All callers
   now respect this, so vdropl() directly frees VI_DOOMED vnodes once the
   last ref is dropped.  This means that we'll never have VI_DOOMED vnodes
   on the free list.
 - Seperate v_incr_usecount() into v_incr_usecount(), v_decr_usecount() and
   v_decr_useonly().  The incr/decr split is so that incr usecount can
   return with the interlock still held while decr drops the interlock so
   it can call vdropl() which will potentially free the vnode.  The calling
   function can't drop the lock of an already free'd node.  v_decr_useonly()
   drops a usecount without droping the hold count.  This is done so the
   usecount reaches zero in vput() before we recycle, however the holdcount
   is still 1 which prevents any new references from placing the vnode
   back on the free list.
 - Fix vnlrureclaim() to vhold the vnode since it doesn't do a vget().  We
   wouldn't want vnlrureclaim() to bump the usecount since this has
   different semantics.  Also change vnlrureclaim() to do a NOWAIT on the
   vn_lock.  When this function runs we're usually in a desperate situation
   and we wouldn't want to wait for any specific vnode to be released.
 - Fix a bunch of misc comments to reflect the new behavior.
 - Add vhold() and vdrop() to vflush() for the same reasons that we do in
   vlrureclaim().  Previously we held no reference and a vnode could have
   been freed while we were waiting on the lock.
 - Get rid of vlruvp() and vfreehead().  Neither are used.  vlruvp() should
   really be rethought before it's reintroduced.
 - vgonel() always returns with the vnode locked now and never puts the
   vnode back on a free list.  The vnode will be freed as soon as the last
   reference is released.

Sponsored by:	Isilon Systems, Inc.
Debugging help from:	Kris Kennaway, Peter Holm
Approved by:	re (blanket vfs)
2005-06-16 04:41:42 +00:00
jeff
909b5b7c58 - In reassignbuf() add many asserts to validate the head and tail pointers
of the clean and dirty lists.  This is in an attempt to catch the wrong
   bufobj problem sooner.
 - In vgonel() don't acquire an extra reference in the active case, the
   vnode lock and VI_DOOMED protect us from recursively cleaning.
 - Also in vgonel() clean up some stale comments.

Sponsored by:	Isilon Systems, Inc.
Approved by:	re (blanket vfs)
2005-06-14 20:31:53 +00:00
jeff
7a825fb457 - Don't make vgonel() globally visible, we want to change its prototype
anyway and it's not used outside of vfs_subr.c.
 - Change vgonel() to accept a parameter which determines whether or not
   we'll put the vnode on the free list when we're done.
 - Use the new vgonel() parameter rather than VI_DOOMED to signal our
   intentions in vtryrecycle().
 - In vgonel() return if VI_DOOMED is already set, this vnode has already
   been reclaimed.

Sponsored by:	Isilon Systems, Inc.
2005-06-13 06:26:55 +00:00
jeff
2ef7df2a1a - Add KTR_VFS events to vdestroy, vtruncbuf, vinvalbuf, vfreehead.
Sponsored by:	Isilon Systems, Inc.
2005-06-13 00:46:37 +00:00
jeff
306b180d66 - Assert that we're not in the name cache anymore in vdestroy().
Sponsored by:	Isilon Systems, Inc.
2005-06-11 08:48:09 +00:00
jeff
3625e8746b - Add KTR_VFS tracing to track the life of vnodes. Eventually KTR_VFS
events could be added to cover other interesting details.
 - Add some VNASSERTs to discover places where we access vnodes after
   they have been uma_zfree'd before we try to free them again.
 - Add a few more VNASSERTs to vdestroy() to be certain that the vnode is
   really unused.

Sponsored by:	Isilon Systems, Inc.
2005-06-11 01:16:46 +00:00
ssouhlal
0835f7b4a9 Allow EVFILT_VNODE events to work on every filesystem type, not just
UFS by:
- Making the pre and post hooks for the VOP functions work even when
DEBUG_VFS_LOCKS is not defined.
- Moving the KNOTE activations into the corresponding VOP hooks.
- Creating a MNTK_NOKNOTE flag for the mnt_kern_flag field of struct
mount that permits filesystems to disable the new behavior.
- Creating a default VOP_KQFILTER function: vfs_kqfilter()

My benchmarks have not revealed any performance degradation.

Reviewed by:	jeff, bde
Approved by:	rwatson, jmg (kqueue changes), grehan (mentor)
2005-06-09 20:20:31 +00:00
jeff
4a9af33a3f - Clear OWEINACT prior to calling VOP_INACTIVE to remove the possibility
of a vget causing another call to INACTIVE before we're finished.
2005-06-07 22:05:32 +00:00
cperciva
e513415af9 If we are going to
1. Copy a NULL-terminated string into a fixed-length buffer, and
2. copyout that buffer to userland,
we really ought to
0. Zero the entire buffer
first.

Security: FreeBSD-SA-05:08.kmem
2005-05-06 02:50:00 +00:00
jeff
92f17d1e6a - A vnode may have made its way onto the free list while it was being
vgone'd.  We must remove it from the freelist before returning in
   vtryrecycle() or we may get a duplicate free.

Reported by:	kkenn
2005-05-03 10:56:00 +00:00
csjp
431f1afe8c Since it is not possible for curthread to be NULL in this context,
drop the check+initialization for a straight initialization. Also
assert that curthread will never be NULL just to be sure.

Discussed with:	rwatson, peter
MFC after:	1 week
2005-05-02 02:07:55 +00:00
jeff
dd41538cd8 - All buffers should either be clean or dirty. If neither of these flags
are set when we attempt to remove a buffer from a queue we should panic.
   Hopefully this will catch the source of the wrong bufobj panics.

Sponsored by:	Isilon Systems, Inc.
2005-05-01 12:00:36 +00:00
jeff
7354fc5e28 - In vnlru_free() remove the vnode from the free list before we call
vtryrecycle().  We could sometimes get into situations where two threads
   could try to recycle the same vnode before this.
 - vtryrecycle() is now responsible for returning the vnode to the free list
   if it fails and someone else hasn't done it.
 - Make a new function vfreehead() which moves a vnode to the head of the
   free list and use it in vgone() to clean up that code a bit.

Sponsored by:	Isilon Systems, Inc.
Reported by:	pho, kkenn
2005-04-30 11:22:40 +00:00
jeff
0e56b01ed6 - Don't vgonel() via vgone() or vrecycle() if the vnode is already doomed.
This fixes forced unmounts via nullfs.

Reported by:	kkenn
Sponsored by:	Isilon Systems, Inc.
2005-04-27 10:03:21 +00:00
jeff
a80bbe799e - Stop setting vxthread, we've asserted that it was useless for several
weeks now.
2005-04-27 09:17:33 +00:00
jeff
31cfb7f242 - Disable code which allows getnewvnode() to fail. Many ffs_vget() callers
do not correctly deal with failures.  This presently risks deadlock
   problems if dependency processing is held up by failures to allocate
   a vnode, however, this is better than the situation with the failures.

Sponsored by:	Isilon Systems, Inc.
2005-04-22 00:57:05 +00:00
phk
4bd811c8dd Initialize mountlist_mtx with an MTX_SYSINIT(), we need it to be ready
earlier.
2005-04-18 21:11:47 +00:00
jeff
5642885b84 - Change vop_lookup_post assertions to reflect recent vfs_lookup changes.
Sponsored by:	Isilon Systems, Inc.
2005-04-13 10:57:53 +00:00
jeff
b391d2675b - Enable ASSERT_VOP_ELOCKED and assert_vop_elocked() now that vnode_if.awk
uses it.

Sponsored by:	Isilon Systems, Inc.
2005-04-11 15:17:06 +00:00
jeff
17be4cbfa0 - Change the VOP_LOCK UPGRADE in vput() to do a LK_NOWAIT to avoid a
potential lock order reversal.  Also, don't unlock the vnode if this
   fails, lockmgr has already unlocked it for us.
 - Restructure vget() now that vn_lock() does all of VI_DOOMED checking
   for us and also handles the case where there is no real lock type.
 - If VI_OWEINACT is set, we need to upgrade the lock request to EXCLUSIVE
   so that we can call inactive.  It's not legal to vget a vnode that hasn't
   had INACTIVE called yet.

Sponsored by:	Isilon Systems, Inc.
2005-04-11 09:28:32 +00:00
jeff
60d07eec30 - Assert that the bufobj matches in flushbuflists. I still haven't gotten
to root cause on exactly how this happens.
 - If the assert is disabled, we presently try to handle this case, but the
   BUF_UNLOCK was missing.  Thus, if this condition ever hit we would leak
   a buf lock.

Many thanks to Peter Holm for all his help in finding this bug.  He really
put more effort into it than I did.
2005-04-06 06:49:46 +00:00