Commit Graph

680 Commits

Author SHA1 Message Date
Pawel Jakub Dawidek
04d9e255df getnewvnode() can be called with NULL mp.
Found by:	Coverity Prevent (tm)
Coverity ID:	1521
Confirmed by:	phk
2006-08-10 08:56:03 +00:00
Pawel Jakub Dawidek
13c85d339d Add a bandaid to avoid a deadlock in a situation, when we are trying to suspend
a file system, but need to obtain a vnode. We may not be able to do it, because
all vnodes could be already in use and other processes cannot release them,
because they are waiting in "suspfs" state.

In such situation, we allow to allocate a vnode anyway.

This is a temporary fix - there is no backpressure to free vnodes allocated in
those circumstances.

MFC after:	1 week
Reviewed by:	tegge
2006-08-09 12:47:30 +00:00
Robert Watson
ccdebe46bd Improve commenting of vaccess(), making sure to be clear that the ifdef
capabilities code is there for reference and never actually used.  Slight
style tweak.
2006-08-06 10:43:35 +00:00
Alan Cox
27ea29536c Enable debug.mpsafevfs by default on arm. Since every architecture except
powerpc has debug.mpsafevfs enabled by default, it is shorter to enumerate
the architectures on which debug.mpsafevfs is off.

Tested by: cognet@
2006-07-15 06:44:27 +00:00
Konstantin Belousov
c8d3bc1fa3 Back out my rev. 1.674. The better fix (rev. 1.637) is already in tree.
Approved by:	kan (mentor)
2006-07-05 16:33:25 +00:00
Sergey Babkin
d81175c738 Backed out the change by request from rwatson.
PR:		kern/14584
2006-06-26 22:03:22 +00:00
Sergey Babkin
7a799f1ef0 The common UID/GID space implementation. It has been discussed on -arch
in 1999, and there are changes to the sysctl names compared to PR,
according to that discussion. The description is in sys/conf/NOTES.
Lines in the GENERIC files are added in commented-out form.
I'll attach the test script I've used to PR.

PR:		kern/14584
Submitted by:	babkin
2006-06-25 18:37:44 +00:00
Konstantin Belousov
55aef2632f Fix the LOR that occurs when the MAC compiled into the kernel
and vnode is destroyed.

Reviewed by:	rwatson
LOR:		189
MFC after:	2 weeks
Approved by:	kan (mentor)
2006-06-08 07:55:10 +00:00
Stephan Uphoff
dcf67e65d2 Do not set B_NOCACHE on buffers when releasing them in flushbuflist().
If B_NOCACHE is set the pages of vm backed buffers will be invalidated.
However clean buffers can be backed by dirty VM pages so invalidating them
can lead to data loss.
Add support for flush dirty page in the data invalidation function
of some network file systems.

This fixes data losses during vnode recycling (and other code paths
using invalbuf(*,V_SAVE,*,*)) for data written using an mmaped file.

Collaborative effort by: jhb@,mohans@,peter@,ps@,ups@
Reviewed by:	tegge@
MFC after:	7 days
2006-05-25 01:00:35 +00:00
John Baldwin
73dbd3da73 Remove various bits of conditional Alpha code and fixup a few comments. 2006-05-12 05:04:46 +00:00
Pawel Jakub Dawidek
643df192de vn_start_write()/vn_finished_write() is not needed here, because
vn_start_write() is always called earlier in the code path and calling
the function recursively may lead to a deadlock.

Confirmed by:	tegge
MFC after:	2 weeks
2006-04-29 21:57:38 +00:00
Jeff Roberson
6ca9fcc586 - Add a BO_NEEDSGIANT flag to the bufobj. This flag forces all child
buffers to go on the buf daemon's DIRTYGIANT queue.
 - Set BO_NEEDSGIANT on ffs's devvp since the ffs_copyonwrite handler
   runs in the context of the buf daemon and may require Giant.
2006-04-28 01:05:31 +00:00
Jeff Roberson
b53bf1269c - VFS_LOCK_GIANT when recycling a vnode via getnewvnode. We may be
recycling for an unrelated filesystem.  I really don't like potentially
   acquiring giant in the context of a giantless filesystem but there
   are reasonable objections to removing the recycling from this path.

Sponsored by:	Isilon Systems, Inc.
2006-04-04 06:46:10 +00:00
Jeff Roberson
0af2472199 - Add an assert to vgone. It is illegal to call vgone without a reference
to the vnode.  Without a reference the vnode will never be vdestroy'd
   and the memory will never be reclaimed.

Sponsored by:	Isilon Systems, Inc.
2006-03-31 23:39:26 +00:00
Jeff Roberson
94bc95db3c - Hold a reference from the time vfs_busy starts until vfs_unbusy is
called.
 - vfs_getvfs has to return a reference to prevent the returned mountpoint
   from changing identities.
 - Release references acquired via vfs_getvfs.

Discussed with:	tegge
Tested by:	kris
Sponsored by:	Isilon Systems, Inc.
2006-03-31 03:53:25 +00:00
Jeff Roberson
084d64ac21 - Add the B_NEEDSGIANT flag which is only set if the vnode that owns a buf
requires Giant.  It is set in bgetvp and cleared in brelvp.
 - Create QUEUE_DIRTY_GIANT for dirty buffers that require giant.
 - In the buf daemon, only grab giant when processing QUEUE_DIRTY_GIANT and
   only if we think there are buffers in that queue.

Sponsored by:	Isilon Systems, Inc.
2006-03-31 02:56:30 +00:00
Jeff Roberson
e44270a781 - Correct an assert in vop_rename_pre. fdvp may be locked if it is either
the target directory or file.  This case should fail in the filesystem
   anyway and perhaps kern_rename() should catch it.

Sponsored by:	Isilon Systems, Inc.
2006-03-19 20:14:46 +00:00
Tor Egge
791dd2fade Use vn_start_secondary_write() and vn_finished_secondary_write() as a
replacement for vn_write_suspend_wait() to better account for secondary write
processing.

Close race where secondary writes could be started after ffs_sync() returned
but before the file system was marked as suspended.

Detect if secondary writes or softdep processing occurred during vnode sync
loop in ffs_sync() and retry the loop if needed.
2006-03-08 23:43:39 +00:00
Tor Egge
3b582b4e72 Eliminate a deadlock when creating snapshots. Blocking vn_start_write() must
be called without any vnode locks held.  Remove calls to vn_start_write() and
vn_finished_write() in vnode_pager_putpages() and add these calls before the
vnode lock is obtained to most of the callers that don't already have them.
2006-03-02 22:13:28 +00:00
Tor Egge
b983aac762 Don't try to show marker nodes. 2006-03-02 21:31:15 +00:00
Jeff Roberson
eb2ea10590 - Move softdep from using a global worklist to per-mount worklists. This
has many positive effects including improved smp locking, reducing
   interdependencies between mounts that can lead to deadlocks, etc.
 - Add the softdep worklist and various counters to the ufsmnt structure.
 - Add a mount pointer to the workitem and remove mount pointers from the
   various structures derived from the workitem as they are now redundant.
 - Remove the poor-man's semaphore protecting softdep_process_worklist and
   softdep_flushworklist.  Several threads may now process the list
   simultaneously.
 - Add softdep_waitidle() to block the thread until all pending
   dependencies being operated on by other threads have been flushed.
 - Use softdep_waitidle() in unmount and snapshots to block either
   operation until the fs is stable.
 - Remove softdep worklist processing from the syncer and move it into the
   softdep_flush() thread.  This thread processes all softdep mounts
   once each second and when it is called via the new softdep_speedup()
   when there is a resource shortage.  This removes the softdep hook
   from the kernel and various hacks in header files to support it.

Reviewed by/Discussed with:	tegge, truckman, mckusick
Tested by:	kris
2006-03-02 05:50:23 +00:00
Jeff Roberson
a1db11fc40 - Release the mount ref once the vnode has been recycled rather than once
the last reference is dropped.  I forgot that vnodes can stick around
   for a very long time until processes discover that they are dead.  This
   means that a vnode reference is not sufficient to keep the mount
   referenced and even more code will be required to ref mount points.

Discovered by:	kris
2006-02-23 05:15:37 +00:00
Jeff Roberson
8a7cd2fdfb - Grab a mnt ref in vfs_busy() before dropping the interlock. This will
prevent the mount point from going away while we're waiting on the lock.
   The ref does not need to persist once we have the lock because the
   lock prevents the mount point from being unmounted.

MFC After:	1 week
2006-02-22 06:20:12 +00:00
Jeff Roberson
04f6d3effa - Add a ref count to the mount structure. Sleep for up to 3 seconds in
vfs_mount_destroy waiting for this ref to hit 0.  We don't print an
   error if we are rebooting as the root mount always retains some refernces
   by init proc.
 - Acquire a mnt ref for every vnode allocated to a mount point.  Drop this
   ref only once vdestroy() has been called and the mount has been freed.
 - No longer NULL the v_mount pointer in delmntque() so that we may release
   the ref after vgone() has been called.  This allows us to guarantee
   that the mount point structure will be valid until the last vnode has
   lost its last ref.
 - Fix a few places that rely on checking v_mount to detect recycling.

Sponsored by:	Isilon Systems, Inc.
MFC After:	1 week
2006-02-06 10:19:50 +00:00
Jeff Roberson
b099db5881 - Solve a race where we could lose a call to VOP_INACTIVE. If vget() waiting
on a lock held the last usecount ref on a vnode and the lock failed we
   would not call INACTIVE.  Solve this by only holding a holdcnt to prevent
   the vnode from disappearing while we wait on vn_lock.  Other callers
   may now VOP_INACTIVE while we are waiting on the lock, however this race
   is acceptable, while losing INACTIVE is not.

Discussed with:	kan, pjd
Tested by:	kkenn
Sponsored by:	Isilon Systems, Inc.
MFC After:	1 week
2006-02-01 00:30:05 +00:00
Kris Kennaway
d5e5528afe Back out r1.653; it turns out that the race (or at least the printf) is
actually not hard to trigger, and it can cause a lot of console spam.

Approved by:	kan
2006-01-28 03:06:35 +00:00
Robert Watson
6be2c41a22 Convert remaining functions in vfs_subr.c from K&R prototypes to ANSI C
prototypes, as the majority of new functions added have been in this
style.  Changing prototype style now results in gcc noticing that the
implementation of vn_pollrecord() has a 'short' argument instead of
'int' as prototyped in vnode.h, so correct that definition.  In practice
this didn't matter as only poll flags in the lower 16 bits are used.

MFC after:	1 week
2006-01-21 19:42:10 +00:00
Tor Egge
82be0a5a24 Add marker vnodes to ensure that all vnodes associated with the mount point are
iterated over when using MNT_VNODE_FOREACH.

Reviewed by:	truckman
2006-01-09 20:42:19 +00:00
Pawel Jakub Dawidek
e7736557d6 Print a warning when we miss vinactive() call, because of race in vget().
The race is very real, but conditions needed for triggering it are rather
hard to meet now.
When gjournal will be committed (where it is quite easy to trigger) we need
to fix it.

For now, verify if it is really hard to trigger.

Discussed with:	kan
2005-12-29 22:52:09 +00:00
Doug White
16e35dcc39 This is a workaround for a complicated issue involving VFS cookies and devfs.
The PR and patch have the details. The ultimate fix requires architectural
changes and clarifications to the VFS API, but this will prevent the system
from panicking when someone does "ls /dev" while running in a shell under the
linuxulator.

This issue affects HEAD and RELENG_6 only.

PR:		88249
Submitted by:	"Devon H. O'Dell" <dodell@ixsystems.com>
MFC after:	3 days
2005-11-09 22:03:50 +00:00
Robert Watson
5bb84bc84b Normalize a significant number of kernel malloc type names:
- Prefer '_' to ' ', as it results in more easily parsed results in
  memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
  as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
  memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
  attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion.  Similar changes are required for UMA zone names.
2005-10-31 15:41:29 +00:00
Kris Kennaway
14cdc36456 mpsafevm has been stable and defaulted to 1 on sparc64 for over 6 months,
so we are ready for mpsafevfs=1 by default on sparc64 too.  I have been
running this on all my sparc64 machines for over 6 months, and have not
encountered MD problems.

MFC after:	1 week
2005-10-14 23:56:13 +00:00
Diomidis Spinellis
9f5c1d1955 Move execve's access time update functionality into a new
vfs_mark_atime() function, and use the new function for
performing efficient atime updates in mmap().

Reviewed by:	bde
MFC after:	2 weeks
2005-10-12 06:56:00 +00:00
Don Lewis
6c8b634f1d Un-staticize runningbufwakeup() and staticize updateproc.
Add a new private thread flag to indicate that the thread should
not sleep if runningbufspace is too large.

Set this flag on the bufdaemon and syncer threads so that they skip
the waitrunningbufspace() call in bufwrite() rather than than
checking the proc pointer vs. the known proc pointers for these two
threads.  A way of preventing these threads from being starved for
I/O but still placing limits on their outstanding I/O would be
desirable.

Set this flag in ffs_copyonwrite() to prevent bufwrite() calls from
blocking on the runningbufspace check while holding snaplk.  This
prevents snaplk from being held for an arbitrarily long period of
time if runningbufspace is high and greatly reduces the contention
for snaplk.  The disadvantage is that ffs_copyonwrite() can start
a large amount of I/O if there are a large number of snapshots,
which could cause a deadlock in other parts of the code.

Call runningbufwakeup() in ffs_copyonwrite() to decrement runningbufspace
before attempting to grab snaplk so that I/O requests waiting on
snaplk are not counted in runningbufspace as being in-progress.
Increment runningbufspace again before actually launching the
original I/O request.

Prior to the above two changes, the system could deadlock if enough
I/O requests were blocked by snaplk to prevent runningbufspace from
falling below lorunningspace and one of the bawrite() calls in
ffs_copyonwrite() blocked in waitrunningbufspace() while holding
snaplk.

See <http://www.holm.cc/stress/log/cons143.html>
2005-09-30 01:30:01 +00:00
Tor Egge
61ac14dab6 Break out of loop if next buffer pointer has become invalid while flushing
current buffer.

Reviewed by:	kan
2005-09-16 18:28:12 +00:00
Robert Watson
fd1a469ba5 In vfs_kqfilter(), return EINVAL instead of 1 (EPERM) when an unsupported
kqueue filter type is requested on a vnode.

MFC after:	3 days
2005-09-12 19:22:37 +00:00
Jung-uk Kim
9ed448b20c use monotonic time_uptime' instead of time_second'
Approved by:	anholt (mentor)
Discussed on:	arch
2005-09-12 15:31:28 +00:00
Poul-Henning Kamp
2883ba6668 Introduce vfs_read_dirent() which can help VOP_READDIR() implementations
by handling all the cookie stuff.
2005-09-12 08:46:07 +00:00
Suleiman Souhlal
a6c109d658 Fix a typo in vop_rename_pre() where we ended up using vholdl()
instead of vhold(), even though the vnode interlock is unlocked.

MFC after:	3 days
2005-08-28 23:00:11 +00:00
Don Lewis
ad9f180121 Back out the removal of LK_NOWAIT from the VOP_LOCK() call in
vlrureclaim() in vfs_subr.c 1.636  because waiting for the vnode
lock aggravates an existing race condition.  It is also undesirable
according to the commit log for 1.631.

Fix the tiny race condition that remains by rechecking the vnode
state after grabbing the vnode lock and grabbing the vnode interlock.

Fix the problem of other threads being starved (which 1.636 attempted
to fix by removing LK_NOWAIT) by calling uio_yield() periodically
in vlrureclaim().  This should be more deterministic than hoping
that VOP_LOCK() without LK_NOWAIT will block, which may not happen
in this loop.

Reviewed by:	kan
MFC after:	5 days
2005-08-23 03:44:06 +00:00
Robert Watson
6cd8dee3c5 Silence "busy" warnings when unmounting devfs at system shutdown. This
is a workaround for non-symetric teardown of the file systems at
shutdown with respect to the mount order at boot.  The proper long term
fix is to properly detach devfs from the root mount before unmounting
each, and should be implemented, but since the problem is non-harmful,
this temporary band-aid will prevent false positive bug reports and
unnecessary error output for 6.0-RELEASE.

MFC after:	3 days
Tested by:	pav, pjd
2005-08-20 17:12:47 +00:00
Marcel Moolenaar
fd65baf8e2 Make mpsafe_vfs=1 the default on ia64. 2005-08-13 20:07:50 +00:00
Alexander Kabaev
45a0d1ed7a Do not drop the vnode interlock if vdropl is called on already doomed vnode.
vdropl callers expect it to return with interlock still being held.

MFC after:	2 days
2005-08-10 11:46:03 +00:00
Suleiman Souhlal
34cc826ae8 Holding a vnode doesn't prevent v_mount from disappearing (when the
vnode is inactivated), possibly leading to a NULL dereference when
checking if the mount wants knotes to be activated in the VOP hooks.
So, we add a new vnode flag VV_NOKNOTE that is only set in getnewvnode(),
if necessary, and check it when activating knotes.
Since the flags are not erased when a vnode is being held, we can safely
read them.

Reviewed by:	kris@
MFC after:	3 days
2005-08-06 01:42:04 +00:00
Jeff Roberson
40a495853a - Unlock before we call mac_destroy_vnode to prevent a lock order reversal.
Found by:	trhodes
2005-08-03 05:36:50 +00:00
Jeff Roberson
39b2406838 - Allow vnlru to drop giant if the filesystem does not require it. The
vnlru proc is extremely inefficient, potentially iteration over tens of
   thousands of vnodes without blocking.  Droping Giant allows other threads
   to preempt us although we should revisit the algorithm to fix the runtime
   problems especially since this may hold up all vnode allocations.
 - Remove the LK_NOWAIT from the VOP_LOCK in vlrureclaim.  This provides
   a natural blocking point to help alleviate the situation described above
   although it may not technically be desirable.
 - yield after we make a pass on all mount points to prevent us from
   blocking other threads which require Giant.

MFC after:	2 weeks
2005-07-20 01:43:27 +00:00
Pawel Jakub Dawidek
c23c87bd93 Fix one "wrong b_bufobj" panic in reassignbuf() by moving VI_UNLOCK(vp)
below KASSERT()s, which means there was no real problem here, we just
needed better locking for assertions.

OK'ed by:	jeff
Approved by:	re (scottl)
2005-07-05 15:57:55 +00:00
Suleiman Souhlal
571dcd15e2 Fix the recent panics/LORs/hangs created by my kqueue commit by:
- Introducing the possibility of using locks different than mutexes
for the knlist locking. In order to do this, we add three arguments to
knlist_init() to specify the functions to use to lock, unlock and
check if the lock is owned. If these arguments are NULL, we assume
mtx_lock, mtx_unlock and mtx_owned, respectively.

- Using the vnode lock for the knlist locking, when doing kqueue operations
on a vnode. This way, we don't have to lock the vnode while holding a
mutex, in filt_vfsread.

Reviewed by:	jmg
Approved by:	re (scottl), scottl (mentor override)
Pointyhat to:	ssouhlal
Will be happy:	everyone
2005-07-01 16:28:32 +00:00
Jeff Roberson
b770ff6eb2 - Try to catch the wrong bufobj panics a little earlier. I believe they
are actually caused by a buf with both VNCLEAN and VNDIRTY set.  In
   the traces it is clear that the buf is removed from the dirty queue while
   it is actually on the clean queue which leaves the tail pointer set.
   Assert that both flags are not set in buf_vlist_add and buf_vlist_remove.

Sponsored by:	Isilon Systems, Inc.
Approved by:	re (blanket vfs)
2005-06-18 18:17:03 +00:00
Jeff Roberson
114a1006a8 - Change holdcnt use around vnode recycling. We now always keep a holdcnt
ref while we're calling vgone().  This prevents transient refs from
   re-adding us to the free list.  Previously, a vfree() triggered via
   vinvalbuf() getting rid of all of a vnode's pages could place a partially
   destructed vnode on the free list where vtryrecycle() could find it.  The
   first call to vtryrecycle would hang up on the vnode lock, but when it
   failed it would place a now dead vnode onto the free list, and another
   call to vtryrecycle() would free an already free vnode.  There were many
   complications of having a zero ref count while freeing which can now go
   away.
 - Change vdropl() to release the interlock before returning.  All callers
   now respect this, so vdropl() directly frees VI_DOOMED vnodes once the
   last ref is dropped.  This means that we'll never have VI_DOOMED vnodes
   on the free list.
 - Seperate v_incr_usecount() into v_incr_usecount(), v_decr_usecount() and
   v_decr_useonly().  The incr/decr split is so that incr usecount can
   return with the interlock still held while decr drops the interlock so
   it can call vdropl() which will potentially free the vnode.  The calling
   function can't drop the lock of an already free'd node.  v_decr_useonly()
   drops a usecount without droping the hold count.  This is done so the
   usecount reaches zero in vput() before we recycle, however the holdcount
   is still 1 which prevents any new references from placing the vnode
   back on the free list.
 - Fix vnlrureclaim() to vhold the vnode since it doesn't do a vget().  We
   wouldn't want vnlrureclaim() to bump the usecount since this has
   different semantics.  Also change vnlrureclaim() to do a NOWAIT on the
   vn_lock.  When this function runs we're usually in a desperate situation
   and we wouldn't want to wait for any specific vnode to be released.
 - Fix a bunch of misc comments to reflect the new behavior.
 - Add vhold() and vdrop() to vflush() for the same reasons that we do in
   vlrureclaim().  Previously we held no reference and a vnode could have
   been freed while we were waiting on the lock.
 - Get rid of vlruvp() and vfreehead().  Neither are used.  vlruvp() should
   really be rethought before it's reintroduced.
 - vgonel() always returns with the vnode locked now and never puts the
   vnode back on a free list.  The vnode will be freed as soon as the last
   reference is released.

Sponsored by:	Isilon Systems, Inc.
Debugging help from:	Kris Kennaway, Peter Holm
Approved by:	re (blanket vfs)
2005-06-16 04:41:42 +00:00