Commit Graph

702 Commits

Author SHA1 Message Date
jeff
8c749eb801 - Move code that should probably be an assert above the main body of
vrele so that we can decrease the indentation of the real work and
   make things slightly more clear.

Sponsored by:	Isilon Systems, Inc.
2005-03-28 11:18:47 +00:00
jeff
b25a472993 - Adjust asserts in vop_lookup_post() to match the new post PDIRUNLOCK
vfs.

Sponsored by:	Isilon Systems, Inc.
2005-03-28 09:25:25 +00:00
phk
eac95420b8 Remove another ';' after if().
Also spotted by:	bz
2005-03-27 07:53:13 +00:00
phk
4b5dfbb1ae Remove extra ; at end of if().
Found by:	bz
2005-03-27 07:52:12 +00:00
jeff
6d72a7bd60 - Don't recycle vnodes anymore. Free them once they are dead. getnewvnode
now always allocates a new vnode.
 - Define a new function, vnlru_free, which frees vnodes from the free list.
   It takes as a parameter the number of vnodes to free, which is
   wantfreevnodes - freevnodes when called from vnlru_proc or 1 when
   called from getnewvnode().  For now, getnewvnode() still tries to reclaim
   a free vnode before creating a new one when we are near the limit.
 - Define a function, vdestroy, which handles the actual release of memory
   and teardown of locks, etc.  This could become a uma_dtor() routine.
 - Get rid of minvnodes.  Now wantfreevnodes is 1/4th the max vnodes.  This
   keeps more unreferenced vnodes around so that files which have only
   been stat'd are less likely to be kicked out of the system before we
   have a chance to read them, etc.  These vnodes may still be freed via
   the normal vnlru_proc() routines which may some day become a real lru.
2005-03-25 05:34:39 +00:00
jeff
0210925e42 - Pass LK_EXCLUSIVE to VFS_ROOT() to satisfy the new flags argument. For
now, all calls to VFS_ROOT() should still acquire exclusive locks.

Sponsored by:	Isilon Systems, Inc.
2005-03-24 07:31:38 +00:00
jeff
bf2e6f43e8 - If vput() is called with a shared lock it must upgrade to an exclusive
before it can call VOP_INACTIVE().  This must use the EXCLUPGRADE path
   because we may violate some lock order with another locked vnode if
   we drop and reacquire the lock.  If EXCLUPGRADE fails, we mark the
   vnode with VI_OWEINACT.  This case should be very rare.
 - Clear VI_OWEINACT in vinactive() and vbusy().
 - If VI_OWEINACT is set in vgone() do the VOP_INACTIVE call here as well.

Sponsored by:	Isilon Systems, Inc.
2005-03-24 06:08:58 +00:00
jeff
d289cc6b5d - Now that there are no external users of vfree() make it static.
- Move VSHOULDBUSY, VSHOULDFREE, and VTRYRECYCLE into vfs_subr.c so
   no one else attempts to grow a dependency on them.
 - Now that objects with pages hold the vnode we don't have to do unlocked
   checks for the page count in the vm object in VSHOULDFREE.  These three
   macros could simply check for holdcnt state transitions to determine
   whether the vnode is on the free list already, but the extra safety
   the flag affords us is probably worth the minimal cost.
 - The leafonly sysctl and code have been dead for several years now,
   remove the sysctl and the code that employed it from vtryrecycle().
 - vtryrecycle() also no longer has to check the object's page count as
   the object holds the vnode until it reaches 0.

Sponsored by:	Isilon Systems, Inc.
2005-03-15 14:38:16 +00:00
jeff
2115694bbc - Expose vholdl() so it may be used outside of vfs_subr.c 2005-03-15 13:43:10 +00:00
jeff
3fcb9112fb - Increment the holdcnt once for each usecount reference. This allows us
to use only the holdcnt to determine whether a vnode may be recycled,
   simplifying the V* macros as well as vtryrecycle(), etc.

Sponsored by:	Isilon Systems, Inc.
2005-03-14 09:25:19 +00:00
jeff
2a81e8df21 - We do not have to check the object's ref_count in VSHOULDFREE or
vtryrecycle().  All obj refs also ref the vnode.
 - Consistently use v_incr_usecount() to increment the usecount.  This will
   be more important later.

Sponsored by:	Isilon Systems, Inc.
2005-03-14 08:30:31 +00:00
jeff
bb63517e7e - Slightly rearrange vrele() to move the common case in one indentation
level.

Sponsored by:	Isilon Systems, Inc.
2005-03-14 07:16:55 +00:00
jeff
a307ec6ef8 - Rework vget() so we drop the usecount in two failure cases that were
missed by my last commit.

Sponsored by:	Isilon Systems, Inc.
2005-03-14 07:11:19 +00:00
jeff
d29b61a365 - Remove vx_lock, vx_unlock, vx_wait, etc.
- Add a vn_start_write/vn_finished_write around vlrureclaim so we don't do
   writing ops without suspending.  This could suspend the vlruproc which
   should not be a problem under normal circumstances.
 - Manually implement VMIGHTFREE in vlrureclaim as this was the only instance
   where it was used.
 - Acquire a lock before calling vgone() as it now requires it.
 - Move the acquisition of the vnode interlock from vtryrecycle() to
   getnewvnode() so that if it fails we don't drop and reacquire the
   vnode_free_list_mtx.
 - Check for a usecount or holdcount at the end of vtryrecycle() in case
   someone grabbed a ref while we were recycling.  Abort the recycle, and
   on the final ref drop this vnode will be placed on the head of the free
   list.
 - Move the redundant VOP_INACTIVE protection code into the local
   vinactive() routine to avoid code bloat.
 - Keep the vnode lock held across calls to vgone() in several places.
 - vgonel() no longer uses XLOCK, instead callers must hold an exclusive
   vnode lock.  The VI_DOOMED flag is set to allow other threads to detect
   a vnode which is no longer valid.  This flag is set until the last
   reference is gone, and there are no chances for a new ref.  vgonel()
   holds this lock across the entire function, which greatly simplifies
   logic.
 _ Only vfree() in one place in vgone() not three.
 - Adjust vget() to check the VI_DOOMED flag prior to waiting on the lock
   in the LK_NOWAIT case.  In other cases, check after we have slept and
   acquired an exlusive lock.  This will simulate the old vx_wait()
   behavior.

Sponsored by:	Isilon Systems, Inc.
2005-03-13 11:54:28 +00:00
jeff
d2fecffa39 - Enable SMP VFS by default on current. More users are needed to turn up
any remaining bugs.  Anyone inconvenienced by this can still disable it
   in the loader.

Sponsored by:	Isilon Systems, Inc.
2005-02-23 10:05:43 +00:00
jeff
cd66df18cc - Only the xlock holder should be calling VOP_LOCK on a vp once VI_XLOCK
has been set.  Assert that this is the case so that we catch filesystems
   who are using naked VOP_LOCKs in illegal cases.

Sponsored by:	Isilon Systems, Inc.
2005-02-23 00:11:14 +00:00
jeff
0d71606b28 - Add a check for xlock in vop_lock_assert. Presently the xlock is
considered to be as good as an exclusive lock, although there is still a
   possibility of someone acquiring a VOP LOCK while xlock is held.

Sponsored by:	Isilon Systems, Inc.
2005-02-22 23:59:11 +00:00
phk
31dd38da62 Zero the v_un container field to make sure everything is gone. 2005-02-22 18:56:18 +00:00
phk
f1d058e032 Reap more benefits from DEVFS:
List devfs_dirents rather than vnodes off their shared struct cdev, this
saves a pointer field in the vnode at the expense of a field in the
devfs_dirent.  There are often 100 times more vnodes so this is bargain.
In addition it makes it harder for people to try to do stypid things like
"finding the vnode from cdev".

Since DEVFS handles all VCHR nodes now, we can do the vnode related
cleanup in devfs_reclaim() instead of in dev_rel() and vgonel().
Similarly, we can do the struct cdev related cleanup in dev_rel()
instead of devfs_reclaim().

	rename idestroy_dev() to destroy_devl() for consistency.

	Add LIST_ENTRY de_alias to struct devfs_dirent.
	Remove v_specnext from struct vnode.
	Change si_hlist to si_alist in struct cdev.
	String new devfs vnodes' devfs_dirent on si_alist when
	we create them and take them off in devfs_reclaim().

	Fix devfs_revoke() accordingly.  Also don't clear fields
	devfs_reclaim() will clear when called from vgone();

	Let devfs_reclaim() call dev_rel() instead of vgonel().

	Move the usecount tracking from dev_rel() to devfs_reclaim(),
	and let dev_rel() take a struct cdev argument instead of vnode.

	Destroy SI_CHEAPCLONE devices in dev_rel() (instead of
	devfs_reclaim()) when they are no longer used.   (This
	should maybe happen in devfs_close() instead.)
2005-02-22 15:51:07 +00:00
phk
cd21b2e10c Remove vfinddev(), it is generally bogus when faced with jails and
chroot and has no legitimate use(r)s in the tree.
2005-02-22 14:11:47 +00:00
phk
66dfd63961 Try to unbreak the vnode locking around vop_reclaim() (based mostly on
patch from kan@).

Pull bufobj_invalbuf() out of vinvalbuf() and make g_vfs call it on
close.  This is not yet a generally safe function, but for this very
specific use it is safe.  This solves the problem with buffers not
being flushed by unmount or after failed mount attempts.
2005-02-19 11:44:57 +00:00
phk
1fe081e954 Make sure to drop the VI_LOCK in vgonel();
Spotted by: Taku YAMAMOTO <taku@tackymt.homeip.net>
2005-02-18 11:13:56 +00:00
phk
af1fa2025c Introduce vx_wait{l}() and use it instead of home-rolled versions. 2005-02-17 10:49:51 +00:00
phk
b6768ad7ab Convert KASSERTS to VNASSERTS 2005-02-17 10:28:58 +00:00
phk
5dd8d30575 Make various vnode related functions static 2005-02-10 12:28:58 +00:00
phk
5d1652b89d Don't pass NULL to vprint() 2005-02-10 08:55:08 +00:00
jeff
06f7a532e9 - Add a new assert in the getnewvnode(). Assert that the usecount is still
0 to detect getnewvnode() races.
 - Add the vnode address to a few panics near by to help in debugging.

Sponsored by:	Isilon Systems, Inc.
2005-02-08 23:27:10 +00:00
phk
628952636c Access vmobject via the bufobj instead of the vnode 2005-02-07 10:04:06 +00:00
phk
d2bbb620e9 Don't call VOP_DESTROYVOBJECT(), trust that VOP_RECLAIM() did what
was necessary.
2005-02-07 07:48:03 +00:00
phk
4f73d0b6fc Remove unused argument to vrecycle() 2005-01-28 13:08:21 +00:00
phk
f8b1ba904f Integrate vclean() into vgonel().
Various associated polishing.
2005-01-28 13:00:03 +00:00
phk
eaf84397bb Remove register keyword 2005-01-28 12:39:10 +00:00
phk
796d435574 Don't use VOP_GETVOBJECT, use vp->v_object directly. 2005-01-25 00:40:01 +00:00
phk
1d63b12e22 Eliminate the constant flags argument to vclean() 2005-01-24 22:22:02 +00:00
phk
dc1cfea3cd Change vprint() to vn_printf() which takes varargs.
Add #define for vprint() to call vn_printf().
2005-01-24 13:58:08 +00:00
phk
d5c135375c Kill the VV_OBJBUF and test the v_object for NULL instead. 2005-01-24 13:13:57 +00:00
jeff
8cb5395678 - Add the tunable and sysctl for the mpsafevfs. It currently defaults
to off.
 - Protect access to mnt_kern_flag with the mointpoint mutex.
 - Remove some KASSERTs which are not legal checks without the appropriate
   locks held.
 - Use VCANRECYCLE() rather than rolling several slightly different
   checks together.
 - Return from vtryrecycle() with a recycled vnode rather than a locked
   vnode.  This simplifies some locking.
 - Remove several GIANT_REQUIRED lines.
 - Add a few KASSERTs to help with INACT debugging.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:41:01 +00:00
phk
d3b1b2cc99 Fix a bug I introduced in 1.561 which has caused considerable filesystem
unhappiness lately.

As far as I can tell, no files that have made it safely to disk
have been endangered, but stuff in transit has been in peril.

Pointy hat:	phk
2005-01-16 21:09:39 +00:00
phk
cc0cbc6b34 Eliminate unused and unnecessary "cred" argument from vinvalbuf() 2005-01-14 07:33:51 +00:00
phk
3760addae2 Ditch vfs_object_create() and make the callers call VOP_CREATEVOBJECT()
directly.
2005-01-13 12:25:19 +00:00
phk
5a497775d6 Add BO_SYNC() and add a default which uses the secret vnode pointer
and VOP_FSYNC() for now.
2005-01-11 10:43:08 +00:00
phk
437e41e061 More vnode -> bufobj migration. 2005-01-11 10:16:39 +00:00
phk
649a01e1a5 Give flushbuflist() a struct bufv as first argument and avoid home-rolling
TAILQ_FOREACH_SAFE().

Loose the error pointer argument and return any errors the normal way.

Return EAGAIN for the case where more work needs to be done.
2005-01-11 10:01:54 +00:00
phk
da2718f1af Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC().
I'm not sure why a credential was added to these in the first place, it is
not used anywhere and it doesn't make much sense:

	The credentials for syncing a file (ability to write to the
	file) should be checked at the system call level.

	Credentials for syncing one or more filesystems ("none")
	should be checked at the system call level as well.

	If the filesystem implementation needs a particular credential
	to carry out the syncing it would logically have to the
	cached mount credential, or a credential cached along with
	any delayed write data.

Discussed with:	rwatson
2005-01-11 07:36:22 +00:00
imp
20280f1431 /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
phk
f2581a224f Since we do not support forceful unmount of DEVFS we can do away with
the partially implemented vnode-readoption code in vgonechrl().
2005-01-04 08:49:14 +00:00
phk
0faeb292ed We can only ever get to vgonechrl() from a devfs vnode, so we do not
need to reassign the vp->v_op to devfs_specops, we know that is the
value already.

Make devfs_specops private to devfs.
2004-12-20 21:34:29 +00:00
phk
4a639d6164 The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly
split the conversion of the remaining three filesystems out from the root
mounting changes, so in one go:

cd9660:
	Convert to nmount.
	Add omount compat shims.
	Remove dedicated rootfs mounting code.
	Use vfs_mountedfrom()
	Rely on vfs_mount.c calling VFS_STATFS()

nfs(client):
	Convert to nmount (the simple way, mount_nfs(8) is still necessary).
	Add omount compat shims.
	Drop COMPAT_PRELITE2 mount arg compatibility.

ffs:
	Convert to nmount.
	Add omount compat shims.
	Remove dedicated rootfs mounting code.
	Use vfs_mountedfrom()
	Rely on vfs_mount.c calling VFS_STATFS()

Remove vfs_omount() method, all filesystems are now converted.

Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem
task, and they all do it now.

Change rootmounting to use DEVFS trampoline:

vfs_mount.c:
	Mount devfs on /.  Devfs needs no 'from' so this is clean.
	symlink /dev to /.  This makes it possible to lookup /dev/foo.
	Mount "real" root filesystem on /.
	Surgically move the devfs mountpoint from under the real root
	filesystem onto /dev in the real root filesystem.

Remove now unnecessary getdiskbyname().

kern_init.c:
	Don't do devfs mounting and rootvnode assignment here, it was
	already handled by vfs_mount.c.

Remove now unused bdevvp(), addaliasu() and addalias().  Put the
few necessary lines in devfs where they belong.  This eliminates the
second-last source of bogo vnodes, leaving only the lemming-syncer.

Remove rootdev variable, it doesn't give meaning in a global context and
was not trustworth anyway.  Correct information is provided by
statfs(/).
2004-12-07 08:15:41 +00:00
phk
4b1a114436 Improve vprint() a little bit: break long lines, reduce indent and tell
if the VI_LOCK() is held.
2004-12-03 12:09:34 +00:00
phk
59f305606c Back when VOP_* was introduced, we did not have new-style struct
initializations but we did have lofty goals and big ideals.

Adjust to more contemporary circumstances and gain type checking.

	Replace the entire vop_t frobbing thing with properly typed
	structures.  The only casualty is that we can not add a new
	VOP_ method with a loadable module.  History has not given
	us reason to belive this would ever be feasible in the the
	first place.

	Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc.

	Give coda correct prototypes and function definitions for
	all vop_()s.

	Generate a bit more data from the vnode_if.src file:  a
	struct vop_vector and protype typedefs for all vop methods.

	Add a new vop_bypass() and make vop_default be a pointer
	to another struct vop_vector.

	Remove a lot of vfs_init since vop_vector is ready to use
	from the compiler.

	Cast various vop_mumble() to void * with uppercase name,
	for instance VOP_PANIC, VOP_NULL etc.

	Implement VCALL() by making vdesc_offset the offsetof() the
	relevant function pointer in vop_vector.  This is disgusting
	but since the code is generated by a script comparatively
	safe.  The alternative for nullfs etc. would be much worse.

	Fix up all vnode method vectors to remove casts so they
	become typesafe.  (The bulk of this is generated by scripts)
2004-12-01 23:16:38 +00:00
phk
ca008fe171 Move pbgetvp() and pbrelvp() to vm_pager.c with the rest of the pbuf stuff. 2004-11-15 08:12:50 +00:00
phk
d632cb3d94 Move the bit of the syncer which deals with vnodes into a separate
function.
2004-11-14 15:24:38 +00:00
phk
b99ff6bbb6 Eliminate vop_revoke() function now that devfs_revoke() does the entire job. 2004-11-13 23:38:13 +00:00
phk
9fdf027988 Slim vnodes by another four bytes by eliminating the (now) unused field
v_cachedid.
2004-11-10 07:31:06 +00:00
phk
cea5000057 Remove vn_todev() 2004-11-10 07:17:28 +00:00
phk
7bef55364a Remove vnode->v_cachedfs.
It was only used for the highly dangerous "export all vnodes with a sysctl"
function.
2004-11-09 22:51:03 +00:00
phk
1e4caea88c Remove buf->b_dev field. 2004-11-04 07:59:57 +00:00
phk
34a530853d Always initialize bo_private along with bo_ops in getnewvnode().
Spotted by:	tegge
2004-11-03 21:09:23 +00:00
phk
c928cc4c54 Loose vfs_mountedon() 2004-10-29 11:15:08 +00:00
phk
12ca46b3fc Give the bufobj a private __bo_vnode for now to keep the syncer floating [1]
At some point later the syncer will unlearn about vnodes and the filesystems
method called by the syncer will know enough about what's in bo_private to
do the right thing.

[1] Ok, I know, but I couldn't resist the pun.
2004-10-29 09:33:32 +00:00
phk
56a7ee8e7f Move the syncer linkage from vnode to bufobj.
This is not quite a perfect separation: the syncer still think it knows
that everything is a vnode.
2004-10-27 08:05:02 +00:00
phk
c66aa10c8e Put the I/O block size in bufobj->bo_bsize.
We keep si_bsize_phys around for now as that is the simplest way to pull
the number out of disk device drivers in devfs_open().  The correct solution
would be to do an ioctl(DIOCGSECTORSIZE), but the point is probably mooth
when filesystems sit on GEOM, so don't bother for now.
2004-10-26 07:39:12 +00:00
phk
0e87ab8bc6 Loose the v_dirty* and v_clean* alias macros.
Check the count field where we just want to know the full/empty state,
rather than using TAILQ_EMPTY() or TAILQ_FIRST().
2004-10-25 09:14:03 +00:00
phk
3a8a530155 Remove vnode->v_bsize. This was a dead-end. 2004-10-25 07:50:59 +00:00
phk
4ba53ec41b Collapse vnode->v_object and buf->b_object into bufobj->bo_object. 2004-10-25 06:02:57 +00:00
phk
1b25a59886 Move the buffer method vector (buf->b_op) to the bufobj.
Extend it with a strategy method.

Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY
song and dance.

Rename ibwrite to bufwrite().

Move the two NFS buf_ops to more sensible places, add bufstrategy
to them.

Add inlines for bwrite() and bstrategy() which calls through
buf->b_bufobj->b_ops->b_{write,strategy}().

Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().
2004-10-24 20:03:41 +00:00
rwatson
f046b53692 When MAC is enabled, warn if getnewvnode() is asked to produce a vnode
without a mountpoint.  In this scenario, there's no useful source for
a label on the vnode, since we can't query the mountpoint for the
labeling strategy or default label.
2004-10-22 11:04:58 +00:00
phk
2c3e47b668 Alas, poor SPECFS! -- I knew him, Horatio; A filesystem of infinite
jest, of most excellent fancy: he hath taught me lessons a thousand
times; and now, how abhorred in my imagination it is! my gorge rises
at it.  Here were those hacks that I have curs'd I know not how
oft.  Where be your kludges now? your workarounds? your layering
violations, that were wont to set the table on a roar?

Move the skeleton of specfs into devfs where it now belongs and
bury the rest.
2004-10-22 09:59:37 +00:00
phk
52a089c526 Add b_bufobj to struct buf which eventually will eliminate the need for b_vp.
Initialize b_bufobj for all buffers.

Make incore() and gbincore() take a bufobj instead of a vnode.

Make inmem() local to vfs_bio.c

Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj)
also VI_MTX() to BO_MTX(),

Make buf_vlist_add() take a bufobj instead of a vnode.

Eliminate other uses of bp->b_vp where bp->b_bufobj will do.

Various minor polishing: remove "register", turn panic into KASSERT,
use new function declarations, TAILQ_FOREACH_SAFE() etc.
2004-10-22 08:47:20 +00:00
phk
3833976d12 Move the VI_BWAIT flag into no bo_flag element of bufobj and call it BO_WWAIT
Add bufobj_wref(), bufobj_wdrop() and bufobj_wwait() to handle the write
count on a bufobj.  Bufobj_wdrop() replaces vwakeup().

Use these functions all relevant places except in ffs_softdep.c where
the use if interlocked_sleep() makes this impossible.

Rename b_vnbufs to b_bobufs now that we touch all the relevant files anyway.
2004-10-21 15:53:54 +00:00
phk
fdf614c0ba Add BO_* macros parallel to VI_* macros for manipulating the bo_mtx.
Initialize the bo_mtx when we allocate a vnode i getnewvnode() For
now we point to the vnodes interlock mutex, that retains the exact
same locking sematics.

Move v_numoutput from vnode to bufobj.  Add renaming macro to
postpone code sweep.
2004-10-21 14:42:31 +00:00
phk
b436dad078 Polish vtruncbuf() to improve readability and style a bit. 2004-10-21 14:13:54 +00:00
phk
350f812103 Simplify buf_vlist_remove().
Now that we have encapsulated the splaytree related information
into a structure we can eliminate the half of this function.
2004-10-21 13:48:50 +00:00
grog
152055d94b vtryrecycle: Don't rely on type VBAD alone to mean that we don't need
to clean the vnode.  If v_data is set, we still need to
	     clean it.  This code change should catch all incidents of
	     the previous commit (INVARIANTS only).
2004-10-06 02:09:59 +00:00
grog
882d69104e getnewvnode: Weaken the panic "cleaned vnode isn't" to a warning.
Discussion: this panic (or waning) only occurs when the kernel is
  compiled with INVARIANTS.  Otherwise the problem (which means that
  the vp->v_data field isn't NULL, and represents a coding error and
  possibly a memory leak) is silently ignored by setting it to NULL
  later on.

  Panicking here isn't very helpful: by this time, we can only find
  the symptoms.  The panic occurs long after the reason for "not
  cleaning" has been forgotten; in the case in point, it was the
  result of severe file system corruption which left the v_type field
  set to VBAD.  That issue will be addressed by a separate commit.
2004-10-06 02:06:11 +00:00
phk
2e5b8b9883 Fix a LOR relating to freeing cdevs. 2004-10-01 06:33:39 +00:00
phk
5536d5757b Hold dev_lock and check for NULL devsw pointer when we determine
if a vnode is a disk.
2004-09-24 06:16:08 +00:00
phk
3947e54e89 Do not refcount the cdevsw, but rather maintain a cdev->si_threadcount
of the number of threads which are inside whatever is behind the
cdevsw for this particular cdev.

Make the device mutex visible through dev_lock() and dev_unlock().
We may want finer granularity later.

Replace spechash_mtx use with dev_lock()/dev_unlock().
2004-09-23 07:17:41 +00:00
phk
02df7323ee Remove unused B_WRITEINPROG flag 2004-09-15 21:49:22 +00:00
phk
1912367ebb Create simple function init_va_filerev() for initializing a va_filerev
field.

Replace three instances of longhaired initialization va_filerev fields.

Added XXX comment wondering why we don't use random bits instead of
uptime of the system for this purpose.
2004-09-07 09:17:05 +00:00
truckman
54d23a34f6 Don't attempt to trigger the syncer thread final sync code in the
shutdown_pre_sync state if the RB_NOSYNC flag is set.  This is the
likely cause of hangs after a system panic that are keeping crash
dumps from being done.

This is a MFC candidate for RELENG_5.

MFC after:	3 days
2004-08-20 19:21:47 +00:00
obrien
8f41b4e870 s/MAX_SAFE_MAXVNODES/MAXVNODES_MAX/g 2004-08-16 08:33:37 +00:00
jmg
bc1805c6e8 Add locking to the kqueue subsystem. This also makes the kqueue subsystem
a more complete subsystem, and removes the knowlege of how things are
implemented from the drivers.  Include locking around filter ops, so a
module like aio will know when not to be unloaded if there are outstanding
knotes using it's filter ops.

Currently, it uses the MTX_DUPOK even though it is not always safe to
aquire duplicate locks.  Witness currently doesn't support the ability
to discover if a dup lock is ok (in some cases).

Reviewed by:	green, rwatson (both earlier versions)
2004-08-15 06:24:42 +00:00
rwatson
371cf09cf7 In v_addpollinfo(), we allocate storage to back vp->v_pollinfo. However,
we may sleep when doing so; check that we didn't race with another thread
allocating storage for the vnode after allocation is made to a local
pointer, and only update the vnode pointer if it's still NULL.  Otherwise,
accept that another thread got there first, and release the local storage.

Discussed with:	jmg
2004-08-11 01:27:53 +00:00
njl
7e21ce666c Skip the syncing disks loop if there are no dirty buffers. Remove a
variable used to flag the initial printf.

Submitted by:	truckman (earlier version)
2004-08-10 01:32:05 +00:00
obrien
47f728c0bc Put a cap on the auto-tuning of kern.maxvnodes.
Cap value chosen by:	scottl
2004-08-02 21:52:43 +00:00
njl
774b91783e Minor message cleanup. 2004-07-30 01:30:05 +00:00
phk
8c9258b82e Convert the vfsconf list to a TAILQ.
Introduce vfs_byname() function to find things on it.

Staticize vfs_nmount() function under the name vfs_donmount().

Various cleanups.
2004-07-27 22:32:01 +00:00
cperciva
d9fecc83c8 Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is
somewhat clearer, but more importantly allows for a consistent naming
scheme for suser_cred flags.

The old name is still defined, but will be removed in a few days (unless I
hear any complaints...)

Discussed with:	rwatson, scottl
Requested by:	jhb
2004-07-26 07:24:04 +00:00
phk
5297516e02 Eliminate unused second argument to reassignbuf() and simplify it
accordingly.
2004-07-25 21:24:23 +00:00
alfred
c2337759f3 put several of the options for DEBUG_VFS_LOCKS under control of sysctls. 2004-07-21 07:13:14 +00:00
alfred
bcc5104ce1 Cleanup shutdown output. 2004-07-15 08:01:00 +00:00
alfred
0389c188de Tidy up system shutdown. 2004-07-15 04:29:48 +00:00
alfred
8a1713aada Make VFS_ROOT() and vflush() take a thread argument.
This is to allow filesystems to decide based on the passed thread
which vnode to return.
Several filesystems used curthread, they now use the passed thread.
2004-07-12 08:14:09 +00:00
alfred
45c9fe9da7 Dump the actual bad values when this assertion is tripped. 2004-07-12 04:13:38 +00:00
marcel
c20ced5cd2 Update for the KDB framework:
o  Call kdb_enter() instead of Debugger().
2004-07-10 21:47:53 +00:00
alfred
b65386ecc3 fixup sysctl by fsid node 2004-07-08 06:11:36 +00:00
alfred
e0a5f530c2 Introduce vfs_suser(), used to test if a user should have special privs
for a mount.
2004-07-06 09:37:43 +00:00
alfred
97a6f04270 NFS mobility PHASE I, II & III (phase VI, and V pending):
Rebind the client socket when we experience a timeout.  This fixes
the case where our IP changes for some reason.

Signal a VFS event when NFS transitions from up to down and vice
versa.

Add a placeholder vfs_sysctl where we will put status reporting
shortly.

Also:
Make down NFS mounts return EIO instead of EINTR when there is a
soft timeout or force unmount in progress.
2004-07-06 09:12:03 +00:00
truckman
690b842bc5 Unconditionally set last_work_seen while in the SYNCER_RUNNING state
so that last_work_seen has a reasonable value at the transition
to the SYNCER_SHUTTING_DOWN state, even if net_worklist_len happened
to be zero at the time.

Initialize last_work_seen to zero as a safety measure in case the
syncer never ran in the SYNCER_RUNNING state.

Tested by:	phk
2004-07-05 21:32:01 +00:00
truckman
471ab74bb2 Rework syncer termination code:
Speed up the syncer when shutting down by sleeping for a shorter
    period of time instead of cranking up rushjob and using the
    normal one second sleep.

    Skip empty worklist slots when shutting down to avoid lengthy
    intervals of inactivity.

    Give I/O more time to complete between steps by not speeding the
    syncer quite as much.

    Terminate the syncer after one full pass through the worklist
    plus one second with the worklist containing nothing but syncer
    vnodes.

    Print an indication of shutdown progress to the console.

Add a sysctl, vfs.worklist_len, to allow the size of the syncer worklist
to be monitored.
2004-07-05 01:07:33 +00:00
phk
49a3aa211e Give synthetic root filesystem device vnodes a v_bsize of DEV_BSIZE. 2004-07-04 22:33:22 +00:00
alfred
95f8f1e089 Pass the operation in with the fsidctl.
Remove some fsidctls that we will not be using.
Correct prototypes for fs sysctls.
2004-07-04 20:21:58 +00:00
phk
59c88fd71a Make the last commit handle non-phk root devices better. 2004-07-04 19:42:25 +00:00
phk
b52c81e5db Blocksize for I/O should be a property of the vnode and not found by groping
around in the vnodes surroundings when we allocate a block.

Assign a blocksize when we create a vnode, and yell a warning (and ignore it)
if we got the wrong size.

Please email all such warnings to me.
2004-07-04 12:49:04 +00:00
alfred
bbaa6c3ec0 Introduce a new kevent filter. EVFILT_FS that will be used to signal
generic filesystem events to userspace.  Currently only mount and unmount
of filesystems are signalled.  Soon to be added, up/down status of NFS.

Introduce a sysctl node used to route requests to/from filesystems
based on filesystem ids.

Introduce a new vfsop, vfs_sysctl(mp, req) that is used as the callback/
entrypoint by the sysctl code to change individual filesystems.
2004-07-04 10:52:54 +00:00
alfred
4a61cff009 Revision 1.496 would not boot on my system due to
ffs_mount -> bdevvp -> getnewvnode(..., mp = NULL, ...) ->
 insmntqueue(vp, mp = NULL) -> KASSERT -> panic

Make getnewvnode() only call insmntqueue() if the mountpoint parameter
is not NULL.
2004-07-04 10:19:15 +00:00
phk
070a613a48 When we traverse the vnodes on a mountpoint we need to look out for
our cached 'next vnode' being removed from this mountpoint.  If we
find that it was recycled, we restart our traversal from the start
of the list.

Code to do that is in all local disk filesystems (and a few other
places) and looks roughly like this:

		MNT_ILOCK(mp);
	loop:
		for (vp = TAILQ_FIRST(&mp...);
		    (vp = nvp) != NULL;
		    nvp = TAILQ_NEXT(vp,...)) {
			if (vp->v_mount != mp)
				goto loop;
			MNT_IUNLOCK(mp);
			...
			MNT_ILOCK(mp);
		}
		MNT_IUNLOCK(mp);

The code which takes vnodes off a mountpoint looks like this:

	MNT_ILOCK(vp->v_mount);
	...
	TAILQ_REMOVE(&vp->v_mount->mnt_nvnodelist, vp, v_nmntvnodes);
	...
	MNT_IUNLOCK(vp->v_mount);
	...
	vp->v_mount = something;

(Take a moment and try to spot the locking error before you read on.)

On a SMP system, one CPU could have removed nvp from our mountlist
but not yet gotten to assign a new value to vp->v_mount while another
CPU simultaneously get to the top of the traversal loop where it
finds that (vp->v_mount != mp) is not true despite the fact that
the vnode has indeed been removed from our mountpoint.

Fix:

Introduce the macro MNT_VNODE_FOREACH() to traverse the list of
vnodes on a mountpoint while taking into account that vnodes may
be removed from the list as we go.  This saves approx 65 lines of
duplicated code.

Split the insmntque() which potentially moves a vnode from one mount
point to another into delmntque() and insmntque() which does just
what the names say.

Fix delmntque() to set vp->v_mount to NULL while holding the
mountpoint lock.
2004-07-04 08:52:35 +00:00
truckman
9ed03e6eb3 When shutting down the syncer kernel thread, first tell it to run
faster and iterate to over its work list a few times in an attempt
to empty the work list before the syncer terminates.  This leaves
fewer dirty blocks to be written at the "syncing disks" stage and
keeps the the "giving up on N buffers" problem from being triggered
by the presence of a large soft updates work list at system shutdown
time.  The downside is that the syncer takes noticeably longer to
terminate.

Tested by:	"Arjan van Leeuwen" <avleeuwen AT piwebs DOT com>
Approved by:	mckusick
2004-07-01 23:59:19 +00:00
phk
40dd98a3bd Second half of the dev_t cleanup.
The big lines are:
	NODEV -> NULL
	NOUDEV -> NODEV
	udev_t -> dev_t
	udev2dev() -> findcdev()

Various minor adjustments including handling of userland access to kernel
space struct cdev etc.
2004-06-17 17:16:53 +00:00
phk
dfd1f7fd50 Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.
2004-06-16 09:47:26 +00:00
phk
ba2d5c4937 Remove a left over from userland buffer-cache access to disks. 2004-06-14 14:25:03 +00:00
rwatson
afc098b3e1 Assert Giant in vrele(). 2004-05-31 19:06:01 +00:00
mux
79217d1505 Put deprecated sysctl code inside BURN_BRIDGES. 2004-04-11 21:09:22 +00:00
imp
74cf37bd00 Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core
2004-04-05 21:03:37 +00:00
peter
5e91995e52 Kill some XXXKSE's. vnlru/syncer are single threaded. 2004-03-29 22:45:33 +00:00
phk
2a5e157787 Properly vector all bwrite() and BUF_WRITE() calls through the same path
and s/BUF_WRITE()/bwrite()/ since it now does the same as bwrite().
2004-03-11 18:02:36 +00:00
phk
eeb7579130 Remove unused second arg to vfinddev().
Don't call addaliasu() on VBLK nodes.
2004-03-11 16:33:11 +00:00
kan
e795b7939d Always call vn_finished_write after vn_start_write was called. All
occurences of 'goto done' after vn_start_write invocation were cleaning
up incompletely.
2004-03-06 04:09:54 +00:00
jhb
d25301c858 Switch the sleep/wakeup and condition variable implementations to use the
sleep queue interface:
- Sleep queues attempt to merge some of the benefits of both sleep queues
  and condition variables.  Having sleep qeueus in a hash table avoids
  having to allocate a queue head for each wait channel.  Thus, struct cv
  has shrunk down to just a single char * pointer now.  However, the
  hash table does not hold threads directly, but queue heads.  This means
  that once you have located a queue in the hash bucket, you no longer have
  to walk the rest of the hash chain looking for threads.  Instead, you have
  a list of all the threads sleeping on that wait channel.
- Outside of the sleepq code and the sleep/cv code the kernel no longer
  differentiates between cv's and sleep/wakeup.  For example, calls to
  abortsleep() and cv_abort() are replaced with a call to sleepq_abort().
  Thus, the TDF_CVWAITQ flag is removed.  Also, calls to unsleep() and
  cv_waitq_remove() have been replaced with calls to sleepq_remove().
- The sched_sleep() function no longer accepts a priority argument as
  sleep's no longer inherently bump the priority.  Instead, this is soley
  a propery of msleep() which explicitly calls sched_prio() before
  blocking.
- The TDF_ONSLEEPQ flag has been dropped as it was never used.  The
  associated TDF_SET_ONSLEEPQ and TDF_CLR_ON_SLEEPQ macros have also been
  dropped and replaced with a single explicit clearing of td_wchan.
  TD_SET_ONSLEEPQ() would really have only made sense if it had taken
  the wait channel and message as arguments anyway.  Now that that only
  happens in one place, a macro would be overkill.
2004-02-27 18:52:44 +00:00
truckman
1de257deb3 Split the mlock() kernel code into two parts, mlock(), which unpacks
the syscall arguments and does the suser() permission check, and
kern_mlock(), which does the resource limit checking and calls
vm_map_wire().  Split munlock() in a similar way.

Enable the RLIMIT_MEMLOCK checking code in kern_mlock().

Replace calls to vslock() and vsunlock() in the sysctl code with
calls to kern_mlock() and kern_munlock() so that the sysctl code
will obey the wired memory limits.

Nuke the vslock() and vsunlock() implementations, which are no
longer used.

Add a member to struct sysctl_req to track the amount of memory
that is wired to handle the request.

Modify sysctl_wire_old_buffer() to return an error if its call to
kern_mlock() fails.  Only wire the minimum of the length specified
in the sysctl request and the length specified in its argument list.
It is recommended that sysctl handlers that use sysctl_wire_old_buffer()
should specify reasonable estimates for the amount of data they
want to return so that only the minimum amount of memory is wired
no matter what length has been specified by the request.

Modify the callers of sysctl_wire_old_buffer() to look for the
error return.

Modify sysctl_old_user to obey the wired buffer length and clean up
its implementation.

Reviewed by:	bms
2004-02-26 00:27:04 +00:00
phk
711ff67b90 Check for NODEV return from udev2dev() 2004-02-21 23:52:03 +00:00
phk
5551e292d8 Device megapatch 6/6:
This is what we came here for:  Hang dev_t's from their cdevsw,
refcount cdevsw and dev_t and generally keep track of things a lot
better than we used to:

Hold a cdevsw reference around all entrances into the device driver,
this will be necessary to safely determine when we can unload driver
code.

Hold a dev_t reference while the device is open.

KASSERT that we do not enter the driver on a non-referenced dev_t.

Remove old D_NAG code, anonymous dev_t's are not a problem now.

When destroy_dev() is called on a referenced dev_t, move it to
dead_cdevsw's list.  When the refcount drops, free it.

Check that cdevsw->d_version is correct.  If not, set all methods
to the dead_*() methods to prevent entrance into driver.  Print
warning on console to this effect.  The device driver may still
explode if it is also incompatible with newbus, but in that case
we probably didn't get this far in the first place.
2004-02-21 21:57:26 +00:00
phk
39fb4aef3d Device megapatch 5/6:
Remove the unused second argument from udev2dev().

Convert all remaining users of makedev() to use udev2dev().  The
semantic difference is that udev2dev() will only locate a pre-existing
dev_t, it will not line makedev() create a new one.

Apart from the tiny well controlled windown in D_PSEUDO drivers,
there should no longer be any "anonymous" dev_t's in the system
now, only dev_t's created with make_dev() and make_dev_alias()
2004-02-21 21:32:15 +00:00
kan
443a35f2bf More style fixes.
Obtained from:	bde
2004-01-05 23:40:46 +00:00
kan
2ffbc2464e style(9):
Add empty line before first code line in functions with no local
variables.
Properly terminate comment sentences.
Indent lines which are longer that 80 characters.
Move v_addpollinfo closer to the rest of poll-related functions.
Move DEBUG_VFS_LOCKS ifdefed block to the end of file.

Obtained from:	bde (partly)
2004-01-05 19:04:29 +00:00
kan
2872921967 Cosmetics: strip '\n' from a string passed to Debugger(). 2004-01-04 03:42:20 +00:00
bde
7d91626477 v_vxproc was a bogus name for a thread (pointer). 2003-12-28 09:12:56 +00:00
jeff
5443bd4c65 - In vget() if LK_NOWAIT is specified we should return EBUSY and not ENOENT.
Submitted by:	Stephan Uphoff <ups@stups.com>
2003-12-16 17:08:27 +00:00
jeff
aa712bc6e4 - When doing a forced unmount, VFS attempts to keep VCHR vnodes valid by
reassigning their v_ops field to specfs, detaching from the mountpoint, etc.
   However, this is not sufficient.  If we vclean() the vnode the pages owned
   by the vnode are lost, potentially while buffers reference them.  Implement
   parts of vclean() seperately in vgonechrl() so that the pages and bufs
   associated with a device vnode are not destroyed while in use.
2003-12-16 17:05:05 +00:00
jeff
e35dcab926 - Don't forget to unlock the vnode interlock in the LK_NOWAIT case.
Submitted by:	Stephan Uphoff <ups@stups.com>
Approved by:	re (rwatson)
2003-11-30 22:09:58 +00:00
tanimura
7eade05dfa - Implement selwakeuppri() which allows raising the priority of a
thread being waken up.  The thread waken up can run at a priority as
  high as after tsleep().

- Replace selwakeup()s with selwakeuppri()s and pass appropriate
  priorities.

- Add cv_broadcastpri() which raises the priority of the broadcast
  threads.  Used by selwakeuppri() if collision occurs.

Not objected in:	-arch, -current
2003-11-09 09:17:26 +00:00
kan
36d60f3bb7 Remove mntvnode_mtx and replace it with per-mountpoint mutex.
Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to
operate on this mutex transparently.

Eventually new mutex will be protecting more fields in
struct mount, not only vnode list.

Discussed with: jeff
2003-11-05 04:30:08 +00:00
wollman
dabc1a332d Add appropriate const poisoning to the assert_*locked() family so that I can
call ASSERT_VOP_LOCKED(vp, __func__) without a diagnostic.

Inspired by:	the evil and rude OpenAFS cache manager code
2003-10-23 18:17:36 +00:00
alc
512489f301 Initialize the buf's b_object in pbgetvp(). Clear it in pbrelvp(). (This
facilitates synchronization of the vm page's valid field using the
vm object's lock.)

Suggested by:	tegge
2003-10-20 18:24:38 +00:00
phk
888092f317 Simplify count_dev() 2003-10-17 11:56:48 +00:00
phk
edaf9214c4 Simplify vn_isdisk() a bit. 2003-10-12 14:04:39 +00:00
jeff
5b9cc4b22e - Fix a typo, I meant & and not |. This was causing lockups from the syncer
looping forever due to list corruption.

Solved by:	tegge
2003-10-11 21:50:45 +00:00
jeff
2c3fea92c8 - Fix an XXX. Check the error of vn_lock() in vflush(). Don't specify
LK_RETRY either, we don't want this vnode if it turns into another.
 - Remove the code that checks the mount point after acquiring the lock
   we are guaranteed to either fail or get the vnode that we wanted.
2003-10-05 07:12:38 +00:00
jeff
6b8324e841 - Rename vcanrecycle() to vtryrecycle() to reflect its new role.
- In vtryrecycle() try to vgonel the vnode if all of the previous checks
   passed.  We won't vgonel if someone has either acquired a hold or usecount
   or started the vgone process elsewhere.  This is because we may have been
   removed from the free list while we were inspecting the vnode for
   recycling.
 - The VI_TRYLOCK stops two threads from entering getnewvnode() and recycling
   the same vnode.  To further reduce the likelyhood of this event, requeue
   the vnode on the tail of the list prior to calling vtryrecycle().  We can
   not actually remove the vnode from the list until we know that it's
   going to be recycled because other interlock holders may see the VI_FREE
   flag and try to remove it from the free list.
 - Kill a bogus XXX comment.  If XLOCK is set we shouldn't wait for it
   regardless of MNT_WAIT because the vnode does not actually belong to
   this filesystem.
2003-10-05 05:35:41 +00:00
jeff
e15704d590 - Don't cache_purge() in getnewvnode. It's done in vclean(). With this
purge, the purge in vclean, and the filesystems purge, we had 3 purges
   per vnode.
 - Move the insmntque(vp, 0) to vclean() so that we may remove it from the
   two vgone() functions and reduce the number of lock operations required.
2003-10-05 02:48:04 +00:00
jeff
8d0f78003f - Solve a LOR with the sync_mtx by using the VI_ONWORKLST flag to determine
whether or not the sync failed.  This could potentially get set between
   the time that we VOP_UNLOCK and VI_LOCK() but the race would harmelssly
   lead to the sync being delayed by an extra 30 seconds.  If we do not move
   the vnode it could cause an endless loop if it continues to fail to sync.
 - Use vhold and vdrop to stop the vnode from changing identities while we
   have it unlocked.  Other internal vfs lists are likely to follow this
   scheme.
2003-10-05 00:35:41 +00:00
jeff
8d72c43763 - Move the xlock 'locking' code into vx_lock() and vx_unlock().
- Create a new function, vgonechrl(), which performs vgone for an in-use
   character device.  Move the code from vflush() that did this into
   vgonechrl().
 - Hold the xlock across the entirety of vgonel() and vgonechrl() so that
   at no point will an invalid vnode exist on any list without XLOCK set.
 - Move the xlock code out of vclean() now that it is in the vgone*()
   functions.
2003-10-05 00:02:41 +00:00
jeff
55547647ec - In sched_sync() test our preconditions prior to dropping the sync_mtx.
This is so that we may grab the interlock while still holding the
   sync_mtx.  We have to VI_TRYLOCK() because in all other cases the lock
   order runs the other way.
 - If we don't meet any of the preconditions, reinsert the vp into the
   list for the next second.
 - We don't need to panic if we fail to sync here because each FSYNC
   function handles this case.  Removing this redundant code also
   simplifies locking.
2003-10-04 18:03:53 +00:00
jeff
bb40013912 - In a Giantless world, the vn_lock() in vcanrecycle() could legitimately
fail.  Remove the panic from that case and document why it might fail.
 - Document the reason for calling cache_purge() on a newly created vnode.
 - In insmntque() order the operations so that we can call mtx_unlock()
   one fewer times.  This makes the code somewhat clearer as well.
 - Add XXX comments in sched_sync() and vflush().
 - In vget(), do not sleep while waiting for XLOCK to clear if LK_NOWAIT is
   set.
 - In vclean() we don't need to acquire a lock around a single TAILQ_FIRST
   call.  It's ok if we race here, the vinvalbuf will just do nothing.
 - Increase the scope of the lock in vgonel() to reduce the number of lock
   operations that are performed.
2003-10-04 15:10:40 +00:00
jeff
517dcea6c8 - In reassignbuf() don't unlock vp and lock newvp if they are the same.
Doing so creates a race where the buf is on neither list.
 - Only vfree() in an error case in vclean() if VSHOULDFREE() thinks we
   should.
 - Convert the error case in vclean() to INVARIANTS from DIAGNOSTIC as this
   really should not happen and is fast to check.
2003-09-20 00:21:48 +00:00
jeff
45f3b1b270 - Remove spls(). The locking that has replaced them is in place and they
no longer serve as guidelines for future work.
2003-09-19 23:52:06 +00:00
kan
cf77f9f005 Eliminate one case of VI_UNLOCK followed by an immediate
VI_LOCK.
2003-09-19 19:13:54 +00:00
jhb
37641f86f1 Consistently use the BSD u_int and u_short instead of the SYSV uint and
ushort.  In most of these files, there was a mixture of both styles and
this change just makes them self-consistent.

Requested by:	bde (kern_ktrace.c)
2003-08-07 15:04:27 +00:00
phk
eb30c92e49 Revert stuff which accidentally ended up in the previous commit. 2003-07-22 10:36:36 +00:00