freebsd-skq

Author	SHA1	Message	Date
jeff	31cfb7f242	- Disable code which allows getnewvnode() to fail. Many ffs_vget() callers do not correctly deal with failures. This presently risks deadlock problems if dependency processing is held up by failures to allocate a vnode, however, this is better than the situation with the failures. Sponsored by: Isilon Systems, Inc.	2005-04-22 00:57:05 +00:00
phk	4bd811c8dd	Initialize mountlist_mtx with an MTX_SYSINIT(), we need it to be ready earlier.	2005-04-18 21:11:47 +00:00
jeff	5642885b84	- Change vop_lookup_post assertions to reflect recent vfs_lookup changes. Sponsored by: Isilon Systems, Inc.	2005-04-13 10:57:53 +00:00
jeff	b391d2675b	- Enable ASSERT_VOP_ELOCKED and assert_vop_elocked() now that vnode_if.awk uses it. Sponsored by: Isilon Systems, Inc.	2005-04-11 15:17:06 +00:00
jeff	17be4cbfa0	- Change the VOP_LOCK UPGRADE in vput() to do a LK_NOWAIT to avoid a potential lock order reversal. Also, don't unlock the vnode if this fails, lockmgr has already unlocked it for us. - Restructure vget() now that vn_lock() does all of VI_DOOMED checking for us and also handles the case where there is no real lock type. - If VI_OWEINACT is set, we need to upgrade the lock request to EXCLUSIVE so that we can call inactive. It's not legal to vget a vnode that hasn't had INACTIVE called yet. Sponsored by: Isilon Systems, Inc.	2005-04-11 09:28:32 +00:00
jeff	60d07eec30	- Assert that the bufobj matches in flushbuflists. I still haven't gotten to root cause on exactly how this happens. - If the assert is disabled, we presently try to handle this case, but the BUF_UNLOCK was missing. Thus, if this condition ever hit we would leak a buf lock. Many thanks to Peter Holm for all his help in finding this bug. He really put more effort into it than I did.	2005-04-06 06:49:46 +00:00
jeff	d42252c158	- Move NDFREE() from vfs_subr to vfs_lookup where namei() is.	2005-04-05 08:58:49 +00:00
jeff	d8b17b2eac	- Add a missing unlock of the vnode_free_list_mtx. Spotted by: Antoine Brodin	2005-04-04 12:07:16 +00:00
jeff	b6f8b968c2	- Instead of waiting forever to get a vnode in getnewvnode() wait for one to become available for one second and then return ENFILE. We can run out of vnodes, and there must be a hard limit because without one we can quickly run out of KVA on x86. Presently the system can deadlock if there are maxvnodes directories in the namecache. The original 4.x BSD behavior was to return ENFILE if we reached the max, but 4.x BSD did not have the vnlru proc so it was less profitable to wait.	2005-04-04 11:43:44 +00:00
jeff	e4d4b610ba	- Disable vfs shared locks by default. They must be specifically enabled on filesystems which safely support them. It appears that many network filesystems specifically are not shared lock safe. Sponsored by: Isilon Systems, Inc.	2005-03-31 05:22:45 +00:00
jeff	97c40ebd49	- LK_NOPAUSE is a nop now. Sponsored by: Isilon Systems, Inc.	2005-03-31 04:37:09 +00:00
das	d3e0f098be	Eliminate v_id and v_ddid. The name cache now holds references to vnodes whose names it caches, so we no longer need a `generation number' to tell us if a referenced vnode is invalid. Replace the use of the parent's v_id in the hash function with the address of the parent vnode. Tested by: Peter Holm Glanced at by: jeff, phk	2005-03-30 03:01:36 +00:00
jeff	b82462f008	- Dont clear OWEINACT in vbusy(), we still owe an inactive call if someone vhold()s us. - Avoid an extra mutex acquire and release in the common case of vgonel() by checking for OWEINACT at the start of the function. - Fix the case where we set OWEINACT in vput(). LK_EXCLUPGRADE drops our shared lock if it fails. Sponsored by: Isilon Systems, Inc.	2005-03-29 10:02:48 +00:00
jeff	2059b48294	- Don't initial v_dd here, let cache_purge() do it for us. Sponsored by: Isilon Systems, Inc.	2005-03-29 09:59:34 +00:00
jeff	8c749eb801	- Move code that should probably be an assert above the main body of vrele so that we can decrease the indentation of the real work and make things slightly more clear. Sponsored by: Isilon Systems, Inc.	2005-03-28 11:18:47 +00:00
jeff	b25a472993	- Adjust asserts in vop_lookup_post() to match the new post PDIRUNLOCK vfs. Sponsored by: Isilon Systems, Inc.	2005-03-28 09:25:25 +00:00
phk	eac95420b8	Remove another ';' after if(). Also spotted by: bz	2005-03-27 07:53:13 +00:00
phk	4b5dfbb1ae	Remove extra ; at end of if(). Found by: bz	2005-03-27 07:52:12 +00:00
jeff	6d72a7bd60	- Don't recycle vnodes anymore. Free them once they are dead. getnewvnode now always allocates a new vnode. - Define a new function, vnlru_free, which frees vnodes from the free list. It takes as a parameter the number of vnodes to free, which is wantfreevnodes - freevnodes when called from vnlru_proc or 1 when called from getnewvnode(). For now, getnewvnode() still tries to reclaim a free vnode before creating a new one when we are near the limit. - Define a function, vdestroy, which handles the actual release of memory and teardown of locks, etc. This could become a uma_dtor() routine. - Get rid of minvnodes. Now wantfreevnodes is 1/4th the max vnodes. This keeps more unreferenced vnodes around so that files which have only been stat'd are less likely to be kicked out of the system before we have a chance to read them, etc. These vnodes may still be freed via the normal vnlru_proc() routines which may some day become a real lru.	2005-03-25 05:34:39 +00:00
jeff	0210925e42	- Pass LK_EXCLUSIVE to VFS_ROOT() to satisfy the new flags argument. For now, all calls to VFS_ROOT() should still acquire exclusive locks. Sponsored by: Isilon Systems, Inc.	2005-03-24 07:31:38 +00:00
jeff	bf2e6f43e8	- If vput() is called with a shared lock it must upgrade to an exclusive before it can call VOP_INACTIVE(). This must use the EXCLUPGRADE path because we may violate some lock order with another locked vnode if we drop and reacquire the lock. If EXCLUPGRADE fails, we mark the vnode with VI_OWEINACT. This case should be very rare. - Clear VI_OWEINACT in vinactive() and vbusy(). - If VI_OWEINACT is set in vgone() do the VOP_INACTIVE call here as well. Sponsored by: Isilon Systems, Inc.	2005-03-24 06:08:58 +00:00
jeff	d289cc6b5d	- Now that there are no external users of vfree() make it static. - Move VSHOULDBUSY, VSHOULDFREE, and VTRYRECYCLE into vfs_subr.c so no one else attempts to grow a dependency on them. - Now that objects with pages hold the vnode we don't have to do unlocked checks for the page count in the vm object in VSHOULDFREE. These three macros could simply check for holdcnt state transitions to determine whether the vnode is on the free list already, but the extra safety the flag affords us is probably worth the minimal cost. - The leafonly sysctl and code have been dead for several years now, remove the sysctl and the code that employed it from vtryrecycle(). - vtryrecycle() also no longer has to check the object's page count as the object holds the vnode until it reaches 0. Sponsored by: Isilon Systems, Inc.	2005-03-15 14:38:16 +00:00
jeff	2115694bbc	- Expose vholdl() so it may be used outside of vfs_subr.c	2005-03-15 13:43:10 +00:00
jeff	3fcb9112fb	- Increment the holdcnt once for each usecount reference. This allows us to use only the holdcnt to determine whether a vnode may be recycled, simplifying the V* macros as well as vtryrecycle(), etc. Sponsored by: Isilon Systems, Inc.	2005-03-14 09:25:19 +00:00
jeff	2a81e8df21	- We do not have to check the object's ref_count in VSHOULDFREE or vtryrecycle(). All obj refs also ref the vnode. - Consistently use v_incr_usecount() to increment the usecount. This will be more important later. Sponsored by: Isilon Systems, Inc.	2005-03-14 08:30:31 +00:00
jeff	bb63517e7e	- Slightly rearrange vrele() to move the common case in one indentation level. Sponsored by: Isilon Systems, Inc.	2005-03-14 07:16:55 +00:00
jeff	a307ec6ef8	- Rework vget() so we drop the usecount in two failure cases that were missed by my last commit. Sponsored by: Isilon Systems, Inc.	2005-03-14 07:11:19 +00:00
jeff	d29b61a365	- Remove vx_lock, vx_unlock, vx_wait, etc. - Add a vn_start_write/vn_finished_write around vlrureclaim so we don't do writing ops without suspending. This could suspend the vlruproc which should not be a problem under normal circumstances. - Manually implement VMIGHTFREE in vlrureclaim as this was the only instance where it was used. - Acquire a lock before calling vgone() as it now requires it. - Move the acquisition of the vnode interlock from vtryrecycle() to getnewvnode() so that if it fails we don't drop and reacquire the vnode_free_list_mtx. - Check for a usecount or holdcount at the end of vtryrecycle() in case someone grabbed a ref while we were recycling. Abort the recycle, and on the final ref drop this vnode will be placed on the head of the free list. - Move the redundant VOP_INACTIVE protection code into the local vinactive() routine to avoid code bloat. - Keep the vnode lock held across calls to vgone() in several places. - vgonel() no longer uses XLOCK, instead callers must hold an exclusive vnode lock. The VI_DOOMED flag is set to allow other threads to detect a vnode which is no longer valid. This flag is set until the last reference is gone, and there are no chances for a new ref. vgonel() holds this lock across the entire function, which greatly simplifies logic. _ Only vfree() in one place in vgone() not three. - Adjust vget() to check the VI_DOOMED flag prior to waiting on the lock in the LK_NOWAIT case. In other cases, check after we have slept and acquired an exlusive lock. This will simulate the old vx_wait() behavior. Sponsored by: Isilon Systems, Inc.	2005-03-13 11:54:28 +00:00
jeff	d2fecffa39	- Enable SMP VFS by default on current. More users are needed to turn up any remaining bugs. Anyone inconvenienced by this can still disable it in the loader. Sponsored by: Isilon Systems, Inc.	2005-02-23 10:05:43 +00:00
jeff	cd66df18cc	- Only the xlock holder should be calling VOP_LOCK on a vp once VI_XLOCK has been set. Assert that this is the case so that we catch filesystems who are using naked VOP_LOCKs in illegal cases. Sponsored by: Isilon Systems, Inc.	2005-02-23 00:11:14 +00:00
jeff	0d71606b28	- Add a check for xlock in vop_lock_assert. Presently the xlock is considered to be as good as an exclusive lock, although there is still a possibility of someone acquiring a VOP LOCK while xlock is held. Sponsored by: Isilon Systems, Inc.	2005-02-22 23:59:11 +00:00
phk	31dd38da62	Zero the v_un container field to make sure everything is gone.	2005-02-22 18:56:18 +00:00
phk	f1d058e032	Reap more benefits from DEVFS: List devfs_dirents rather than vnodes off their shared struct cdev, this saves a pointer field in the vnode at the expense of a field in the devfs_dirent. There are often 100 times more vnodes so this is bargain. In addition it makes it harder for people to try to do stypid things like "finding the vnode from cdev". Since DEVFS handles all VCHR nodes now, we can do the vnode related cleanup in devfs_reclaim() instead of in dev_rel() and vgonel(). Similarly, we can do the struct cdev related cleanup in dev_rel() instead of devfs_reclaim(). rename idestroy_dev() to destroy_devl() for consistency. Add LIST_ENTRY de_alias to struct devfs_dirent. Remove v_specnext from struct vnode. Change si_hlist to si_alist in struct cdev. String new devfs vnodes' devfs_dirent on si_alist when we create them and take them off in devfs_reclaim(). Fix devfs_revoke() accordingly. Also don't clear fields devfs_reclaim() will clear when called from vgone(); Let devfs_reclaim() call dev_rel() instead of vgonel(). Move the usecount tracking from dev_rel() to devfs_reclaim(), and let dev_rel() take a struct cdev argument instead of vnode. Destroy SI_CHEAPCLONE devices in dev_rel() (instead of devfs_reclaim()) when they are no longer used. (This should maybe happen in devfs_close() instead.)	2005-02-22 15:51:07 +00:00
phk	cd21b2e10c	Remove vfinddev(), it is generally bogus when faced with jails and chroot and has no legitimate use(r)s in the tree.	2005-02-22 14:11:47 +00:00
phk	66dfd63961	Try to unbreak the vnode locking around vop_reclaim() (based mostly on patch from kan@). Pull bufobj_invalbuf() out of vinvalbuf() and make g_vfs call it on close. This is not yet a generally safe function, but for this very specific use it is safe. This solves the problem with buffers not being flushed by unmount or after failed mount attempts.	2005-02-19 11:44:57 +00:00
phk	1fe081e954	Make sure to drop the VI_LOCK in vgonel(); Spotted by: Taku YAMAMOTO <taku@tackymt.homeip.net>	2005-02-18 11:13:56 +00:00
phk	af1fa2025c	Introduce vx_wait{l}() and use it instead of home-rolled versions.	2005-02-17 10:49:51 +00:00
phk	b6768ad7ab	Convert KASSERTS to VNASSERTS	2005-02-17 10:28:58 +00:00
phk	5dd8d30575	Make various vnode related functions static	2005-02-10 12:28:58 +00:00
phk	5d1652b89d	Don't pass NULL to vprint()	2005-02-10 08:55:08 +00:00
jeff	06f7a532e9	- Add a new assert in the getnewvnode(). Assert that the usecount is still 0 to detect getnewvnode() races. - Add the vnode address to a few panics near by to help in debugging. Sponsored by: Isilon Systems, Inc.	2005-02-08 23:27:10 +00:00
phk	628952636c	Access vmobject via the bufobj instead of the vnode	2005-02-07 10:04:06 +00:00
phk	d2bbb620e9	Don't call VOP_DESTROYVOBJECT(), trust that VOP_RECLAIM() did what was necessary.	2005-02-07 07:48:03 +00:00
phk	4f73d0b6fc	Remove unused argument to vrecycle()	2005-01-28 13:08:21 +00:00
phk	f8b1ba904f	Integrate vclean() into vgonel(). Various associated polishing.	2005-01-28 13:00:03 +00:00
phk	eaf84397bb	Remove register keyword	2005-01-28 12:39:10 +00:00
phk	796d435574	Don't use VOP_GETVOBJECT, use vp->v_object directly.	2005-01-25 00:40:01 +00:00
phk	1d63b12e22	Eliminate the constant flags argument to vclean()	2005-01-24 22:22:02 +00:00
phk	dc1cfea3cd	Change vprint() to vn_printf() which takes varargs. Add #define for vprint() to call vn_printf().	2005-01-24 13:58:08 +00:00
phk	d5c135375c	Kill the VV_OBJBUF and test the v_object for NULL instead.	2005-01-24 13:13:57 +00:00
jeff	8cb5395678	- Add the tunable and sysctl for the mpsafevfs. It currently defaults to off. - Protect access to mnt_kern_flag with the mointpoint mutex. - Remove some KASSERTs which are not legal checks without the appropriate locks held. - Use VCANRECYCLE() rather than rolling several slightly different checks together. - Return from vtryrecycle() with a recycled vnode rather than a locked vnode. This simplifies some locking. - Remove several GIANT_REQUIRED lines. - Add a few KASSERTs to help with INACT debugging. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:41:01 +00:00
phk	d3b1b2cc99	Fix a bug I introduced in 1.561 which has caused considerable filesystem unhappiness lately. As far as I can tell, no files that have made it safely to disk have been endangered, but stuff in transit has been in peril. Pointy hat: phk	2005-01-16 21:09:39 +00:00
phk	cc0cbc6b34	Eliminate unused and unnecessary "cred" argument from vinvalbuf()	2005-01-14 07:33:51 +00:00
phk	3760addae2	Ditch vfs_object_create() and make the callers call VOP_CREATEVOBJECT() directly.	2005-01-13 12:25:19 +00:00
phk	5a497775d6	Add BO_SYNC() and add a default which uses the secret vnode pointer and VOP_FSYNC() for now.	2005-01-11 10:43:08 +00:00
phk	437e41e061	More vnode -> bufobj migration.	2005-01-11 10:16:39 +00:00
phk	649a01e1a5	Give flushbuflist() a struct bufv as first argument and avoid home-rolling TAILQ_FOREACH_SAFE(). Loose the error pointer argument and return any errors the normal way. Return EAGAIN for the case where more work needs to be done.	2005-01-11 10:01:54 +00:00
phk	da2718f1af	Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC(). I'm not sure why a credential was added to these in the first place, it is not used anywhere and it doesn't make much sense: The credentials for syncing a file (ability to write to the file) should be checked at the system call level. Credentials for syncing one or more filesystems ("none") should be checked at the system call level as well. If the filesystem implementation needs a particular credential to carry out the syncing it would logically have to the cached mount credential, or a credential cached along with any delayed write data. Discussed with: rwatson	2005-01-11 07:36:22 +00:00
imp	20280f1431	/* -> /*- for copyright notices, minor format tweaks as necessary	2005-01-06 23:35:40 +00:00
phk	f2581a224f	Since we do not support forceful unmount of DEVFS we can do away with the partially implemented vnode-readoption code in vgonechrl().	2005-01-04 08:49:14 +00:00
phk	0faeb292ed	We can only ever get to vgonechrl() from a devfs vnode, so we do not need to reassign the vp->v_op to devfs_specops, we know that is the value already. Make devfs_specops private to devfs.	2004-12-20 21:34:29 +00:00
phk	4a639d6164	The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly split the conversion of the remaining three filesystems out from the root mounting changes, so in one go: cd9660: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS() nfs(client): Convert to nmount (the simple way, mount_nfs(8) is still necessary). Add omount compat shims. Drop COMPAT_PRELITE2 mount arg compatibility. ffs: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS() Remove vfs_omount() method, all filesystems are now converted. Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem task, and they all do it now. Change rootmounting to use DEVFS trampoline: vfs_mount.c: Mount devfs on /. Devfs needs no 'from' so this is clean. symlink /dev to /. This makes it possible to lookup /dev/foo. Mount "real" root filesystem on /. Surgically move the devfs mountpoint from under the real root filesystem onto /dev in the real root filesystem. Remove now unnecessary getdiskbyname(). kern_init.c: Don't do devfs mounting and rootvnode assignment here, it was already handled by vfs_mount.c. Remove now unused bdevvp(), addaliasu() and addalias(). Put the few necessary lines in devfs where they belong. This eliminates the second-last source of bogo vnodes, leaving only the lemming-syncer. Remove rootdev variable, it doesn't give meaning in a global context and was not trustworth anyway. Correct information is provided by statfs(/).	2004-12-07 08:15:41 +00:00
phk	4b1a114436	Improve vprint() a little bit: break long lines, reduce indent and tell if the VI_LOCK() is held.	2004-12-03 12:09:34 +00:00
phk	59f305606c	Back when VOP_* was introduced, we did not have new-style struct initializations but we did have lofty goals and big ideals. Adjust to more contemporary circumstances and gain type checking. Replace the entire vop_t frobbing thing with properly typed structures. The only casualty is that we can not add a new VOP_ method with a loadable module. History has not given us reason to belive this would ever be feasible in the the first place. Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc. Give coda correct prototypes and function definitions for all vop_()s. Generate a bit more data from the vnode_if.src file: a struct vop_vector and protype typedefs for all vop methods. Add a new vop_bypass() and make vop_default be a pointer to another struct vop_vector. Remove a lot of vfs_init since vop_vector is ready to use from the compiler. Cast various vop_mumble() to void * with uppercase name, for instance VOP_PANIC, VOP_NULL etc. Implement VCALL() by making vdesc_offset the offsetof() the relevant function pointer in vop_vector. This is disgusting but since the code is generated by a script comparatively safe. The alternative for nullfs etc. would be much worse. Fix up all vnode method vectors to remove casts so they become typesafe. (The bulk of this is generated by scripts)	2004-12-01 23:16:38 +00:00
phk	ca008fe171	Move pbgetvp() and pbrelvp() to vm_pager.c with the rest of the pbuf stuff.	2004-11-15 08:12:50 +00:00
phk	d632cb3d94	Move the bit of the syncer which deals with vnodes into a separate function.	2004-11-14 15:24:38 +00:00
phk	b99ff6bbb6	Eliminate vop_revoke() function now that devfs_revoke() does the entire job.	2004-11-13 23:38:13 +00:00
phk	9fdf027988	Slim vnodes by another four bytes by eliminating the (now) unused field v_cachedid.	2004-11-10 07:31:06 +00:00
phk	cea5000057	Remove vn_todev()	2004-11-10 07:17:28 +00:00
phk	7bef55364a	Remove vnode->v_cachedfs. It was only used for the highly dangerous "export all vnodes with a sysctl" function.	2004-11-09 22:51:03 +00:00
phk	1e4caea88c	Remove buf->b_dev field.	2004-11-04 07:59:57 +00:00
phk	34a530853d	Always initialize bo_private along with bo_ops in getnewvnode(). Spotted by: tegge	2004-11-03 21:09:23 +00:00
phk	c928cc4c54	Loose vfs_mountedon()	2004-10-29 11:15:08 +00:00
phk	12ca46b3fc	Give the bufobj a private __bo_vnode for now to keep the syncer floating [1] At some point later the syncer will unlearn about vnodes and the filesystems method called by the syncer will know enough about what's in bo_private to do the right thing. [1] Ok, I know, but I couldn't resist the pun.	2004-10-29 09:33:32 +00:00
phk	56a7ee8e7f	Move the syncer linkage from vnode to bufobj. This is not quite a perfect separation: the syncer still think it knows that everything is a vnode.	2004-10-27 08:05:02 +00:00
phk	c66aa10c8e	Put the I/O block size in bufobj->bo_bsize. We keep si_bsize_phys around for now as that is the simplest way to pull the number out of disk device drivers in devfs_open(). The correct solution would be to do an ioctl(DIOCGSECTORSIZE), but the point is probably mooth when filesystems sit on GEOM, so don't bother for now.	2004-10-26 07:39:12 +00:00
phk	0e87ab8bc6	Loose the v_dirty* and v_clean* alias macros. Check the count field where we just want to know the full/empty state, rather than using TAILQ_EMPTY() or TAILQ_FIRST().	2004-10-25 09:14:03 +00:00
phk	3a8a530155	Remove vnode->v_bsize. This was a dead-end.	2004-10-25 07:50:59 +00:00
phk	4ba53ec41b	Collapse vnode->v_object and buf->b_object into bufobj->bo_object.	2004-10-25 06:02:57 +00:00
phk	1b25a59886	Move the buffer method vector (buf->b_op) to the bufobj. Extend it with a strategy method. Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY song and dance. Rename ibwrite to bufwrite(). Move the two NFS buf_ops to more sensible places, add bufstrategy to them. Add inlines for bwrite() and bstrategy() which calls through buf->b_bufobj->b_ops->b_{write,strategy}(). Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().	2004-10-24 20:03:41 +00:00
rwatson	f046b53692	When MAC is enabled, warn if getnewvnode() is asked to produce a vnode without a mountpoint. In this scenario, there's no useful source for a label on the vnode, since we can't query the mountpoint for the labeling strategy or default label.	2004-10-22 11:04:58 +00:00
phk	2c3e47b668	Alas, poor SPECFS! -- I knew him, Horatio; A filesystem of infinite jest, of most excellent fancy: he hath taught me lessons a thousand times; and now, how abhorred in my imagination it is! my gorge rises at it. Here were those hacks that I have curs'd I know not how oft. Where be your kludges now? your workarounds? your layering violations, that were wont to set the table on a roar? Move the skeleton of specfs into devfs where it now belongs and bury the rest.	2004-10-22 09:59:37 +00:00
phk	52a089c526	Add b_bufobj to struct buf which eventually will eliminate the need for b_vp. Initialize b_bufobj for all buffers. Make incore() and gbincore() take a bufobj instead of a vnode. Make inmem() local to vfs_bio.c Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj) also VI_MTX() to BO_MTX(), Make buf_vlist_add() take a bufobj instead of a vnode. Eliminate other uses of bp->b_vp where bp->b_bufobj will do. Various minor polishing: remove "register", turn panic into KASSERT, use new function declarations, TAILQ_FOREACH_SAFE() etc.	2004-10-22 08:47:20 +00:00
phk	3833976d12	Move the VI_BWAIT flag into no bo_flag element of bufobj and call it BO_WWAIT Add bufobj_wref(), bufobj_wdrop() and bufobj_wwait() to handle the write count on a bufobj. Bufobj_wdrop() replaces vwakeup(). Use these functions all relevant places except in ffs_softdep.c where the use if interlocked_sleep() makes this impossible. Rename b_vnbufs to b_bobufs now that we touch all the relevant files anyway.	2004-10-21 15:53:54 +00:00
phk	fdf614c0ba	Add BO_* macros parallel to VI_* macros for manipulating the bo_mtx. Initialize the bo_mtx when we allocate a vnode i getnewvnode() For now we point to the vnodes interlock mutex, that retains the exact same locking sematics. Move v_numoutput from vnode to bufobj. Add renaming macro to postpone code sweep.	2004-10-21 14:42:31 +00:00
phk	b436dad078	Polish vtruncbuf() to improve readability and style a bit.	2004-10-21 14:13:54 +00:00
phk	350f812103	Simplify buf_vlist_remove(). Now that we have encapsulated the splaytree related information into a structure we can eliminate the half of this function.	2004-10-21 13:48:50 +00:00
grog	152055d94b	vtryrecycle: Don't rely on type VBAD alone to mean that we don't need to clean the vnode. If v_data is set, we still need to clean it. This code change should catch all incidents of the previous commit (INVARIANTS only).	2004-10-06 02:09:59 +00:00
grog	882d69104e	getnewvnode: Weaken the panic "cleaned vnode isn't" to a warning. Discussion: this panic (or waning) only occurs when the kernel is compiled with INVARIANTS. Otherwise the problem (which means that the vp->v_data field isn't NULL, and represents a coding error and possibly a memory leak) is silently ignored by setting it to NULL later on. Panicking here isn't very helpful: by this time, we can only find the symptoms. The panic occurs long after the reason for "not cleaning" has been forgotten; in the case in point, it was the result of severe file system corruption which left the v_type field set to VBAD. That issue will be addressed by a separate commit.	2004-10-06 02:06:11 +00:00
phk	2e5b8b9883	Fix a LOR relating to freeing cdevs.	2004-10-01 06:33:39 +00:00
phk	5536d5757b	Hold dev_lock and check for NULL devsw pointer when we determine if a vnode is a disk.	2004-09-24 06:16:08 +00:00
phk	3947e54e89	Do not refcount the cdevsw, but rather maintain a cdev->si_threadcount of the number of threads which are inside whatever is behind the cdevsw for this particular cdev. Make the device mutex visible through dev_lock() and dev_unlock(). We may want finer granularity later. Replace spechash_mtx use with dev_lock()/dev_unlock().	2004-09-23 07:17:41 +00:00
phk	02df7323ee	Remove unused B_WRITEINPROG flag	2004-09-15 21:49:22 +00:00
phk	1912367ebb	Create simple function init_va_filerev() for initializing a va_filerev field. Replace three instances of longhaired initialization va_filerev fields. Added XXX comment wondering why we don't use random bits instead of uptime of the system for this purpose.	2004-09-07 09:17:05 +00:00
truckman	54d23a34f6	Don't attempt to trigger the syncer thread final sync code in the shutdown_pre_sync state if the RB_NOSYNC flag is set. This is the likely cause of hangs after a system panic that are keeping crash dumps from being done. This is a MFC candidate for RELENG_5. MFC after: 3 days	2004-08-20 19:21:47 +00:00
obrien	8f41b4e870	s/MAX_SAFE_MAXVNODES/MAXVNODES_MAX/g	2004-08-16 08:33:37 +00:00
jmg	bc1805c6e8	Add locking to the kqueue subsystem. This also makes the kqueue subsystem a more complete subsystem, and removes the knowlege of how things are implemented from the drivers. Include locking around filter ops, so a module like aio will know when not to be unloaded if there are outstanding knotes using it's filter ops. Currently, it uses the MTX_DUPOK even though it is not always safe to aquire duplicate locks. Witness currently doesn't support the ability to discover if a dup lock is ok (in some cases). Reviewed by: green, rwatson (both earlier versions)	2004-08-15 06:24:42 +00:00
rwatson	371cf09cf7	In v_addpollinfo(), we allocate storage to back vp->v_pollinfo. However, we may sleep when doing so; check that we didn't race with another thread allocating storage for the vnode after allocation is made to a local pointer, and only update the vnode pointer if it's still NULL. Otherwise, accept that another thread got there first, and release the local storage. Discussed with: jmg	2004-08-11 01:27:53 +00:00
njl	7e21ce666c	Skip the syncing disks loop if there are no dirty buffers. Remove a variable used to flag the initial printf. Submitted by: truckman (earlier version)	2004-08-10 01:32:05 +00:00
obrien	47f728c0bc	Put a cap on the auto-tuning of kern.maxvnodes. Cap value chosen by: scottl	2004-08-02 21:52:43 +00:00
njl	774b91783e	Minor message cleanup.	2004-07-30 01:30:05 +00:00
phk	8c9258b82e	Convert the vfsconf list to a TAILQ. Introduce vfs_byname() function to find things on it. Staticize vfs_nmount() function under the name vfs_donmount(). Various cleanups.	2004-07-27 22:32:01 +00:00
cperciva	d9fecc83c8	Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is somewhat clearer, but more importantly allows for a consistent naming scheme for suser_cred flags. The old name is still defined, but will be removed in a few days (unless I hear any complaints...) Discussed with: rwatson, scottl Requested by: jhb	2004-07-26 07:24:04 +00:00
phk	5297516e02	Eliminate unused second argument to reassignbuf() and simplify it accordingly.	2004-07-25 21:24:23 +00:00
alfred	c2337759f3	put several of the options for DEBUG_VFS_LOCKS under control of sysctls.	2004-07-21 07:13:14 +00:00
alfred	bcc5104ce1	Cleanup shutdown output.	2004-07-15 08:01:00 +00:00
alfred	0389c188de	Tidy up system shutdown.	2004-07-15 04:29:48 +00:00
alfred	8a1713aada	Make VFS_ROOT() and vflush() take a thread argument. This is to allow filesystems to decide based on the passed thread which vnode to return. Several filesystems used curthread, they now use the passed thread.	2004-07-12 08:14:09 +00:00
alfred	45c9fe9da7	Dump the actual bad values when this assertion is tripped.	2004-07-12 04:13:38 +00:00
marcel	c20ced5cd2	Update for the KDB framework: o Call kdb_enter() instead of Debugger().	2004-07-10 21:47:53 +00:00
alfred	b65386ecc3	fixup sysctl by fsid node	2004-07-08 06:11:36 +00:00
alfred	e0a5f530c2	Introduce vfs_suser(), used to test if a user should have special privs for a mount.	2004-07-06 09:37:43 +00:00
alfred	97a6f04270	NFS mobility PHASE I, II & III (phase VI, and V pending): Rebind the client socket when we experience a timeout. This fixes the case where our IP changes for some reason. Signal a VFS event when NFS transitions from up to down and vice versa. Add a placeholder vfs_sysctl where we will put status reporting shortly. Also: Make down NFS mounts return EIO instead of EINTR when there is a soft timeout or force unmount in progress.	2004-07-06 09:12:03 +00:00
truckman	690b842bc5	Unconditionally set last_work_seen while in the SYNCER_RUNNING state so that last_work_seen has a reasonable value at the transition to the SYNCER_SHUTTING_DOWN state, even if net_worklist_len happened to be zero at the time. Initialize last_work_seen to zero as a safety measure in case the syncer never ran in the SYNCER_RUNNING state. Tested by: phk	2004-07-05 21:32:01 +00:00
truckman	471ab74bb2	Rework syncer termination code: Speed up the syncer when shutting down by sleeping for a shorter period of time instead of cranking up rushjob and using the normal one second sleep. Skip empty worklist slots when shutting down to avoid lengthy intervals of inactivity. Give I/O more time to complete between steps by not speeding the syncer quite as much. Terminate the syncer after one full pass through the worklist plus one second with the worklist containing nothing but syncer vnodes. Print an indication of shutdown progress to the console. Add a sysctl, vfs.worklist_len, to allow the size of the syncer worklist to be monitored.	2004-07-05 01:07:33 +00:00
phk	49a3aa211e	Give synthetic root filesystem device vnodes a v_bsize of DEV_BSIZE.	2004-07-04 22:33:22 +00:00
alfred	95f8f1e089	Pass the operation in with the fsidctl. Remove some fsidctls that we will not be using. Correct prototypes for fs sysctls.	2004-07-04 20:21:58 +00:00
phk	59c88fd71a	Make the last commit handle non-phk root devices better.	2004-07-04 19:42:25 +00:00
phk	b52c81e5db	Blocksize for I/O should be a property of the vnode and not found by groping around in the vnodes surroundings when we allocate a block. Assign a blocksize when we create a vnode, and yell a warning (and ignore it) if we got the wrong size. Please email all such warnings to me.	2004-07-04 12:49:04 +00:00
alfred	bbaa6c3ec0	Introduce a new kevent filter. EVFILT_FS that will be used to signal generic filesystem events to userspace. Currently only mount and unmount of filesystems are signalled. Soon to be added, up/down status of NFS. Introduce a sysctl node used to route requests to/from filesystems based on filesystem ids. Introduce a new vfsop, vfs_sysctl(mp, req) that is used as the callback/ entrypoint by the sysctl code to change individual filesystems.	2004-07-04 10:52:54 +00:00
alfred	4a61cff009	Revision 1.496 would not boot on my system due to ffs_mount -> bdevvp -> getnewvnode(..., mp = NULL, ...) -> insmntqueue(vp, mp = NULL) -> KASSERT -> panic Make getnewvnode() only call insmntqueue() if the mountpoint parameter is not NULL.	2004-07-04 10:19:15 +00:00
phk	070a613a48	When we traverse the vnodes on a mountpoint we need to look out for our cached 'next vnode' being removed from this mountpoint. If we find that it was recycled, we restart our traversal from the start of the list. Code to do that is in all local disk filesystems (and a few other places) and looks roughly like this: MNT_ILOCK(mp); loop: for (vp = TAILQ_FIRST(&mp...); (vp = nvp) != NULL; nvp = TAILQ_NEXT(vp,...)) { if (vp->v_mount != mp) goto loop; MNT_IUNLOCK(mp); ... MNT_ILOCK(mp); } MNT_IUNLOCK(mp); The code which takes vnodes off a mountpoint looks like this: MNT_ILOCK(vp->v_mount); ... TAILQ_REMOVE(&vp->v_mount->mnt_nvnodelist, vp, v_nmntvnodes); ... MNT_IUNLOCK(vp->v_mount); ... vp->v_mount = something; (Take a moment and try to spot the locking error before you read on.) On a SMP system, one CPU could have removed nvp from our mountlist but not yet gotten to assign a new value to vp->v_mount while another CPU simultaneously get to the top of the traversal loop where it finds that (vp->v_mount != mp) is not true despite the fact that the vnode has indeed been removed from our mountpoint. Fix: Introduce the macro MNT_VNODE_FOREACH() to traverse the list of vnodes on a mountpoint while taking into account that vnodes may be removed from the list as we go. This saves approx 65 lines of duplicated code. Split the insmntque() which potentially moves a vnode from one mount point to another into delmntque() and insmntque() which does just what the names say. Fix delmntque() to set vp->v_mount to NULL while holding the mountpoint lock.	2004-07-04 08:52:35 +00:00
truckman	9ed03e6eb3	When shutting down the syncer kernel thread, first tell it to run faster and iterate to over its work list a few times in an attempt to empty the work list before the syncer terminates. This leaves fewer dirty blocks to be written at the "syncing disks" stage and keeps the the "giving up on N buffers" problem from being triggered by the presence of a large soft updates work list at system shutdown time. The downside is that the syncer takes noticeably longer to terminate. Tested by: "Arjan van Leeuwen" <avleeuwen AT piwebs DOT com> Approved by: mckusick	2004-07-01 23:59:19 +00:00
phk	40dd98a3bd	Second half of the dev_t cleanup. The big lines are: NODEV -> NULL NOUDEV -> NODEV udev_t -> dev_t udev2dev() -> findcdev() Various minor adjustments including handling of userland access to kernel space struct cdev etc.	2004-06-17 17:16:53 +00:00
phk	dfd1f7fd50	Do the dreaded s/dev_t/struct cdev */ Bump __FreeBSD_version accordingly.	2004-06-16 09:47:26 +00:00
phk	ba2d5c4937	Remove a left over from userland buffer-cache access to disks.	2004-06-14 14:25:03 +00:00
rwatson	afc098b3e1	Assert Giant in vrele().	2004-05-31 19:06:01 +00:00
mux	79217d1505	Put deprecated sysctl code inside BURN_BRIDGES.	2004-04-11 21:09:22 +00:00
imp	74cf37bd00	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999. Approved by: core	2004-04-05 21:03:37 +00:00
peter	5e91995e52	Kill some XXXKSE's. vnlru/syncer are single threaded.	2004-03-29 22:45:33 +00:00
phk	2a5e157787	Properly vector all bwrite() and BUF_WRITE() calls through the same path and s/BUF_WRITE()/bwrite()/ since it now does the same as bwrite().	2004-03-11 18:02:36 +00:00
phk	eeb7579130	Remove unused second arg to vfinddev(). Don't call addaliasu() on VBLK nodes.	2004-03-11 16:33:11 +00:00
kan	e795b7939d	Always call vn_finished_write after vn_start_write was called. All occurences of 'goto done' after vn_start_write invocation were cleaning up incompletely.	2004-03-06 04:09:54 +00:00
jhb	d25301c858	Switch the sleep/wakeup and condition variable implementations to use the sleep queue interface: - Sleep queues attempt to merge some of the benefits of both sleep queues and condition variables. Having sleep qeueus in a hash table avoids having to allocate a queue head for each wait channel. Thus, struct cv has shrunk down to just a single char * pointer now. However, the hash table does not hold threads directly, but queue heads. This means that once you have located a queue in the hash bucket, you no longer have to walk the rest of the hash chain looking for threads. Instead, you have a list of all the threads sleeping on that wait channel. - Outside of the sleepq code and the sleep/cv code the kernel no longer differentiates between cv's and sleep/wakeup. For example, calls to abortsleep() and cv_abort() are replaced with a call to sleepq_abort(). Thus, the TDF_CVWAITQ flag is removed. Also, calls to unsleep() and cv_waitq_remove() have been replaced with calls to sleepq_remove(). - The sched_sleep() function no longer accepts a priority argument as sleep's no longer inherently bump the priority. Instead, this is soley a propery of msleep() which explicitly calls sched_prio() before blocking. - The TDF_ONSLEEPQ flag has been dropped as it was never used. The associated TDF_SET_ONSLEEPQ and TDF_CLR_ON_SLEEPQ macros have also been dropped and replaced with a single explicit clearing of td_wchan. TD_SET_ONSLEEPQ() would really have only made sense if it had taken the wait channel and message as arguments anyway. Now that that only happens in one place, a macro would be overkill.	2004-02-27 18:52:44 +00:00
truckman	1de257deb3	Split the mlock() kernel code into two parts, mlock(), which unpacks the syscall arguments and does the suser() permission check, and kern_mlock(), which does the resource limit checking and calls vm_map_wire(). Split munlock() in a similar way. Enable the RLIMIT_MEMLOCK checking code in kern_mlock(). Replace calls to vslock() and vsunlock() in the sysctl code with calls to kern_mlock() and kern_munlock() so that the sysctl code will obey the wired memory limits. Nuke the vslock() and vsunlock() implementations, which are no longer used. Add a member to struct sysctl_req to track the amount of memory that is wired to handle the request. Modify sysctl_wire_old_buffer() to return an error if its call to kern_mlock() fails. Only wire the minimum of the length specified in the sysctl request and the length specified in its argument list. It is recommended that sysctl handlers that use sysctl_wire_old_buffer() should specify reasonable estimates for the amount of data they want to return so that only the minimum amount of memory is wired no matter what length has been specified by the request. Modify the callers of sysctl_wire_old_buffer() to look for the error return. Modify sysctl_old_user to obey the wired buffer length and clean up its implementation. Reviewed by: bms	2004-02-26 00:27:04 +00:00
phk	711ff67b90	Check for NODEV return from udev2dev()	2004-02-21 23:52:03 +00:00
phk	5551e292d8	Device megapatch 6/6: This is what we came here for: Hang dev_t's from their cdevsw, refcount cdevsw and dev_t and generally keep track of things a lot better than we used to: Hold a cdevsw reference around all entrances into the device driver, this will be necessary to safely determine when we can unload driver code. Hold a dev_t reference while the device is open. KASSERT that we do not enter the driver on a non-referenced dev_t. Remove old D_NAG code, anonymous dev_t's are not a problem now. When destroy_dev() is called on a referenced dev_t, move it to dead_cdevsw's list. When the refcount drops, free it. Check that cdevsw->d_version is correct. If not, set all methods to the dead_*() methods to prevent entrance into driver. Print warning on console to this effect. The device driver may still explode if it is also incompatible with newbus, but in that case we probably didn't get this far in the first place.	2004-02-21 21:57:26 +00:00
phk	39fb4aef3d	Device megapatch 5/6: Remove the unused second argument from udev2dev(). Convert all remaining users of makedev() to use udev2dev(). The semantic difference is that udev2dev() will only locate a pre-existing dev_t, it will not line makedev() create a new one. Apart from the tiny well controlled windown in D_PSEUDO drivers, there should no longer be any "anonymous" dev_t's in the system now, only dev_t's created with make_dev() and make_dev_alias()	2004-02-21 21:32:15 +00:00
kan	443a35f2bf	More style fixes. Obtained from: bde	2004-01-05 23:40:46 +00:00
kan	2ffbc2464e	style(9): Add empty line before first code line in functions with no local variables. Properly terminate comment sentences. Indent lines which are longer that 80 characters. Move v_addpollinfo closer to the rest of poll-related functions. Move DEBUG_VFS_LOCKS ifdefed block to the end of file. Obtained from: bde (partly)	2004-01-05 19:04:29 +00:00
kan	2872921967	Cosmetics: strip '\n' from a string passed to Debugger().	2004-01-04 03:42:20 +00:00
bde	7d91626477	v_vxproc was a bogus name for a thread (pointer).	2003-12-28 09:12:56 +00:00
jeff	5443bd4c65	- In vget() if LK_NOWAIT is specified we should return EBUSY and not ENOENT. Submitted by: Stephan Uphoff <ups@stups.com>	2003-12-16 17:08:27 +00:00
jeff	aa712bc6e4	- When doing a forced unmount, VFS attempts to keep VCHR vnodes valid by reassigning their v_ops field to specfs, detaching from the mountpoint, etc. However, this is not sufficient. If we vclean() the vnode the pages owned by the vnode are lost, potentially while buffers reference them. Implement parts of vclean() seperately in vgonechrl() so that the pages and bufs associated with a device vnode are not destroyed while in use.	2003-12-16 17:05:05 +00:00
jeff	e35dcab926	- Don't forget to unlock the vnode interlock in the LK_NOWAIT case. Submitted by: Stephan Uphoff <ups@stups.com> Approved by: re (rwatson)	2003-11-30 22:09:58 +00:00
tanimura	7eade05dfa	- Implement selwakeuppri() which allows raising the priority of a thread being waken up. The thread waken up can run at a priority as high as after tsleep(). - Replace selwakeup()s with selwakeuppri()s and pass appropriate priorities. - Add cv_broadcastpri() which raises the priority of the broadcast threads. Used by selwakeuppri() if collision occurs. Not objected in: -arch, -current	2003-11-09 09:17:26 +00:00
kan	36d60f3bb7	Remove mntvnode_mtx and replace it with per-mountpoint mutex. Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to operate on this mutex transparently. Eventually new mutex will be protecting more fields in struct mount, not only vnode list. Discussed with: jeff	2003-11-05 04:30:08 +00:00
wollman	dabc1a332d	Add appropriate const poisoning to the assert_*locked() family so that I can call ASSERT_VOP_LOCKED(vp, __func__) without a diagnostic. Inspired by: the evil and rude OpenAFS cache manager code	2003-10-23 18:17:36 +00:00
alc	512489f301	Initialize the buf's b_object in pbgetvp(). Clear it in pbrelvp(). (This facilitates synchronization of the vm page's valid field using the vm object's lock.) Suggested by: tegge	2003-10-20 18:24:38 +00:00
phk	888092f317	Simplify count_dev()	2003-10-17 11:56:48 +00:00
phk	edaf9214c4	Simplify vn_isdisk() a bit.	2003-10-12 14:04:39 +00:00
jeff	5b9cc4b22e	- Fix a typo, I meant & and not \|. This was causing lockups from the syncer looping forever due to list corruption. Solved by: tegge	2003-10-11 21:50:45 +00:00
jeff	2c3fea92c8	- Fix an XXX. Check the error of vn_lock() in vflush(). Don't specify LK_RETRY either, we don't want this vnode if it turns into another. - Remove the code that checks the mount point after acquiring the lock we are guaranteed to either fail or get the vnode that we wanted.	2003-10-05 07:12:38 +00:00
jeff	6b8324e841	- Rename vcanrecycle() to vtryrecycle() to reflect its new role. - In vtryrecycle() try to vgonel the vnode if all of the previous checks passed. We won't vgonel if someone has either acquired a hold or usecount or started the vgone process elsewhere. This is because we may have been removed from the free list while we were inspecting the vnode for recycling. - The VI_TRYLOCK stops two threads from entering getnewvnode() and recycling the same vnode. To further reduce the likelyhood of this event, requeue the vnode on the tail of the list prior to calling vtryrecycle(). We can not actually remove the vnode from the list until we know that it's going to be recycled because other interlock holders may see the VI_FREE flag and try to remove it from the free list. - Kill a bogus XXX comment. If XLOCK is set we shouldn't wait for it regardless of MNT_WAIT because the vnode does not actually belong to this filesystem.	2003-10-05 05:35:41 +00:00
jeff	e15704d590	- Don't cache_purge() in getnewvnode. It's done in vclean(). With this purge, the purge in vclean, and the filesystems purge, we had 3 purges per vnode. - Move the insmntque(vp, 0) to vclean() so that we may remove it from the two vgone() functions and reduce the number of lock operations required.	2003-10-05 02:48:04 +00:00
jeff	8d0f78003f	- Solve a LOR with the sync_mtx by using the VI_ONWORKLST flag to determine whether or not the sync failed. This could potentially get set between the time that we VOP_UNLOCK and VI_LOCK() but the race would harmelssly lead to the sync being delayed by an extra 30 seconds. If we do not move the vnode it could cause an endless loop if it continues to fail to sync. - Use vhold and vdrop to stop the vnode from changing identities while we have it unlocked. Other internal vfs lists are likely to follow this scheme.	2003-10-05 00:35:41 +00:00
jeff	8d72c43763	- Move the xlock 'locking' code into vx_lock() and vx_unlock(). - Create a new function, vgonechrl(), which performs vgone for an in-use character device. Move the code from vflush() that did this into vgonechrl(). - Hold the xlock across the entirety of vgonel() and vgonechrl() so that at no point will an invalid vnode exist on any list without XLOCK set. - Move the xlock code out of vclean() now that it is in the vgone*() functions.	2003-10-05 00:02:41 +00:00
jeff	55547647ec	- In sched_sync() test our preconditions prior to dropping the sync_mtx. This is so that we may grab the interlock while still holding the sync_mtx. We have to VI_TRYLOCK() because in all other cases the lock order runs the other way. - If we don't meet any of the preconditions, reinsert the vp into the list for the next second. - We don't need to panic if we fail to sync here because each FSYNC function handles this case. Removing this redundant code also simplifies locking.	2003-10-04 18:03:53 +00:00
jeff	bb40013912	- In a Giantless world, the vn_lock() in vcanrecycle() could legitimately fail. Remove the panic from that case and document why it might fail. - Document the reason for calling cache_purge() on a newly created vnode. - In insmntque() order the operations so that we can call mtx_unlock() one fewer times. This makes the code somewhat clearer as well. - Add XXX comments in sched_sync() and vflush(). - In vget(), do not sleep while waiting for XLOCK to clear if LK_NOWAIT is set. - In vclean() we don't need to acquire a lock around a single TAILQ_FIRST call. It's ok if we race here, the vinvalbuf will just do nothing. - Increase the scope of the lock in vgonel() to reduce the number of lock operations that are performed.	2003-10-04 15:10:40 +00:00
jeff	517dcea6c8	- In reassignbuf() don't unlock vp and lock newvp if they are the same. Doing so creates a race where the buf is on neither list. - Only vfree() in an error case in vclean() if VSHOULDFREE() thinks we should. - Convert the error case in vclean() to INVARIANTS from DIAGNOSTIC as this really should not happen and is fast to check.	2003-09-20 00:21:48 +00:00
jeff	45f3b1b270	- Remove spls(). The locking that has replaced them is in place and they no longer serve as guidelines for future work.	2003-09-19 23:52:06 +00:00
kan	cf77f9f005	Eliminate one case of VI_UNLOCK followed by an immediate VI_LOCK.	2003-09-19 19:13:54 +00:00
jhb	37641f86f1	Consistently use the BSD u_int and u_short instead of the SYSV uint and ushort. In most of these files, there was a mixture of both styles and this change just makes them self-consistent. Requested by: bde (kern_ktrace.c)	2003-08-07 15:04:27 +00:00
phk	eb30c92e49	Revert stuff which accidentally ended up in the previous commit.	2003-07-22 10:36:36 +00:00
phk	c4a9334fa6	Don't attempt to inline large functions mb_alloc() and mb_free(), it more than doubles the text size of this file. GCC has wisely ignored us on this previously	2003-07-22 10:24:41 +00:00
obrien	3b8fff9e4c	Use __FBSDID().	2003-06-11 00:56:59 +00:00
phk	ddccb1d287	Remove unused variable and now unbalanced call to splbio(); Found by: FlexeLint	2003-05-31 20:09:01 +00:00
alc	53638c7027	Make the maximum number of vnodes a function of both the physical memory size and the kernel's heap size, specifically, vm_kmem_size. This function allows a maximum of 40% of the vm_kmem_size to be used for vnodes and vm objects. This is a conservative bound based upon recent problem reports. (In other words, a slight increase in this percentage may be safe.) Finally, machines with less than ~3GB of RAM should be unaffected by this change, i.e., the maximum number of vnodes should remain the same. If necessary, machines with 3GB or more of RAM can increase the maximum number of vnodes by increasing vm_kmem_size. Desired by: scottl Tested by: jake Approved by: re (rwatson,scottl)	2003-05-23 19:54:02 +00:00
truckman	80040f21a3	Detect that a vnode has been reclaimed while vflush() was waiting to lock the vnode and restart the loop. Vflush() is vulnerable since it does not hold a reference to the vnode and it holds no other locks while waiting for the vnode lock. The vnode will no longer be on the list when the loop is restarted. Approved by: re (rwatson)	2003-05-16 19:46:51 +00:00
alc	0422418ef4	Optimize the use of splay in gbincore(). During a "make buildworld" the desired buffer is found at one of the roots more than 60% of the time. Thus, checking both roots before performing either splay eliminates unnecessary splays on the first tree splayed. Approved by: re (jhb)	2003-05-13 04:36:02 +00:00
rwatson	c21d149f29	Remove bogus locking from DDB's "show lockedvnods" command: using synchronization primitives from inside DDB is generally a bad idea, and in this case it frequently results in panics due to DDB commands being executed from the sio fast interrupt context on a serial console. Replace the locking with a note that a lack of locking means that DDB may get see inconsistent views of the mount and vnode lists, which could also result in a panic. More frequently, though, this avoids a panic than causes it. Discussed with ages ago: bde Approved by: re (scottl)	2003-05-12 14:37:47 +00:00
alc	410b675ed9	- Revert kern/vfs_subr.c revision 1.444. The vm_object's size isn't trustworthy for vnode-backed objects. - Restore the old behavior of vm_object_page_remove() when the end of the given range is zero. Add a comment to vm_object_page_remove() regarding this behavior. Reported by: iedowse	2003-05-03 08:09:24 +00:00
alc	e9c4374a87	Lock accesses to the vm_object's ref_count and resident_page_count.	2003-05-01 03:10:38 +00:00
alc	d5ac0bc453	Various changes to vm_object_page_remove(): - Eliminate an odd, special-case feature: if start == end == 0 then all pages are removed. Only one caller used this feature and that caller can trivially pass the object's size. - Assert that the vm_object is locked on entry; don't bother testing for a NULL vm_object. - Style: Fix lines that are longer than 80 characters.	2003-04-26 23:41:30 +00:00
alc	373b18b5c3	- Convert vm_object_pip_wait() from using tsleep() to msleep(). - Make vm_object_pip_sleep() static. - Lock the vm_object when performing vm_object_pip_wait().	2003-04-26 18:33:18 +00:00
alc	87da2c3cf3	- Acquire the vm_object's lock when performing vm_object_page_clean(). - Add a parameter to vm_pageout_flush() that tells vm_pageout_flush() whether its caller has locked the vm_object. (This is a temporary measure to bootstrap vm_object locking.)	2003-04-24 04:31:25 +00:00
alc	83fe46be18	Update locking around vm_object_page_remove() to use the new macros.	2003-04-18 16:39:03 +00:00
alc	227f7746c4	Use vm_object_pip_wait() rather than reimplementing it.	2003-04-13 05:10:44 +00:00
tegge	5e14826743	Adjust the number of vnodes scanned by vlrureclaim() according to the size of the vnode list.	2003-03-26 22:15:58 +00:00
yar	f9968b4d9f	We shouldn't assert that a vode is locked in vop_lock_post() if VOP_LOCK() has failed. Reviewed by: jeff	2003-03-22 13:21:54 +00:00
jeff	ec5374265b	- Remove a dead check for bp->b_vp == vp in vtruncbuf(). This has not been possible for some time. - Lock the buf before accessing fields. This should very rarely be locked. - Assert that B_DELWRI is set after we acquire the buf. This should always be the case now.	2003-03-13 07:22:53 +00:00
jeff	ae3c8799da	- Remove a race between fsync like functions and flushbufqueues() by requiring locked bufs in vfs_bio_awrite(). Previously the buf could have been written out by fsync before we acquired the buf lock if it weren't for giant. The cluster_wbuild() handles this race properly but the single write at the end of vfs_bio_awrite() would not. - Modify flushbufqueues() so there is only one copy of the loop. Pass a parameter in that says whether or not we should sync bufs with deps. - Call flushbufqueues() a second time and then break if we couldn't find any bufs without deps.	2003-03-13 07:19:23 +00:00
alc	c50367da67	Remove ENABLE_VFS_IOOPT. It is a long unfinished work-in-progress. Discussed on: arch@	2003-03-06 03:41:02 +00:00
njl	5a225ad933	Finish cleanup of vprint() which was begun with changing v_tag to a string. Remove extraneous uses of vop_null, instead defering to the default op. Rename vnode type "vfs" to the more descriptive "syncer". Fix formatting for various filesystems that use vop_print.	2003-03-03 19:15:40 +00:00
jeff	8e95e91722	- Hold the vnode interlock across calls to bgetvp instead of acquiring it internally. This is required to stop multiple bufs from being associated with a single lblkno.	2003-03-02 06:05:23 +00:00
jeff	98d7696db0	- gc USE_BUFHASH. The smp locking of the buf cache renders this useless.	2003-03-01 05:55:03 +00:00
mckusick	6e9f6f2d6d	Prevent large files from monopolizing the system buffers. Keep track of the number of dirty buffers held by a vnode. When a bdwrite is done on a buffer, check the existing number of dirty buffers associated with its vnode. If the number rises above vfs.dirtybufthresh (currently 90% of vfs.hidirtybuffers), one of the other (hopefully older) dirty buffers associated with the vnode is written (using bawrite). In the event that this approach fails to curb the growth in it the vnode's number of dirty buffers (due to soft updates rollback dependencies), the more drastic approach of doing a VOP_FSYNC on the vnode is used. This code primarily affects very large and actively written files such as snapshots. This change should eliminate hanging when taking snapshots or doing background fsck on very large filesystems. Hopefully, one day it will be possible to cache filesystem metadata in the VM cache as is done with file data. As it stands, only the buffer cache can be used which limits total metadata storage to about 20Mb no matter how much memory is available on the system. This rather small memory gets badly thrashed causing a lot of extra I/O. For example, taking a snapshot of a 1Tb filesystem minimally requires about 35,000 write operations, but because of the cache thrashing (we only have about 350 buffers at our disposal) ends up doing about 237,540 I/O's thus taking twenty-five minutes instead of four if it could run entirely in the cache. Reported by: Attila Nagy <bra@fsn.hu> Sponsored by: DARPA & NAI Labs.	2003-02-25 06:44:42 +00:00
jeff	9e4c9a6ce9	- Add an interlock argument to BUF_LOCK and BUF_TIMELOCK. - Remove the buftimelock mutex and acquire the buf's interlock to protect these fields instead. - Hold the vnode interlock while locking bufs on the clean/dirty queues. This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another BUF_LOCK with a LK_TIMEFAIL to a single lock. Reviewed by: arch, mckusick	2003-02-25 03:37:48 +00:00
phk	af9c7adfc3	Bracket the kern.vnode sysctl in #ifdef notyet because it results in massive locking issues on diskless systems. It is also not clear that this sysctl is non-dangerous in its requirements for locked down memory on large RAM systems.	2003-02-23 18:09:05 +00:00
imp	cf874b345d	Back out M_* changes, per decision of the TRB. Approved by: trb	2003-02-19 05:47:46 +00:00
alfred	bf8e8a6e8f	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.	2003-01-21 08:56:16 +00:00
iedowse	1ec5f03e6d	Add a new vnode flag VI_DOINGINACT to indicate that a VOP_INACTIVE call is in progress on the vnode. When vput() or vrele() sees a 1->0 reference count transition, it now return without any further action if this flag is set. This flag is necessary to avoid recursion into VOP_INACTIVE if the filesystem inactive routine causes the reference count to increase and then drop back to zero. It is also used to guarantee that an unlocked vnode will not be recycled while blocked in VOP_INACTIVE(). There are at least two cases where the recursion can occur: one is that the softupdates code called by ufs_inactive() via ffs_truncate() can call vput() on the vnode. This has been reported by many people as "lockmgr: draining against myself" panics. The other case is that nfs_inactive() can call vget() and then vrele() on the vnode to clean up a sillyrename file. Reviewed by: mckusick (an older version of the patch)	2002-12-29 18:30:49 +00:00
phk	2eae537376	Use a timeout of one second while we wait for the vnode washer, this prevents a potential race and makes the system a little bit less jerky under extreme loads.	2002-12-29 11:18:25 +00:00
phk	90510abb6e	Vnodes pull in 800-900 bytes these days, all things counted, so we need to treat desiredvnodes much more like a limit than as a vague concept. On a 2GB RAM machine where desired vnodes is 130k, we run out of kmem_map space when we hit about 190k vnodes. If we wake up the vnode washer in getnewvnode(), sleep until it is done, so that it has a chance to offer us a washed vnode. If we don't sleep here we'll just race ahead and allocate yet a vnode which will never get freed. In the vnodewasher, instead of doing 10 vnodes per mountpoint per rotation, do 10% of the vnodes distributed evenly across the mountpoints.	2002-12-29 10:39:05 +00:00
phk	1496f0639d	KASSERT that vop_revoke() gets a VCHR.	2002-12-28 22:27:14 +00:00
alc	6be448f264	Perform vm_object_lock() and vm_object_unlock() around vm_object_page_remove().	2002-12-15 05:41:56 +00:00
alc	a7482ae294	To avoid lock order reversals in getnewvnode(), the call to uma_zfree() must be delayed until the vnode interlock is released. Reported by: kris@ Approved by: re (jhb)	2002-12-08 05:06:50 +00:00
robert	b4c9e24303	Do not set a variable (vp->p_pollinfo) to NULL if we know it already has that value. Approved by: re	2002-11-27 16:45:54 +00:00
rwatson	312cab0dee	Slightly change the semantics of vnode labels for MAC: rather than "refreshing" the label on the vnode before use, just get the label right from inception. For single-label file systems, set the label in the generic VFS getnewvnode() code; for multi-label file systems, leave the labeling up to the file system. With UFS1/2, this means reading the extended attribute during vfs_vget() as the inode is pulled off disk, rather than hitting the extended attributes frequently during operations later, improving performance. This also corrects sematics for shared vnode locks, which were not previously present in the system. This chances the cache coherrency properties WRT out-of-band access to label data, but in an acceptable form. With UFS1, there is a small race condition during automatic extended attribute start -- this is not present with UFS2, and occurs because EAs aren't available at vnode inception. We'll introduce a work around for this shortly. Approved by: re Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-10-26 14:38:24 +00:00
phk	7320ebcf72	In vrele() we can actually have a VCHR with v_rdev == NULL if we came from the bottom of addaliasu(). Don't panic.	2002-10-25 07:58:25 +00:00
mckusick	6b1611bd94	Within ufs, the ffs_sync and ffs_fsync functions did not always check for and/or report I/O errors. The result is that a VFS_SYNC or VOP_FSYNC called with MNT_WAIT could loop infinitely on ufs in the presence of a hard error writing a disk sector or in a filesystem full condition. This patch ensures that I/O errors will always be checked and returned. This patch also ensures that every call to VFS_SYNC or VOP_FSYNC with MNT_WAIT set checks for and takes appropriate action when an error is returned. Sponsored by: DARPA & NAI Labs.	2002-10-25 00:20:37 +00:00
phk	a3766d9d16	Fix the spechash lock order reversal by keeping an updated sum of v_usecount in the dev_t which vcount() can return without locking any vnodes. Seen by: jhb	2002-10-24 19:38:56 +00:00
mckusick	9d00a4a781	When scanning the freelist looking for candidate vnodes to recycle, be sure to exit the loop with vp == NULL if no candidates are found. Formerly, this bug would cause the last vnode inspected to be used, even if it was not available. The result was a panic "vn_finished_write: neg cnt". Sponsored by: DARPA & NAI Labs.	2002-10-14 19:54:39 +00:00
mckusick	05ff8976a7	Unconditionally reset vp->v_vnlock back to the default in the vclean() function (e.g., vp->v_vnlock = &vp->v_lock) rather than requiring filesystems that use alternate locks to do so in their vop_reclaim functions. This change is a further cleanup of the vop_stdlock interface. Submitted by: Poul-Henning Kamp <phk@critter.freebsd.dk> Sponsored by: DARPA & NAI Labs.	2002-10-14 19:44:51 +00:00
mckusick	25230d4c6a	Regularize the vop_stdlock'ing protocol across all the filesystems that use it. Specifically, vop_stdlock uses the lock pointed to by vp->v_vnlock. By default, getnewvnode sets up vp->v_vnlock to reference vp->v_lock. Filesystems that wish to use the default do not need to allocate a lock at the front of their node structure (as some still did) or do a lockinit. They can simply start using vn_lock/VOP_UNLOCK. Filesystems that wish to manage their own locks, but still use the vop_stdlock functions (such as nullfs) can simply replace vp->v_vnlock with a pointer to the lock that they wish to have used for the vnode. Such filesystems are responsible for setting the vp->v_vnlock back to the default in their vop_reclaim routine (e.g., vp->v_vnlock = &vp->v_lock). In theory, this set of changes cleans up the existing filesystem lock interface and should have no function change to the existing locking scheme. Sponsored by: DARPA & NAI Labs.	2002-10-14 03:20:36 +00:00
mckusick	16ad96c43c	When considering a vnode for reuse in getnewvnode, we call vcanrecycle to check a free vnode's availability. If it is available, vcanrecycle returns an error code of zero and the vnode in question locked. The getnewvnode routine then used to call vn_start_write with the V_NOWAIT flag. If the filesystem was suspended while taking a snapshot, the vn_start_write would fail but getnewvnode would fail to unlock the vnode, instead leaving it locked on the freelist. The result would be that the vnode would be locked forever and would eventually hang the system with a race to the root when it was attempted to recycle it. This fix moves the vn_start_write check into vcanrecycle where it will properly unlock the vnode if it is unavailable for recycling due to filesystem suspension. Sponsored by: DARPA & NAI Labs.	2002-10-11 01:04:14 +00:00
sobomax	18d9db4bb5	Fix problem introduced in rev.1.406, which can cause already unlocked mutex being unlocked again causing system panic.	2002-10-05 12:56:10 +00:00
phk	b55fa4540e	Fix some harmless mis-indents. Spotted by: FlexeLint	2002-10-01 15:48:31 +00:00
rwatson	5d5060bddf	Move vnode MAC label initialization to after the release of the vnode interlock in getnewvnode() to avoid possible sleeps while holding the mutex. Note that the warning from Witness is a slight false positive since we know there will be no contention on the interlock since we haven't made the vnode available for use yet, but the theory is not a bad one. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-09-30 20:51:48 +00:00
phk	1dfc2c167f	Be consistent about "static" functions: if the function is marked static in its prototype, mark it static at the definition too. Inspired by: FlexeLint warning #512	2002-09-28 17:15:38 +00:00
jeff	31b1ddae74	- Move ASSERT_VOP_LOCK functionality into functions in vfs_subr.c - Make the VI asserts more orthogonal to the rest of the asserts by using a new, common vfs_badlock() function and adding a 'str' arg. - Adjust generated ASSERTS to match the new prototype. - Adjust explicit ASSERTS to match the new prototype.	2002-09-26 04:48:44 +00:00
jeff	ee7cd9172d	- Lock down the syncer with sync_mtx. - Enable vfs_badlock_mutex by default. - Assert that the vp is locked in VOP_UNLOCK. - Use standard interlock macros in remaining code. - Correct a race in getnewvnode(). - Lock access to v_numoutput with interlock. - Lock access to buf lists and splay tree with interlock. - Add VOP and VI asserts. - Lock b_vnbufs with the vnode interlock. - Add vrefcnt() for callers who want to retreive the vnode ref without holding a lock. Add a comment that describes when this is safe. - Add vholdl() and vdropl() so that callers who already own the interlock can avoid race conditions and unnecessary unlocking. - Move the VOP_GETATTR() in vflush() into the WRITECLOSE conditional case. - Hold the interlock before droping the mntlist_mtx in vflush() to avoid a race. - Fix locking in vfs_msync().	2002-09-25 02:22:21 +00:00
njl	00c79f5c92	Remove any VOP_PRINT that redundantly prints the tag. Move lockmgr_printinfo() into vprint() for everyone's benefit. Suggested by: bde	2002-09-18 20:42:04 +00:00
njl	0590c43070	Remove all use of vnode->v_tag, replacing with appropriate substitutes. v_tag is now const char * and should only be used for debugging. Additionally: 1. All users of VT_NTS now check vfsconf->vf_type VFCF_NETWORK 2. The user of VT_PROCFS now checks for the new flag VV_PROCDEP, which is propagated by pseudofs to all child vnodes if the fs sets PFS_PROCDEP. Suggested by: phk Reviewed by: bde, rwatson (earlier version)	2002-09-14 09:02:28 +00:00
julian	06f500f894	Indentation does not make a block.. need curly braces too. Submitted by: Eagle-eyes evans <bde@freebsd.org>	2002-09-11 18:15:26 +00:00
julian	5702a380a5	Completely redo thread states. Reviewed by: davidxu@freebsd.org	2002-09-11 08:13:56 +00:00
phk	3303b3f624	Fix an inherited style bug: compare with NOCRED instead of NULL. Sponsored by: DARPA & NAI Labs.	2002-09-05 20:46:19 +00:00
phk	55be95d161	Introduce new extattr_check_cred() function which implements the canonical crential washing for extended attributes. Sponsored by: DARPA & NAI Labs.	2002-09-05 20:38:57 +00:00
charnier	7dd9d47059	Replace various spelling with FALLTHROUGH which is lint()able	2002-08-25 13:23:09 +00:00
jeff	da601a39ac	- Fix a mistake in my last few commits. The PDROP flag stops msleep from re-acquiring the mutex. Pointy hat to: me Noticed by: tegge	2002-08-23 00:32:03 +00:00
jeff	6c5497f47a	- Make vn_lock() vget() and VOP_LOCK() all behave the same way WRT LK_INTERLOCK. The interlock will never be held on return from these functions even when there is an error. Errors typically only occur when the XLOCK is held which means this isn't the vnode we want anyway. Almost all users of these interfaces expected this behavior even though it was not provided before.	2002-08-22 07:44:45 +00:00
jeff	1e39ba8620	- Fix interlock handling in vn_lock(). Previously, vn_lock() could return with interlock held in error conditions when the caller did not specify LK_INTERLOCK. - Add several comments to vn_lock() describing the rational behind the code flow since it was not immediately obvious.	2002-08-22 06:51:06 +00:00
jeff	275611472a	- Document two cases, one in vget and the other in vn_lock, where the state of interlock on exit is not consistent. There are probably several bugs relating to this.	2002-08-21 08:34:48 +00:00
jeff	ca5f1feb36	- If vn_lock fails with the LK_INTERLOCK flag set, interlock will not be released. vcanrecycle() failed to unlock interlock under this condition. - Remove an extra VOP_UNLOCK from a failure case in vcanrecycle(). Pointed out by: rwatson	2002-08-21 06:40:34 +00:00
jeff	2fc7835d26	- Add two new debugging macros: ASSERT_VI_LOCKED and ASSERT_VI_UNLOCKED - Use the new VI asserts in place of the old mtx_assert checks. - Add the VI asserts to the automated lock checking in the VOP calls. The interlock should not be held across vops with a few exceptions. - Add the vop_(un)lock_{pre,post} functions to assert that interlock is held when LK_INTERLOCK is set.	2002-08-21 06:19:29 +00:00
jeff	d18378e088	- Extend the vnode_free_list_mtx to cover numvnodes and freevnodes. This was done only some of the time before, and now it is uniformly applied.	2002-08-13 05:29:48 +00:00
mux	f43070c325	- Introduce a new struct xvfsconf, the userland version of struct vfsconf. - Make getvfsbyname() take a struct xvfsconf *. - Convert several consumers of getvfsbyname() to use struct xvfsconf. - Correct the getvfsbyname.3 manpage. - Create a new vfs.conflist sysctl to dump all the struct xvfsconf in the kernel, and rewrite getvfsbyname() to use this instead of the weird existing API. - Convert some {set,get,end}vfsent() consumers to use the new vfs.conflist sysctl. - Convert a vfsload() call in nfsiod.c to kldload() and remove the useless vfsisloadable() and endvfsent() calls. - Add a warning printf() in vfs_sysctl() to tell people they are using an old userland. After these changes, it's possible to modify struct vfsconf without breaking the binary compatibility. Please note that these changes don't break this compatibility either. When bp will have updated mount_smbfs(8) with the patch I sent him, there will be no more consumers of the {set,get,end}vfsent(), vfsisloadable() and vfsload() API, and I will promptly delete it.	2002-08-10 20:19:04 +00:00
jeff	f91961bfed	- Move some logic from getnewvnode() to a new function vcanrecycle() - Unlock the free list mutex around vcanrecycle to prevent a lock order reversal.	2002-08-05 10:15:56 +00:00
jeff	02517b6731	- Replace v_flag with v_iflag and v_vflag - v_vflag is protected by the vnode lock and is used when synchronization with VOP calls is needed. - v_iflag is protected by interlock and is used for dealing with vnode management issues. These flags include X/O LOCK, FREE, DOOMED, etc. - All accesses to v_iflag and v_vflag have either been locked or marked with mp_fixme's. - Many ASSERT_VOP_LOCKED calls have been added where the locking was not clear. - Many functions in vfs_subr.c were restructured to provide for stronger locking. Idea stolen from: BSD/OS	2002-08-04 10:29:36 +00:00
rwatson	a5dcc1fd3d	Include file cleanup; mac.h and malloc.h at one point had ordering relationship requirements, and no longer do. Reminded by: bde	2002-08-01 17:47:56 +00:00
des	2ca172b725	Nit in previous commit: the correct sysctl type is "S,xvnode"	2002-07-31 12:25:28 +00:00
des	9c7ec03502	Initialize v_cachedid to -1 in getnewvnode(). Reintroduce the kern.vnode sysctl and make it export xvnodes rather than vnodes. Sponsored by: DARPA, NAI Labs	2002-07-31 12:24:35 +00:00
rwatson	6bb9b1da05	Note that the privilege indicating flag to vaccess() originally used by the process accounting system is now deprecated.	2002-07-31 02:05:12 +00:00
rwatson	261170743f	Introduce support for Mandatory Access Control and extensible kernel access control. Invoke the necessary MAC entry points to maintain labels on vnodes. In particular, initialize the label when the vnode is allocated or reused, and destroy the label when the vnode is going to be released, or reused. Wow, an object where there really is exactly one place where it's allocated, and one other where it's freed. Amazing. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-31 02:03:46 +00:00
jeff	5dce00d8f1	- Backout the patch made in revision 1.75 of vfs_mount.c. The vputs here were hiding the real problem of the missing unlock in sync_inactive. - Add the missing unlock in sync_inactive. Submitted by: iedowse	2002-07-29 06:26:55 +00:00
truckman	b1555a2743	Wire the sysctl output buffer before grabbing any locks to prevent SYSCTL_OUT() from blocking while locks are held. This should only be done when it would be inconvenient to make a temporary copy of the data and defer calling SYSCTL_OUT() until after the locks are released.	2002-07-28 19:59:31 +00:00
rwatson	7be639a7c0	Teach discretionary access control methods for files about VAPPEND and VALLPERM. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-22 03:57:07 +00:00
mckusick	b44cb5787c	Add support to UFS2 to provide storage for extended attributes. As this code is not actually used by any of the existing interfaces, it seems unlikely to break anything (famous last words). The internal kernel interface to manipulate these attributes is invoked using two new IO_ flags: IO_NORMAL and IO_EXT. These flags may be specified in the ioflags word of VOP_READ, VOP_WRITE, and VOP_TRUNCATE. Specifying IO_NORMAL means that you want to do I/O to the normal data part of the file and IO_EXT means that you want to do I/O to the extended attributes part of the file. IO_NORMAL and IO_EXT are mutually exclusive for VOP_READ and VOP_WRITE, but may be specified individually or together in the case of VOP_TRUNCATE. For example, when removing a file, VOP_TRUNCATE is called with both IO_NORMAL and IO_EXT set. For backward compatibility, if neither IO_NORMAL nor IO_EXT is set, then IO_NORMAL is assumed. Note that the BA_ and IO_ flags have been `merged' so that they may both be used in the same flags word. This merger is possible by assigning the IO_ flags to the low sixteen bits and the BA_ flags the high sixteen bits. This works because the high sixteen bits of the IO_ word is reserved for read-ahead and help with write clustering so will never be used for flags. This merge lets us get away from code of the form: if (ioflags & IO_SYNC) flags \|= BA_SYNC; For the future, I have considered adding a new field to the vattr structure, va_extsize. This addition could then be exported through the stat structure to allow applications to find out the size of the extended attribute storage and also would provide a more standard interface for truncating them (via VOP_SETATTR rather than VOP_TRUNCATE). I am also contemplating adding a pathconf parameter (for concreteness, lets call it _PC_MAX_EXTSIZE) which would let an application determine the maximum size of the extended atribute storage. Sponsored by: DARPA & NAI Labs.	2002-07-19 07:29:39 +00:00
mckusick	3abb526f86	Change utimes to set the file creation time (for filesystems that support creation times such as UFS2) to the value of the modification time if the value of the modification time is older than the current creation time. See utimes(2) for further details. Sponsored by: DARPA & NAI Labs.	2002-07-17 02:03:19 +00:00
dillon	da4e111a55	Replace the global buffer hash table with per-vnode splay trees using a methodology similar to the vm_map_entry splay and the VM splay that Alan Cox is working on. Extensive testing has appeared to have shown no increase in overhead. Disadvantages Dirties more cache lines during lookups. Not as fast as a hash table lookup (but still N log N and optimal when there is locality of reference). Advantages vnode->v_dirtyblkhd is now perfectly sorted, making fsync/sync/filesystem syncer operate more efficiently. I get to rip out all the old hacks (some of which were mine) that tried to keep the v_dirtyblkhd tailq sorted. The per-vnode splay tree should be easier to lock / SMPng pushdown on vnodes will be easier. This commit along with another that Alan is working on for the VM page global hash table will allow me to implement ranged fsync(), optimize server-side nfs commit rpcs, and implement partial syncs by the filesystem syncer (aka filesystem syncer would detect that someone is trying to get the vnode lock, remembers its place, and skip to the next vnode). Note that the buffer cache splay is somewhat more complex then other splays due to special handling of background bitmap writes (multiple buffers with the same lblkno in the same vnode), and B_INVAL discontinuities between the old hash table and the existence of the buffer on the v_cleanblkhd list. Suggested by: alc	2002-07-10 17:02:32 +00:00
jeff	fe9018671a	- Use standard locking functions in syncer's opv - vput instead of vrele syncer vnodes in vfs_mount - Add vop_lookup_{pre,post} to verify locking in VOP_LOOKUP	2002-07-09 19:54:20 +00:00
jeff	cca3a0ef3d	- Don't hold the vn lock while calling VOP_CLOSE in vclean().	2002-07-07 06:38:22 +00:00
jeff	8bf1a039cb	- BUF_REFCNT() seems to be the preferred method for verifying a locked buf. Tell vop_strategy_pre() to use this instead. - Ignore B_CLUSTER bufs. Their components are locked but they don't really exist so they don't have to be. This isn't ideal but it is safe.	2002-07-07 05:29:45 +00:00
jeff	f1b0400267	Fix a mistake in my last commit. Don't grab an extra reference to the object in bp->b_object.	2002-07-06 21:27:20 +00:00
jeff	0dd7645264	Fixup uses of GETVOBJECT. - Cache a pointer to the vnode's object in the buf. - Hold a reference to that object in addition to the vnode's reference just to be consistent. - Cleanup code that got the object indirectly through the vp and VOP calls. This fixes at least one case where we were calling GETVOBJECT without a lock. It also avoids an expensive layered call at the cost of another pointer in struct buf.	2002-07-06 08:59:52 +00:00
jeff	908b0eb9a7	- Add vop_strategy_pre to validate VOP_STRATEGY locking. - Disable original vop_strategy lock specification. - Switch to the new vop_strategy_pre for lock validation. VOP_STRATEGY requires only that the buf is locked UNLESS the block numbers need to be translated. There may be other reasons, but as long as the underlying layer uses a VOP to perform the operations they will be caught later.	2002-07-06 05:21:12 +00:00
jeff	3bce786a77	Add "vop_rename_pre" to do pre rename lock verification. This is enabled only with DEBUG_VFS_LOCKS.	2002-07-06 04:39:48 +00:00
mux	4f6ffa4183	Move vfs_rootmountalloc() in vfs_mount.c and remove lite2_vfs_mountroot() which was #if 0'd and is not likely to be used now.	2002-07-03 09:27:24 +00:00
mux	eb5a0f4a7e	Move every code related to mount(2) in a new file, vfs_mount.c. The file vfs_conf.c which was dealing with root mounting has been repo-copied into vfs_mount.c to preserve history. This makes nmount related development easier, and help reducing the size of vfs_syscalls.c, which is still an enormous file. Reviewed by: rwatson Repo-copy by: peter	2002-07-02 17:09:22 +00:00
iedowse	4416f82706	Use indirect function pointer hooks instead of #ifdef SOFTUPDATES direct calls for the two places where the kernel calls into soft updates code. Set up the hooks in softdep_initialize() and NULL them out in softdep_uninitialize(). This change allows soft updates to function correctly when ufs is loaded as a module. Reviewed by: mckusick	2002-07-01 17:59:40 +00:00

... 3 4 5 6 7 ...

816 Commits