freebsd-nq

Author	SHA1	Message	Date
Jeff Roberson	6703c30bb5	- Remove vx_lock, vx_unlock, vx_wait, etc. - Add a vn_start_write/vn_finished_write around vlrureclaim so we don't do writing ops without suspending. This could suspend the vlruproc which should not be a problem under normal circumstances. - Manually implement VMIGHTFREE in vlrureclaim as this was the only instance where it was used. - Acquire a lock before calling vgone() as it now requires it. - Move the acquisition of the vnode interlock from vtryrecycle() to getnewvnode() so that if it fails we don't drop and reacquire the vnode_free_list_mtx. - Check for a usecount or holdcount at the end of vtryrecycle() in case someone grabbed a ref while we were recycling. Abort the recycle, and on the final ref drop this vnode will be placed on the head of the free list. - Move the redundant VOP_INACTIVE protection code into the local vinactive() routine to avoid code bloat. - Keep the vnode lock held across calls to vgone() in several places. - vgonel() no longer uses XLOCK, instead callers must hold an exclusive vnode lock. The VI_DOOMED flag is set to allow other threads to detect a vnode which is no longer valid. This flag is set until the last reference is gone, and there are no chances for a new ref. vgonel() holds this lock across the entire function, which greatly simplifies logic. _ Only vfree() in one place in vgone() not three. - Adjust vget() to check the VI_DOOMED flag prior to waiting on the lock in the LK_NOWAIT case. In other cases, check after we have slept and acquired an exlusive lock. This will simulate the old vx_wait() behavior. Sponsored by: Isilon Systems, Inc.	2005-03-13 11:54:28 +00:00
Jeff Roberson	d9a9c2c22c	- Enable SMP VFS by default on current. More users are needed to turn up any remaining bugs. Anyone inconvenienced by this can still disable it in the loader. Sponsored by: Isilon Systems, Inc.	2005-02-23 10:05:43 +00:00
Jeff Roberson	d8a7c99a1c	- Only the xlock holder should be calling VOP_LOCK on a vp once VI_XLOCK has been set. Assert that this is the case so that we catch filesystems who are using naked VOP_LOCKs in illegal cases. Sponsored by: Isilon Systems, Inc.	2005-02-23 00:11:14 +00:00
Jeff Roberson	4c11620bb9	- Add a check for xlock in vop_lock_assert. Presently the xlock is considered to be as good as an exclusive lock, although there is still a possibility of someone acquiring a VOP LOCK while xlock is held. Sponsored by: Isilon Systems, Inc.	2005-02-22 23:59:11 +00:00
Poul-Henning Kamp	767056c0e8	Zero the v_un container field to make sure everything is gone.	2005-02-22 18:56:18 +00:00
Poul-Henning Kamp	aa2f6ddc3f	Reap more benefits from DEVFS: List devfs_dirents rather than vnodes off their shared struct cdev, this saves a pointer field in the vnode at the expense of a field in the devfs_dirent. There are often 100 times more vnodes so this is bargain. In addition it makes it harder for people to try to do stypid things like "finding the vnode from cdev". Since DEVFS handles all VCHR nodes now, we can do the vnode related cleanup in devfs_reclaim() instead of in dev_rel() and vgonel(). Similarly, we can do the struct cdev related cleanup in dev_rel() instead of devfs_reclaim(). rename idestroy_dev() to destroy_devl() for consistency. Add LIST_ENTRY de_alias to struct devfs_dirent. Remove v_specnext from struct vnode. Change si_hlist to si_alist in struct cdev. String new devfs vnodes' devfs_dirent on si_alist when we create them and take them off in devfs_reclaim(). Fix devfs_revoke() accordingly. Also don't clear fields devfs_reclaim() will clear when called from vgone(); Let devfs_reclaim() call dev_rel() instead of vgonel(). Move the usecount tracking from dev_rel() to devfs_reclaim(), and let dev_rel() take a struct cdev argument instead of vnode. Destroy SI_CHEAPCLONE devices in dev_rel() (instead of devfs_reclaim()) when they are no longer used. (This should maybe happen in devfs_close() instead.)	2005-02-22 15:51:07 +00:00
Poul-Henning Kamp	7fc940b266	Remove vfinddev(), it is generally bogus when faced with jails and chroot and has no legitimate use(r)s in the tree.	2005-02-22 14:11:47 +00:00
Poul-Henning Kamp	dfd4be14bd	Try to unbreak the vnode locking around vop_reclaim() (based mostly on patch from kan@). Pull bufobj_invalbuf() out of vinvalbuf() and make g_vfs call it on close. This is not yet a generally safe function, but for this very specific use it is safe. This solves the problem with buffers not being flushed by unmount or after failed mount attempts.	2005-02-19 11:44:57 +00:00
Poul-Henning Kamp	900b7e2648	Make sure to drop the VI_LOCK in vgonel(); Spotted by: Taku YAMAMOTO <taku@tackymt.homeip.net>	2005-02-18 11:13:56 +00:00
Poul-Henning Kamp	4d8ac58b05	Introduce vx_wait{l}() and use it instead of home-rolled versions.	2005-02-17 10:49:51 +00:00
Poul-Henning Kamp	58aac12894	Convert KASSERTS to VNASSERTS	2005-02-17 10:28:58 +00:00
Poul-Henning Kamp	1ba212823f	Make various vnode related functions static	2005-02-10 12:28:58 +00:00
Poul-Henning Kamp	fe0198779c	Don't pass NULL to vprint()	2005-02-10 08:55:08 +00:00
Jeff Roberson	68f2274d97	- Add a new assert in the getnewvnode(). Assert that the usecount is still 0 to detect getnewvnode() races. - Add the vnode address to a few panics near by to help in debugging. Sponsored by: Isilon Systems, Inc.	2005-02-08 23:27:10 +00:00
Poul-Henning Kamp	b9489a449c	Access vmobject via the bufobj instead of the vnode	2005-02-07 10:04:06 +00:00
Poul-Henning Kamp	b348abd6cd	Don't call VOP_DESTROYVOBJECT(), trust that VOP_RECLAIM() did what was necessary.	2005-02-07 07:48:03 +00:00
Poul-Henning Kamp	d4eb29ba71	Remove unused argument to vrecycle()	2005-01-28 13:08:21 +00:00
Poul-Henning Kamp	1fdfaafb08	Integrate vclean() into vgonel(). Various associated polishing.	2005-01-28 13:00:03 +00:00
Poul-Henning Kamp	3fc8dd0653	Remove register keyword	2005-01-28 12:39:10 +00:00
Poul-Henning Kamp	8516dd18e1	Don't use VOP_GETVOBJECT, use vp->v_object directly.	2005-01-25 00:40:01 +00:00
Poul-Henning Kamp	b5b6ec5faa	Eliminate the constant flags argument to vclean()	2005-01-24 22:22:02 +00:00
Poul-Henning Kamp	7c93282e42	Change vprint() to vn_printf() which takes varargs. Add #define for vprint() to call vn_printf().	2005-01-24 13:58:08 +00:00
Poul-Henning Kamp	35764be39e	Kill the VV_OBJBUF and test the v_object for NULL instead.	2005-01-24 13:13:57 +00:00
Jeff Roberson	d1fcf3bb31	- Add the tunable and sysctl for the mpsafevfs. It currently defaults to off. - Protect access to mnt_kern_flag with the mointpoint mutex. - Remove some KASSERTs which are not legal checks without the appropriate locks held. - Use VCANRECYCLE() rather than rolling several slightly different checks together. - Return from vtryrecycle() with a recycled vnode rather than a locked vnode. This simplifies some locking. - Remove several GIANT_REQUIRED lines. - Add a few KASSERTs to help with INACT debugging. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:41:01 +00:00
Poul-Henning Kamp	7bf38aeae7	Fix a bug I introduced in 1.561 which has caused considerable filesystem unhappiness lately. As far as I can tell, no files that have made it safely to disk have been endangered, but stuff in transit has been in peril. Pointy hat: phk	2005-01-16 21:09:39 +00:00
Poul-Henning Kamp	7c0745eeae	Eliminate unused and unnecessary "cred" argument from vinvalbuf()	2005-01-14 07:33:51 +00:00
Poul-Henning Kamp	e39db32ab0	Ditch vfs_object_create() and make the callers call VOP_CREATEVOBJECT() directly.	2005-01-13 12:25:19 +00:00
Poul-Henning Kamp	6ef8480a88	Add BO_SYNC() and add a default which uses the secret vnode pointer and VOP_FSYNC() for now.	2005-01-11 10:43:08 +00:00
Poul-Henning Kamp	6afa350d53	More vnode -> bufobj migration.	2005-01-11 10:16:39 +00:00
Poul-Henning Kamp	8d785753bd	Give flushbuflist() a struct bufv as first argument and avoid home-rolling TAILQ_FOREACH_SAFE(). Loose the error pointer argument and return any errors the normal way. Return EAGAIN for the case where more work needs to be done.	2005-01-11 10:01:54 +00:00
Poul-Henning Kamp	8df6bac4c7	Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC(). I'm not sure why a credential was added to these in the first place, it is not used anywhere and it doesn't make much sense: The credentials for syncing a file (ability to write to the file) should be checked at the system call level. Credentials for syncing one or more filesystems ("none") should be checked at the system call level as well. If the filesystem implementation needs a particular credential to carry out the syncing it would logically have to the cached mount credential, or a credential cached along with any delayed write data. Discussed with: rwatson	2005-01-11 07:36:22 +00:00
Warner Losh	9454b2d864	/* -> /*- for copyright notices, minor format tweaks as necessary	2005-01-06 23:35:40 +00:00
Poul-Henning Kamp	0b3e4fe239	Since we do not support forceful unmount of DEVFS we can do away with the partially implemented vnode-readoption code in vgonechrl().	2005-01-04 08:49:14 +00:00
Poul-Henning Kamp	e87047b437	We can only ever get to vgonechrl() from a devfs vnode, so we do not need to reassign the vp->v_op to devfs_specops, we know that is the value already. Make devfs_specops private to devfs.	2004-12-20 21:34:29 +00:00
Poul-Henning Kamp	20a92a18f1	The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly split the conversion of the remaining three filesystems out from the root mounting changes, so in one go: cd9660: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS() nfs(client): Convert to nmount (the simple way, mount_nfs(8) is still necessary). Add omount compat shims. Drop COMPAT_PRELITE2 mount arg compatibility. ffs: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS() Remove vfs_omount() method, all filesystems are now converted. Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem task, and they all do it now. Change rootmounting to use DEVFS trampoline: vfs_mount.c: Mount devfs on /. Devfs needs no 'from' so this is clean. symlink /dev to /. This makes it possible to lookup /dev/foo. Mount "real" root filesystem on /. Surgically move the devfs mountpoint from under the real root filesystem onto /dev in the real root filesystem. Remove now unnecessary getdiskbyname(). kern_init.c: Don't do devfs mounting and rootvnode assignment here, it was already handled by vfs_mount.c. Remove now unused bdevvp(), addaliasu() and addalias(). Put the few necessary lines in devfs where they belong. This eliminates the second-last source of bogo vnodes, leaving only the lemming-syncer. Remove rootdev variable, it doesn't give meaning in a global context and was not trustworth anyway. Correct information is provided by statfs(/).	2004-12-07 08:15:41 +00:00
Poul-Henning Kamp	f76fedd20b	Improve vprint() a little bit: break long lines, reduce indent and tell if the VI_LOCK() is held.	2004-12-03 12:09:34 +00:00
Poul-Henning Kamp	aec0fb7b40	Back when VOP_* was introduced, we did not have new-style struct initializations but we did have lofty goals and big ideals. Adjust to more contemporary circumstances and gain type checking. Replace the entire vop_t frobbing thing with properly typed structures. The only casualty is that we can not add a new VOP_ method with a loadable module. History has not given us reason to belive this would ever be feasible in the the first place. Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc. Give coda correct prototypes and function definitions for all vop_()s. Generate a bit more data from the vnode_if.src file: a struct vop_vector and protype typedefs for all vop methods. Add a new vop_bypass() and make vop_default be a pointer to another struct vop_vector. Remove a lot of vfs_init since vop_vector is ready to use from the compiler. Cast various vop_mumble() to void * with uppercase name, for instance VOP_PANIC, VOP_NULL etc. Implement VCALL() by making vdesc_offset the offsetof() the relevant function pointer in vop_vector. This is disgusting but since the code is generated by a script comparatively safe. The alternative for nullfs etc. would be much worse. Fix up all vnode method vectors to remove casts so they become typesafe. (The bulk of this is generated by scripts)	2004-12-01 23:16:38 +00:00
Poul-Henning Kamp	a752aa8f17	Move pbgetvp() and pbrelvp() to vm_pager.c with the rest of the pbuf stuff.	2004-11-15 08:12:50 +00:00
Poul-Henning Kamp	11bcbee11b	Move the bit of the syncer which deals with vnodes into a separate function.	2004-11-14 15:24:38 +00:00
Poul-Henning Kamp	db442506db	Eliminate vop_revoke() function now that devfs_revoke() does the entire job.	2004-11-13 23:38:13 +00:00
Poul-Henning Kamp	c5b846fe8e	Slim vnodes by another four bytes by eliminating the (now) unused field v_cachedid.	2004-11-10 07:31:06 +00:00
Poul-Henning Kamp	c13a4e8820	Remove vn_todev()	2004-11-10 07:17:28 +00:00
Poul-Henning Kamp	b797084e48	Remove vnode->v_cachedfs. It was only used for the highly dangerous "export all vnodes with a sysctl" function.	2004-11-09 22:51:03 +00:00
Poul-Henning Kamp	c569065139	Remove buf->b_dev field.	2004-11-04 07:59:57 +00:00
Poul-Henning Kamp	e0b687d33b	Always initialize bo_private along with bo_ops in getnewvnode(). Spotted by: tegge	2004-11-03 21:09:23 +00:00
Poul-Henning Kamp	996b2c82ca	Loose vfs_mountedon()	2004-10-29 11:15:08 +00:00
Poul-Henning Kamp	e1f355fe4e	Give the bufobj a private __bo_vnode for now to keep the syncer floating [1] At some point later the syncer will unlearn about vnodes and the filesystems method called by the syncer will know enough about what's in bo_private to do the right thing. [1] Ok, I know, but I couldn't resist the pun.	2004-10-29 09:33:32 +00:00
Poul-Henning Kamp	20eba72f53	Move the syncer linkage from vnode to bufobj. This is not quite a perfect separation: the syncer still think it knows that everything is a vnode.	2004-10-27 08:05:02 +00:00
Poul-Henning Kamp	5d9d81e7ea	Put the I/O block size in bufobj->bo_bsize. We keep si_bsize_phys around for now as that is the simplest way to pull the number out of disk device drivers in devfs_open(). The correct solution would be to do an ioctl(DIOCGSECTORSIZE), but the point is probably mooth when filesystems sit on GEOM, so don't bother for now.	2004-10-26 07:39:12 +00:00
Poul-Henning Kamp	156cb26583	Loose the v_dirty* and v_clean* alias macros. Check the count field where we just want to know the full/empty state, rather than using TAILQ_EMPTY() or TAILQ_FIRST().	2004-10-25 09:14:03 +00:00
Poul-Henning Kamp	ee1d0eb330	Remove vnode->v_bsize. This was a dead-end.	2004-10-25 07:50:59 +00:00
Poul-Henning Kamp	4dcd0ac4cf	Collapse vnode->v_object and buf->b_object into bufobj->bo_object.	2004-10-25 06:02:57 +00:00
Poul-Henning Kamp	b792bebeea	Move the buffer method vector (buf->b_op) to the bufobj. Extend it with a strategy method. Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY song and dance. Rename ibwrite to bufwrite(). Move the two NFS buf_ops to more sensible places, add bufstrategy to them. Add inlines for bwrite() and bstrategy() which calls through buf->b_bufobj->b_ops->b_{write,strategy}(). Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().	2004-10-24 20:03:41 +00:00
Robert Watson	b0e86f6ac2	When MAC is enabled, warn if getnewvnode() is asked to produce a vnode without a mountpoint. In this scenario, there's no useful source for a label on the vnode, since we can't query the mountpoint for the labeling strategy or default label.	2004-10-22 11:04:58 +00:00
Poul-Henning Kamp	ff7c5a4880	Alas, poor SPECFS! -- I knew him, Horatio; A filesystem of infinite jest, of most excellent fancy: he hath taught me lessons a thousand times; and now, how abhorred in my imagination it is! my gorge rises at it. Here were those hacks that I have curs'd I know not how oft. Where be your kludges now? your workarounds? your layering violations, that were wont to set the table on a roar? Move the skeleton of specfs into devfs where it now belongs and bury the rest.	2004-10-22 09:59:37 +00:00
Poul-Henning Kamp	494eb176e7	Add b_bufobj to struct buf which eventually will eliminate the need for b_vp. Initialize b_bufobj for all buffers. Make incore() and gbincore() take a bufobj instead of a vnode. Make inmem() local to vfs_bio.c Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj) also VI_MTX() to BO_MTX(), Make buf_vlist_add() take a bufobj instead of a vnode. Eliminate other uses of bp->b_vp where bp->b_bufobj will do. Various minor polishing: remove "register", turn panic into KASSERT, use new function declarations, TAILQ_FOREACH_SAFE() etc.	2004-10-22 08:47:20 +00:00
Poul-Henning Kamp	a76d8f4ec9	Move the VI_BWAIT flag into no bo_flag element of bufobj and call it BO_WWAIT Add bufobj_wref(), bufobj_wdrop() and bufobj_wwait() to handle the write count on a bufobj. Bufobj_wdrop() replaces vwakeup(). Use these functions all relevant places except in ffs_softdep.c where the use if interlocked_sleep() makes this impossible. Rename b_vnbufs to b_bobufs now that we touch all the relevant files anyway.	2004-10-21 15:53:54 +00:00
Poul-Henning Kamp	1bca607b9f	Add BO_* macros parallel to VI_* macros for manipulating the bo_mtx. Initialize the bo_mtx when we allocate a vnode i getnewvnode() For now we point to the vnodes interlock mutex, that retains the exact same locking sematics. Move v_numoutput from vnode to bufobj. Add renaming macro to postpone code sweep.	2004-10-21 14:42:31 +00:00
Poul-Henning Kamp	67647b2312	Polish vtruncbuf() to improve readability and style a bit.	2004-10-21 14:13:54 +00:00
Poul-Henning Kamp	e163395619	Simplify buf_vlist_remove(). Now that we have encapsulated the splaytree related information into a structure we can eliminate the half of this function.	2004-10-21 13:48:50 +00:00
Greg Lehey	57259f2864	vtryrecycle: Don't rely on type VBAD alone to mean that we don't need to clean the vnode. If v_data is set, we still need to clean it. This code change should catch all incidents of the previous commit (INVARIANTS only).	2004-10-06 02:09:59 +00:00
Greg Lehey	f2154b33d2	getnewvnode: Weaken the panic "cleaned vnode isn't" to a warning. Discussion: this panic (or waning) only occurs when the kernel is compiled with INVARIANTS. Otherwise the problem (which means that the vp->v_data field isn't NULL, and represents a coding error and possibly a memory leak) is silently ignored by setting it to NULL later on. Panicking here isn't very helpful: by this time, we can only find the symptoms. The panic occurs long after the reason for "not cleaning" has been forgotten; in the case in point, it was the result of severe file system corruption which left the v_type field set to VBAD. That issue will be addressed by a separate commit.	2004-10-06 02:06:11 +00:00
Poul-Henning Kamp	ba2851254f	Fix a LOR relating to freeing cdevs.	2004-10-01 06:33:39 +00:00
Poul-Henning Kamp	70526ca6a5	Hold dev_lock and check for NULL devsw pointer when we determine if a vnode is a disk.	2004-09-24 06:16:08 +00:00
Poul-Henning Kamp	a0e78d2eb0	Do not refcount the cdevsw, but rather maintain a cdev->si_threadcount of the number of threads which are inside whatever is behind the cdevsw for this particular cdev. Make the device mutex visible through dev_lock() and dev_unlock(). We may want finer granularity later. Replace spechash_mtx use with dev_lock()/dev_unlock().	2004-09-23 07:17:41 +00:00
Poul-Henning Kamp	08dbd671ff	Remove unused B_WRITEINPROG flag	2004-09-15 21:49:22 +00:00
Poul-Henning Kamp	1affa3adc8	Create simple function init_va_filerev() for initializing a va_filerev field. Replace three instances of longhaired initialization va_filerev fields. Added XXX comment wondering why we don't use random bits instead of uptime of the system for this purpose.	2004-09-07 09:17:05 +00:00
Don Lewis	8ded654028	Don't attempt to trigger the syncer thread final sync code in the shutdown_pre_sync state if the RB_NOSYNC flag is set. This is the likely cause of hangs after a system panic that are keeping crash dumps from being done. This is a MFC candidate for RELENG_5. MFC after: 3 days	2004-08-20 19:21:47 +00:00
David E. O'Brien	78c37b0de8	s/MAX_SAFE_MAXVNODES/MAXVNODES_MAX/g	2004-08-16 08:33:37 +00:00
John-Mark Gurney	ad3b9257c2	Add locking to the kqueue subsystem. This also makes the kqueue subsystem a more complete subsystem, and removes the knowlege of how things are implemented from the drivers. Include locking around filter ops, so a module like aio will know when not to be unloaded if there are outstanding knotes using it's filter ops. Currently, it uses the MTX_DUPOK even though it is not always safe to aquire duplicate locks. Witness currently doesn't support the ability to discover if a dup lock is ok (in some cases). Reviewed by: green, rwatson (both earlier versions)	2004-08-15 06:24:42 +00:00
Robert Watson	87e83e7d4c	In v_addpollinfo(), we allocate storage to back vp->v_pollinfo. However, we may sleep when doing so; check that we didn't race with another thread allocating storage for the vnode after allocation is made to a local pointer, and only update the vnode pointer if it's still NULL. Otherwise, accept that another thread got there first, and release the local storage. Discussed with: jmg	2004-08-11 01:27:53 +00:00
Nate Lawson	c8c216d558	Skip the syncing disks loop if there are no dirty buffers. Remove a variable used to flag the initial printf. Submitted by: truckman (earlier version)	2004-08-10 01:32:05 +00:00
David E. O'Brien	64298d52cc	Put a cap on the auto-tuning of kern.maxvnodes. Cap value chosen by: scottl	2004-08-02 21:52:43 +00:00
Nate Lawson	b1c8139147	Minor message cleanup.	2004-07-30 01:30:05 +00:00
Poul-Henning Kamp	3dfe213e61	Convert the vfsconf list to a TAILQ. Introduce vfs_byname() function to find things on it. Staticize vfs_nmount() function under the name vfs_donmount(). Various cleanups.	2004-07-27 22:32:01 +00:00
Colin Percival	56f21b9d74	Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is somewhat clearer, but more importantly allows for a consistent naming scheme for suser_cred flags. The old name is still defined, but will be removed in a few days (unless I hear any complaints...) Discussed with: rwatson, scottl Requested by: jhb	2004-07-26 07:24:04 +00:00
Poul-Henning Kamp	cf95b5c381	Eliminate unused second argument to reassignbuf() and simplify it accordingly.	2004-07-25 21:24:23 +00:00
Alfred Perlstein	05656b6e2b	put several of the options for DEBUG_VFS_LOCKS under control of sysctls.	2004-07-21 07:13:14 +00:00
Alfred Perlstein	bb5faea34f	Cleanup shutdown output.	2004-07-15 08:01:00 +00:00
Alfred Perlstein	da6303bacc	Tidy up system shutdown.	2004-07-15 04:29:48 +00:00
Alfred Perlstein	f257b7a54b	Make VFS_ROOT() and vflush() take a thread argument. This is to allow filesystems to decide based on the passed thread which vnode to return. Several filesystems used curthread, they now use the passed thread.	2004-07-12 08:14:09 +00:00
Alfred Perlstein	7ae8ce5df1	Dump the actual bad values when this assertion is tripped.	2004-07-12 04:13:38 +00:00
Marcel Moolenaar	32240d082c	Update for the KDB framework: o Call kdb_enter() instead of Debugger().	2004-07-10 21:47:53 +00:00
Alfred Perlstein	057589c485	fixup sysctl by fsid node	2004-07-08 06:11:36 +00:00
Alfred Perlstein	ea0104b032	Introduce vfs_suser(), used to test if a user should have special privs for a mount.	2004-07-06 09:37:43 +00:00
Alfred Perlstein	c713aaaeca	NFS mobility PHASE I, II & III (phase VI, and V pending): Rebind the client socket when we experience a timeout. This fixes the case where our IP changes for some reason. Signal a VFS event when NFS transitions from up to down and vice versa. Add a placeholder vfs_sysctl where we will put status reporting shortly. Also: Make down NFS mounts return EIO instead of EINTR when there is a soft timeout or force unmount in progress.	2004-07-06 09:12:03 +00:00
Don Lewis	27875d9c88	Unconditionally set last_work_seen while in the SYNCER_RUNNING state so that last_work_seen has a reasonable value at the transition to the SYNCER_SHUTTING_DOWN state, even if net_worklist_len happened to be zero at the time. Initialize last_work_seen to zero as a safety measure in case the syncer never ran in the SYNCER_RUNNING state. Tested by: phk	2004-07-05 21:32:01 +00:00
Don Lewis	faf1b66d1d	Rework syncer termination code: Speed up the syncer when shutting down by sleeping for a shorter period of time instead of cranking up rushjob and using the normal one second sleep. Skip empty worklist slots when shutting down to avoid lengthy intervals of inactivity. Give I/O more time to complete between steps by not speeding the syncer quite as much. Terminate the syncer after one full pass through the worklist plus one second with the worklist containing nothing but syncer vnodes. Print an indication of shutdown progress to the console. Add a sysctl, vfs.worklist_len, to allow the size of the syncer worklist to be monitored.	2004-07-05 01:07:33 +00:00
Poul-Henning Kamp	c555963fd1	Give synthetic root filesystem device vnodes a v_bsize of DEV_BSIZE.	2004-07-04 22:33:22 +00:00
Alfred Perlstein	2d1dca73ee	Pass the operation in with the fsidctl. Remove some fsidctls that we will not be using. Correct prototypes for fs sysctls.	2004-07-04 20:21:58 +00:00
Poul-Henning Kamp	7f6599fec6	Make the last commit handle non-phk root devices better.	2004-07-04 19:42:25 +00:00
Poul-Henning Kamp	1cbb1e02c4	Blocksize for I/O should be a property of the vnode and not found by groping around in the vnodes surroundings when we allocate a block. Assign a blocksize when we create a vnode, and yell a warning (and ignore it) if we got the wrong size. Please email all such warnings to me.	2004-07-04 12:49:04 +00:00
Alfred Perlstein	94ed9c8af5	Introduce a new kevent filter. EVFILT_FS that will be used to signal generic filesystem events to userspace. Currently only mount and unmount of filesystems are signalled. Soon to be added, up/down status of NFS. Introduce a sysctl node used to route requests to/from filesystems based on filesystem ids. Introduce a new vfsop, vfs_sysctl(mp, req) that is used as the callback/ entrypoint by the sysctl code to change individual filesystems.	2004-07-04 10:52:54 +00:00
Alfred Perlstein	903ac7c219	Revision 1.496 would not boot on my system due to ffs_mount -> bdevvp -> getnewvnode(..., mp = NULL, ...) -> insmntqueue(vp, mp = NULL) -> KASSERT -> panic Make getnewvnode() only call insmntqueue() if the mountpoint parameter is not NULL.	2004-07-04 10:19:15 +00:00
Poul-Henning Kamp	e3c5a7a4dd	When we traverse the vnodes on a mountpoint we need to look out for our cached 'next vnode' being removed from this mountpoint. If we find that it was recycled, we restart our traversal from the start of the list. Code to do that is in all local disk filesystems (and a few other places) and looks roughly like this: MNT_ILOCK(mp); loop: for (vp = TAILQ_FIRST(&mp...); (vp = nvp) != NULL; nvp = TAILQ_NEXT(vp,...)) { if (vp->v_mount != mp) goto loop; MNT_IUNLOCK(mp); ... MNT_ILOCK(mp); } MNT_IUNLOCK(mp); The code which takes vnodes off a mountpoint looks like this: MNT_ILOCK(vp->v_mount); ... TAILQ_REMOVE(&vp->v_mount->mnt_nvnodelist, vp, v_nmntvnodes); ... MNT_IUNLOCK(vp->v_mount); ... vp->v_mount = something; (Take a moment and try to spot the locking error before you read on.) On a SMP system, one CPU could have removed nvp from our mountlist but not yet gotten to assign a new value to vp->v_mount while another CPU simultaneously get to the top of the traversal loop where it finds that (vp->v_mount != mp) is not true despite the fact that the vnode has indeed been removed from our mountpoint. Fix: Introduce the macro MNT_VNODE_FOREACH() to traverse the list of vnodes on a mountpoint while taking into account that vnodes may be removed from the list as we go. This saves approx 65 lines of duplicated code. Split the insmntque() which potentially moves a vnode from one mount point to another into delmntque() and insmntque() which does just what the names say. Fix delmntque() to set vp->v_mount to NULL while holding the mountpoint lock.	2004-07-04 08:52:35 +00:00
Don Lewis	e06500dde5	When shutting down the syncer kernel thread, first tell it to run faster and iterate to over its work list a few times in an attempt to empty the work list before the syncer terminates. This leaves fewer dirty blocks to be written at the "syncing disks" stage and keeps the the "giving up on N buffers" problem from being triggered by the presence of a large soft updates work list at system shutdown time. The downside is that the syncer takes noticeably longer to terminate. Tested by: "Arjan van Leeuwen" <avleeuwen AT piwebs DOT com> Approved by: mckusick	2004-07-01 23:59:19 +00:00
Poul-Henning Kamp	f3732fd15b	Second half of the dev_t cleanup. The big lines are: NODEV -> NULL NOUDEV -> NODEV udev_t -> dev_t udev2dev() -> findcdev() Various minor adjustments including handling of userland access to kernel space struct cdev etc.	2004-06-17 17:16:53 +00:00
Poul-Henning Kamp	89c9c53da0	Do the dreaded s/dev_t/struct cdev */ Bump __FreeBSD_version accordingly.	2004-06-16 09:47:26 +00:00
Poul-Henning Kamp	170593a9b5	Remove a left over from userland buffer-cache access to disks.	2004-06-14 14:25:03 +00:00
Robert Watson	9e6127fe3b	Assert Giant in vrele().	2004-05-31 19:06:01 +00:00
Maxime Henrion	a0b5a67929	Put deprecated sysctl code inside BURN_BRIDGES.	2004-04-11 21:09:22 +00:00
Warner Losh	7f8a436ff2	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999. Approved by: core	2004-04-05 21:03:37 +00:00
Peter Wemm	39d3505a30	Kill some XXXKSE's. vnlru/syncer are single threaded.	2004-03-29 22:45:33 +00:00
Poul-Henning Kamp	4d453ef101	Properly vector all bwrite() and BUF_WRITE() calls through the same path and s/BUF_WRITE()/bwrite()/ since it now does the same as bwrite().	2004-03-11 18:02:36 +00:00
Poul-Henning Kamp	651b11eaf2	Remove unused second arg to vfinddev(). Don't call addaliasu() on VBLK nodes.	2004-03-11 16:33:11 +00:00
Alexander Kabaev	ff85a3f0e1	Always call vn_finished_write after vn_start_write was called. All occurences of 'goto done' after vn_start_write invocation were cleaning up incompletely.	2004-03-06 04:09:54 +00:00
John Baldwin	44f3b09204	Switch the sleep/wakeup and condition variable implementations to use the sleep queue interface: - Sleep queues attempt to merge some of the benefits of both sleep queues and condition variables. Having sleep qeueus in a hash table avoids having to allocate a queue head for each wait channel. Thus, struct cv has shrunk down to just a single char * pointer now. However, the hash table does not hold threads directly, but queue heads. This means that once you have located a queue in the hash bucket, you no longer have to walk the rest of the hash chain looking for threads. Instead, you have a list of all the threads sleeping on that wait channel. - Outside of the sleepq code and the sleep/cv code the kernel no longer differentiates between cv's and sleep/wakeup. For example, calls to abortsleep() and cv_abort() are replaced with a call to sleepq_abort(). Thus, the TDF_CVWAITQ flag is removed. Also, calls to unsleep() and cv_waitq_remove() have been replaced with calls to sleepq_remove(). - The sched_sleep() function no longer accepts a priority argument as sleep's no longer inherently bump the priority. Instead, this is soley a propery of msleep() which explicitly calls sched_prio() before blocking. - The TDF_ONSLEEPQ flag has been dropped as it was never used. The associated TDF_SET_ONSLEEPQ and TDF_CLR_ON_SLEEPQ macros have also been dropped and replaced with a single explicit clearing of td_wchan. TD_SET_ONSLEEPQ() would really have only made sense if it had taken the wait channel and message as arguments anyway. Now that that only happens in one place, a macro would be overkill.	2004-02-27 18:52:44 +00:00
Don Lewis	47934cef8f	Split the mlock() kernel code into two parts, mlock(), which unpacks the syscall arguments and does the suser() permission check, and kern_mlock(), which does the resource limit checking and calls vm_map_wire(). Split munlock() in a similar way. Enable the RLIMIT_MEMLOCK checking code in kern_mlock(). Replace calls to vslock() and vsunlock() in the sysctl code with calls to kern_mlock() and kern_munlock() so that the sysctl code will obey the wired memory limits. Nuke the vslock() and vsunlock() implementations, which are no longer used. Add a member to struct sysctl_req to track the amount of memory that is wired to handle the request. Modify sysctl_wire_old_buffer() to return an error if its call to kern_mlock() fails. Only wire the minimum of the length specified in the sysctl request and the length specified in its argument list. It is recommended that sysctl handlers that use sysctl_wire_old_buffer() should specify reasonable estimates for the amount of data they want to return so that only the minimum amount of memory is wired no matter what length has been specified by the request. Modify the callers of sysctl_wire_old_buffer() to look for the error return. Modify sysctl_old_user to obey the wired buffer length and clean up its implementation. Reviewed by: bms	2004-02-26 00:27:04 +00:00
Poul-Henning Kamp	ded67d0f77	Check for NODEV return from udev2dev()	2004-02-21 23:52:03 +00:00
Poul-Henning Kamp	cd690b60de	Device megapatch 6/6: This is what we came here for: Hang dev_t's from their cdevsw, refcount cdevsw and dev_t and generally keep track of things a lot better than we used to: Hold a cdevsw reference around all entrances into the device driver, this will be necessary to safely determine when we can unload driver code. Hold a dev_t reference while the device is open. KASSERT that we do not enter the driver on a non-referenced dev_t. Remove old D_NAG code, anonymous dev_t's are not a problem now. When destroy_dev() is called on a referenced dev_t, move it to dead_cdevsw's list. When the refcount drops, free it. Check that cdevsw->d_version is correct. If not, set all methods to the dead_*() methods to prevent entrance into driver. Print warning on console to this effect. The device driver may still explode if it is also incompatible with newbus, but in that case we probably didn't get this far in the first place.	2004-02-21 21:57:26 +00:00
Poul-Henning Kamp	816d62bbb9	Device megapatch 5/6: Remove the unused second argument from udev2dev(). Convert all remaining users of makedev() to use udev2dev(). The semantic difference is that udev2dev() will only locate a pre-existing dev_t, it will not line makedev() create a new one. Apart from the tiny well controlled windown in D_PSEUDO drivers, there should no longer be any "anonymous" dev_t's in the system now, only dev_t's created with make_dev() and make_dev_alias()	2004-02-21 21:32:15 +00:00
Alexander Kabaev	580ddfa64b	More style fixes. Obtained from: bde	2004-01-05 23:40:46 +00:00
Alexander Kabaev	b0fdf71656	style(9): Add empty line before first code line in functions with no local variables. Properly terminate comment sentences. Indent lines which are longer that 80 characters. Move v_addpollinfo closer to the rest of poll-related functions. Move DEBUG_VFS_LOCKS ifdefed block to the end of file. Obtained from: bde (partly)	2004-01-05 19:04:29 +00:00
Alexander Kabaev	3ff1b7c23f	Cosmetics: strip '\n' from a string passed to Debugger().	2004-01-04 03:42:20 +00:00
Bruce Evans	9efe7d9d83	v_vxproc was a bogus name for a thread (pointer).	2003-12-28 09:12:56 +00:00
Jeff Roberson	958557e9c7	- In vget() if LK_NOWAIT is specified we should return EBUSY and not ENOENT. Submitted by: Stephan Uphoff <ups@stups.com>	2003-12-16 17:08:27 +00:00
Jeff Roberson	d85213669b	- When doing a forced unmount, VFS attempts to keep VCHR vnodes valid by reassigning their v_ops field to specfs, detaching from the mountpoint, etc. However, this is not sufficient. If we vclean() the vnode the pages owned by the vnode are lost, potentially while buffers reference them. Implement parts of vclean() seperately in vgonechrl() so that the pages and bufs associated with a device vnode are not destroyed while in use.	2003-12-16 17:05:05 +00:00
Jeff Roberson	a6c6a93c89	- Don't forget to unlock the vnode interlock in the LK_NOWAIT case. Submitted by: Stephan Uphoff <ups@stups.com> Approved by: re (rwatson)	2003-11-30 22:09:58 +00:00
Seigo Tanimura	512824f8f7	- Implement selwakeuppri() which allows raising the priority of a thread being waken up. The thread waken up can run at a priority as high as after tsleep(). - Replace selwakeup()s with selwakeuppri()s and pass appropriate priorities. - Add cv_broadcastpri() which raises the priority of the broadcast threads. Used by selwakeuppri() if collision occurs. Not objected in: -arch, -current	2003-11-09 09:17:26 +00:00
Alexander Kabaev	ca430f2e92	Remove mntvnode_mtx and replace it with per-mountpoint mutex. Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to operate on this mutex transparently. Eventually new mutex will be protecting more fields in struct mount, not only vnode list. Discussed with: jeff	2003-11-05 04:30:08 +00:00
Garrett Wollman	06cb76bde3	Add appropriate const poisoning to the assert_*locked() family so that I can call ASSERT_VOP_LOCKED(vp, __func__) without a diagnostic. Inspired by: the evil and rude OpenAFS cache manager code	2003-10-23 18:17:36 +00:00
Alan Cox	f2b1200d08	Initialize the buf's b_object in pbgetvp(). Clear it in pbrelvp(). (This facilitates synchronization of the vm page's valid field using the vm object's lock.) Suggested by: tegge	2003-10-20 18:24:38 +00:00
Poul-Henning Kamp	3da2d6a453	Simplify count_dev()	2003-10-17 11:56:48 +00:00
Poul-Henning Kamp	5108cd3652	Simplify vn_isdisk() a bit.	2003-10-12 14:04:39 +00:00
Jeff Roberson	7dd1328c13	- Fix a typo, I meant & and not \|. This was causing lockups from the syncer looping forever due to list corruption. Solved by: tegge	2003-10-11 21:50:45 +00:00
Jeff Roberson	bdcfcdecea	- Fix an XXX. Check the error of vn_lock() in vflush(). Don't specify LK_RETRY either, we don't want this vnode if it turns into another. - Remove the code that checks the mount point after acquiring the lock we are guaranteed to either fail or get the vnode that we wanted.	2003-10-05 07:12:38 +00:00
Jeff Roberson	45503a37dd	- Rename vcanrecycle() to vtryrecycle() to reflect its new role. - In vtryrecycle() try to vgonel the vnode if all of the previous checks passed. We won't vgonel if someone has either acquired a hold or usecount or started the vgone process elsewhere. This is because we may have been removed from the free list while we were inspecting the vnode for recycling. - The VI_TRYLOCK stops two threads from entering getnewvnode() and recycling the same vnode. To further reduce the likelyhood of this event, requeue the vnode on the tail of the list prior to calling vtryrecycle(). We can not actually remove the vnode from the list until we know that it's going to be recycled because other interlock holders may see the VI_FREE flag and try to remove it from the free list. - Kill a bogus XXX comment. If XLOCK is set we shouldn't wait for it regardless of MNT_WAIT because the vnode does not actually belong to this filesystem.	2003-10-05 05:35:41 +00:00
Jeff Roberson	85311d4b59	- Don't cache_purge() in getnewvnode. It's done in vclean(). With this purge, the purge in vclean, and the filesystems purge, we had 3 purges per vnode. - Move the insmntque(vp, 0) to vclean() so that we may remove it from the two vgone() functions and reduce the number of lock operations required.	2003-10-05 02:48:04 +00:00
Jeff Roberson	ce13b187e7	- Solve a LOR with the sync_mtx by using the VI_ONWORKLST flag to determine whether or not the sync failed. This could potentially get set between the time that we VOP_UNLOCK and VI_LOCK() but the race would harmelssly lead to the sync being delayed by an extra 30 seconds. If we do not move the vnode it could cause an endless loop if it continues to fail to sync. - Use vhold and vdrop to stop the vnode from changing identities while we have it unlocked. Other internal vfs lists are likely to follow this scheme.	2003-10-05 00:35:41 +00:00
Jeff Roberson	894fbf9769	- Move the xlock 'locking' code into vx_lock() and vx_unlock(). - Create a new function, vgonechrl(), which performs vgone for an in-use character device. Move the code from vflush() that did this into vgonechrl(). - Hold the xlock across the entirety of vgonel() and vgonechrl() so that at no point will an invalid vnode exist on any list without XLOCK set. - Move the xlock code out of vclean() now that it is in the vgone*() functions.	2003-10-05 00:02:41 +00:00
Jeff Roberson	6f4b0863e0	- In sched_sync() test our preconditions prior to dropping the sync_mtx. This is so that we may grab the interlock while still holding the sync_mtx. We have to VI_TRYLOCK() because in all other cases the lock order runs the other way. - If we don't meet any of the preconditions, reinsert the vp into the list for the next second. - We don't need to panic if we fail to sync here because each FSYNC function handles this case. Removing this redundant code also simplifies locking.	2003-10-04 18:03:53 +00:00
Jeff Roberson	e4c49d2b50	- In a Giantless world, the vn_lock() in vcanrecycle() could legitimately fail. Remove the panic from that case and document why it might fail. - Document the reason for calling cache_purge() on a newly created vnode. - In insmntque() order the operations so that we can call mtx_unlock() one fewer times. This makes the code somewhat clearer as well. - Add XXX comments in sched_sync() and vflush(). - In vget(), do not sleep while waiting for XLOCK to clear if LK_NOWAIT is set. - In vclean() we don't need to acquire a lock around a single TAILQ_FIRST call. It's ok if we race here, the vinvalbuf will just do nothing. - Increase the scope of the lock in vgonel() to reduce the number of lock operations that are performed.	2003-10-04 15:10:40 +00:00
Jeff Roberson	51b575490c	- In reassignbuf() don't unlock vp and lock newvp if they are the same. Doing so creates a race where the buf is on neither list. - Only vfree() in an error case in vclean() if VSHOULDFREE() thinks we should. - Convert the error case in vclean() to INVARIANTS from DIAGNOSTIC as this really should not happen and is fast to check.	2003-09-20 00:21:48 +00:00
Jeff Roberson	6b6c163a37	- Remove spls(). The locking that has replaced them is in place and they no longer serve as guidelines for future work.	2003-09-19 23:52:06 +00:00
Alexander Kabaev	aebbeee812	Eliminate one case of VI_UNLOCK followed by an immediate VI_LOCK.	2003-09-19 19:13:54 +00:00
John Baldwin	8b149b5131	Consistently use the BSD u_int and u_short instead of the SYSV uint and ushort. In most of these files, there was a mixture of both styles and this change just makes them self-consistent. Requested by: bde (kern_ktrace.c)	2003-08-07 15:04:27 +00:00
Poul-Henning Kamp	68f2d20b70	Revert stuff which accidentally ended up in the previous commit.	2003-07-22 10:36:36 +00:00
Poul-Henning Kamp	55d1d7034f	Don't attempt to inline large functions mb_alloc() and mb_free(), it more than doubles the text size of this file. GCC has wisely ignored us on this previously	2003-07-22 10:24:41 +00:00
David E. O'Brien	677b542ea2	Use __FBSDID().	2003-06-11 00:56:59 +00:00
Poul-Henning Kamp	a62f80f8e0	Remove unused variable and now unbalanced call to splbio(); Found by: FlexeLint	2003-05-31 20:09:01 +00:00
Alan Cox	2e05d89828	Make the maximum number of vnodes a function of both the physical memory size and the kernel's heap size, specifically, vm_kmem_size. This function allows a maximum of 40% of the vm_kmem_size to be used for vnodes and vm objects. This is a conservative bound based upon recent problem reports. (In other words, a slight increase in this percentage may be safe.) Finally, machines with less than ~3GB of RAM should be unaffected by this change, i.e., the maximum number of vnodes should remain the same. If necessary, machines with 3GB or more of RAM can increase the maximum number of vnodes by increasing vm_kmem_size. Desired by: scottl Tested by: jake Approved by: re (rwatson,scottl)	2003-05-23 19:54:02 +00:00
Don Lewis	1e9bc9f889	Detect that a vnode has been reclaimed while vflush() was waiting to lock the vnode and restart the loop. Vflush() is vulnerable since it does not hold a reference to the vnode and it holds no other locks while waiting for the vnode lock. The vnode will no longer be on the list when the loop is restarted. Approved by: re (rwatson)	2003-05-16 19:46:51 +00:00
Alan Cox	099e981aa1	Optimize the use of splay in gbincore(). During a "make buildworld" the desired buffer is found at one of the roots more than 60% of the time. Thus, checking both roots before performing either splay eliminates unnecessary splays on the first tree splayed. Approved by: re (jhb)	2003-05-13 04:36:02 +00:00
Robert Watson	1964fb9ba2	Remove bogus locking from DDB's "show lockedvnods" command: using synchronization primitives from inside DDB is generally a bad idea, and in this case it frequently results in panics due to DDB commands being executed from the sio fast interrupt context on a serial console. Replace the locking with a note that a lack of locking means that DDB may get see inconsistent views of the mount and vnode lists, which could also result in a panic. More frequently, though, this avoids a panic than causes it. Discussed with ages ago: bde Approved by: re (scottl)	2003-05-12 14:37:47 +00:00
Alan Cox	bff99f0d12	- Revert kern/vfs_subr.c revision 1.444. The vm_object's size isn't trustworthy for vnode-backed objects. - Restore the old behavior of vm_object_page_remove() when the end of the given range is zero. Add a comment to vm_object_page_remove() regarding this behavior. Reported by: iedowse	2003-05-03 08:09:24 +00:00
Alan Cox	ebba1b25f9	Lock accesses to the vm_object's ref_count and resident_page_count.	2003-05-01 03:10:38 +00:00
Alan Cox	ecde4b3218	Various changes to vm_object_page_remove(): - Eliminate an odd, special-case feature: if start == end == 0 then all pages are removed. Only one caller used this feature and that caller can trivially pass the object's size. - Assert that the vm_object is locked on entry; don't bother testing for a NULL vm_object. - Style: Fix lines that are longer than 80 characters.	2003-04-26 23:41:30 +00:00
Alan Cox	1ca5895341	- Convert vm_object_pip_wait() from using tsleep() to msleep(). - Make vm_object_pip_sleep() static. - Lock the vm_object when performing vm_object_pip_wait().	2003-04-26 18:33:18 +00:00
Alan Cox	b6e48e0372	- Acquire the vm_object's lock when performing vm_object_page_clean(). - Add a parameter to vm_pageout_flush() that tells vm_pageout_flush() whether its caller has locked the vm_object. (This is a temporary measure to bootstrap vm_object locking.)	2003-04-24 04:31:25 +00:00
Alan Cox	49281fbf68	Update locking around vm_object_page_remove() to use the new macros.	2003-04-18 16:39:03 +00:00
Alan Cox	e96c181d16	Use vm_object_pip_wait() rather than reimplementing it.	2003-04-13 05:10:44 +00:00
Tor Egge	6b08046175	Adjust the number of vnodes scanned by vlrureclaim() according to the size of the vnode list.	2003-03-26 22:15:58 +00:00
Yaroslav Tykhiy	17ce5b94d6	We shouldn't assert that a vode is locked in vop_lock_post() if VOP_LOCK() has failed. Reviewed by: jeff	2003-03-22 13:21:54 +00:00
Jeff Roberson	e99215a614	- Remove a dead check for bp->b_vp == vp in vtruncbuf(). This has not been possible for some time. - Lock the buf before accessing fields. This should very rarely be locked. - Assert that B_DELWRI is set after we acquire the buf. This should always be the case now.	2003-03-13 07:22:53 +00:00
Jeff Roberson	09f11da5a3	- Remove a race between fsync like functions and flushbufqueues() by requiring locked bufs in vfs_bio_awrite(). Previously the buf could have been written out by fsync before we acquired the buf lock if it weren't for giant. The cluster_wbuild() handles this race properly but the single write at the end of vfs_bio_awrite() would not. - Modify flushbufqueues() so there is only one copy of the loop. Pass a parameter in that says whether or not we should sync bufs with deps. - Call flushbufqueues() a second time and then break if we couldn't find any bufs without deps.	2003-03-13 07:19:23 +00:00
Alan Cox	09c80124a3	Remove ENABLE_VFS_IOOPT. It is a long unfinished work-in-progress. Discussed on: arch@	2003-03-06 03:41:02 +00:00
Nate Lawson	99648386d3	Finish cleanup of vprint() which was begun with changing v_tag to a string. Remove extraneous uses of vop_null, instead defering to the default op. Rename vnode type "vfs" to the more descriptive "syncer". Fix formatting for various filesystems that use vop_print.	2003-03-03 19:15:40 +00:00
Jeff Roberson	491081fabf	- Hold the vnode interlock across calls to bgetvp instead of acquiring it internally. This is required to stop multiple bufs from being associated with a single lblkno.	2003-03-02 06:05:23 +00:00
Jeff Roberson	bff5362bf2	- gc USE_BUFHASH. The smp locking of the buf cache renders this useless.	2003-03-01 05:55:03 +00:00
Kirk McKusick	3a7053cb60	Prevent large files from monopolizing the system buffers. Keep track of the number of dirty buffers held by a vnode. When a bdwrite is done on a buffer, check the existing number of dirty buffers associated with its vnode. If the number rises above vfs.dirtybufthresh (currently 90% of vfs.hidirtybuffers), one of the other (hopefully older) dirty buffers associated with the vnode is written (using bawrite). In the event that this approach fails to curb the growth in it the vnode's number of dirty buffers (due to soft updates rollback dependencies), the more drastic approach of doing a VOP_FSYNC on the vnode is used. This code primarily affects very large and actively written files such as snapshots. This change should eliminate hanging when taking snapshots or doing background fsck on very large filesystems. Hopefully, one day it will be possible to cache filesystem metadata in the VM cache as is done with file data. As it stands, only the buffer cache can be used which limits total metadata storage to about 20Mb no matter how much memory is available on the system. This rather small memory gets badly thrashed causing a lot of extra I/O. For example, taking a snapshot of a 1Tb filesystem minimally requires about 35,000 write operations, but because of the cache thrashing (we only have about 350 buffers at our disposal) ends up doing about 237,540 I/O's thus taking twenty-five minutes instead of four if it could run entirely in the cache. Reported by: Attila Nagy <bra@fsn.hu> Sponsored by: DARPA & NAI Labs.	2003-02-25 06:44:42 +00:00
Jeff Roberson	17661e5ac4	- Add an interlock argument to BUF_LOCK and BUF_TIMELOCK. - Remove the buftimelock mutex and acquire the buf's interlock to protect these fields instead. - Hold the vnode interlock while locking bufs on the clean/dirty queues. This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another BUF_LOCK with a LK_TIMEFAIL to a single lock. Reviewed by: arch, mckusick	2003-02-25 03:37:48 +00:00
Poul-Henning Kamp	acb18acfec	Bracket the kern.vnode sysctl in #ifdef notyet because it results in massive locking issues on diskless systems. It is also not clear that this sysctl is non-dangerous in its requirements for locked down memory on large RAM systems.	2003-02-23 18:09:05 +00:00
Warner Losh	a163d034fa	Back out M_* changes, per decision of the TRB. Approved by: trb	2003-02-19 05:47:46 +00:00
Alfred Perlstein	44956c9863	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.	2003-01-21 08:56:16 +00:00
Ian Dowse	6a1b2a22ef	Add a new vnode flag VI_DOINGINACT to indicate that a VOP_INACTIVE call is in progress on the vnode. When vput() or vrele() sees a 1->0 reference count transition, it now return without any further action if this flag is set. This flag is necessary to avoid recursion into VOP_INACTIVE if the filesystem inactive routine causes the reference count to increase and then drop back to zero. It is also used to guarantee that an unlocked vnode will not be recycled while blocked in VOP_INACTIVE(). There are at least two cases where the recursion can occur: one is that the softupdates code called by ufs_inactive() via ffs_truncate() can call vput() on the vnode. This has been reported by many people as "lockmgr: draining against myself" panics. The other case is that nfs_inactive() can call vget() and then vrele() on the vnode to clean up a sillyrename file. Reviewed by: mckusick (an older version of the patch)	2002-12-29 18:30:49 +00:00
Poul-Henning Kamp	371400cf2e	Use a timeout of one second while we wait for the vnode washer, this prevents a potential race and makes the system a little bit less jerky under extreme loads.	2002-12-29 11:18:25 +00:00
Poul-Henning Kamp	851a87ea1a	Vnodes pull in 800-900 bytes these days, all things counted, so we need to treat desiredvnodes much more like a limit than as a vague concept. On a 2GB RAM machine where desired vnodes is 130k, we run out of kmem_map space when we hit about 190k vnodes. If we wake up the vnode washer in getnewvnode(), sleep until it is done, so that it has a chance to offer us a washed vnode. If we don't sleep here we'll just race ahead and allocate yet a vnode which will never get freed. In the vnodewasher, instead of doing 10 vnodes per mountpoint per rotation, do 10% of the vnodes distributed evenly across the mountpoints.	2002-12-29 10:39:05 +00:00
Poul-Henning Kamp	9f16282798	KASSERT that vop_revoke() gets a VCHR.	2002-12-28 22:27:14 +00:00
Alan Cox	475e8011ab	Perform vm_object_lock() and vm_object_unlock() around vm_object_page_remove().	2002-12-15 05:41:56 +00:00
Alan Cox	2e29a1f21f	To avoid lock order reversals in getnewvnode(), the call to uma_zfree() must be delayed until the vnode interlock is released. Reported by: kris@ Approved by: re (jhb)	2002-12-08 05:06:50 +00:00
Robert Drehmel	f85a961930	Do not set a variable (vp->p_pollinfo) to NULL if we know it already has that value. Approved by: re	2002-11-27 16:45:54 +00:00
Robert Watson	763bbd2f4f	Slightly change the semantics of vnode labels for MAC: rather than "refreshing" the label on the vnode before use, just get the label right from inception. For single-label file systems, set the label in the generic VFS getnewvnode() code; for multi-label file systems, leave the labeling up to the file system. With UFS1/2, this means reading the extended attribute during vfs_vget() as the inode is pulled off disk, rather than hitting the extended attributes frequently during operations later, improving performance. This also corrects sematics for shared vnode locks, which were not previously present in the system. This chances the cache coherrency properties WRT out-of-band access to label data, but in an acceptable form. With UFS1, there is a small race condition during automatic extended attribute start -- this is not present with UFS2, and occurs because EAs aren't available at vnode inception. We'll introduce a work around for this shortly. Approved by: re Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-10-26 14:38:24 +00:00
Poul-Henning Kamp	0d6dc414b4	In vrele() we can actually have a VCHR with v_rdev == NULL if we came from the bottom of addaliasu(). Don't panic.	2002-10-25 07:58:25 +00:00
Kirk McKusick	9ab73fd11a	Within ufs, the ffs_sync and ffs_fsync functions did not always check for and/or report I/O errors. The result is that a VFS_SYNC or VOP_FSYNC called with MNT_WAIT could loop infinitely on ufs in the presence of a hard error writing a disk sector or in a filesystem full condition. This patch ensures that I/O errors will always be checked and returned. This patch also ensures that every call to VFS_SYNC or VOP_FSYNC with MNT_WAIT set checks for and takes appropriate action when an error is returned. Sponsored by: DARPA & NAI Labs.	2002-10-25 00:20:37 +00:00
Poul-Henning Kamp	a2fb4feded	Fix the spechash lock order reversal by keeping an updated sum of v_usecount in the dev_t which vcount() can return without locking any vnodes. Seen by: jhb	2002-10-24 19:38:56 +00:00
Kirk McKusick	a6b9f47b31	When scanning the freelist looking for candidate vnodes to recycle, be sure to exit the loop with vp == NULL if no candidates are found. Formerly, this bug would cause the last vnode inspected to be used, even if it was not available. The result was a panic "vn_finished_write: neg cnt". Sponsored by: DARPA & NAI Labs.	2002-10-14 19:54:39 +00:00
Kirk McKusick	e04a020067	Unconditionally reset vp->v_vnlock back to the default in the vclean() function (e.g., vp->v_vnlock = &vp->v_lock) rather than requiring filesystems that use alternate locks to do so in their vop_reclaim functions. This change is a further cleanup of the vop_stdlock interface. Submitted by: Poul-Henning Kamp <phk@critter.freebsd.dk> Sponsored by: DARPA & NAI Labs.	2002-10-14 19:44:51 +00:00
Kirk McKusick	a5b65058d5	Regularize the vop_stdlock'ing protocol across all the filesystems that use it. Specifically, vop_stdlock uses the lock pointed to by vp->v_vnlock. By default, getnewvnode sets up vp->v_vnlock to reference vp->v_lock. Filesystems that wish to use the default do not need to allocate a lock at the front of their node structure (as some still did) or do a lockinit. They can simply start using vn_lock/VOP_UNLOCK. Filesystems that wish to manage their own locks, but still use the vop_stdlock functions (such as nullfs) can simply replace vp->v_vnlock with a pointer to the lock that they wish to have used for the vnode. Such filesystems are responsible for setting the vp->v_vnlock back to the default in their vop_reclaim routine (e.g., vp->v_vnlock = &vp->v_lock). In theory, this set of changes cleans up the existing filesystem lock interface and should have no function change to the existing locking scheme. Sponsored by: DARPA & NAI Labs.	2002-10-14 03:20:36 +00:00
Kirk McKusick	192e439ed4	When considering a vnode for reuse in getnewvnode, we call vcanrecycle to check a free vnode's availability. If it is available, vcanrecycle returns an error code of zero and the vnode in question locked. The getnewvnode routine then used to call vn_start_write with the V_NOWAIT flag. If the filesystem was suspended while taking a snapshot, the vn_start_write would fail but getnewvnode would fail to unlock the vnode, instead leaving it locked on the freelist. The result would be that the vnode would be locked forever and would eventually hang the system with a race to the root when it was attempted to recycle it. This fix moves the vn_start_write check into vcanrecycle where it will properly unlock the vnode if it is unavailable for recycling due to filesystem suspension. Sponsored by: DARPA & NAI Labs.	2002-10-11 01:04:14 +00:00
Maxim Sobolev	790a8088d0	Fix problem introduced in rev.1.406, which can cause already unlocked mutex being unlocked again causing system panic.	2002-10-05 12:56:10 +00:00
Poul-Henning Kamp	8d3574c7a4	Fix some harmless mis-indents. Spotted by: FlexeLint	2002-10-01 15:48:31 +00:00
Robert Watson	0626774f08	Move vnode MAC label initialization to after the release of the vnode interlock in getnewvnode() to avoid possible sleeps while holding the mutex. Note that the warning from Witness is a slight false positive since we know there will be no contention on the interlock since we haven't made the vnode available for use yet, but the theory is not a bad one. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-09-30 20:51:48 +00:00
Poul-Henning Kamp	37c841831f	Be consistent about "static" functions: if the function is marked static in its prototype, mark it static at the definition too. Inspired by: FlexeLint warning #512	2002-09-28 17:15:38 +00:00
Jeff Roberson	6423c9433c	- Move ASSERT_VOP_LOCK functionality into functions in vfs_subr.c - Make the VI asserts more orthogonal to the rest of the asserts by using a new, common vfs_badlock() function and adding a 'str' arg. - Adjust generated ASSERTS to match the new prototype. - Adjust explicit ASSERTS to match the new prototype.	2002-09-26 04:48:44 +00:00
Jeff Roberson	6cb8bf2027	- Lock down the syncer with sync_mtx. - Enable vfs_badlock_mutex by default. - Assert that the vp is locked in VOP_UNLOCK. - Use standard interlock macros in remaining code. - Correct a race in getnewvnode(). - Lock access to v_numoutput with interlock. - Lock access to buf lists and splay tree with interlock. - Add VOP and VI asserts. - Lock b_vnbufs with the vnode interlock. - Add vrefcnt() for callers who want to retreive the vnode ref without holding a lock. Add a comment that describes when this is safe. - Add vholdl() and vdropl() so that callers who already own the interlock can avoid race conditions and unnecessary unlocking. - Move the VOP_GETATTR() in vflush() into the WRITECLOSE conditional case. - Hold the interlock before droping the mntlist_mtx in vflush() to avoid a race. - Fix locking in vfs_msync().	2002-09-25 02:22:21 +00:00
Nate Lawson	86ed6d45ac	Remove any VOP_PRINT that redundantly prints the tag. Move lockmgr_printinfo() into vprint() for everyone's benefit. Suggested by: bde	2002-09-18 20:42:04 +00:00
Nate Lawson	06be2aaa83	Remove all use of vnode->v_tag, replacing with appropriate substitutes. v_tag is now const char * and should only be used for debugging. Additionally: 1. All users of VT_NTS now check vfsconf->vf_type VFCF_NETWORK 2. The user of VT_PROCFS now checks for the new flag VV_PROCDEP, which is propagated by pseudofs to all child vnodes if the fs sets PFS_PROCDEP. Suggested by: phk Reviewed by: bde, rwatson (earlier version)	2002-09-14 09:02:28 +00:00
Julian Elischer	85e40eaf26	Indentation does not make a block.. need curly braces too. Submitted by: Eagle-eyes evans <bde@freebsd.org>	2002-09-11 18:15:26 +00:00
Julian Elischer	71fad9fdee	Completely redo thread states. Reviewed by: davidxu@freebsd.org	2002-09-11 08:13:56 +00:00
Poul-Henning Kamp	f8b663614d	Fix an inherited style bug: compare with NOCRED instead of NULL. Sponsored by: DARPA & NAI Labs.	2002-09-05 20:46:19 +00:00
Poul-Henning Kamp	c1a925a637	Introduce new extattr_check_cred() function which implements the canonical crential washing for extended attributes. Sponsored by: DARPA & NAI Labs.	2002-09-05 20:38:57 +00:00
Philippe Charnier	93b0017f88	Replace various spelling with FALLTHROUGH which is lint()able	2002-08-25 13:23:09 +00:00
Jeff Roberson	ad32f726db	- Fix a mistake in my last few commits. The PDROP flag stops msleep from re-acquiring the mutex. Pointy hat to: me Noticed by: tegge	2002-08-23 00:32:03 +00:00
Jeff Roberson	9abf54f032	- Make vn_lock() vget() and VOP_LOCK() all behave the same way WRT LK_INTERLOCK. The interlock will never be held on return from these functions even when there is an error. Errors typically only occur when the XLOCK is held which means this isn't the vnode we want anyway. Almost all users of these interfaces expected this behavior even though it was not provided before.	2002-08-22 07:44:45 +00:00
Jeff Roberson	183158485a	- Fix interlock handling in vn_lock(). Previously, vn_lock() could return with interlock held in error conditions when the caller did not specify LK_INTERLOCK. - Add several comments to vn_lock() describing the rational behind the code flow since it was not immediately obvious.	2002-08-22 06:51:06 +00:00
Jeff Roberson	0b600db425	- Document two cases, one in vget and the other in vn_lock, where the state of interlock on exit is not consistent. There are probably several bugs relating to this.	2002-08-21 08:34:48 +00:00
Jeff Roberson	88cf6b94bd	- If vn_lock fails with the LK_INTERLOCK flag set, interlock will not be released. vcanrecycle() failed to unlock interlock under this condition. - Remove an extra VOP_UNLOCK from a failure case in vcanrecycle(). Pointed out by: rwatson	2002-08-21 06:40:34 +00:00
Jeff Roberson	71ea4ba57c	- Add two new debugging macros: ASSERT_VI_LOCKED and ASSERT_VI_UNLOCKED - Use the new VI asserts in place of the old mtx_assert checks. - Add the VI asserts to the automated lock checking in the VOP calls. The interlock should not be held across vops with a few exceptions. - Add the vop_(un)lock_{pre,post} functions to assert that interlock is held when LK_INTERLOCK is set.	2002-08-21 06:19:29 +00:00
Jeff Roberson	055c012332	- Extend the vnode_free_list_mtx to cover numvnodes and freevnodes. This was done only some of the time before, and now it is uniformly applied.	2002-08-13 05:29:48 +00:00
Maxime Henrion	5965373e69	- Introduce a new struct xvfsconf, the userland version of struct vfsconf. - Make getvfsbyname() take a struct xvfsconf *. - Convert several consumers of getvfsbyname() to use struct xvfsconf. - Correct the getvfsbyname.3 manpage. - Create a new vfs.conflist sysctl to dump all the struct xvfsconf in the kernel, and rewrite getvfsbyname() to use this instead of the weird existing API. - Convert some {set,get,end}vfsent() consumers to use the new vfs.conflist sysctl. - Convert a vfsload() call in nfsiod.c to kldload() and remove the useless vfsisloadable() and endvfsent() calls. - Add a warning printf() in vfs_sysctl() to tell people they are using an old userland. After these changes, it's possible to modify struct vfsconf without breaking the binary compatibility. Please note that these changes don't break this compatibility either. When bp will have updated mount_smbfs(8) with the patch I sent him, there will be no more consumers of the {set,get,end}vfsent(), vfsisloadable() and vfsload() API, and I will promptly delete it.	2002-08-10 20:19:04 +00:00

... 2 3 4 5 6 ...

739 Commits