freebsd-dev

Author	SHA1	Message	Date
Konstantin Belousov	6179164448	In the nfsrv_fhtovp(), after the vfs_getvfs() function found the pointer to the fs, but before a vnode on the fs is locked, unmount may free fs structures, causing access to destroyed data and freed memory. Introduce a vfs_busymp() function that looks up and busies found fs while mountlist_mtx is held. Use it in nfsrv_fhtovp() and in the implementation of the handle syscalls. Two other uses of the vfs_getvfs() in the vfs_subr.c, namely in sysctl_vfs_ctl and vfs_getnewfsid seems to be ok. In particular, sysctl_vfs_ctl is protected by Giant by being a non-sleeping sysctl handler, that prevents Giant-locked unmount code to interfere with it. Noted by: tegge Reviewed by: dfr Tested by: pho MFC after: 1 month	2008-11-29 13:34:59 +00:00
Pawel Jakub Dawidek	1ba4a712dd	Update ZFS from version 6 to 13 and bring some FreeBSD-specific changes. This bring huge amount of changes, I'll enumerate only user-visible changes: - Delegated Administration Allows regular users to perform ZFS operations, like file system creation, snapshot creation, etc. - L2ARC Level 2 cache for ZFS - allows to use additional disks for cache. Huge performance improvements mostly for random read of mostly static content. - slog Allow to use additional disks for ZFS Intent Log to speed up operations like fsync(2). - vfs.zfs.super_owner Allows regular users to perform privileged operations on files stored on ZFS file systems owned by him. Very careful with this one. - chflags(2) Not all the flags are supported. This still needs work. - ZFSBoot Support to boot off of ZFS pool. Not finished, AFAIK. Submitted by: dfr - Snapshot properties - New failure modes Before if write requested failed, system paniced. Now one can select from one of three failure modes: - panic - panic on write error - wait - wait for disk to reappear - continue - serve read requests if possible, block write requests - Refquota, refreservation properties Just quota and reservation properties, but don't count space consumed by children file systems, clones and snapshots. - Sparse volumes ZVOLs that don't reserve space in the pool. - External attributes Compatible with extattr(2). - NFSv4-ACLs Not sure about the status, might not be complete yet. Submitted by: trasz - Creation-time properties - Regression tests for zpool(8) command. Obtained from: OpenSolaris	2008-11-17 20:49:29 +00:00
Attilio Rao	30f60d8c31	Remove the mnt_holdcnt and mnt_holdcntwaiters because they are useless. Really, the concept of holdcnt in the struct mount is rappresented by the mnt_ref (which prevents the type-stable structure from being "recycled) handled through vfs_ref() and vfs_rel(). On this optic, switch the holdcnt acquisition into an emulated vfs_ref() (and subsequent release into vfs_rel()). Discussed with: kib Tested by: pho	2008-11-03 20:00:35 +00:00
Attilio Rao	83b3bdbc8a	Improve VFS locking: - Implement real draining for vfs consumers by not relying on the mnt_lock and using instead a refcount in order to keep track of lock requesters. - Due to the change above, remove the mnt_lock lockmgr because it is now useless. - Due to the change above, vfs_busy() is no more linked to a lockmgr. Change so its KPI by removing the interlock argument and defining 2 new flags for it: MBF_NOWAIT which basically replaces the LK_NOWAIT of the old version (which was unlinked from the lockmgr alredy) and MBF_MNTLSTLOCK which provides the ability to drop the mountlist_mtx once the mnt interlock is held (ability still desired by most consumers). - The stub used into vfs_mount_destroy(), that allows to override the mnt_ref if running for more than 3 seconds, make it totally useless. Remove it as it was thought to work into older versions. If a problem of "refcount held never going away" should appear, we will need to fix properly instead than trust on such hackish solution. - Fix a bug where returning (with an error) from dounmount() was still leaving the MNTK_MWAIT flag on even if it the waiters were actually woken up. Just a place in vfs_mount_destroy() is left because it is going to recycle the structure in any case, so it doesn't matter. - Remove the markercnt refcount as it is useless. This patch modifies VFS ABI and breaks KPI for vfs_busy() so manpages and __FreeBSD_version will be modified accordingly. Discussed with: kib Tested by: pho	2008-11-02 10:15:42 +00:00
Edward Tomasz Napierala	15bc6b2bd8	Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary to add more V* constants, and the variables changed by this patch were often being assigned to mode_t variables, which is 16 bit. Approved by: rwatson (mentor)	2008-10-28 13:44:11 +00:00
Konstantin Belousov	7cd5a03a8e	Style return statements in vn_pollrecord().	2008-10-28 12:22:33 +00:00
Konstantin Belousov	ae53539e21	Protect check for v_pollinfo == NULL and assignment of the newly allocated vpollinfo with vnode interlock. Fully initialize vpollinfo before putting pointer to it into vp->v_pollinfo. Discussed with: dwhite Tested by: pho MFC after: 1 week	2008-10-28 12:08:36 +00:00
Konstantin Belousov	3cfc308922	In vfs_busy(), lockmgr() cannot legitimately sleep, because code checked MNTK_UNMOUNT before, and mnt_mtx is used as interlock. vfs_busy() always tries to obtain a shared lock on mnt_lock, the other user is unmount who tries to drain it, setting MNTK_UNMOUNT before. Reviewed by: tegge, attilio Tested by: pho MFC after: 2 weeks	2008-10-20 10:07:28 +00:00
Attilio Rao	0d7935fd01	Remove the struct thread unuseful argument from bufobj interface. In particular following functions KPI results modified: - bufobj_invalbuf() - bufsync() and BO_SYNC() "virtual method" of the buffer objects set. Main consumers of bufobj functions are affected by this change too and, in particular, functions which changed their KPI are: - vinvalbuf() - g_vfs_close() Due to the KPI breakage, __FreeBSD_version will be bumped in a later commit. As a side note, please consider just temporary the 'curthread' argument passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP Reviewed by: kib Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2008-10-10 21:23:50 +00:00
Attilio Rao	59d4932531	Decontextualize vfs_busy(), vfs_unbusy() and vfs_mount_alloc() functions. Manpages are updated accordingly. Tested by: Diego Sardina <siarodx at gmail dot com>	2008-08-31 14:26:08 +00:00
Attilio Rao	0359a12ead	Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread was always curthread and totally unuseful. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2008-08-28 15:23:18 +00:00
Konstantin Belousov	a888d54d39	Introduce the VV_FORCEINSMQ vnode flag. It instructs the insmnque() function to ignore the unmounting and forces insertion of the vnode into the mount vnode list. Change insmntque() to fail when forced unmount is in progress and VV_FORCEINSMQ is not specified. Add an assertion to the insmntque(), requiring the vnode to be exclusively locked for mp-safe filesystems. Use the VV_FORCEINSMQ for the creation of the syncvnode. Tested by: pho Reviewed by: tegge MFC after: 1 month	2008-08-28 09:08:15 +00:00
Christian S.J. Peron	e451733718	Remove worrying printf warning on bootup when processing vnodes which have NULL mount-points. This is the case for special vnodes, such as the one used in nameiinit() which is used for crossing mount points in lookup() to avoid lock ordering issues. MFC after: 2 weeks Discussed with: rwatson, kib	2008-08-24 20:16:44 +00:00
Ed Schouten	e7ea30e404	Remove the use of lbolt from the VFS syncer. It seems we only use `lbolt' inside the VFS syncer and the TTY layer now. Because I'm planning to replace the TTY layer next month, there's no reason to keep `lbolt' if it's only used in a single thread inside the kernel. Because the syncer code wanted to wake up the syncer thread before the timeout, it called sleepq_remove(). Because we now just use a condvar(9) with a timeout value of `hz', we can wake it up using cv_broadcast() without waking up any unrelated threads. Reviewed by: phk	2008-07-30 12:39:18 +00:00
Pawel Jakub Dawidek	5573021d78	Assert for exclusive vnode lock in vinactive(), vrecycle() and vgonel() functions. Reviewed by: kib	2008-07-27 11:48:15 +00:00
Pawel Jakub Dawidek	610507ae00	- Move vp test for beeing NULL under IGNORE_LOCK(). - Check if panicstr isn't set, if it is ignore the lock. This helps to avoid confusion, because lockmgr is a no-op when panicstr isn't NULL, so asserting anything at this point doesn't make sense and can just race with other panic. Discussed with: kib	2008-07-27 11:46:42 +00:00
Attilio Rao	09400d5abe	- Disallow XFS mounting in write mode. The write support never worked really and there is no need to maintain it. - Fix vn_get() in order to let it call vget(9) with a valid locking request. vget(9) returns the vnode locked in order to prevent recycling, but in this case internal XFS locks alredy prevent it from happening, so it is safe to drop the vnode lock before to return by vn_get(). - Add a VNASSERT() in vget(9) in order to catch malformed locking requests. Discussed with: kan, kib Tested by: Lothar Braun <lothar at lobraun dot de>	2008-07-21 23:01:09 +00:00
Pawel Jakub Dawidek	988f0e193a	Be more friendly for DDB pager. Educated by: jhb's BSDCan presentation	2008-05-18 21:08:12 +00:00
Attilio Rao	60e2edce55	sync_vnode() has some messy code about locking in order to deal with mount fs needing Giant to be held when processing bufobjs. Use a different subqueue for pending workitems on filesystems requiring Giant. This simplifies the code notably and also reduces the number of Giant acquisitions (and the whole processing cost). Suggested by: jeff Reviewed by: kib Tested by: pho	2008-05-04 13:54:55 +00:00
Pawel Jakub Dawidek	3800322fe2	Implement 'show mount' command in DDB. Without argument, it prints short info about all currently mounted file systems. When an address is given as an argument, prints detailed info about the given mount point. MFC after: 2 weeks	2008-04-26 13:04:48 +00:00
Konstantin Belousov	12e79a9bbc	Allow the vnode zone to return the unused memory. The vnode reference count is/shall be properly maintained for the long time, and VFS shall be safe against the vnode memory reclamation. Proposed by: jeff Tested by: pho	2008-04-24 09:58:33 +00:00
Konstantin Belousov	eab626f110	Move the head of byte-level advisory lock list from the filesystem-specific vnode data to the struct vnode. Provide the default implementation for the vop_advlock and vop_advlockasync. Purge the locks on the vnode reclaim by using the lf_purgelocks(). The default implementation is augmented for the nfs and smbfs. In the nfs_advlock, push the Giant inside the nfs_dolock. Before the change, the vop_advlock and vop_advlockasync have taken the unlocked vnode and dereferenced the fs-private inode data, racing with with the vnode reclamation due to forced unmount. Now, the vop_getattr under the shared vnode lock is used to obtain the inode size, and later, in the lf_advlockasync, after locking the vnode interlock, the VI_DOOMED flag is checked to prevent an operation on the doomed vnode. The implementation of the lf_purgelocks() is submitted by dfr. Reported by: kris Tested by: kris, pho Discussed with: jeff, dfr MFC after: 2 weeks	2008-04-16 11:33:32 +00:00
Jeff Roberson	1fd9b6a577	- Destroy the bo mtx when the vnode is destroyed.	2008-04-02 10:40:03 +00:00
Attilio Rao	71072af500	b_waiters cannot be adequately protected by the interlock because it is dropped after the call to lockmgr() so just revert this approach using something similar to the precedent one: BUF_LOCKWAITERS() just checks if there are waiters (not the actual number of them) and it is based on newly introduced lockmgr_waiters() which returns if the lockmgr has waiters or not. The name has been choosen differently by old lockwaiters() in order to not confuse them. KPI results enriched by this commit so __FreeBSD_version bumping and manpage update will be happening soon. 'struct buf' also changes, so kernel ABI is disturbed. Bug found by: jeff Approved by: jeff, kib	2008-03-28 12:30:12 +00:00
Jeff Roberson	0ee6cecc9d	- Greatly simplify vget() by removing the guarantee that any new references to a vnode with VI_OWEINACT set will force the vinactive() call. The kernel makes no guarantees about which reference was the last to close a file or when the actual inactive processing will happen. The previous code was designed to preserve existing semantics in the face of shared locks, however, this was unnecessary. Discussed with: mckusick	2008-03-24 04:22:58 +00:00
Jeff Roberson	e6b2545b3b	- Only return 1 from sync_vnode() in cases where the vnode is still at the head of the sync list. This prevents sched_sync() from re-queueing a vnode which may have been freed already. Discussed with: kib	2008-03-23 01:44:28 +00:00
Jeff Roberson	f6a8cecfc6	- Pass BO_MTX(bo) to lockmgr in vtruncbuf, we don't own the vnode interlock here anymore. Reported by: kris	2008-03-23 01:42:19 +00:00
Jeff Roberson	698b1a6643	- Complete part of the unfinished bufobj work by consistently using BO_LOCK/UNLOCK/MTX when manipulating the bufobj. - Create a new lock in the bufobj to lock bufobj fields independently. This leaves the vnode interlock as an 'identity' lock while the bufobj is an io lock. The bufobj lock is ordered before the vnode interlock and also before the mnt ilock. - Exploit this new lock order to simplify softdep_check_suspend(). - A few sync related functions are marked with a new XXX to note that we may not properly interlock against a non-zero bv_cnt when attempting to sync all vnodes on a mountlist. I do not believe this race is important. If I'm wrong this will make these locations easier to find. Reviewed by: kib (earlier diff) Tested by: kris, pho (earlier diff)	2008-03-22 09:15:16 +00:00
Robert Watson	237fdd787b	In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink	2008-03-16 10:58:09 +00:00
Attilio Rao	7fbfba7bf8	- Handle buffer lock waiters count directly in the buffer cache instead than rely on the lockmgr support [1]: * bump the waiters only if the interlock is held * let brelvp() return the waiters count * rely on brelvp() instead than BUF_LOCKWAITERS() in order to check for the waiters number - Remove a namespace pollution introduced recently with lockmgr.h including lock.h by including lock.h directly in the consumers and making it mandatory for using lockmgr. - Modify flags accepted by lockinit(): * introduce LK_NOPROFILE which disables lock profiling for the specified lockmgr * introduce LK_QUIET which disables ktr tracing for the specified lockmgr [2] * disallow LK_SLEEPFAIL and LK_NOWAIT to be passed there so that it can only be used on a per-instance basis - Remove BUF_LOCKWAITERS() and lockwaiters() as they are no longer used This patch breaks KPI so __FreBSD_version will be bumped and manpages updated by further commits. Additively, 'struct buf' changes results in a disturbed ABI also. [2] Really, currently there is no ktr tracing in the lockmgr, but it will be added soon. [1] Submitted by: kib Tested by: pho, Andrea Barberio <insomniac at slackware dot it>	2008-03-01 19:47:50 +00:00
Attilio Rao	81c794f998	Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it is always curthread. As KPI gets broken by this patch, manpages and __FreeBSD_version will be updated by further commits. Tested by: Andrea Barberio <insomniac at slackware dot it>	2008-02-25 18:45:57 +00:00
Attilio Rao	2433c4883e	Conver all explicit instances to VOP_ISLOCKED(arg, NULL) into VOP_ISLOCKED(arg, curthread). Now, VOP_ISLOCKED() and lockstatus() should only acquire curthread as argument; this will lead in axing the additional argument from both functions, making the code cleaner. Reviewed by: jeff, kib	2008-02-08 21:45:47 +00:00
Attilio Rao	0e9eb108f0	Cleanup lockmgr interface and exported KPI: - Remove the "thread" argument from the lockmgr() function as it is always curthread now - Axe lockcount() function as it is no longer used - Axe LOCKMGR_ASSERT() as it is bogus really and no currently used. Hopefully this will be soonly replaced by something suitable for it. - Remove the prototype for dumplockinfo() as the function is no longer present Addictionally: - Introduce a KASSERT() in lockstatus() in order to let it accept only curthread or NULL as they should only be passed - Do a little bit of style(9) cleanup on lockmgr.h KPI results heavilly broken by this change, so manpages and FreeBSD_version will be modified accordingly by further commits. Tested by: matteo	2008-01-24 12:34:30 +00:00
Attilio Rao	d638e093d6	- Introduce the function lockmgr_recursed() which returns true if the lockmgr lkp, when held in exclusive mode, is recursed - Introduce the function BUF_RECURSED() which does the same for bufobj locks based on the top of lockmgr_recursed() - Introduce the function BUF_ISLOCKED() which works like the counterpart VOP_ISLOCKED(9), showing the state of lockmgr linked with the bufobj BUF_RECURSED() and BUF_ISLOCKED() entirely replace the usage of bogus BUF_REFCNT() in a more explicative and SMP-compliant way. This allows us to axe out BUF_REFCNT() and leaving the function lockcount() totally unused in our stock kernel. Further commits will axe lockcount() as well as part of lockmgr() cleanup. KPI results, obviously, broken so further commits will update manpages and freebsd version. Tested by: kris (on UFS and NFS)	2008-01-19 17:36:23 +00:00
Attilio Rao	22db15c06f	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>	2008-01-13 14:44:15 +00:00
Attilio Rao	cb05b60a89	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>	2008-01-10 01:10:58 +00:00
Robert Watson	c5f1beb02a	In "show lockedvnods" DDB command, use db_printf() rather than printf() so that the results end up in the DDB output stream rather than the console output stream. This should likely also be done for the vprint() function it calls. MFC after: 3 months	2007-12-28 00:47:31 +00:00
Attilio Rao	98e4f2e2bf	As LK_EXCLUPGRADE is used in conjuction with LK_NOWAIT, LK_UPGRADE becames equivalent with this and so operate the switch. That call is the only one remaining LK_EXCLUPGRADE consumer and removing it will prepare the ground for LK_EXCLUPGRADE axing and further lockmgr improvements. Discussed with: jeff, ups	2007-12-27 20:52:05 +00:00
Robert Watson	3de213cc00	Add a new 'why' argument to kdb_enter(), and a set of constants to use for that argument. This will allow DDB to detect the broad category of reason why the debugger has been entered, which it can use for the purposes of deciding which DDB script to run. Assign approximate why values to all current consumers of the kdb_enter() interface.	2007-12-25 17:52:02 +00:00
Konstantin Belousov	973bdaa06f	Use curthread instead of the FIRST_THREAD_IN_PROC for vnlru and syncer, when applicable. Aquire Giant slightly later for vnlru. In the syncer, aquire the Giant only when a vnode belongs to the non-MPsafe fs. In both speedup_syncer() and syncer_shutdown(), remove the syncer thread from the lbolt sleep queue after the syncer state is modified, not before. Herded by: attilio Tested by: Peter Holm Reviewed by: ups MFC after: 1 week	2007-12-05 09:34:04 +00:00
Robert Watson	30d239bc4c	Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer	2007-10-24 19:04:04 +00:00
Julian Elischer	3745c395ec	Rename the kthread_xxx (e.g. kthread_create()) calls to kproc_xxx as they actually make whole processes. Thos makes way for us to add REAL kthread_create() and friends that actually make theads. it turns out that most of these calls actually end up being moved back to the thread version when it's added. but we need to make this cosmetic change first. I'd LOVE to do this rename in 7.0 so that we can eventually MFC the new kthread_xxx() calls.	2007-10-20 23:23:23 +00:00
Konstantin Belousov	245b204491	When restoring the mount after umount failed, the MNTK_UNMOUNT flag prevents insmntque() from placing reallocated syncer vnode on mount list, that causes panic in vfs_allocate_syncvnode(). Introduce MNTK_NOINSMNTQ flag, that marks the period when instmntque is not allowed to success, instead of MNTK_UNMOUNT. The MNTK_NOINSMNTQ is set and cleared simultaneously with MNTK_UNMOUNT, except on umount error path, where it is cleaned just before the syncer vnode is going to be allocated. Reported by: Peter Jeremy <peterjeremy optushome com au> Suggested by: tegge Approved by: re (rwatson)	2007-09-12 16:31:32 +00:00
Pawel Jakub Dawidek	354eb80141	Improve vn_printf() by: - adding missing vnode flags, - printing unknown flags as numbers, - using strlcat() instead of strcat(). Approved by: re (bmah)	2007-08-13 21:23:30 +00:00
Robert Watson	32f9753cfb	Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in some cases, move to priv_check() if it was an operation on a thread and no other flags were present. Eliminate caller-side jail exception checking (also now-unused); jail privilege exception code now goes solely in kern_jail.c. We can't yet eliminate suser() due to some cases in the KAME code where a privilege check is performed and then used in many different deferred paths. Do, however, move those prototypes to priv.h. Reviewed by: csjp Obtained from: TrustedBSD Project	2007-06-12 00:12:01 +00:00
Attilio Rao	2feb50bf7d	Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)	2007-05-31 22:52:15 +00:00
Robert Watson	e1e8f51b85	Universally adopt most conventional spelling of acquire.	2007-05-27 20:50:23 +00:00
Konstantin Belousov	d413d21071	Since renaming of vop_lock to _vop_lock, pre- and post-condition function calls are no more generated for vop_lock. Rename _vop_lock to vop_lock1 to satisfy tools/vnode_if.awk assumption about vop naming conventions. This restores pre/post-condition calls.	2007-05-18 13:02:13 +00:00
Jeff Roberson	222d01951f	- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>	2007-05-18 07:10:50 +00:00
Pawel Jakub Dawidek	24b0502ee0	Fix jails and jail-friendly file systems handling: - We need to allow for PRIV_VFS_MOUNT_OWNER inside a jail. - Move security checks to vfs_suser() and deny unmounting and updating for jailed root from different jails, etc. OK'ed by: rwatson	2007-04-13 23:54:22 +00:00
Pawel Jakub Dawidek	6bc3ab2574	When we are running low on vnodes, there is currently no way to ask other subsystems to release some vnodes. Implement backpressure based on vfs_lowvnodes event (similar to vm_lowmem for memory).	2007-04-13 08:38:48 +00:00
Pawel Jakub Dawidek	08be819487	Minor style cleanups (mostly removal of trailing whitespaces).	2007-04-10 15:29:37 +00:00
Pawel Jakub Dawidek	21ff8c6715	Correct typos.	2007-04-10 15:22:40 +00:00
Pawel Jakub Dawidek	def72fbba1	Now that the vdropl() function is public, assert that the vnode interlock is held.	2007-04-01 10:45:32 +00:00
Dag-Erling Smørgrav	e6534b36d8	Make vdropl() public; zfs needs it. There is also plenty of existing file system code (mostly _reclaim()) which look like this: VOP_LOCK(vp); / examine vp / VOP_UNLOCK(vp); vdrop(vp); This can now be rewritten to: VOP_LOCK(vp); / examine vp / vdropl(vp); / will unlock vp */ MFC after: 1 week	2007-03-31 23:57:17 +00:00
Marcel Moolenaar	f3ea971bf0	PowerPC is the only architecture with mpsafe_vfs=0. This is now broken. Rudimentary tests show that PowerPC can run with mpsafe_vfs=1. Make it so...	2007-03-27 05:29:41 +00:00
Tor Egge	61b9d89ff0	Make insmntque() externally visibile and allow it to fail (e.g. during late stages of unmount). On failure, the vnode is recycled. Add insmntque1(), to allow for file system specific cleanup when recycling vnode on failure. Change getnewvnode() to no longer call insmntque(). Previously, embryonic vnodes were put onto the list of vnode belonging to a file system, which is unsafe for a file system marked MPSAFE. Change vfs_hash_insert() to no longer lock the vnode. The caller now has that responsibility. Change most file systems to lock the vnode and call insmntque() or insmntque1() after a new vnode has been sufficiently setup. Handle failed insmntque*() calls by propagating errors to callers, possibly after some file system specific cleanup. Approved by: re (kensmith) Reviewed by: kib In collaboration with: kib	2007-03-13 01:50:27 +00:00
Kip Macy	2f6a774be4	change vop_lock handling to allowing tracking of callers' file and line for acquisition of lockmgr locks Approved by: scottl (standing in for mentor rwatson)	2006-11-13 05:51:22 +00:00
John Baldwin	6b8de13ab4	Simplify operations with sync_mtx in sched_sync(): - Don't drop the lock just to reacquire it again to check rushjob, this only wastes time. - Use msleep() to drop the mutex while sleeping instead of explicitly unlocking around tsleep. Reviewed by: pjd	2006-11-07 19:45:05 +00:00
John Baldwin	8064e5d71f	Fix comment typo and function declaration.	2006-11-07 19:07:33 +00:00
Robert Watson	acd3428b7d	Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>	2006-11-06 13:42:10 +00:00
Pawel Jakub Dawidek	a2ca03b3ad	Typo, 'from' vnode is locked here, not 'to' vnode.	2006-11-04 23:57:02 +00:00
Pawel Jakub Dawidek	1a60c7fc8e	Add gjournal specific code to the UFS file system: - Add FS_GJOURNAL flag which enables gjournal support on a file system. - Add cg_unrefs field to the cylinder group structure which holds number of unreferenced (orphaned) inodes in the given cylinder group. - Add fs_unrefs field to the super block structure which holds total number of unreferenced (orphaned) inodes. - When file or a directory is orphaned (last reference is removed, but object is still open), increase fs_unrefs and cg_unrefs fields, which is a hint for fsck in which cylinder groups looks for such (orphaned) objects. - When file is last closed, decrease {fs,cg}_unrefs fields. - Add VV_DELETED vnode flag which points at orphaned objects. Sponsored by: home.pl	2006-10-31 21:48:54 +00:00
Robert Watson	aed5570872	Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA	2006-10-22 11:52:19 +00:00
Konstantin Belousov	45ea8737bf	Correct the comment: numvnodes is decreased on vdestroying the vnode. OKed by: tegge Approved by: pjd (mentor) MFC after: 1 week	2006-10-02 07:25:58 +00:00
Tor Egge	a1e363f256	Add mnt_noasync counter to better handle interleaved calls to nmount(), sync() and sync_fsync() without losing MNT_ASYNC. Add MNTK_ASYNC flag which is set only when MNT_ASYNC is set and mnt_noasync is zero, and check that flag instead of MNT_ASYNC before initiating async io.	2006-09-26 04:15:59 +00:00
Tor Egge	5da56ddb21	Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag. This eliminates a race where MNT_UPDATE flag could be lost when nmount() raced against sync(), sync_fsync() or quotactl().	2006-09-26 04:12:49 +00:00
Pawel Jakub Dawidek	c37789fe7e	Add 'show vnode <addr>' DDB command.	2006-09-04 22:15:44 +00:00
Pawel Jakub Dawidek	04d9e255df	getnewvnode() can be called with NULL mp. Found by: Coverity Prevent (tm) Coverity ID: 1521 Confirmed by: phk	2006-08-10 08:56:03 +00:00
Pawel Jakub Dawidek	13c85d339d	Add a bandaid to avoid a deadlock in a situation, when we are trying to suspend a file system, but need to obtain a vnode. We may not be able to do it, because all vnodes could be already in use and other processes cannot release them, because they are waiting in "suspfs" state. In such situation, we allow to allocate a vnode anyway. This is a temporary fix - there is no backpressure to free vnodes allocated in those circumstances. MFC after: 1 week Reviewed by: tegge	2006-08-09 12:47:30 +00:00
Robert Watson	ccdebe46bd	Improve commenting of vaccess(), making sure to be clear that the ifdef capabilities code is there for reference and never actually used. Slight style tweak.	2006-08-06 10:43:35 +00:00
Alan Cox	27ea29536c	Enable debug.mpsafevfs by default on arm. Since every architecture except powerpc has debug.mpsafevfs enabled by default, it is shorter to enumerate the architectures on which debug.mpsafevfs is off. Tested by: cognet@	2006-07-15 06:44:27 +00:00
Konstantin Belousov	c8d3bc1fa3	Back out my rev. 1.674. The better fix (rev. 1.637) is already in tree. Approved by: kan (mentor)	2006-07-05 16:33:25 +00:00
Sergey Babkin	d81175c738	Backed out the change by request from rwatson. PR: kern/14584	2006-06-26 22:03:22 +00:00
Sergey Babkin	7a799f1ef0	The common UID/GID space implementation. It has been discussed on -arch in 1999, and there are changes to the sysctl names compared to PR, according to that discussion. The description is in sys/conf/NOTES. Lines in the GENERIC files are added in commented-out form. I'll attach the test script I've used to PR. PR: kern/14584 Submitted by: babkin	2006-06-25 18:37:44 +00:00
Konstantin Belousov	55aef2632f	Fix the LOR that occurs when the MAC compiled into the kernel and vnode is destroyed. Reviewed by: rwatson LOR: 189 MFC after: 2 weeks Approved by: kan (mentor)	2006-06-08 07:55:10 +00:00
Stephan Uphoff	dcf67e65d2	Do not set B_NOCACHE on buffers when releasing them in flushbuflist(). If B_NOCACHE is set the pages of vm backed buffers will be invalidated. However clean buffers can be backed by dirty VM pages so invalidating them can lead to data loss. Add support for flush dirty page in the data invalidation function of some network file systems. This fixes data losses during vnode recycling (and other code paths using invalbuf(,V_SAVE,,*)) for data written using an mmaped file. Collaborative effort by: jhb@,mohans@,peter@,ps@,ups@ Reviewed by: tegge@ MFC after: 7 days	2006-05-25 01:00:35 +00:00
John Baldwin	73dbd3da73	Remove various bits of conditional Alpha code and fixup a few comments.	2006-05-12 05:04:46 +00:00
Pawel Jakub Dawidek	643df192de	vn_start_write()/vn_finished_write() is not needed here, because vn_start_write() is always called earlier in the code path and calling the function recursively may lead to a deadlock. Confirmed by: tegge MFC after: 2 weeks	2006-04-29 21:57:38 +00:00
Jeff Roberson	6ca9fcc586	- Add a BO_NEEDSGIANT flag to the bufobj. This flag forces all child buffers to go on the buf daemon's DIRTYGIANT queue. - Set BO_NEEDSGIANT on ffs's devvp since the ffs_copyonwrite handler runs in the context of the buf daemon and may require Giant.	2006-04-28 01:05:31 +00:00
Jeff Roberson	b53bf1269c	- VFS_LOCK_GIANT when recycling a vnode via getnewvnode. We may be recycling for an unrelated filesystem. I really don't like potentially acquiring giant in the context of a giantless filesystem but there are reasonable objections to removing the recycling from this path. Sponsored by: Isilon Systems, Inc.	2006-04-04 06:46:10 +00:00
Jeff Roberson	0af2472199	- Add an assert to vgone. It is illegal to call vgone without a reference to the vnode. Without a reference the vnode will never be vdestroy'd and the memory will never be reclaimed. Sponsored by: Isilon Systems, Inc.	2006-03-31 23:39:26 +00:00
Jeff Roberson	94bc95db3c	- Hold a reference from the time vfs_busy starts until vfs_unbusy is called. - vfs_getvfs has to return a reference to prevent the returned mountpoint from changing identities. - Release references acquired via vfs_getvfs. Discussed with: tegge Tested by: kris Sponsored by: Isilon Systems, Inc.	2006-03-31 03:53:25 +00:00
Jeff Roberson	084d64ac21	- Add the B_NEEDSGIANT flag which is only set if the vnode that owns a buf requires Giant. It is set in bgetvp and cleared in brelvp. - Create QUEUE_DIRTY_GIANT for dirty buffers that require giant. - In the buf daemon, only grab giant when processing QUEUE_DIRTY_GIANT and only if we think there are buffers in that queue. Sponsored by: Isilon Systems, Inc.	2006-03-31 02:56:30 +00:00
Jeff Roberson	e44270a781	- Correct an assert in vop_rename_pre. fdvp may be locked if it is either the target directory or file. This case should fail in the filesystem anyway and perhaps kern_rename() should catch it. Sponsored by: Isilon Systems, Inc.	2006-03-19 20:14:46 +00:00
Tor Egge	791dd2fade	Use vn_start_secondary_write() and vn_finished_secondary_write() as a replacement for vn_write_suspend_wait() to better account for secondary write processing. Close race where secondary writes could be started after ffs_sync() returned but before the file system was marked as suspended. Detect if secondary writes or softdep processing occurred during vnode sync loop in ffs_sync() and retry the loop if needed.	2006-03-08 23:43:39 +00:00
Tor Egge	3b582b4e72	Eliminate a deadlock when creating snapshots. Blocking vn_start_write() must be called without any vnode locks held. Remove calls to vn_start_write() and vn_finished_write() in vnode_pager_putpages() and add these calls before the vnode lock is obtained to most of the callers that don't already have them.	2006-03-02 22:13:28 +00:00
Tor Egge	b983aac762	Don't try to show marker nodes.	2006-03-02 21:31:15 +00:00
Jeff Roberson	eb2ea10590	- Move softdep from using a global worklist to per-mount worklists. This has many positive effects including improved smp locking, reducing interdependencies between mounts that can lead to deadlocks, etc. - Add the softdep worklist and various counters to the ufsmnt structure. - Add a mount pointer to the workitem and remove mount pointers from the various structures derived from the workitem as they are now redundant. - Remove the poor-man's semaphore protecting softdep_process_worklist and softdep_flushworklist. Several threads may now process the list simultaneously. - Add softdep_waitidle() to block the thread until all pending dependencies being operated on by other threads have been flushed. - Use softdep_waitidle() in unmount and snapshots to block either operation until the fs is stable. - Remove softdep worklist processing from the syncer and move it into the softdep_flush() thread. This thread processes all softdep mounts once each second and when it is called via the new softdep_speedup() when there is a resource shortage. This removes the softdep hook from the kernel and various hacks in header files to support it. Reviewed by/Discussed with: tegge, truckman, mckusick Tested by: kris	2006-03-02 05:50:23 +00:00
Jeff Roberson	a1db11fc40	- Release the mount ref once the vnode has been recycled rather than once the last reference is dropped. I forgot that vnodes can stick around for a very long time until processes discover that they are dead. This means that a vnode reference is not sufficient to keep the mount referenced and even more code will be required to ref mount points. Discovered by: kris	2006-02-23 05:15:37 +00:00
Jeff Roberson	8a7cd2fdfb	- Grab a mnt ref in vfs_busy() before dropping the interlock. This will prevent the mount point from going away while we're waiting on the lock. The ref does not need to persist once we have the lock because the lock prevents the mount point from being unmounted. MFC After: 1 week	2006-02-22 06:20:12 +00:00
Jeff Roberson	04f6d3effa	- Add a ref count to the mount structure. Sleep for up to 3 seconds in vfs_mount_destroy waiting for this ref to hit 0. We don't print an error if we are rebooting as the root mount always retains some refernces by init proc. - Acquire a mnt ref for every vnode allocated to a mount point. Drop this ref only once vdestroy() has been called and the mount has been freed. - No longer NULL the v_mount pointer in delmntque() so that we may release the ref after vgone() has been called. This allows us to guarantee that the mount point structure will be valid until the last vnode has lost its last ref. - Fix a few places that rely on checking v_mount to detect recycling. Sponsored by: Isilon Systems, Inc. MFC After: 1 week	2006-02-06 10:19:50 +00:00
Jeff Roberson	b099db5881	- Solve a race where we could lose a call to VOP_INACTIVE. If vget() waiting on a lock held the last usecount ref on a vnode and the lock failed we would not call INACTIVE. Solve this by only holding a holdcnt to prevent the vnode from disappearing while we wait on vn_lock. Other callers may now VOP_INACTIVE while we are waiting on the lock, however this race is acceptable, while losing INACTIVE is not. Discussed with: kan, pjd Tested by: kkenn Sponsored by: Isilon Systems, Inc. MFC After: 1 week	2006-02-01 00:30:05 +00:00
Kris Kennaway	d5e5528afe	Back out r1.653; it turns out that the race (or at least the printf) is actually not hard to trigger, and it can cause a lot of console spam. Approved by: kan	2006-01-28 03:06:35 +00:00
Robert Watson	6be2c41a22	Convert remaining functions in vfs_subr.c from K&R prototypes to ANSI C prototypes, as the majority of new functions added have been in this style. Changing prototype style now results in gcc noticing that the implementation of vn_pollrecord() has a 'short' argument instead of 'int' as prototyped in vnode.h, so correct that definition. In practice this didn't matter as only poll flags in the lower 16 bits are used. MFC after: 1 week	2006-01-21 19:42:10 +00:00
Tor Egge	82be0a5a24	Add marker vnodes to ensure that all vnodes associated with the mount point are iterated over when using MNT_VNODE_FOREACH. Reviewed by: truckman	2006-01-09 20:42:19 +00:00
Pawel Jakub Dawidek	e7736557d6	Print a warning when we miss vinactive() call, because of race in vget(). The race is very real, but conditions needed for triggering it are rather hard to meet now. When gjournal will be committed (where it is quite easy to trigger) we need to fix it. For now, verify if it is really hard to trigger. Discussed with: kan	2005-12-29 22:52:09 +00:00
Doug White	16e35dcc39	This is a workaround for a complicated issue involving VFS cookies and devfs. The PR and patch have the details. The ultimate fix requires architectural changes and clarifications to the VFS API, but this will prevent the system from panicking when someone does "ls /dev" while running in a shell under the linuxulator. This issue affects HEAD and RELENG_6 only. PR: 88249 Submitted by: "Devon H. O'Dell" <dodell@ixsystems.com> MFC after: 3 days	2005-11-09 22:03:50 +00:00
Robert Watson	5bb84bc84b	Normalize a significant number of kernel malloc type names: - Prefer '_' to ' ', as it results in more easily parsed results in memory monitoring tools such as vmstat. - Remove punctuation that is incompatible with using memory type names as file names, such as '/' characters. - Disambiguate some collisions by adding subsystem prefixes to some memory types. - Generally prefer lower case to upper case. - If the same type is defined in multiple architecture directories, attempt to use the same name in additional cases. Not all instances were caught in this change, so more work is required to finish this conversion. Similar changes are required for UMA zone names.	2005-10-31 15:41:29 +00:00
Kris Kennaway	14cdc36456	mpsafevm has been stable and defaulted to 1 on sparc64 for over 6 months, so we are ready for mpsafevfs=1 by default on sparc64 too. I have been running this on all my sparc64 machines for over 6 months, and have not encountered MD problems. MFC after: 1 week	2005-10-14 23:56:13 +00:00
Diomidis Spinellis	9f5c1d1955	Move execve's access time update functionality into a new vfs_mark_atime() function, and use the new function for performing efficient atime updates in mmap(). Reviewed by: bde MFC after: 2 weeks	2005-10-12 06:56:00 +00:00
Don Lewis	6c8b634f1d	Un-staticize runningbufwakeup() and staticize updateproc. Add a new private thread flag to indicate that the thread should not sleep if runningbufspace is too large. Set this flag on the bufdaemon and syncer threads so that they skip the waitrunningbufspace() call in bufwrite() rather than than checking the proc pointer vs. the known proc pointers for these two threads. A way of preventing these threads from being starved for I/O but still placing limits on their outstanding I/O would be desirable. Set this flag in ffs_copyonwrite() to prevent bufwrite() calls from blocking on the runningbufspace check while holding snaplk. This prevents snaplk from being held for an arbitrarily long period of time if runningbufspace is high and greatly reduces the contention for snaplk. The disadvantage is that ffs_copyonwrite() can start a large amount of I/O if there are a large number of snapshots, which could cause a deadlock in other parts of the code. Call runningbufwakeup() in ffs_copyonwrite() to decrement runningbufspace before attempting to grab snaplk so that I/O requests waiting on snaplk are not counted in runningbufspace as being in-progress. Increment runningbufspace again before actually launching the original I/O request. Prior to the above two changes, the system could deadlock if enough I/O requests were blocked by snaplk to prevent runningbufspace from falling below lorunningspace and one of the bawrite() calls in ffs_copyonwrite() blocked in waitrunningbufspace() while holding snaplk. See <http://www.holm.cc/stress/log/cons143.html>	2005-09-30 01:30:01 +00:00
Tor Egge	61ac14dab6	Break out of loop if next buffer pointer has become invalid while flushing current buffer. Reviewed by: kan	2005-09-16 18:28:12 +00:00
Robert Watson	fd1a469ba5	In vfs_kqfilter(), return EINVAL instead of 1 (EPERM) when an unsupported kqueue filter type is requested on a vnode. MFC after: 3 days	2005-09-12 19:22:37 +00:00
Jung-uk Kim	9ed448b20c	use monotonic `time_uptime' instead of` time_second' Approved by: anholt (mentor) Discussed on: arch	2005-09-12 15:31:28 +00:00
Poul-Henning Kamp	2883ba6668	Introduce vfs_read_dirent() which can help VOP_READDIR() implementations by handling all the cookie stuff.	2005-09-12 08:46:07 +00:00
Suleiman Souhlal	a6c109d658	Fix a typo in vop_rename_pre() where we ended up using vholdl() instead of vhold(), even though the vnode interlock is unlocked. MFC after: 3 days	2005-08-28 23:00:11 +00:00
Don Lewis	ad9f180121	Back out the removal of LK_NOWAIT from the VOP_LOCK() call in vlrureclaim() in vfs_subr.c 1.636 because waiting for the vnode lock aggravates an existing race condition. It is also undesirable according to the commit log for 1.631. Fix the tiny race condition that remains by rechecking the vnode state after grabbing the vnode lock and grabbing the vnode interlock. Fix the problem of other threads being starved (which 1.636 attempted to fix by removing LK_NOWAIT) by calling uio_yield() periodically in vlrureclaim(). This should be more deterministic than hoping that VOP_LOCK() without LK_NOWAIT will block, which may not happen in this loop. Reviewed by: kan MFC after: 5 days	2005-08-23 03:44:06 +00:00
Robert Watson	6cd8dee3c5	Silence "busy" warnings when unmounting devfs at system shutdown. This is a workaround for non-symetric teardown of the file systems at shutdown with respect to the mount order at boot. The proper long term fix is to properly detach devfs from the root mount before unmounting each, and should be implemented, but since the problem is non-harmful, this temporary band-aid will prevent false positive bug reports and unnecessary error output for 6.0-RELEASE. MFC after: 3 days Tested by: pav, pjd	2005-08-20 17:12:47 +00:00
Marcel Moolenaar	fd65baf8e2	Make mpsafe_vfs=1 the default on ia64.	2005-08-13 20:07:50 +00:00
Alexander Kabaev	45a0d1ed7a	Do not drop the vnode interlock if vdropl is called on already doomed vnode. vdropl callers expect it to return with interlock still being held. MFC after: 2 days	2005-08-10 11:46:03 +00:00
Suleiman Souhlal	34cc826ae8	Holding a vnode doesn't prevent v_mount from disappearing (when the vnode is inactivated), possibly leading to a NULL dereference when checking if the mount wants knotes to be activated in the VOP hooks. So, we add a new vnode flag VV_NOKNOTE that is only set in getnewvnode(), if necessary, and check it when activating knotes. Since the flags are not erased when a vnode is being held, we can safely read them. Reviewed by: kris@ MFC after: 3 days	2005-08-06 01:42:04 +00:00
Jeff Roberson	40a495853a	- Unlock before we call mac_destroy_vnode to prevent a lock order reversal. Found by: trhodes	2005-08-03 05:36:50 +00:00
Jeff Roberson	39b2406838	- Allow vnlru to drop giant if the filesystem does not require it. The vnlru proc is extremely inefficient, potentially iteration over tens of thousands of vnodes without blocking. Droping Giant allows other threads to preempt us although we should revisit the algorithm to fix the runtime problems especially since this may hold up all vnode allocations. - Remove the LK_NOWAIT from the VOP_LOCK in vlrureclaim. This provides a natural blocking point to help alleviate the situation described above although it may not technically be desirable. - yield after we make a pass on all mount points to prevent us from blocking other threads which require Giant. MFC after: 2 weeks	2005-07-20 01:43:27 +00:00
Pawel Jakub Dawidek	c23c87bd93	Fix one "wrong b_bufobj" panic in reassignbuf() by moving VI_UNLOCK(vp) below KASSERT()s, which means there was no real problem here, we just needed better locking for assertions. OK'ed by: jeff Approved by: re (scottl)	2005-07-05 15:57:55 +00:00
Suleiman Souhlal	571dcd15e2	Fix the recent panics/LORs/hangs created by my kqueue commit by: - Introducing the possibility of using locks different than mutexes for the knlist locking. In order to do this, we add three arguments to knlist_init() to specify the functions to use to lock, unlock and check if the lock is owned. If these arguments are NULL, we assume mtx_lock, mtx_unlock and mtx_owned, respectively. - Using the vnode lock for the knlist locking, when doing kqueue operations on a vnode. This way, we don't have to lock the vnode while holding a mutex, in filt_vfsread. Reviewed by: jmg Approved by: re (scottl), scottl (mentor override) Pointyhat to: ssouhlal Will be happy: everyone	2005-07-01 16:28:32 +00:00
Jeff Roberson	b770ff6eb2	- Try to catch the wrong bufobj panics a little earlier. I believe they are actually caused by a buf with both VNCLEAN and VNDIRTY set. In the traces it is clear that the buf is removed from the dirty queue while it is actually on the clean queue which leaves the tail pointer set. Assert that both flags are not set in buf_vlist_add and buf_vlist_remove. Sponsored by: Isilon Systems, Inc. Approved by: re (blanket vfs)	2005-06-18 18:17:03 +00:00
Jeff Roberson	114a1006a8	- Change holdcnt use around vnode recycling. We now always keep a holdcnt ref while we're calling vgone(). This prevents transient refs from re-adding us to the free list. Previously, a vfree() triggered via vinvalbuf() getting rid of all of a vnode's pages could place a partially destructed vnode on the free list where vtryrecycle() could find it. The first call to vtryrecycle would hang up on the vnode lock, but when it failed it would place a now dead vnode onto the free list, and another call to vtryrecycle() would free an already free vnode. There were many complications of having a zero ref count while freeing which can now go away. - Change vdropl() to release the interlock before returning. All callers now respect this, so vdropl() directly frees VI_DOOMED vnodes once the last ref is dropped. This means that we'll never have VI_DOOMED vnodes on the free list. - Seperate v_incr_usecount() into v_incr_usecount(), v_decr_usecount() and v_decr_useonly(). The incr/decr split is so that incr usecount can return with the interlock still held while decr drops the interlock so it can call vdropl() which will potentially free the vnode. The calling function can't drop the lock of an already free'd node. v_decr_useonly() drops a usecount without droping the hold count. This is done so the usecount reaches zero in vput() before we recycle, however the holdcount is still 1 which prevents any new references from placing the vnode back on the free list. - Fix vnlrureclaim() to vhold the vnode since it doesn't do a vget(). We wouldn't want vnlrureclaim() to bump the usecount since this has different semantics. Also change vnlrureclaim() to do a NOWAIT on the vn_lock. When this function runs we're usually in a desperate situation and we wouldn't want to wait for any specific vnode to be released. - Fix a bunch of misc comments to reflect the new behavior. - Add vhold() and vdrop() to vflush() for the same reasons that we do in vlrureclaim(). Previously we held no reference and a vnode could have been freed while we were waiting on the lock. - Get rid of vlruvp() and vfreehead(). Neither are used. vlruvp() should really be rethought before it's reintroduced. - vgonel() always returns with the vnode locked now and never puts the vnode back on a free list. The vnode will be freed as soon as the last reference is released. Sponsored by: Isilon Systems, Inc. Debugging help from: Kris Kennaway, Peter Holm Approved by: re (blanket vfs)	2005-06-16 04:41:42 +00:00
Jeff Roberson	12c2dcde40	- In reassignbuf() add many asserts to validate the head and tail pointers of the clean and dirty lists. This is in an attempt to catch the wrong bufobj problem sooner. - In vgonel() don't acquire an extra reference in the active case, the vnode lock and VI_DOOMED protect us from recursively cleaning. - Also in vgonel() clean up some stale comments. Sponsored by: Isilon Systems, Inc. Approved by: re (blanket vfs)	2005-06-14 20:31:53 +00:00
Jeff Roberson	b930d85380	- Don't make vgonel() globally visible, we want to change its prototype anyway and it's not used outside of vfs_subr.c. - Change vgonel() to accept a parameter which determines whether or not we'll put the vnode on the free list when we're done. - Use the new vgonel() parameter rather than VI_DOOMED to signal our intentions in vtryrecycle(). - In vgonel() return if VI_DOOMED is already set, this vnode has already been reclaimed. Sponsored by: Isilon Systems, Inc.	2005-06-13 06:26:55 +00:00
Jeff Roberson	d2ad9baac0	- Add KTR_VFS events to vdestroy, vtruncbuf, vinvalbuf, vfreehead. Sponsored by: Isilon Systems, Inc.	2005-06-13 00:46:37 +00:00
Jeff Roberson	d6dbf760a6	- Assert that we're not in the name cache anymore in vdestroy(). Sponsored by: Isilon Systems, Inc.	2005-06-11 08:48:09 +00:00
Jeff Roberson	9aa0eba464	- Add KTR_VFS tracing to track the life of vnodes. Eventually KTR_VFS events could be added to cover other interesting details. - Add some VNASSERTs to discover places where we access vnodes after they have been uma_zfree'd before we try to free them again. - Add a few more VNASSERTs to vdestroy() to be certain that the vnode is really unused. Sponsored by: Isilon Systems, Inc.	2005-06-11 01:16:46 +00:00
Suleiman Souhlal	679985d03a	Allow EVFILT_VNODE events to work on every filesystem type, not just UFS by: - Making the pre and post hooks for the VOP functions work even when DEBUG_VFS_LOCKS is not defined. - Moving the KNOTE activations into the corresponding VOP hooks. - Creating a MNTK_NOKNOTE flag for the mnt_kern_flag field of struct mount that permits filesystems to disable the new behavior. - Creating a default VOP_KQFILTER function: vfs_kqfilter() My benchmarks have not revealed any performance degradation. Reviewed by: jeff, bde Approved by: rwatson, jmg (kqueue changes), grehan (mentor)	2005-06-09 20:20:31 +00:00
Jeff Roberson	fae89dce3e	- Clear OWEINACT prior to calling VOP_INACTIVE to remove the possibility of a vget causing another call to INACTIVE before we're finished.	2005-06-07 22:05:32 +00:00
Colin Percival	fd94099ec2	If we are going to 1. Copy a NULL-terminated string into a fixed-length buffer, and 2. copyout that buffer to userland, we really ought to 0. Zero the entire buffer first. Security: FreeBSD-SA-05:08.kmem	2005-05-06 02:50:00 +00:00
Jeff Roberson	059f090fa1	- A vnode may have made its way onto the free list while it was being vgone'd. We must remove it from the freelist before returning in vtryrecycle() or we may get a duplicate free. Reported by: kkenn	2005-05-03 10:56:00 +00:00
Christian S.J. Peron	02fe1744f1	Since it is not possible for curthread to be NULL in this context, drop the check+initialization for a straight initialization. Also assert that curthread will never be NULL just to be sure. Discussed with: rwatson, peter MFC after: 1 week	2005-05-02 02:07:55 +00:00
Jeff Roberson	b2e2166483	- All buffers should either be clean or dirty. If neither of these flags are set when we attempt to remove a buffer from a queue we should panic. Hopefully this will catch the source of the wrong bufobj panics. Sponsored by: Isilon Systems, Inc.	2005-05-01 12:00:36 +00:00
Jeff Roberson	b2183bfe05	- In vnlru_free() remove the vnode from the free list before we call vtryrecycle(). We could sometimes get into situations where two threads could try to recycle the same vnode before this. - vtryrecycle() is now responsible for returning the vnode to the free list if it fails and someone else hasn't done it. - Make a new function vfreehead() which moves a vnode to the head of the free list and use it in vgone() to clean up that code a bit. Sponsored by: Isilon Systems, Inc. Reported by: pho, kkenn	2005-04-30 11:22:40 +00:00
Jeff Roberson	0dd02d67eb	- Don't vgonel() via vgone() or vrecycle() if the vnode is already doomed. This fixes forced unmounts via nullfs. Reported by: kkenn Sponsored by: Isilon Systems, Inc.	2005-04-27 10:03:21 +00:00
Jeff Roberson	6c317bc4cf	- Stop setting vxthread, we've asserted that it was useless for several weeks now.	2005-04-27 09:17:33 +00:00
Jeff Roberson	7d60dc524b	- Disable code which allows getnewvnode() to fail. Many ffs_vget() callers do not correctly deal with failures. This presently risks deadlock problems if dependency processing is held up by failures to allocate a vnode, however, this is better than the situation with the failures. Sponsored by: Isilon Systems, Inc.	2005-04-22 00:57:05 +00:00
Poul-Henning Kamp	bdb3564638	Initialize mountlist_mtx with an MTX_SYSINIT(), we need it to be ready earlier.	2005-04-18 21:11:47 +00:00
Jeff Roberson	374df05fd3	- Change vop_lookup_post assertions to reflect recent vfs_lookup changes. Sponsored by: Isilon Systems, Inc.	2005-04-13 10:57:53 +00:00
Jeff Roberson	539de9eda0	- Enable ASSERT_VOP_ELOCKED and assert_vop_elocked() now that vnode_if.awk uses it. Sponsored by: Isilon Systems, Inc.	2005-04-11 15:17:06 +00:00
Jeff Roberson	070898b1b3	- Change the VOP_LOCK UPGRADE in vput() to do a LK_NOWAIT to avoid a potential lock order reversal. Also, don't unlock the vnode if this fails, lockmgr has already unlocked it for us. - Restructure vget() now that vn_lock() does all of VI_DOOMED checking for us and also handles the case where there is no real lock type. - If VI_OWEINACT is set, we need to upgrade the lock request to EXCLUSIVE so that we can call inactive. It's not legal to vget a vnode that hasn't had INACTIVE called yet. Sponsored by: Isilon Systems, Inc.	2005-04-11 09:28:32 +00:00
Jeff Roberson	d78e0ee9fd	- Assert that the bufobj matches in flushbuflists. I still haven't gotten to root cause on exactly how this happens. - If the assert is disabled, we presently try to handle this case, but the BUF_UNLOCK was missing. Thus, if this condition ever hit we would leak a buf lock. Many thanks to Peter Holm for all his help in finding this bug. He really put more effort into it than I did.	2005-04-06 06:49:46 +00:00
Jeff Roberson	2bbd6c9818	- Move NDFREE() from vfs_subr to vfs_lookup where namei() is.	2005-04-05 08:58:49 +00:00
Jeff Roberson	d1cc6041e6	- Add a missing unlock of the vnode_free_list_mtx. Spotted by: Antoine Brodin	2005-04-04 12:07:16 +00:00
Jeff Roberson	92b8231d4f	- Instead of waiting forever to get a vnode in getnewvnode() wait for one to become available for one second and then return ENFILE. We can run out of vnodes, and there must be a hard limit because without one we can quickly run out of KVA on x86. Presently the system can deadlock if there are maxvnodes directories in the namecache. The original 4.x BSD behavior was to return ENFILE if we reached the max, but 4.x BSD did not have the vnlru proc so it was less profitable to wait.	2005-04-04 11:43:44 +00:00
Jeff Roberson	e451d879a1	- Disable vfs shared locks by default. They must be specifically enabled on filesystems which safely support them. It appears that many network filesystems specifically are not shared lock safe. Sponsored by: Isilon Systems, Inc.	2005-03-31 05:22:45 +00:00
Jeff Roberson	f247a5240d	- LK_NOPAUSE is a nop now. Sponsored by: Isilon Systems, Inc.	2005-03-31 04:37:09 +00:00
David Schultz	7ce7f713ee	Eliminate v_id and v_ddid. The name cache now holds references to vnodes whose names it caches, so we no longer need a `generation number' to tell us if a referenced vnode is invalid. Replace the use of the parent's v_id in the hash function with the address of the parent vnode. Tested by: Peter Holm Glanced at by: jeff, phk	2005-03-30 03:01:36 +00:00
Jeff Roberson	0fbc3b7df0	- Dont clear OWEINACT in vbusy(), we still owe an inactive call if someone vhold()s us. - Avoid an extra mutex acquire and release in the common case of vgonel() by checking for OWEINACT at the start of the function. - Fix the case where we set OWEINACT in vput(). LK_EXCLUPGRADE drops our shared lock if it fails. Sponsored by: Isilon Systems, Inc.	2005-03-29 10:02:48 +00:00
Jeff Roberson	cb34b95ba4	- Don't initial v_dd here, let cache_purge() do it for us. Sponsored by: Isilon Systems, Inc.	2005-03-29 09:59:34 +00:00
Jeff Roberson	9dcc5da318	- Move code that should probably be an assert above the main body of vrele so that we can decrease the indentation of the real work and make things slightly more clear. Sponsored by: Isilon Systems, Inc.	2005-03-28 11:18:47 +00:00
Jeff Roberson	d36f0a4ff8	- Adjust asserts in vop_lookup_post() to match the new post PDIRUNLOCK vfs. Sponsored by: Isilon Systems, Inc.	2005-03-28 09:25:25 +00:00
Poul-Henning Kamp	3b73a3c079	Remove another ';' after if(). Also spotted by: bz	2005-03-27 07:53:13 +00:00
Poul-Henning Kamp	2d8dfb2836	Remove extra ; at end of if(). Found by: bz	2005-03-27 07:52:12 +00:00
Jeff Roberson	228ea9d212	- Don't recycle vnodes anymore. Free them once they are dead. getnewvnode now always allocates a new vnode. - Define a new function, vnlru_free, which frees vnodes from the free list. It takes as a parameter the number of vnodes to free, which is wantfreevnodes - freevnodes when called from vnlru_proc or 1 when called from getnewvnode(). For now, getnewvnode() still tries to reclaim a free vnode before creating a new one when we are near the limit. - Define a function, vdestroy, which handles the actual release of memory and teardown of locks, etc. This could become a uma_dtor() routine. - Get rid of minvnodes. Now wantfreevnodes is 1/4th the max vnodes. This keeps more unreferenced vnodes around so that files which have only been stat'd are less likely to be kicked out of the system before we have a chance to read them, etc. These vnodes may still be freed via the normal vnlru_proc() routines which may some day become a real lru.	2005-03-25 05:34:39 +00:00
Jeff Roberson	d830f82824	- Pass LK_EXCLUSIVE to VFS_ROOT() to satisfy the new flags argument. For now, all calls to VFS_ROOT() should still acquire exclusive locks. Sponsored by: Isilon Systems, Inc.	2005-03-24 07:31:38 +00:00
Jeff Roberson	c167961e27	- If vput() is called with a shared lock it must upgrade to an exclusive before it can call VOP_INACTIVE(). This must use the EXCLUPGRADE path because we may violate some lock order with another locked vnode if we drop and reacquire the lock. If EXCLUPGRADE fails, we mark the vnode with VI_OWEINACT. This case should be very rare. - Clear VI_OWEINACT in vinactive() and vbusy(). - If VI_OWEINACT is set in vgone() do the VOP_INACTIVE call here as well. Sponsored by: Isilon Systems, Inc.	2005-03-24 06:08:58 +00:00
Jeff Roberson	b172f6c5f9	- Now that there are no external users of vfree() make it static. - Move VSHOULDBUSY, VSHOULDFREE, and VTRYRECYCLE into vfs_subr.c so no one else attempts to grow a dependency on them. - Now that objects with pages hold the vnode we don't have to do unlocked checks for the page count in the vm object in VSHOULDFREE. These three macros could simply check for holdcnt state transitions to determine whether the vnode is on the free list already, but the extra safety the flag affords us is probably worth the minimal cost. - The leafonly sysctl and code have been dead for several years now, remove the sysctl and the code that employed it from vtryrecycle(). - vtryrecycle() also no longer has to check the object's page count as the object holds the vnode until it reaches 0. Sponsored by: Isilon Systems, Inc.	2005-03-15 14:38:16 +00:00
Jeff Roberson	c178628d6e	- Expose vholdl() so it may be used outside of vfs_subr.c	2005-03-15 13:43:10 +00:00
Jeff Roberson	8045557f2b	- Increment the holdcnt once for each usecount reference. This allows us to use only the holdcnt to determine whether a vnode may be recycled, simplifying the V* macros as well as vtryrecycle(), etc. Sponsored by: Isilon Systems, Inc.	2005-03-14 09:25:19 +00:00
Jeff Roberson	159b454819	- We do not have to check the object's ref_count in VSHOULDFREE or vtryrecycle(). All obj refs also ref the vnode. - Consistently use v_incr_usecount() to increment the usecount. This will be more important later. Sponsored by: Isilon Systems, Inc.	2005-03-14 08:30:31 +00:00
Jeff Roberson	8f13a540ed	- Slightly rearrange vrele() to move the common case in one indentation level. Sponsored by: Isilon Systems, Inc.	2005-03-14 07:16:55 +00:00
Jeff Roberson	6fc16a838c	- Rework vget() so we drop the usecount in two failure cases that were missed by my last commit. Sponsored by: Isilon Systems, Inc.	2005-03-14 07:11:19 +00:00
Jeff Roberson	6703c30bb5	- Remove vx_lock, vx_unlock, vx_wait, etc. - Add a vn_start_write/vn_finished_write around vlrureclaim so we don't do writing ops without suspending. This could suspend the vlruproc which should not be a problem under normal circumstances. - Manually implement VMIGHTFREE in vlrureclaim as this was the only instance where it was used. - Acquire a lock before calling vgone() as it now requires it. - Move the acquisition of the vnode interlock from vtryrecycle() to getnewvnode() so that if it fails we don't drop and reacquire the vnode_free_list_mtx. - Check for a usecount or holdcount at the end of vtryrecycle() in case someone grabbed a ref while we were recycling. Abort the recycle, and on the final ref drop this vnode will be placed on the head of the free list. - Move the redundant VOP_INACTIVE protection code into the local vinactive() routine to avoid code bloat. - Keep the vnode lock held across calls to vgone() in several places. - vgonel() no longer uses XLOCK, instead callers must hold an exclusive vnode lock. The VI_DOOMED flag is set to allow other threads to detect a vnode which is no longer valid. This flag is set until the last reference is gone, and there are no chances for a new ref. vgonel() holds this lock across the entire function, which greatly simplifies logic. _ Only vfree() in one place in vgone() not three. - Adjust vget() to check the VI_DOOMED flag prior to waiting on the lock in the LK_NOWAIT case. In other cases, check after we have slept and acquired an exlusive lock. This will simulate the old vx_wait() behavior. Sponsored by: Isilon Systems, Inc.	2005-03-13 11:54:28 +00:00
Jeff Roberson	d9a9c2c22c	- Enable SMP VFS by default on current. More users are needed to turn up any remaining bugs. Anyone inconvenienced by this can still disable it in the loader. Sponsored by: Isilon Systems, Inc.	2005-02-23 10:05:43 +00:00
Jeff Roberson	d8a7c99a1c	- Only the xlock holder should be calling VOP_LOCK on a vp once VI_XLOCK has been set. Assert that this is the case so that we catch filesystems who are using naked VOP_LOCKs in illegal cases. Sponsored by: Isilon Systems, Inc.	2005-02-23 00:11:14 +00:00
Jeff Roberson	4c11620bb9	- Add a check for xlock in vop_lock_assert. Presently the xlock is considered to be as good as an exclusive lock, although there is still a possibility of someone acquiring a VOP LOCK while xlock is held. Sponsored by: Isilon Systems, Inc.	2005-02-22 23:59:11 +00:00
Poul-Henning Kamp	767056c0e8	Zero the v_un container field to make sure everything is gone.	2005-02-22 18:56:18 +00:00
Poul-Henning Kamp	aa2f6ddc3f	Reap more benefits from DEVFS: List devfs_dirents rather than vnodes off their shared struct cdev, this saves a pointer field in the vnode at the expense of a field in the devfs_dirent. There are often 100 times more vnodes so this is bargain. In addition it makes it harder for people to try to do stypid things like "finding the vnode from cdev". Since DEVFS handles all VCHR nodes now, we can do the vnode related cleanup in devfs_reclaim() instead of in dev_rel() and vgonel(). Similarly, we can do the struct cdev related cleanup in dev_rel() instead of devfs_reclaim(). rename idestroy_dev() to destroy_devl() for consistency. Add LIST_ENTRY de_alias to struct devfs_dirent. Remove v_specnext from struct vnode. Change si_hlist to si_alist in struct cdev. String new devfs vnodes' devfs_dirent on si_alist when we create them and take them off in devfs_reclaim(). Fix devfs_revoke() accordingly. Also don't clear fields devfs_reclaim() will clear when called from vgone(); Let devfs_reclaim() call dev_rel() instead of vgonel(). Move the usecount tracking from dev_rel() to devfs_reclaim(), and let dev_rel() take a struct cdev argument instead of vnode. Destroy SI_CHEAPCLONE devices in dev_rel() (instead of devfs_reclaim()) when they are no longer used. (This should maybe happen in devfs_close() instead.)	2005-02-22 15:51:07 +00:00
Poul-Henning Kamp	7fc940b266	Remove vfinddev(), it is generally bogus when faced with jails and chroot and has no legitimate use(r)s in the tree.	2005-02-22 14:11:47 +00:00
Poul-Henning Kamp	dfd4be14bd	Try to unbreak the vnode locking around vop_reclaim() (based mostly on patch from kan@). Pull bufobj_invalbuf() out of vinvalbuf() and make g_vfs call it on close. This is not yet a generally safe function, but for this very specific use it is safe. This solves the problem with buffers not being flushed by unmount or after failed mount attempts.	2005-02-19 11:44:57 +00:00
Poul-Henning Kamp	900b7e2648	Make sure to drop the VI_LOCK in vgonel(); Spotted by: Taku YAMAMOTO <taku@tackymt.homeip.net>	2005-02-18 11:13:56 +00:00
Poul-Henning Kamp	4d8ac58b05	Introduce vx_wait{l}() and use it instead of home-rolled versions.	2005-02-17 10:49:51 +00:00
Poul-Henning Kamp	58aac12894	Convert KASSERTS to VNASSERTS	2005-02-17 10:28:58 +00:00
Poul-Henning Kamp	1ba212823f	Make various vnode related functions static	2005-02-10 12:28:58 +00:00
Poul-Henning Kamp	fe0198779c	Don't pass NULL to vprint()	2005-02-10 08:55:08 +00:00
Jeff Roberson	68f2274d97	- Add a new assert in the getnewvnode(). Assert that the usecount is still 0 to detect getnewvnode() races. - Add the vnode address to a few panics near by to help in debugging. Sponsored by: Isilon Systems, Inc.	2005-02-08 23:27:10 +00:00
Poul-Henning Kamp	b9489a449c	Access vmobject via the bufobj instead of the vnode	2005-02-07 10:04:06 +00:00
Poul-Henning Kamp	b348abd6cd	Don't call VOP_DESTROYVOBJECT(), trust that VOP_RECLAIM() did what was necessary.	2005-02-07 07:48:03 +00:00
Poul-Henning Kamp	d4eb29ba71	Remove unused argument to vrecycle()	2005-01-28 13:08:21 +00:00
Poul-Henning Kamp	1fdfaafb08	Integrate vclean() into vgonel(). Various associated polishing.	2005-01-28 13:00:03 +00:00
Poul-Henning Kamp	3fc8dd0653	Remove register keyword	2005-01-28 12:39:10 +00:00
Poul-Henning Kamp	8516dd18e1	Don't use VOP_GETVOBJECT, use vp->v_object directly.	2005-01-25 00:40:01 +00:00
Poul-Henning Kamp	b5b6ec5faa	Eliminate the constant flags argument to vclean()	2005-01-24 22:22:02 +00:00
Poul-Henning Kamp	7c93282e42	Change vprint() to vn_printf() which takes varargs. Add #define for vprint() to call vn_printf().	2005-01-24 13:58:08 +00:00
Poul-Henning Kamp	35764be39e	Kill the VV_OBJBUF and test the v_object for NULL instead.	2005-01-24 13:13:57 +00:00
Jeff Roberson	d1fcf3bb31	- Add the tunable and sysctl for the mpsafevfs. It currently defaults to off. - Protect access to mnt_kern_flag with the mointpoint mutex. - Remove some KASSERTs which are not legal checks without the appropriate locks held. - Use VCANRECYCLE() rather than rolling several slightly different checks together. - Return from vtryrecycle() with a recycled vnode rather than a locked vnode. This simplifies some locking. - Remove several GIANT_REQUIRED lines. - Add a few KASSERTs to help with INACT debugging. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:41:01 +00:00
Poul-Henning Kamp	7bf38aeae7	Fix a bug I introduced in 1.561 which has caused considerable filesystem unhappiness lately. As far as I can tell, no files that have made it safely to disk have been endangered, but stuff in transit has been in peril. Pointy hat: phk	2005-01-16 21:09:39 +00:00
Poul-Henning Kamp	7c0745eeae	Eliminate unused and unnecessary "cred" argument from vinvalbuf()	2005-01-14 07:33:51 +00:00
Poul-Henning Kamp	e39db32ab0	Ditch vfs_object_create() and make the callers call VOP_CREATEVOBJECT() directly.	2005-01-13 12:25:19 +00:00
Poul-Henning Kamp	6ef8480a88	Add BO_SYNC() and add a default which uses the secret vnode pointer and VOP_FSYNC() for now.	2005-01-11 10:43:08 +00:00
Poul-Henning Kamp	6afa350d53	More vnode -> bufobj migration.	2005-01-11 10:16:39 +00:00
Poul-Henning Kamp	8d785753bd	Give flushbuflist() a struct bufv as first argument and avoid home-rolling TAILQ_FOREACH_SAFE(). Loose the error pointer argument and return any errors the normal way. Return EAGAIN for the case where more work needs to be done.	2005-01-11 10:01:54 +00:00
Poul-Henning Kamp	8df6bac4c7	Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC(). I'm not sure why a credential was added to these in the first place, it is not used anywhere and it doesn't make much sense: The credentials for syncing a file (ability to write to the file) should be checked at the system call level. Credentials for syncing one or more filesystems ("none") should be checked at the system call level as well. If the filesystem implementation needs a particular credential to carry out the syncing it would logically have to the cached mount credential, or a credential cached along with any delayed write data. Discussed with: rwatson	2005-01-11 07:36:22 +00:00
Warner Losh	9454b2d864	/* -> /*- for copyright notices, minor format tweaks as necessary	2005-01-06 23:35:40 +00:00
Poul-Henning Kamp	0b3e4fe239	Since we do not support forceful unmount of DEVFS we can do away with the partially implemented vnode-readoption code in vgonechrl().	2005-01-04 08:49:14 +00:00
Poul-Henning Kamp	e87047b437	We can only ever get to vgonechrl() from a devfs vnode, so we do not need to reassign the vp->v_op to devfs_specops, we know that is the value already. Make devfs_specops private to devfs.	2004-12-20 21:34:29 +00:00
Poul-Henning Kamp	20a92a18f1	The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly split the conversion of the remaining three filesystems out from the root mounting changes, so in one go: cd9660: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS() nfs(client): Convert to nmount (the simple way, mount_nfs(8) is still necessary). Add omount compat shims. Drop COMPAT_PRELITE2 mount arg compatibility. ffs: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS() Remove vfs_omount() method, all filesystems are now converted. Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem task, and they all do it now. Change rootmounting to use DEVFS trampoline: vfs_mount.c: Mount devfs on /. Devfs needs no 'from' so this is clean. symlink /dev to /. This makes it possible to lookup /dev/foo. Mount "real" root filesystem on /. Surgically move the devfs mountpoint from under the real root filesystem onto /dev in the real root filesystem. Remove now unnecessary getdiskbyname(). kern_init.c: Don't do devfs mounting and rootvnode assignment here, it was already handled by vfs_mount.c. Remove now unused bdevvp(), addaliasu() and addalias(). Put the few necessary lines in devfs where they belong. This eliminates the second-last source of bogo vnodes, leaving only the lemming-syncer. Remove rootdev variable, it doesn't give meaning in a global context and was not trustworth anyway. Correct information is provided by statfs(/).	2004-12-07 08:15:41 +00:00
Poul-Henning Kamp	f76fedd20b	Improve vprint() a little bit: break long lines, reduce indent and tell if the VI_LOCK() is held.	2004-12-03 12:09:34 +00:00
Poul-Henning Kamp	aec0fb7b40	Back when VOP_* was introduced, we did not have new-style struct initializations but we did have lofty goals and big ideals. Adjust to more contemporary circumstances and gain type checking. Replace the entire vop_t frobbing thing with properly typed structures. The only casualty is that we can not add a new VOP_ method with a loadable module. History has not given us reason to belive this would ever be feasible in the the first place. Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc. Give coda correct prototypes and function definitions for all vop_()s. Generate a bit more data from the vnode_if.src file: a struct vop_vector and protype typedefs for all vop methods. Add a new vop_bypass() and make vop_default be a pointer to another struct vop_vector. Remove a lot of vfs_init since vop_vector is ready to use from the compiler. Cast various vop_mumble() to void * with uppercase name, for instance VOP_PANIC, VOP_NULL etc. Implement VCALL() by making vdesc_offset the offsetof() the relevant function pointer in vop_vector. This is disgusting but since the code is generated by a script comparatively safe. The alternative for nullfs etc. would be much worse. Fix up all vnode method vectors to remove casts so they become typesafe. (The bulk of this is generated by scripts)	2004-12-01 23:16:38 +00:00
Poul-Henning Kamp	a752aa8f17	Move pbgetvp() and pbrelvp() to vm_pager.c with the rest of the pbuf stuff.	2004-11-15 08:12:50 +00:00
Poul-Henning Kamp	11bcbee11b	Move the bit of the syncer which deals with vnodes into a separate function.	2004-11-14 15:24:38 +00:00
Poul-Henning Kamp	db442506db	Eliminate vop_revoke() function now that devfs_revoke() does the entire job.	2004-11-13 23:38:13 +00:00
Poul-Henning Kamp	c5b846fe8e	Slim vnodes by another four bytes by eliminating the (now) unused field v_cachedid.	2004-11-10 07:31:06 +00:00
Poul-Henning Kamp	c13a4e8820	Remove vn_todev()	2004-11-10 07:17:28 +00:00
Poul-Henning Kamp	b797084e48	Remove vnode->v_cachedfs. It was only used for the highly dangerous "export all vnodes with a sysctl" function.	2004-11-09 22:51:03 +00:00
Poul-Henning Kamp	c569065139	Remove buf->b_dev field.	2004-11-04 07:59:57 +00:00
Poul-Henning Kamp	e0b687d33b	Always initialize bo_private along with bo_ops in getnewvnode(). Spotted by: tegge	2004-11-03 21:09:23 +00:00
Poul-Henning Kamp	996b2c82ca	Loose vfs_mountedon()	2004-10-29 11:15:08 +00:00
Poul-Henning Kamp	e1f355fe4e	Give the bufobj a private __bo_vnode for now to keep the syncer floating [1] At some point later the syncer will unlearn about vnodes and the filesystems method called by the syncer will know enough about what's in bo_private to do the right thing. [1] Ok, I know, but I couldn't resist the pun.	2004-10-29 09:33:32 +00:00
Poul-Henning Kamp	20eba72f53	Move the syncer linkage from vnode to bufobj. This is not quite a perfect separation: the syncer still think it knows that everything is a vnode.	2004-10-27 08:05:02 +00:00
Poul-Henning Kamp	5d9d81e7ea	Put the I/O block size in bufobj->bo_bsize. We keep si_bsize_phys around for now as that is the simplest way to pull the number out of disk device drivers in devfs_open(). The correct solution would be to do an ioctl(DIOCGSECTORSIZE), but the point is probably mooth when filesystems sit on GEOM, so don't bother for now.	2004-10-26 07:39:12 +00:00
Poul-Henning Kamp	156cb26583	Loose the v_dirty* and v_clean* alias macros. Check the count field where we just want to know the full/empty state, rather than using TAILQ_EMPTY() or TAILQ_FIRST().	2004-10-25 09:14:03 +00:00
Poul-Henning Kamp	ee1d0eb330	Remove vnode->v_bsize. This was a dead-end.	2004-10-25 07:50:59 +00:00
Poul-Henning Kamp	4dcd0ac4cf	Collapse vnode->v_object and buf->b_object into bufobj->bo_object.	2004-10-25 06:02:57 +00:00
Poul-Henning Kamp	b792bebeea	Move the buffer method vector (buf->b_op) to the bufobj. Extend it with a strategy method. Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY song and dance. Rename ibwrite to bufwrite(). Move the two NFS buf_ops to more sensible places, add bufstrategy to them. Add inlines for bwrite() and bstrategy() which calls through buf->b_bufobj->b_ops->b_{write,strategy}(). Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().	2004-10-24 20:03:41 +00:00
Robert Watson	b0e86f6ac2	When MAC is enabled, warn if getnewvnode() is asked to produce a vnode without a mountpoint. In this scenario, there's no useful source for a label on the vnode, since we can't query the mountpoint for the labeling strategy or default label.	2004-10-22 11:04:58 +00:00
Poul-Henning Kamp	ff7c5a4880	Alas, poor SPECFS! -- I knew him, Horatio; A filesystem of infinite jest, of most excellent fancy: he hath taught me lessons a thousand times; and now, how abhorred in my imagination it is! my gorge rises at it. Here were those hacks that I have curs'd I know not how oft. Where be your kludges now? your workarounds? your layering violations, that were wont to set the table on a roar? Move the skeleton of specfs into devfs where it now belongs and bury the rest.	2004-10-22 09:59:37 +00:00
Poul-Henning Kamp	494eb176e7	Add b_bufobj to struct buf which eventually will eliminate the need for b_vp. Initialize b_bufobj for all buffers. Make incore() and gbincore() take a bufobj instead of a vnode. Make inmem() local to vfs_bio.c Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj) also VI_MTX() to BO_MTX(), Make buf_vlist_add() take a bufobj instead of a vnode. Eliminate other uses of bp->b_vp where bp->b_bufobj will do. Various minor polishing: remove "register", turn panic into KASSERT, use new function declarations, TAILQ_FOREACH_SAFE() etc.	2004-10-22 08:47:20 +00:00
Poul-Henning Kamp	a76d8f4ec9	Move the VI_BWAIT flag into no bo_flag element of bufobj and call it BO_WWAIT Add bufobj_wref(), bufobj_wdrop() and bufobj_wwait() to handle the write count on a bufobj. Bufobj_wdrop() replaces vwakeup(). Use these functions all relevant places except in ffs_softdep.c where the use if interlocked_sleep() makes this impossible. Rename b_vnbufs to b_bobufs now that we touch all the relevant files anyway.	2004-10-21 15:53:54 +00:00
Poul-Henning Kamp	1bca607b9f	Add BO_* macros parallel to VI_* macros for manipulating the bo_mtx. Initialize the bo_mtx when we allocate a vnode i getnewvnode() For now we point to the vnodes interlock mutex, that retains the exact same locking sematics. Move v_numoutput from vnode to bufobj. Add renaming macro to postpone code sweep.	2004-10-21 14:42:31 +00:00
Poul-Henning Kamp	67647b2312	Polish vtruncbuf() to improve readability and style a bit.	2004-10-21 14:13:54 +00:00
Poul-Henning Kamp	e163395619	Simplify buf_vlist_remove(). Now that we have encapsulated the splaytree related information into a structure we can eliminate the half of this function.	2004-10-21 13:48:50 +00:00
Greg Lehey	57259f2864	vtryrecycle: Don't rely on type VBAD alone to mean that we don't need to clean the vnode. If v_data is set, we still need to clean it. This code change should catch all incidents of the previous commit (INVARIANTS only).	2004-10-06 02:09:59 +00:00
Greg Lehey	f2154b33d2	getnewvnode: Weaken the panic "cleaned vnode isn't" to a warning. Discussion: this panic (or waning) only occurs when the kernel is compiled with INVARIANTS. Otherwise the problem (which means that the vp->v_data field isn't NULL, and represents a coding error and possibly a memory leak) is silently ignored by setting it to NULL later on. Panicking here isn't very helpful: by this time, we can only find the symptoms. The panic occurs long after the reason for "not cleaning" has been forgotten; in the case in point, it was the result of severe file system corruption which left the v_type field set to VBAD. That issue will be addressed by a separate commit.	2004-10-06 02:06:11 +00:00
Poul-Henning Kamp	ba2851254f	Fix a LOR relating to freeing cdevs.	2004-10-01 06:33:39 +00:00
Poul-Henning Kamp	70526ca6a5	Hold dev_lock and check for NULL devsw pointer when we determine if a vnode is a disk.	2004-09-24 06:16:08 +00:00
Poul-Henning Kamp	a0e78d2eb0	Do not refcount the cdevsw, but rather maintain a cdev->si_threadcount of the number of threads which are inside whatever is behind the cdevsw for this particular cdev. Make the device mutex visible through dev_lock() and dev_unlock(). We may want finer granularity later. Replace spechash_mtx use with dev_lock()/dev_unlock().	2004-09-23 07:17:41 +00:00
Poul-Henning Kamp	08dbd671ff	Remove unused B_WRITEINPROG flag	2004-09-15 21:49:22 +00:00
Poul-Henning Kamp	1affa3adc8	Create simple function init_va_filerev() for initializing a va_filerev field. Replace three instances of longhaired initialization va_filerev fields. Added XXX comment wondering why we don't use random bits instead of uptime of the system for this purpose.	2004-09-07 09:17:05 +00:00
Don Lewis	8ded654028	Don't attempt to trigger the syncer thread final sync code in the shutdown_pre_sync state if the RB_NOSYNC flag is set. This is the likely cause of hangs after a system panic that are keeping crash dumps from being done. This is a MFC candidate for RELENG_5. MFC after: 3 days	2004-08-20 19:21:47 +00:00
David E. O'Brien	78c37b0de8	s/MAX_SAFE_MAXVNODES/MAXVNODES_MAX/g	2004-08-16 08:33:37 +00:00
John-Mark Gurney	ad3b9257c2	Add locking to the kqueue subsystem. This also makes the kqueue subsystem a more complete subsystem, and removes the knowlege of how things are implemented from the drivers. Include locking around filter ops, so a module like aio will know when not to be unloaded if there are outstanding knotes using it's filter ops. Currently, it uses the MTX_DUPOK even though it is not always safe to aquire duplicate locks. Witness currently doesn't support the ability to discover if a dup lock is ok (in some cases). Reviewed by: green, rwatson (both earlier versions)	2004-08-15 06:24:42 +00:00
Robert Watson	87e83e7d4c	In v_addpollinfo(), we allocate storage to back vp->v_pollinfo. However, we may sleep when doing so; check that we didn't race with another thread allocating storage for the vnode after allocation is made to a local pointer, and only update the vnode pointer if it's still NULL. Otherwise, accept that another thread got there first, and release the local storage. Discussed with: jmg	2004-08-11 01:27:53 +00:00
Nate Lawson	c8c216d558	Skip the syncing disks loop if there are no dirty buffers. Remove a variable used to flag the initial printf. Submitted by: truckman (earlier version)	2004-08-10 01:32:05 +00:00
David E. O'Brien	64298d52cc	Put a cap on the auto-tuning of kern.maxvnodes. Cap value chosen by: scottl	2004-08-02 21:52:43 +00:00
Nate Lawson	b1c8139147	Minor message cleanup.	2004-07-30 01:30:05 +00:00
Poul-Henning Kamp	3dfe213e61	Convert the vfsconf list to a TAILQ. Introduce vfs_byname() function to find things on it. Staticize vfs_nmount() function under the name vfs_donmount(). Various cleanups.	2004-07-27 22:32:01 +00:00
Colin Percival	56f21b9d74	Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is somewhat clearer, but more importantly allows for a consistent naming scheme for suser_cred flags. The old name is still defined, but will be removed in a few days (unless I hear any complaints...) Discussed with: rwatson, scottl Requested by: jhb	2004-07-26 07:24:04 +00:00
Poul-Henning Kamp	cf95b5c381	Eliminate unused second argument to reassignbuf() and simplify it accordingly.	2004-07-25 21:24:23 +00:00
Alfred Perlstein	05656b6e2b	put several of the options for DEBUG_VFS_LOCKS under control of sysctls.	2004-07-21 07:13:14 +00:00
Alfred Perlstein	bb5faea34f	Cleanup shutdown output.	2004-07-15 08:01:00 +00:00
Alfred Perlstein	da6303bacc	Tidy up system shutdown.	2004-07-15 04:29:48 +00:00
Alfred Perlstein	f257b7a54b	Make VFS_ROOT() and vflush() take a thread argument. This is to allow filesystems to decide based on the passed thread which vnode to return. Several filesystems used curthread, they now use the passed thread.	2004-07-12 08:14:09 +00:00
Alfred Perlstein	7ae8ce5df1	Dump the actual bad values when this assertion is tripped.	2004-07-12 04:13:38 +00:00
Marcel Moolenaar	32240d082c	Update for the KDB framework: o Call kdb_enter() instead of Debugger().	2004-07-10 21:47:53 +00:00
Alfred Perlstein	057589c485	fixup sysctl by fsid node	2004-07-08 06:11:36 +00:00
Alfred Perlstein	ea0104b032	Introduce vfs_suser(), used to test if a user should have special privs for a mount.	2004-07-06 09:37:43 +00:00
Alfred Perlstein	c713aaaeca	NFS mobility PHASE I, II & III (phase VI, and V pending): Rebind the client socket when we experience a timeout. This fixes the case where our IP changes for some reason. Signal a VFS event when NFS transitions from up to down and vice versa. Add a placeholder vfs_sysctl where we will put status reporting shortly. Also: Make down NFS mounts return EIO instead of EINTR when there is a soft timeout or force unmount in progress.	2004-07-06 09:12:03 +00:00
Don Lewis	27875d9c88	Unconditionally set last_work_seen while in the SYNCER_RUNNING state so that last_work_seen has a reasonable value at the transition to the SYNCER_SHUTTING_DOWN state, even if net_worklist_len happened to be zero at the time. Initialize last_work_seen to zero as a safety measure in case the syncer never ran in the SYNCER_RUNNING state. Tested by: phk	2004-07-05 21:32:01 +00:00
Don Lewis	faf1b66d1d	Rework syncer termination code: Speed up the syncer when shutting down by sleeping for a shorter period of time instead of cranking up rushjob and using the normal one second sleep. Skip empty worklist slots when shutting down to avoid lengthy intervals of inactivity. Give I/O more time to complete between steps by not speeding the syncer quite as much. Terminate the syncer after one full pass through the worklist plus one second with the worklist containing nothing but syncer vnodes. Print an indication of shutdown progress to the console. Add a sysctl, vfs.worklist_len, to allow the size of the syncer worklist to be monitored.	2004-07-05 01:07:33 +00:00
Poul-Henning Kamp	c555963fd1	Give synthetic root filesystem device vnodes a v_bsize of DEV_BSIZE.	2004-07-04 22:33:22 +00:00
Alfred Perlstein	2d1dca73ee	Pass the operation in with the fsidctl. Remove some fsidctls that we will not be using. Correct prototypes for fs sysctls.	2004-07-04 20:21:58 +00:00
Poul-Henning Kamp	7f6599fec6	Make the last commit handle non-phk root devices better.	2004-07-04 19:42:25 +00:00

... 3 4 5 6 7 ...

948 Commits