Commit Graph

948 Commits

Author SHA1 Message Date
Konstantin Belousov
6179164448 In the nfsrv_fhtovp(), after the vfs_getvfs() function found the pointer
to the fs, but before a vnode on the fs is locked, unmount may free fs
structures, causing access to destroyed data and freed memory.

Introduce a vfs_busymp() function that looks up and busies found
fs while mountlist_mtx is held. Use it in nfsrv_fhtovp() and in the
implementation of the handle syscalls.

Two other uses of the vfs_getvfs() in the vfs_subr.c, namely in
sysctl_vfs_ctl and vfs_getnewfsid seems to be ok. In particular,
sysctl_vfs_ctl is protected by Giant by being a non-sleeping sysctl
handler, that prevents Giant-locked unmount code to interfere with it.

Noted by:	tegge
Reviewed by:	dfr
Tested by:	pho
MFC after:	1 month
2008-11-29 13:34:59 +00:00
Pawel Jakub Dawidek
1ba4a712dd Update ZFS from version 6 to 13 and bring some FreeBSD-specific changes.
This bring huge amount of changes, I'll enumerate only user-visible changes:

- Delegated Administration

	Allows regular users to perform ZFS operations, like file system
	creation, snapshot creation, etc.

- L2ARC

	Level 2 cache for ZFS - allows to use additional disks for cache.
	Huge performance improvements mostly for random read of mostly
	static content.

- slog

	Allow to use additional disks for ZFS Intent Log to speed up
	operations like fsync(2).

- vfs.zfs.super_owner

	Allows regular users to perform privileged operations on files stored
	on ZFS file systems owned by him. Very careful with this one.

- chflags(2)

	Not all the flags are supported. This still needs work.

- ZFSBoot

	Support to boot off of ZFS pool. Not finished, AFAIK.

	Submitted by:	dfr

- Snapshot properties

- New failure modes

	Before if write requested failed, system paniced. Now one
	can select from one of three failure modes:
	- panic - panic on write error
	- wait - wait for disk to reappear
	- continue - serve read requests if possible, block write requests

- Refquota, refreservation properties

	Just quota and reservation properties, but don't count space consumed
	by children file systems, clones and snapshots.

- Sparse volumes

	ZVOLs that don't reserve space in the pool.

- External attributes

	Compatible with extattr(2).

- NFSv4-ACLs

	Not sure about the status, might not be complete yet.

	Submitted by:	trasz

- Creation-time properties

- Regression tests for zpool(8) command.

Obtained from:	OpenSolaris
2008-11-17 20:49:29 +00:00
Attilio Rao
30f60d8c31 Remove the mnt_holdcnt and mnt_holdcntwaiters because they are useless.
Really, the concept of holdcnt in the struct mount is rappresented by
the mnt_ref (which prevents the type-stable structure from being
"recycled) handled through vfs_ref() and vfs_rel().
On this optic, switch the holdcnt acquisition into an emulated vfs_ref()
(and subsequent release into vfs_rel()).

Discussed with:	kib
Tested by:	pho
2008-11-03 20:00:35 +00:00
Attilio Rao
83b3bdbc8a Improve VFS locking:
- Implement real draining for vfs consumers by not relying on the
  mnt_lock and using instead a refcount in order to keep track of lock
  requesters.
- Due to the change above, remove the mnt_lock lockmgr because it is now
  useless.
- Due to the change above, vfs_busy() is no more linked to a lockmgr.
  Change so its KPI by removing the interlock argument and defining 2 new
  flags for it: MBF_NOWAIT which basically replaces the LK_NOWAIT of the
  old version (which was unlinked from the lockmgr alredy) and
  MBF_MNTLSTLOCK which provides the ability to drop the mountlist_mtx
  once the mnt interlock is held (ability still desired by most consumers).
- The stub used into vfs_mount_destroy(), that allows to override the
  mnt_ref if running for more than 3 seconds, make it totally useless.
  Remove it as it was thought to work into older versions.
  If a problem of "refcount held never going away" should appear, we will
  need to fix properly instead than trust on such hackish solution.
- Fix a bug where returning (with an error) from dounmount() was still
  leaving the MNTK_MWAIT flag on even if it the waiters were actually
  woken up. Just a place in vfs_mount_destroy() is left because it is
  going to recycle the structure in any case, so it doesn't matter.
- Remove the markercnt refcount as it is useless.

This patch modifies VFS ABI and breaks KPI for vfs_busy() so manpages and
__FreeBSD_version will be modified accordingly.

Discussed with:	kib
Tested by:	pho
2008-11-02 10:15:42 +00:00
Edward Tomasz Napierala
15bc6b2bd8 Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary
to add more V* constants, and the variables changed by this patch were often
being assigned to mode_t variables, which is 16 bit.

Approved by:	rwatson (mentor)
2008-10-28 13:44:11 +00:00
Konstantin Belousov
7cd5a03a8e Style return statements in vn_pollrecord(). 2008-10-28 12:22:33 +00:00
Konstantin Belousov
ae53539e21 Protect check for v_pollinfo == NULL and assignment of the newly allocated
vpollinfo with vnode interlock. Fully initialize vpollinfo before putting
pointer to it into vp->v_pollinfo.

Discussed with:	dwhite
Tested by:	pho
MFC after:	1 week
2008-10-28 12:08:36 +00:00
Konstantin Belousov
3cfc308922 In vfs_busy(), lockmgr() cannot legitimately sleep, because code checked
MNTK_UNMOUNT before, and mnt_mtx is used as interlock. vfs_busy() always
tries to obtain a shared lock on mnt_lock, the other user is unmount who
tries to drain it, setting MNTK_UNMOUNT before.

Reviewed by:	tegge, attilio
Tested by:	pho
MFC after:	2 weeks
2008-10-20 10:07:28 +00:00
Attilio Rao
0d7935fd01 Remove the struct thread unuseful argument from bufobj interface.
In particular following functions KPI results modified:
- bufobj_invalbuf()
- bufsync()

and BO_SYNC() "virtual method" of the buffer objects set.
Main consumers of bufobj functions are affected by this change too and,
in particular, functions which changed their KPI are:
- vinvalbuf()
- g_vfs_close()

Due to the KPI breakage, __FreeBSD_version will be bumped in a later
commit.

As a side note, please consider just temporary the 'curthread' argument
passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP

Reviewed by:	kib
Tested by:	Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
2008-10-10 21:23:50 +00:00
Attilio Rao
59d4932531 Decontextualize vfs_busy(), vfs_unbusy() and vfs_mount_alloc() functions.
Manpages are updated accordingly.

Tested by:	Diego Sardina <siarodx at gmail dot com>
2008-08-31 14:26:08 +00:00
Attilio Rao
0359a12ead Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread
was always curthread and totally unuseful.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
2008-08-28 15:23:18 +00:00
Konstantin Belousov
a888d54d39 Introduce the VV_FORCEINSMQ vnode flag. It instructs the insmnque() function
to ignore the unmounting and forces insertion of the vnode into the mount
vnode list.

Change insmntque() to fail when forced unmount is in progress and
VV_FORCEINSMQ is not specified.

Add an assertion to the insmntque(), requiring the vnode to be
exclusively locked for mp-safe filesystems.

Use the VV_FORCEINSMQ for the creation of the syncvnode.

Tested by:	pho
Reviewed by:	tegge
MFC after:	1 month
2008-08-28 09:08:15 +00:00
Christian S.J. Peron
e451733718 Remove worrying printf warning on bootup when processing vnodes which
have NULL mount-points.  This is the case for special vnodes, such as the
one used in nameiinit() which is used for crossing mount points in lookup()
to avoid  lock ordering issues.

MFC after:	2 weeks
Discussed with:	rwatson, kib
2008-08-24 20:16:44 +00:00
Ed Schouten
e7ea30e404 Remove the use of lbolt from the VFS syncer.
It seems we only use `lbolt' inside the VFS syncer and the TTY layer
now.  Because I'm planning to replace the TTY layer next month, there's
no reason to keep `lbolt' if it's only used in a single thread inside
the kernel.

Because the syncer code wanted to wake up the syncer thread before the
timeout, it called sleepq_remove(). Because we now just use a condvar(9)
with a timeout value of `hz', we can wake it up using cv_broadcast()
without waking up any unrelated threads.

Reviewed by:	phk
2008-07-30 12:39:18 +00:00
Pawel Jakub Dawidek
5573021d78 Assert for exclusive vnode lock in vinactive(), vrecycle() and vgonel()
functions.

Reviewed by:	kib
2008-07-27 11:48:15 +00:00
Pawel Jakub Dawidek
610507ae00 - Move vp test for beeing NULL under IGNORE_LOCK().
- Check if panicstr isn't set, if it is ignore the lock. This helps to avoid
  confusion, because lockmgr is a no-op when panicstr isn't NULL, so
  asserting anything at this point doesn't make sense and can just race with
  other panic.

Discussed with:	kib
2008-07-27 11:46:42 +00:00
Attilio Rao
09400d5abe - Disallow XFS mounting in write mode. The write support never worked really
and there is no need to maintain it.
- Fix vn_get() in order to let it call vget(9) with a valid locking
  request.  vget(9) returns the vnode locked in order to prevent recycling,
  but in this case internal XFS locks alredy prevent it from happening, so
  it is safe to drop the vnode lock before to return by vn_get().
- Add a VNASSERT() in vget(9) in order to catch malformed locking requests.

Discussed with:	kan, kib
Tested by:	Lothar Braun <lothar at lobraun dot de>
2008-07-21 23:01:09 +00:00
Pawel Jakub Dawidek
988f0e193a Be more friendly for DDB pager.
Educated by:	jhb's BSDCan presentation
2008-05-18 21:08:12 +00:00
Attilio Rao
60e2edce55 sync_vnode() has some messy code about locking in order to deal with
mount fs needing Giant to be held when processing bufobjs.
Use a different subqueue for pending workitems on filesystems requiring
Giant. This simplifies the code notably and also reduces the number of
Giant acquisitions (and the whole processing cost).

Suggested by:	jeff
Reviewed by:	kib
Tested by:	pho
2008-05-04 13:54:55 +00:00
Pawel Jakub Dawidek
3800322fe2 Implement 'show mount' command in DDB. Without argument, it prints short
info about all currently mounted file systems. When an address is given
as an argument, prints detailed info about the given mount point.

MFC after:	2 weeks
2008-04-26 13:04:48 +00:00
Konstantin Belousov
12e79a9bbc Allow the vnode zone to return the unused memory. The vnode reference
count is/shall be properly maintained for the long time, and VFS
shall be safe against the vnode memory reclamation.

Proposed by:	jeff
Tested by:	pho
2008-04-24 09:58:33 +00:00
Konstantin Belousov
eab626f110 Move the head of byte-level advisory lock list from the
filesystem-specific vnode data to the struct vnode. Provide the
default implementation for the vop_advlock and vop_advlockasync.
Purge the locks on the vnode reclaim by using the lf_purgelocks().
The default implementation is augmented for the nfs and smbfs.
In the nfs_advlock, push the Giant inside the nfs_dolock.

Before the change, the vop_advlock and vop_advlockasync have taken the
unlocked vnode and dereferenced the fs-private inode data, racing with
with the vnode reclamation due to forced unmount. Now, the vop_getattr
under the shared vnode lock is used to obtain the inode size, and
later, in the lf_advlockasync, after locking the vnode interlock, the
VI_DOOMED flag is checked to prevent an operation on the doomed vnode.

The implementation of the lf_purgelocks() is submitted by dfr.

Reported by:	kris
Tested by:	kris, pho
Discussed with:	jeff, dfr
MFC after:	2 weeks
2008-04-16 11:33:32 +00:00
Jeff Roberson
1fd9b6a577 - Destroy the bo mtx when the vnode is destroyed. 2008-04-02 10:40:03 +00:00
Attilio Rao
71072af500 b_waiters cannot be adequately protected by the interlock because it is
dropped after the call to lockmgr() so just revert this approach using
something similar to the precedent one:
BUF_LOCKWAITERS() just checks if there are waiters (not the actual number
of them) and it is based on newly introduced lockmgr_waiters() which
returns if the lockmgr has waiters or not. The name has been choosen
differently by old lockwaiters() in order to not confuse them.

KPI results enriched by this commit so __FreeBSD_version bumping and
manpage update will be happening soon.
'struct buf' also changes, so kernel ABI is disturbed.

Bug found by:	jeff
Approved by:	jeff, kib
2008-03-28 12:30:12 +00:00
Jeff Roberson
0ee6cecc9d - Greatly simplify vget() by removing the guarantee that any new
references to a vnode with VI_OWEINACT set will force the vinactive()
   call.  The kernel makes no guarantees about which reference was the
   last to close a file or when the actual inactive processing will
   happen.  The previous code was designed to preserve existing semantics
   in the face of shared locks, however, this was unnecessary.

Discussed with:	mckusick
2008-03-24 04:22:58 +00:00
Jeff Roberson
e6b2545b3b - Only return 1 from sync_vnode() in cases where the vnode is still
at the head of the sync list.  This prevents sched_sync() from
   re-queueing a vnode which may have been freed already.

Discussed with:	kib
2008-03-23 01:44:28 +00:00
Jeff Roberson
f6a8cecfc6 - Pass BO_MTX(bo) to lockmgr in vtruncbuf, we don't own the vnode
interlock here anymore.

Reported by:	kris
2008-03-23 01:42:19 +00:00
Jeff Roberson
698b1a6643 - Complete part of the unfinished bufobj work by consistently using
BO_LOCK/UNLOCK/MTX when manipulating the bufobj.
 - Create a new lock in the bufobj to lock bufobj fields independently.
   This leaves the vnode interlock as an 'identity' lock while the bufobj
   is an io lock.  The bufobj lock is ordered before the vnode interlock
   and also before the mnt ilock.
 - Exploit this new lock order to simplify softdep_check_suspend().
 - A few sync related functions are marked with a new XXX to note that
   we may not properly interlock against a non-zero bv_cnt when
   attempting to sync all vnodes on a mountlist.  I do not believe this
   race is important.  If I'm wrong this will make these locations easier
   to find.

Reviewed by:	kib (earlier diff)
Tested by:	kris, pho (earlier diff)
2008-03-22 09:15:16 +00:00
Robert Watson
237fdd787b In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation.  This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after:	1 month
Discussed with:	imp, rink
2008-03-16 10:58:09 +00:00
Attilio Rao
7fbfba7bf8 - Handle buffer lock waiters count directly in the buffer cache instead
than rely on the lockmgr support [1]:
  * bump the waiters only if the interlock is held
  * let brelvp() return the waiters count
  * rely on brelvp() instead than BUF_LOCKWAITERS() in order to check
    for the waiters number
- Remove a namespace pollution introduced recently with lockmgr.h
  including lock.h by including lock.h directly in the consumers and
  making it mandatory for using lockmgr.
- Modify flags accepted by lockinit():
  * introduce LK_NOPROFILE which disables lock profiling for the
    specified lockmgr
  * introduce LK_QUIET which disables ktr tracing for the specified
    lockmgr [2]
  * disallow LK_SLEEPFAIL and LK_NOWAIT to be passed there so that it
    can only be used on a per-instance basis
- Remove BUF_LOCKWAITERS() and lockwaiters() as they are no longer
  used

This patch breaks KPI so __FreBSD_version will be bumped and manpages
updated by further commits. Additively, 'struct buf' changes results in
a disturbed ABI also.

[2] Really, currently there is no ktr tracing in the lockmgr, but it
will be added soon.

[1] Submitted by:	kib
Tested by:	pho, Andrea Barberio <insomniac at slackware dot it>
2008-03-01 19:47:50 +00:00
Attilio Rao
81c794f998 Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it is
always curthread.

As KPI gets broken by this patch, manpages and __FreeBSD_version will be
updated by further commits.

Tested by:	Andrea Barberio <insomniac at slackware dot it>
2008-02-25 18:45:57 +00:00
Attilio Rao
2433c4883e Conver all explicit instances to VOP_ISLOCKED(arg, NULL) into
VOP_ISLOCKED(arg, curthread). Now, VOP_ISLOCKED() and lockstatus() should
only acquire curthread as argument; this will lead in axing the additional
argument from both functions, making the code cleaner.

Reviewed by: jeff, kib
2008-02-08 21:45:47 +00:00
Attilio Rao
0e9eb108f0 Cleanup lockmgr interface and exported KPI:
- Remove the "thread" argument from the lockmgr() function as it is
  always curthread now
- Axe lockcount() function as it is no longer used
- Axe LOCKMGR_ASSERT() as it is bogus really and no currently used.
  Hopefully this will be soonly replaced by something suitable for it.
- Remove the prototype for dumplockinfo() as the function is no longer
  present

Addictionally:
- Introduce a KASSERT() in lockstatus() in order to let it accept only
  curthread or NULL as they should only be passed
- Do a little bit of style(9) cleanup on lockmgr.h

KPI results heavilly broken by this change, so manpages and
FreeBSD_version will be modified accordingly by further commits.

Tested by: matteo
2008-01-24 12:34:30 +00:00
Attilio Rao
d638e093d6 - Introduce the function lockmgr_recursed() which returns true if the
lockmgr lkp, when held in exclusive mode, is recursed
- Introduce the function BUF_RECURSED() which does the same for bufobj
  locks based on the top of lockmgr_recursed()
- Introduce the function BUF_ISLOCKED() which works like the counterpart
  VOP_ISLOCKED(9), showing the state of lockmgr linked with the bufobj

BUF_RECURSED() and BUF_ISLOCKED() entirely replace the usage of bogus
BUF_REFCNT() in a more explicative and SMP-compliant way.
This allows us to axe out BUF_REFCNT() and leaving the function
lockcount() totally unused in our stock kernel. Further commits will
axe lockcount() as well as part of lockmgr() cleanup.

KPI results, obviously, broken so further commits will update manpages
and freebsd version.

Tested by: kris (on UFS and NFS)
2008-01-19 17:36:23 +00:00
Attilio Rao
22db15c06f VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
2008-01-13 14:44:15 +00:00
Attilio Rao
cb05b60a89 vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by:	Diego Sardina <siarodx at gmail dot com>,
		Andrea Di Pasquale <whyx dot it at gmail dot com>
2008-01-10 01:10:58 +00:00
Robert Watson
c5f1beb02a In "show lockedvnods" DDB command, use db_printf() rather than printf()
so that the results end up in the DDB output stream rather than the
console output stream.

This should likely also be done for the vprint() function it calls.

MFC after:	3 months
2007-12-28 00:47:31 +00:00
Attilio Rao
98e4f2e2bf As LK_EXCLUPGRADE is used in conjuction with LK_NOWAIT, LK_UPGRADE becames
equivalent with this and so operate the switch.

That call is the only one remaining LK_EXCLUPGRADE consumer and removing
it will prepare the ground for LK_EXCLUPGRADE axing and further
lockmgr improvements.

Discussed with: jeff, ups
2007-12-27 20:52:05 +00:00
Robert Watson
3de213cc00 Add a new 'why' argument to kdb_enter(), and a set of constants to use
for that argument.  This will allow DDB to detect the broad category of
reason why the debugger has been entered, which it can use for the
purposes of deciding which DDB script to run.

Assign approximate why values to all current consumers of the
kdb_enter() interface.
2007-12-25 17:52:02 +00:00
Konstantin Belousov
973bdaa06f Use curthread instead of the FIRST_THREAD_IN_PROC for vnlru and syncer,
when applicable.

Aquire Giant slightly later for vnlru.

In the syncer, aquire the Giant only when a vnode belongs to the
non-MPsafe fs.

In both speedup_syncer() and syncer_shutdown(), remove the syncer thread from
the lbolt sleep queue after the syncer state is modified, not before.

Herded by:	attilio
Tested by:	Peter Holm
Reviewed by:	ups
MFC after:	1 week
2007-12-05 09:34:04 +00:00
Robert Watson
30d239bc4c Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

  mac_<object>_<method/action>
  mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme.  Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier.  Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods.  Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by:	SPARTA (original patches against Mac OS X)
Obtained from:	TrustedBSD Project, Apple Computer
2007-10-24 19:04:04 +00:00
Julian Elischer
3745c395ec Rename the kthread_xxx (e.g. kthread_create()) calls
to kproc_xxx as they actually make whole processes.
Thos makes way for us to add REAL kthread_create() and friends
that actually make theads. it turns out that most of these
calls actually end up being moved back to the thread version
when it's added. but we need to make this cosmetic change first.

I'd LOVE to do this rename in 7.0  so that we can eventually MFC the
new kthread_xxx() calls.
2007-10-20 23:23:23 +00:00
Konstantin Belousov
245b204491 When restoring the mount after umount failed, the MNTK_UNMOUNT flag
prevents insmntque() from placing reallocated syncer vnode on mount
list, that causes panic in vfs_allocate_syncvnode().

Introduce MNTK_NOINSMNTQ flag, that marks the period when instmntque is
not allowed to success, instead of MNTK_UNMOUNT. The MNTK_NOINSMNTQ is
set and cleared simultaneously with MNTK_UNMOUNT, except on umount error
path, where it is cleaned just before the syncer vnode is going to be
allocated.

Reported by:	Peter Jeremy <peterjeremy optushome com au>
Suggested by:	tegge
Approved by:	re (rwatson)
2007-09-12 16:31:32 +00:00
Pawel Jakub Dawidek
354eb80141 Improve vn_printf() by:
- adding missing vnode flags,
- printing unknown flags as numbers,
- using strlcat() instead of strcat().

Approved by:	re (bmah)
2007-08-13 21:23:30 +00:00
Robert Watson
32f9753cfb Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths.  Do, however, move those prototypes to priv.h.

Reviewed by:	csjp
Obtained from:	TrustedBSD Project
2007-06-12 00:12:01 +00:00
Attilio Rao
2feb50bf7d Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)
2007-05-31 22:52:15 +00:00
Robert Watson
e1e8f51b85 Universally adopt most conventional spelling of acquire. 2007-05-27 20:50:23 +00:00
Konstantin Belousov
d413d21071 Since renaming of vop_lock to _vop_lock, pre- and post-condition
function calls are no more generated for vop_lock.
Rename _vop_lock to vop_lock1 to satisfy tools/vnode_if.awk assumption
about vop naming conventions. This restores pre/post-condition calls.
2007-05-18 13:02:13 +00:00
Jeff Roberson
222d01951f - define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts.  This can be used to abstract away pcpu details but also changes
   to use atomics for all counters now.  This means sched lock is no longer
   responsible for protecting counts in the switch routines.

Contributed by:		Attilio Rao <attilio@FreeBSD.org>
2007-05-18 07:10:50 +00:00
Pawel Jakub Dawidek
24b0502ee0 Fix jails and jail-friendly file systems handling:
- We need to allow for PRIV_VFS_MOUNT_OWNER inside a jail.
- Move security checks to vfs_suser() and deny unmounting and updating
  for jailed root from different jails, etc.

OK'ed by:	rwatson
2007-04-13 23:54:22 +00:00
Pawel Jakub Dawidek
6bc3ab2574 When we are running low on vnodes, there is currently no way to ask other
subsystems to release some vnodes. Implement backpressure based on
vfs_lowvnodes event (similar to vm_lowmem for memory).
2007-04-13 08:38:48 +00:00
Pawel Jakub Dawidek
08be819487 Minor style cleanups (mostly removal of trailing whitespaces). 2007-04-10 15:29:37 +00:00
Pawel Jakub Dawidek
21ff8c6715 Correct typos. 2007-04-10 15:22:40 +00:00
Pawel Jakub Dawidek
def72fbba1 Now that the vdropl() function is public, assert that the vnode interlock
is held.
2007-04-01 10:45:32 +00:00
Dag-Erling Smørgrav
e6534b36d8 Make vdropl() public; zfs needs it. There is also plenty of existing
file system code (mostly *_reclaim()) which look like this:

    VOP_LOCK(vp);
    /* examine vp */
    VOP_UNLOCK(vp);
    vdrop(vp);

This can now be rewritten to:

    VOP_LOCK(vp);
    /* examine vp */
    vdropl(vp); /* will unlock vp */

MFC after:	1 week
2007-03-31 23:57:17 +00:00
Marcel Moolenaar
f3ea971bf0 PowerPC is the only architecture with mpsafe_vfs=0. This is now
broken. Rudimentary tests show that PowerPC can run with
mpsafe_vfs=1. Make it so...
2007-03-27 05:29:41 +00:00
Tor Egge
61b9d89ff0 Make insmntque() externally visibile and allow it to fail (e.g. during
late stages of unmount).  On failure, the vnode is recycled.

Add insmntque1(), to allow for file system specific cleanup when
recycling vnode on failure.

Change getnewvnode() to no longer call insmntque().  Previously,
embryonic vnodes were put onto the list of vnode belonging to a file
system, which is unsafe for a file system marked MPSAFE.

Change vfs_hash_insert() to no longer lock the vnode.  The caller now
has that responsibility.

Change most file systems to lock the vnode and call insmntque() or
insmntque1() after a new vnode has been sufficiently setup.  Handle
failed insmntque*() calls by propagating errors to callers, possibly
after some file system specific cleanup.

Approved by:	re (kensmith)
Reviewed by:	kib
In collaboration with:	kib
2007-03-13 01:50:27 +00:00
Kip Macy
2f6a774be4 change vop_lock handling to allowing tracking of callers' file and line for
acquisition of lockmgr locks

Approved by: scottl (standing in for mentor rwatson)
2006-11-13 05:51:22 +00:00
John Baldwin
6b8de13ab4 Simplify operations with sync_mtx in sched_sync():
- Don't drop the lock just to reacquire it again to check rushjob, this
  only wastes time.
- Use msleep() to drop the mutex while sleeping instead of explicitly
  unlocking around tsleep.

Reviewed by:	pjd
2006-11-07 19:45:05 +00:00
John Baldwin
8064e5d71f Fix comment typo and function declaration. 2006-11-07 19:07:33 +00:00
Robert Watson
acd3428b7d Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges.  These may
require some future tweaking.

Sponsored by:           nCircle Network Security, Inc.
Obtained from:          TrustedBSD Project
Discussed on:           arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
                        Alex Lyashkov <umka at sevcity dot net>,
                        Skip Ford <skip dot ford at verizon dot net>,
                        Antoine Brodin <antoine dot brodin at laposte dot net>
2006-11-06 13:42:10 +00:00
Pawel Jakub Dawidek
a2ca03b3ad Typo, 'from' vnode is locked here, not 'to' vnode. 2006-11-04 23:57:02 +00:00
Pawel Jakub Dawidek
1a60c7fc8e Add gjournal specific code to the UFS file system:
- Add FS_GJOURNAL flag which enables gjournal support on a file system.
- Add cg_unrefs field to the cylinder group structure which holds
  number of unreferenced (orphaned) inodes in the given cylinder group.
- Add fs_unrefs field to the super block structure which holds
  total number of unreferenced (orphaned) inodes.
- When file or a directory is orphaned (last reference is removed, but
  object is still open), increase fs_unrefs and cg_unrefs fields,
  which is a hint for fsck in which cylinder groups looks for such
  (orphaned) objects.
- When file is last closed, decrease {fs,cg}_unrefs fields.
- Add VV_DELETED vnode flag which points at orphaned objects.

Sponsored by:	home.pl
2006-10-31 21:48:54 +00:00
Robert Watson
aed5570872 Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h.  sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from:	TrustedBSD Project
Sponsored by:	SPARTA
2006-10-22 11:52:19 +00:00
Konstantin Belousov
45ea8737bf Correct the comment: numvnodes is decreased on vdestroying the vnode.
OKed by:	tegge
Approved by:	pjd (mentor)
MFC after:	1 week
2006-10-02 07:25:58 +00:00
Tor Egge
a1e363f256 Add mnt_noasync counter to better handle interleaved calls to nmount(),
sync() and sync_fsync() without losing MNT_ASYNC.  Add MNTK_ASYNC flag
which is set only when MNT_ASYNC is set and mnt_noasync is zero, and
check that flag instead of MNT_ASYNC before initiating async io.
2006-09-26 04:15:59 +00:00
Tor Egge
5da56ddb21 Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag.
This eliminates a race where MNT_UPDATE flag could be lost when nmount()
raced against sync(), sync_fsync() or quotactl().
2006-09-26 04:12:49 +00:00
Pawel Jakub Dawidek
c37789fe7e Add 'show vnode <addr>' DDB command. 2006-09-04 22:15:44 +00:00
Pawel Jakub Dawidek
04d9e255df getnewvnode() can be called with NULL mp.
Found by:	Coverity Prevent (tm)
Coverity ID:	1521
Confirmed by:	phk
2006-08-10 08:56:03 +00:00
Pawel Jakub Dawidek
13c85d339d Add a bandaid to avoid a deadlock in a situation, when we are trying to suspend
a file system, but need to obtain a vnode. We may not be able to do it, because
all vnodes could be already in use and other processes cannot release them,
because they are waiting in "suspfs" state.

In such situation, we allow to allocate a vnode anyway.

This is a temporary fix - there is no backpressure to free vnodes allocated in
those circumstances.

MFC after:	1 week
Reviewed by:	tegge
2006-08-09 12:47:30 +00:00
Robert Watson
ccdebe46bd Improve commenting of vaccess(), making sure to be clear that the ifdef
capabilities code is there for reference and never actually used.  Slight
style tweak.
2006-08-06 10:43:35 +00:00
Alan Cox
27ea29536c Enable debug.mpsafevfs by default on arm. Since every architecture except
powerpc has debug.mpsafevfs enabled by default, it is shorter to enumerate
the architectures on which debug.mpsafevfs is off.

Tested by: cognet@
2006-07-15 06:44:27 +00:00
Konstantin Belousov
c8d3bc1fa3 Back out my rev. 1.674. The better fix (rev. 1.637) is already in tree.
Approved by:	kan (mentor)
2006-07-05 16:33:25 +00:00
Sergey Babkin
d81175c738 Backed out the change by request from rwatson.
PR:		kern/14584
2006-06-26 22:03:22 +00:00
Sergey Babkin
7a799f1ef0 The common UID/GID space implementation. It has been discussed on -arch
in 1999, and there are changes to the sysctl names compared to PR,
according to that discussion. The description is in sys/conf/NOTES.
Lines in the GENERIC files are added in commented-out form.
I'll attach the test script I've used to PR.

PR:		kern/14584
Submitted by:	babkin
2006-06-25 18:37:44 +00:00
Konstantin Belousov
55aef2632f Fix the LOR that occurs when the MAC compiled into the kernel
and vnode is destroyed.

Reviewed by:	rwatson
LOR:		189
MFC after:	2 weeks
Approved by:	kan (mentor)
2006-06-08 07:55:10 +00:00
Stephan Uphoff
dcf67e65d2 Do not set B_NOCACHE on buffers when releasing them in flushbuflist().
If B_NOCACHE is set the pages of vm backed buffers will be invalidated.
However clean buffers can be backed by dirty VM pages so invalidating them
can lead to data loss.
Add support for flush dirty page in the data invalidation function
of some network file systems.

This fixes data losses during vnode recycling (and other code paths
using invalbuf(*,V_SAVE,*,*)) for data written using an mmaped file.

Collaborative effort by: jhb@,mohans@,peter@,ps@,ups@
Reviewed by:	tegge@
MFC after:	7 days
2006-05-25 01:00:35 +00:00
John Baldwin
73dbd3da73 Remove various bits of conditional Alpha code and fixup a few comments. 2006-05-12 05:04:46 +00:00
Pawel Jakub Dawidek
643df192de vn_start_write()/vn_finished_write() is not needed here, because
vn_start_write() is always called earlier in the code path and calling
the function recursively may lead to a deadlock.

Confirmed by:	tegge
MFC after:	2 weeks
2006-04-29 21:57:38 +00:00
Jeff Roberson
6ca9fcc586 - Add a BO_NEEDSGIANT flag to the bufobj. This flag forces all child
buffers to go on the buf daemon's DIRTYGIANT queue.
 - Set BO_NEEDSGIANT on ffs's devvp since the ffs_copyonwrite handler
   runs in the context of the buf daemon and may require Giant.
2006-04-28 01:05:31 +00:00
Jeff Roberson
b53bf1269c - VFS_LOCK_GIANT when recycling a vnode via getnewvnode. We may be
recycling for an unrelated filesystem.  I really don't like potentially
   acquiring giant in the context of a giantless filesystem but there
   are reasonable objections to removing the recycling from this path.

Sponsored by:	Isilon Systems, Inc.
2006-04-04 06:46:10 +00:00
Jeff Roberson
0af2472199 - Add an assert to vgone. It is illegal to call vgone without a reference
to the vnode.  Without a reference the vnode will never be vdestroy'd
   and the memory will never be reclaimed.

Sponsored by:	Isilon Systems, Inc.
2006-03-31 23:39:26 +00:00
Jeff Roberson
94bc95db3c - Hold a reference from the time vfs_busy starts until vfs_unbusy is
called.
 - vfs_getvfs has to return a reference to prevent the returned mountpoint
   from changing identities.
 - Release references acquired via vfs_getvfs.

Discussed with:	tegge
Tested by:	kris
Sponsored by:	Isilon Systems, Inc.
2006-03-31 03:53:25 +00:00
Jeff Roberson
084d64ac21 - Add the B_NEEDSGIANT flag which is only set if the vnode that owns a buf
requires Giant.  It is set in bgetvp and cleared in brelvp.
 - Create QUEUE_DIRTY_GIANT for dirty buffers that require giant.
 - In the buf daemon, only grab giant when processing QUEUE_DIRTY_GIANT and
   only if we think there are buffers in that queue.

Sponsored by:	Isilon Systems, Inc.
2006-03-31 02:56:30 +00:00
Jeff Roberson
e44270a781 - Correct an assert in vop_rename_pre. fdvp may be locked if it is either
the target directory or file.  This case should fail in the filesystem
   anyway and perhaps kern_rename() should catch it.

Sponsored by:	Isilon Systems, Inc.
2006-03-19 20:14:46 +00:00
Tor Egge
791dd2fade Use vn_start_secondary_write() and vn_finished_secondary_write() as a
replacement for vn_write_suspend_wait() to better account for secondary write
processing.

Close race where secondary writes could be started after ffs_sync() returned
but before the file system was marked as suspended.

Detect if secondary writes or softdep processing occurred during vnode sync
loop in ffs_sync() and retry the loop if needed.
2006-03-08 23:43:39 +00:00
Tor Egge
3b582b4e72 Eliminate a deadlock when creating snapshots. Blocking vn_start_write() must
be called without any vnode locks held.  Remove calls to vn_start_write() and
vn_finished_write() in vnode_pager_putpages() and add these calls before the
vnode lock is obtained to most of the callers that don't already have them.
2006-03-02 22:13:28 +00:00
Tor Egge
b983aac762 Don't try to show marker nodes. 2006-03-02 21:31:15 +00:00
Jeff Roberson
eb2ea10590 - Move softdep from using a global worklist to per-mount worklists. This
has many positive effects including improved smp locking, reducing
   interdependencies between mounts that can lead to deadlocks, etc.
 - Add the softdep worklist and various counters to the ufsmnt structure.
 - Add a mount pointer to the workitem and remove mount pointers from the
   various structures derived from the workitem as they are now redundant.
 - Remove the poor-man's semaphore protecting softdep_process_worklist and
   softdep_flushworklist.  Several threads may now process the list
   simultaneously.
 - Add softdep_waitidle() to block the thread until all pending
   dependencies being operated on by other threads have been flushed.
 - Use softdep_waitidle() in unmount and snapshots to block either
   operation until the fs is stable.
 - Remove softdep worklist processing from the syncer and move it into the
   softdep_flush() thread.  This thread processes all softdep mounts
   once each second and when it is called via the new softdep_speedup()
   when there is a resource shortage.  This removes the softdep hook
   from the kernel and various hacks in header files to support it.

Reviewed by/Discussed with:	tegge, truckman, mckusick
Tested by:	kris
2006-03-02 05:50:23 +00:00
Jeff Roberson
a1db11fc40 - Release the mount ref once the vnode has been recycled rather than once
the last reference is dropped.  I forgot that vnodes can stick around
   for a very long time until processes discover that they are dead.  This
   means that a vnode reference is not sufficient to keep the mount
   referenced and even more code will be required to ref mount points.

Discovered by:	kris
2006-02-23 05:15:37 +00:00
Jeff Roberson
8a7cd2fdfb - Grab a mnt ref in vfs_busy() before dropping the interlock. This will
prevent the mount point from going away while we're waiting on the lock.
   The ref does not need to persist once we have the lock because the
   lock prevents the mount point from being unmounted.

MFC After:	1 week
2006-02-22 06:20:12 +00:00
Jeff Roberson
04f6d3effa - Add a ref count to the mount structure. Sleep for up to 3 seconds in
vfs_mount_destroy waiting for this ref to hit 0.  We don't print an
   error if we are rebooting as the root mount always retains some refernces
   by init proc.
 - Acquire a mnt ref for every vnode allocated to a mount point.  Drop this
   ref only once vdestroy() has been called and the mount has been freed.
 - No longer NULL the v_mount pointer in delmntque() so that we may release
   the ref after vgone() has been called.  This allows us to guarantee
   that the mount point structure will be valid until the last vnode has
   lost its last ref.
 - Fix a few places that rely on checking v_mount to detect recycling.

Sponsored by:	Isilon Systems, Inc.
MFC After:	1 week
2006-02-06 10:19:50 +00:00
Jeff Roberson
b099db5881 - Solve a race where we could lose a call to VOP_INACTIVE. If vget() waiting
on a lock held the last usecount ref on a vnode and the lock failed we
   would not call INACTIVE.  Solve this by only holding a holdcnt to prevent
   the vnode from disappearing while we wait on vn_lock.  Other callers
   may now VOP_INACTIVE while we are waiting on the lock, however this race
   is acceptable, while losing INACTIVE is not.

Discussed with:	kan, pjd
Tested by:	kkenn
Sponsored by:	Isilon Systems, Inc.
MFC After:	1 week
2006-02-01 00:30:05 +00:00
Kris Kennaway
d5e5528afe Back out r1.653; it turns out that the race (or at least the printf) is
actually not hard to trigger, and it can cause a lot of console spam.

Approved by:	kan
2006-01-28 03:06:35 +00:00
Robert Watson
6be2c41a22 Convert remaining functions in vfs_subr.c from K&R prototypes to ANSI C
prototypes, as the majority of new functions added have been in this
style.  Changing prototype style now results in gcc noticing that the
implementation of vn_pollrecord() has a 'short' argument instead of
'int' as prototyped in vnode.h, so correct that definition.  In practice
this didn't matter as only poll flags in the lower 16 bits are used.

MFC after:	1 week
2006-01-21 19:42:10 +00:00
Tor Egge
82be0a5a24 Add marker vnodes to ensure that all vnodes associated with the mount point are
iterated over when using MNT_VNODE_FOREACH.

Reviewed by:	truckman
2006-01-09 20:42:19 +00:00
Pawel Jakub Dawidek
e7736557d6 Print a warning when we miss vinactive() call, because of race in vget().
The race is very real, but conditions needed for triggering it are rather
hard to meet now.
When gjournal will be committed (where it is quite easy to trigger) we need
to fix it.

For now, verify if it is really hard to trigger.

Discussed with:	kan
2005-12-29 22:52:09 +00:00
Doug White
16e35dcc39 This is a workaround for a complicated issue involving VFS cookies and devfs.
The PR and patch have the details. The ultimate fix requires architectural
changes and clarifications to the VFS API, but this will prevent the system
from panicking when someone does "ls /dev" while running in a shell under the
linuxulator.

This issue affects HEAD and RELENG_6 only.

PR:		88249
Submitted by:	"Devon H. O'Dell" <dodell@ixsystems.com>
MFC after:	3 days
2005-11-09 22:03:50 +00:00
Robert Watson
5bb84bc84b Normalize a significant number of kernel malloc type names:
- Prefer '_' to ' ', as it results in more easily parsed results in
  memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
  as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
  memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
  attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion.  Similar changes are required for UMA zone names.
2005-10-31 15:41:29 +00:00
Kris Kennaway
14cdc36456 mpsafevm has been stable and defaulted to 1 on sparc64 for over 6 months,
so we are ready for mpsafevfs=1 by default on sparc64 too.  I have been
running this on all my sparc64 machines for over 6 months, and have not
encountered MD problems.

MFC after:	1 week
2005-10-14 23:56:13 +00:00
Diomidis Spinellis
9f5c1d1955 Move execve's access time update functionality into a new
vfs_mark_atime() function, and use the new function for
performing efficient atime updates in mmap().

Reviewed by:	bde
MFC after:	2 weeks
2005-10-12 06:56:00 +00:00
Don Lewis
6c8b634f1d Un-staticize runningbufwakeup() and staticize updateproc.
Add a new private thread flag to indicate that the thread should
not sleep if runningbufspace is too large.

Set this flag on the bufdaemon and syncer threads so that they skip
the waitrunningbufspace() call in bufwrite() rather than than
checking the proc pointer vs. the known proc pointers for these two
threads.  A way of preventing these threads from being starved for
I/O but still placing limits on their outstanding I/O would be
desirable.

Set this flag in ffs_copyonwrite() to prevent bufwrite() calls from
blocking on the runningbufspace check while holding snaplk.  This
prevents snaplk from being held for an arbitrarily long period of
time if runningbufspace is high and greatly reduces the contention
for snaplk.  The disadvantage is that ffs_copyonwrite() can start
a large amount of I/O if there are a large number of snapshots,
which could cause a deadlock in other parts of the code.

Call runningbufwakeup() in ffs_copyonwrite() to decrement runningbufspace
before attempting to grab snaplk so that I/O requests waiting on
snaplk are not counted in runningbufspace as being in-progress.
Increment runningbufspace again before actually launching the
original I/O request.

Prior to the above two changes, the system could deadlock if enough
I/O requests were blocked by snaplk to prevent runningbufspace from
falling below lorunningspace and one of the bawrite() calls in
ffs_copyonwrite() blocked in waitrunningbufspace() while holding
snaplk.

See <http://www.holm.cc/stress/log/cons143.html>
2005-09-30 01:30:01 +00:00
Tor Egge
61ac14dab6 Break out of loop if next buffer pointer has become invalid while flushing
current buffer.

Reviewed by:	kan
2005-09-16 18:28:12 +00:00
Robert Watson
fd1a469ba5 In vfs_kqfilter(), return EINVAL instead of 1 (EPERM) when an unsupported
kqueue filter type is requested on a vnode.

MFC after:	3 days
2005-09-12 19:22:37 +00:00
Jung-uk Kim
9ed448b20c use monotonic time_uptime' instead of time_second'
Approved by:	anholt (mentor)
Discussed on:	arch
2005-09-12 15:31:28 +00:00
Poul-Henning Kamp
2883ba6668 Introduce vfs_read_dirent() which can help VOP_READDIR() implementations
by handling all the cookie stuff.
2005-09-12 08:46:07 +00:00
Suleiman Souhlal
a6c109d658 Fix a typo in vop_rename_pre() where we ended up using vholdl()
instead of vhold(), even though the vnode interlock is unlocked.

MFC after:	3 days
2005-08-28 23:00:11 +00:00
Don Lewis
ad9f180121 Back out the removal of LK_NOWAIT from the VOP_LOCK() call in
vlrureclaim() in vfs_subr.c 1.636  because waiting for the vnode
lock aggravates an existing race condition.  It is also undesirable
according to the commit log for 1.631.

Fix the tiny race condition that remains by rechecking the vnode
state after grabbing the vnode lock and grabbing the vnode interlock.

Fix the problem of other threads being starved (which 1.636 attempted
to fix by removing LK_NOWAIT) by calling uio_yield() periodically
in vlrureclaim().  This should be more deterministic than hoping
that VOP_LOCK() without LK_NOWAIT will block, which may not happen
in this loop.

Reviewed by:	kan
MFC after:	5 days
2005-08-23 03:44:06 +00:00
Robert Watson
6cd8dee3c5 Silence "busy" warnings when unmounting devfs at system shutdown. This
is a workaround for non-symetric teardown of the file systems at
shutdown with respect to the mount order at boot.  The proper long term
fix is to properly detach devfs from the root mount before unmounting
each, and should be implemented, but since the problem is non-harmful,
this temporary band-aid will prevent false positive bug reports and
unnecessary error output for 6.0-RELEASE.

MFC after:	3 days
Tested by:	pav, pjd
2005-08-20 17:12:47 +00:00
Marcel Moolenaar
fd65baf8e2 Make mpsafe_vfs=1 the default on ia64. 2005-08-13 20:07:50 +00:00
Alexander Kabaev
45a0d1ed7a Do not drop the vnode interlock if vdropl is called on already doomed vnode.
vdropl callers expect it to return with interlock still being held.

MFC after:	2 days
2005-08-10 11:46:03 +00:00
Suleiman Souhlal
34cc826ae8 Holding a vnode doesn't prevent v_mount from disappearing (when the
vnode is inactivated), possibly leading to a NULL dereference when
checking if the mount wants knotes to be activated in the VOP hooks.
So, we add a new vnode flag VV_NOKNOTE that is only set in getnewvnode(),
if necessary, and check it when activating knotes.
Since the flags are not erased when a vnode is being held, we can safely
read them.

Reviewed by:	kris@
MFC after:	3 days
2005-08-06 01:42:04 +00:00
Jeff Roberson
40a495853a - Unlock before we call mac_destroy_vnode to prevent a lock order reversal.
Found by:	trhodes
2005-08-03 05:36:50 +00:00
Jeff Roberson
39b2406838 - Allow vnlru to drop giant if the filesystem does not require it. The
vnlru proc is extremely inefficient, potentially iteration over tens of
   thousands of vnodes without blocking.  Droping Giant allows other threads
   to preempt us although we should revisit the algorithm to fix the runtime
   problems especially since this may hold up all vnode allocations.
 - Remove the LK_NOWAIT from the VOP_LOCK in vlrureclaim.  This provides
   a natural blocking point to help alleviate the situation described above
   although it may not technically be desirable.
 - yield after we make a pass on all mount points to prevent us from
   blocking other threads which require Giant.

MFC after:	2 weeks
2005-07-20 01:43:27 +00:00
Pawel Jakub Dawidek
c23c87bd93 Fix one "wrong b_bufobj" panic in reassignbuf() by moving VI_UNLOCK(vp)
below KASSERT()s, which means there was no real problem here, we just
needed better locking for assertions.

OK'ed by:	jeff
Approved by:	re (scottl)
2005-07-05 15:57:55 +00:00
Suleiman Souhlal
571dcd15e2 Fix the recent panics/LORs/hangs created by my kqueue commit by:
- Introducing the possibility of using locks different than mutexes
for the knlist locking. In order to do this, we add three arguments to
knlist_init() to specify the functions to use to lock, unlock and
check if the lock is owned. If these arguments are NULL, we assume
mtx_lock, mtx_unlock and mtx_owned, respectively.

- Using the vnode lock for the knlist locking, when doing kqueue operations
on a vnode. This way, we don't have to lock the vnode while holding a
mutex, in filt_vfsread.

Reviewed by:	jmg
Approved by:	re (scottl), scottl (mentor override)
Pointyhat to:	ssouhlal
Will be happy:	everyone
2005-07-01 16:28:32 +00:00
Jeff Roberson
b770ff6eb2 - Try to catch the wrong bufobj panics a little earlier. I believe they
are actually caused by a buf with both VNCLEAN and VNDIRTY set.  In
   the traces it is clear that the buf is removed from the dirty queue while
   it is actually on the clean queue which leaves the tail pointer set.
   Assert that both flags are not set in buf_vlist_add and buf_vlist_remove.

Sponsored by:	Isilon Systems, Inc.
Approved by:	re (blanket vfs)
2005-06-18 18:17:03 +00:00
Jeff Roberson
114a1006a8 - Change holdcnt use around vnode recycling. We now always keep a holdcnt
ref while we're calling vgone().  This prevents transient refs from
   re-adding us to the free list.  Previously, a vfree() triggered via
   vinvalbuf() getting rid of all of a vnode's pages could place a partially
   destructed vnode on the free list where vtryrecycle() could find it.  The
   first call to vtryrecycle would hang up on the vnode lock, but when it
   failed it would place a now dead vnode onto the free list, and another
   call to vtryrecycle() would free an already free vnode.  There were many
   complications of having a zero ref count while freeing which can now go
   away.
 - Change vdropl() to release the interlock before returning.  All callers
   now respect this, so vdropl() directly frees VI_DOOMED vnodes once the
   last ref is dropped.  This means that we'll never have VI_DOOMED vnodes
   on the free list.
 - Seperate v_incr_usecount() into v_incr_usecount(), v_decr_usecount() and
   v_decr_useonly().  The incr/decr split is so that incr usecount can
   return with the interlock still held while decr drops the interlock so
   it can call vdropl() which will potentially free the vnode.  The calling
   function can't drop the lock of an already free'd node.  v_decr_useonly()
   drops a usecount without droping the hold count.  This is done so the
   usecount reaches zero in vput() before we recycle, however the holdcount
   is still 1 which prevents any new references from placing the vnode
   back on the free list.
 - Fix vnlrureclaim() to vhold the vnode since it doesn't do a vget().  We
   wouldn't want vnlrureclaim() to bump the usecount since this has
   different semantics.  Also change vnlrureclaim() to do a NOWAIT on the
   vn_lock.  When this function runs we're usually in a desperate situation
   and we wouldn't want to wait for any specific vnode to be released.
 - Fix a bunch of misc comments to reflect the new behavior.
 - Add vhold() and vdrop() to vflush() for the same reasons that we do in
   vlrureclaim().  Previously we held no reference and a vnode could have
   been freed while we were waiting on the lock.
 - Get rid of vlruvp() and vfreehead().  Neither are used.  vlruvp() should
   really be rethought before it's reintroduced.
 - vgonel() always returns with the vnode locked now and never puts the
   vnode back on a free list.  The vnode will be freed as soon as the last
   reference is released.

Sponsored by:	Isilon Systems, Inc.
Debugging help from:	Kris Kennaway, Peter Holm
Approved by:	re (blanket vfs)
2005-06-16 04:41:42 +00:00
Jeff Roberson
12c2dcde40 - In reassignbuf() add many asserts to validate the head and tail pointers
of the clean and dirty lists.  This is in an attempt to catch the wrong
   bufobj problem sooner.
 - In vgonel() don't acquire an extra reference in the active case, the
   vnode lock and VI_DOOMED protect us from recursively cleaning.
 - Also in vgonel() clean up some stale comments.

Sponsored by:	Isilon Systems, Inc.
Approved by:	re (blanket vfs)
2005-06-14 20:31:53 +00:00
Jeff Roberson
b930d85380 - Don't make vgonel() globally visible, we want to change its prototype
anyway and it's not used outside of vfs_subr.c.
 - Change vgonel() to accept a parameter which determines whether or not
   we'll put the vnode on the free list when we're done.
 - Use the new vgonel() parameter rather than VI_DOOMED to signal our
   intentions in vtryrecycle().
 - In vgonel() return if VI_DOOMED is already set, this vnode has already
   been reclaimed.

Sponsored by:	Isilon Systems, Inc.
2005-06-13 06:26:55 +00:00
Jeff Roberson
d2ad9baac0 - Add KTR_VFS events to vdestroy, vtruncbuf, vinvalbuf, vfreehead.
Sponsored by:	Isilon Systems, Inc.
2005-06-13 00:46:37 +00:00
Jeff Roberson
d6dbf760a6 - Assert that we're not in the name cache anymore in vdestroy().
Sponsored by:	Isilon Systems, Inc.
2005-06-11 08:48:09 +00:00
Jeff Roberson
9aa0eba464 - Add KTR_VFS tracing to track the life of vnodes. Eventually KTR_VFS
events could be added to cover other interesting details.
 - Add some VNASSERTs to discover places where we access vnodes after
   they have been uma_zfree'd before we try to free them again.
 - Add a few more VNASSERTs to vdestroy() to be certain that the vnode is
   really unused.

Sponsored by:	Isilon Systems, Inc.
2005-06-11 01:16:46 +00:00
Suleiman Souhlal
679985d03a Allow EVFILT_VNODE events to work on every filesystem type, not just
UFS by:
- Making the pre and post hooks for the VOP functions work even when
DEBUG_VFS_LOCKS is not defined.
- Moving the KNOTE activations into the corresponding VOP hooks.
- Creating a MNTK_NOKNOTE flag for the mnt_kern_flag field of struct
mount that permits filesystems to disable the new behavior.
- Creating a default VOP_KQFILTER function: vfs_kqfilter()

My benchmarks have not revealed any performance degradation.

Reviewed by:	jeff, bde
Approved by:	rwatson, jmg (kqueue changes), grehan (mentor)
2005-06-09 20:20:31 +00:00
Jeff Roberson
fae89dce3e - Clear OWEINACT prior to calling VOP_INACTIVE to remove the possibility
of a vget causing another call to INACTIVE before we're finished.
2005-06-07 22:05:32 +00:00
Colin Percival
fd94099ec2 If we are going to
1. Copy a NULL-terminated string into a fixed-length buffer, and
2. copyout that buffer to userland,
we really ought to
0. Zero the entire buffer
first.

Security: FreeBSD-SA-05:08.kmem
2005-05-06 02:50:00 +00:00
Jeff Roberson
059f090fa1 - A vnode may have made its way onto the free list while it was being
vgone'd.  We must remove it from the freelist before returning in
   vtryrecycle() or we may get a duplicate free.

Reported by:	kkenn
2005-05-03 10:56:00 +00:00
Christian S.J. Peron
02fe1744f1 Since it is not possible for curthread to be NULL in this context,
drop the check+initialization for a straight initialization. Also
assert that curthread will never be NULL just to be sure.

Discussed with:	rwatson, peter
MFC after:	1 week
2005-05-02 02:07:55 +00:00
Jeff Roberson
b2e2166483 - All buffers should either be clean or dirty. If neither of these flags
are set when we attempt to remove a buffer from a queue we should panic.
   Hopefully this will catch the source of the wrong bufobj panics.

Sponsored by:	Isilon Systems, Inc.
2005-05-01 12:00:36 +00:00
Jeff Roberson
b2183bfe05 - In vnlru_free() remove the vnode from the free list before we call
vtryrecycle().  We could sometimes get into situations where two threads
   could try to recycle the same vnode before this.
 - vtryrecycle() is now responsible for returning the vnode to the free list
   if it fails and someone else hasn't done it.
 - Make a new function vfreehead() which moves a vnode to the head of the
   free list and use it in vgone() to clean up that code a bit.

Sponsored by:	Isilon Systems, Inc.
Reported by:	pho, kkenn
2005-04-30 11:22:40 +00:00
Jeff Roberson
0dd02d67eb - Don't vgonel() via vgone() or vrecycle() if the vnode is already doomed.
This fixes forced unmounts via nullfs.

Reported by:	kkenn
Sponsored by:	Isilon Systems, Inc.
2005-04-27 10:03:21 +00:00
Jeff Roberson
6c317bc4cf - Stop setting vxthread, we've asserted that it was useless for several
weeks now.
2005-04-27 09:17:33 +00:00
Jeff Roberson
7d60dc524b - Disable code which allows getnewvnode() to fail. Many ffs_vget() callers
do not correctly deal with failures.  This presently risks deadlock
   problems if dependency processing is held up by failures to allocate
   a vnode, however, this is better than the situation with the failures.

Sponsored by:	Isilon Systems, Inc.
2005-04-22 00:57:05 +00:00
Poul-Henning Kamp
bdb3564638 Initialize mountlist_mtx with an MTX_SYSINIT(), we need it to be ready
earlier.
2005-04-18 21:11:47 +00:00
Jeff Roberson
374df05fd3 - Change vop_lookup_post assertions to reflect recent vfs_lookup changes.
Sponsored by:	Isilon Systems, Inc.
2005-04-13 10:57:53 +00:00
Jeff Roberson
539de9eda0 - Enable ASSERT_VOP_ELOCKED and assert_vop_elocked() now that vnode_if.awk
uses it.

Sponsored by:	Isilon Systems, Inc.
2005-04-11 15:17:06 +00:00
Jeff Roberson
070898b1b3 - Change the VOP_LOCK UPGRADE in vput() to do a LK_NOWAIT to avoid a
potential lock order reversal.  Also, don't unlock the vnode if this
   fails, lockmgr has already unlocked it for us.
 - Restructure vget() now that vn_lock() does all of VI_DOOMED checking
   for us and also handles the case where there is no real lock type.
 - If VI_OWEINACT is set, we need to upgrade the lock request to EXCLUSIVE
   so that we can call inactive.  It's not legal to vget a vnode that hasn't
   had INACTIVE called yet.

Sponsored by:	Isilon Systems, Inc.
2005-04-11 09:28:32 +00:00
Jeff Roberson
d78e0ee9fd - Assert that the bufobj matches in flushbuflists. I still haven't gotten
to root cause on exactly how this happens.
 - If the assert is disabled, we presently try to handle this case, but the
   BUF_UNLOCK was missing.  Thus, if this condition ever hit we would leak
   a buf lock.

Many thanks to Peter Holm for all his help in finding this bug.  He really
put more effort into it than I did.
2005-04-06 06:49:46 +00:00
Jeff Roberson
2bbd6c9818 - Move NDFREE() from vfs_subr to vfs_lookup where namei() is. 2005-04-05 08:58:49 +00:00
Jeff Roberson
d1cc6041e6 - Add a missing unlock of the vnode_free_list_mtx.
Spotted by:	Antoine Brodin
2005-04-04 12:07:16 +00:00
Jeff Roberson
92b8231d4f - Instead of waiting forever to get a vnode in getnewvnode() wait for
one to become available for one second and then return ENFILE.  We
   can run out of vnodes, and there must be a hard limit because without
   one we can quickly run out of KVA on x86.  Presently the system can
   deadlock if there are maxvnodes directories in the namecache.  The
   original 4.x BSD behavior was to return ENFILE if we reached the max,
   but 4.x BSD did not have the vnlru proc so it was less profitable to
   wait.
2005-04-04 11:43:44 +00:00
Jeff Roberson
e451d879a1 - Disable vfs shared locks by default. They must be specifically enabled
on filesystems which safely support them.  It appears that many
   network filesystems specifically are not shared lock safe.

Sponsored by:	Isilon Systems, Inc.
2005-03-31 05:22:45 +00:00
Jeff Roberson
f247a5240d - LK_NOPAUSE is a nop now.
Sponsored by:   Isilon Systems, Inc.
2005-03-31 04:37:09 +00:00
David Schultz
7ce7f713ee Eliminate v_id and v_ddid. The name cache now holds references to
vnodes whose names it caches, so we no longer need a `generation
number' to tell us if a referenced vnode is invalid.  Replace the use
of the parent's v_id in the hash function with the address of the
parent vnode.

Tested by:	Peter Holm
Glanced at by:	jeff, phk
2005-03-30 03:01:36 +00:00
Jeff Roberson
0fbc3b7df0 - Dont clear OWEINACT in vbusy(), we still owe an inactive call if someone
vhold()s us.
 - Avoid an extra mutex acquire and release in the common case of vgonel()
   by checking for OWEINACT at the start of the function.
 - Fix the case where we set OWEINACT in vput().  LK_EXCLUPGRADE drops our
   shared lock if it fails.

Sponsored by:	Isilon Systems, Inc.
2005-03-29 10:02:48 +00:00
Jeff Roberson
cb34b95ba4 - Don't initial v_dd here, let cache_purge() do it for us.
Sponsored by:	Isilon Systems, Inc.
2005-03-29 09:59:34 +00:00
Jeff Roberson
9dcc5da318 - Move code that should probably be an assert above the main body of
vrele so that we can decrease the indentation of the real work and
   make things slightly more clear.

Sponsored by:	Isilon Systems, Inc.
2005-03-28 11:18:47 +00:00
Jeff Roberson
d36f0a4ff8 - Adjust asserts in vop_lookup_post() to match the new post PDIRUNLOCK
vfs.

Sponsored by:	Isilon Systems, Inc.
2005-03-28 09:25:25 +00:00
Poul-Henning Kamp
3b73a3c079 Remove another ';' after if().
Also spotted by:	bz
2005-03-27 07:53:13 +00:00
Poul-Henning Kamp
2d8dfb2836 Remove extra ; at end of if().
Found by:	bz
2005-03-27 07:52:12 +00:00
Jeff Roberson
228ea9d212 - Don't recycle vnodes anymore. Free them once they are dead. getnewvnode
now always allocates a new vnode.
 - Define a new function, vnlru_free, which frees vnodes from the free list.
   It takes as a parameter the number of vnodes to free, which is
   wantfreevnodes - freevnodes when called from vnlru_proc or 1 when
   called from getnewvnode().  For now, getnewvnode() still tries to reclaim
   a free vnode before creating a new one when we are near the limit.
 - Define a function, vdestroy, which handles the actual release of memory
   and teardown of locks, etc.  This could become a uma_dtor() routine.
 - Get rid of minvnodes.  Now wantfreevnodes is 1/4th the max vnodes.  This
   keeps more unreferenced vnodes around so that files which have only
   been stat'd are less likely to be kicked out of the system before we
   have a chance to read them, etc.  These vnodes may still be freed via
   the normal vnlru_proc() routines which may some day become a real lru.
2005-03-25 05:34:39 +00:00
Jeff Roberson
d830f82824 - Pass LK_EXCLUSIVE to VFS_ROOT() to satisfy the new flags argument. For
now, all calls to VFS_ROOT() should still acquire exclusive locks.

Sponsored by:	Isilon Systems, Inc.
2005-03-24 07:31:38 +00:00
Jeff Roberson
c167961e27 - If vput() is called with a shared lock it must upgrade to an exclusive
before it can call VOP_INACTIVE().  This must use the EXCLUPGRADE path
   because we may violate some lock order with another locked vnode if
   we drop and reacquire the lock.  If EXCLUPGRADE fails, we mark the
   vnode with VI_OWEINACT.  This case should be very rare.
 - Clear VI_OWEINACT in vinactive() and vbusy().
 - If VI_OWEINACT is set in vgone() do the VOP_INACTIVE call here as well.

Sponsored by:	Isilon Systems, Inc.
2005-03-24 06:08:58 +00:00
Jeff Roberson
b172f6c5f9 - Now that there are no external users of vfree() make it static.
- Move VSHOULDBUSY, VSHOULDFREE, and VTRYRECYCLE into vfs_subr.c so
   no one else attempts to grow a dependency on them.
 - Now that objects with pages hold the vnode we don't have to do unlocked
   checks for the page count in the vm object in VSHOULDFREE.  These three
   macros could simply check for holdcnt state transitions to determine
   whether the vnode is on the free list already, but the extra safety
   the flag affords us is probably worth the minimal cost.
 - The leafonly sysctl and code have been dead for several years now,
   remove the sysctl and the code that employed it from vtryrecycle().
 - vtryrecycle() also no longer has to check the object's page count as
   the object holds the vnode until it reaches 0.

Sponsored by:	Isilon Systems, Inc.
2005-03-15 14:38:16 +00:00
Jeff Roberson
c178628d6e - Expose vholdl() so it may be used outside of vfs_subr.c 2005-03-15 13:43:10 +00:00
Jeff Roberson
8045557f2b - Increment the holdcnt once for each usecount reference. This allows us
to use only the holdcnt to determine whether a vnode may be recycled,
   simplifying the V* macros as well as vtryrecycle(), etc.

Sponsored by:	Isilon Systems, Inc.
2005-03-14 09:25:19 +00:00
Jeff Roberson
159b454819 - We do not have to check the object's ref_count in VSHOULDFREE or
vtryrecycle().  All obj refs also ref the vnode.
 - Consistently use v_incr_usecount() to increment the usecount.  This will
   be more important later.

Sponsored by:	Isilon Systems, Inc.
2005-03-14 08:30:31 +00:00
Jeff Roberson
8f13a540ed - Slightly rearrange vrele() to move the common case in one indentation
level.

Sponsored by:	Isilon Systems, Inc.
2005-03-14 07:16:55 +00:00
Jeff Roberson
6fc16a838c - Rework vget() so we drop the usecount in two failure cases that were
missed by my last commit.

Sponsored by:	Isilon Systems, Inc.
2005-03-14 07:11:19 +00:00
Jeff Roberson
6703c30bb5 - Remove vx_lock, vx_unlock, vx_wait, etc.
- Add a vn_start_write/vn_finished_write around vlrureclaim so we don't do
   writing ops without suspending.  This could suspend the vlruproc which
   should not be a problem under normal circumstances.
 - Manually implement VMIGHTFREE in vlrureclaim as this was the only instance
   where it was used.
 - Acquire a lock before calling vgone() as it now requires it.
 - Move the acquisition of the vnode interlock from vtryrecycle() to
   getnewvnode() so that if it fails we don't drop and reacquire the
   vnode_free_list_mtx.
 - Check for a usecount or holdcount at the end of vtryrecycle() in case
   someone grabbed a ref while we were recycling.  Abort the recycle, and
   on the final ref drop this vnode will be placed on the head of the free
   list.
 - Move the redundant VOP_INACTIVE protection code into the local
   vinactive() routine to avoid code bloat.
 - Keep the vnode lock held across calls to vgone() in several places.
 - vgonel() no longer uses XLOCK, instead callers must hold an exclusive
   vnode lock.  The VI_DOOMED flag is set to allow other threads to detect
   a vnode which is no longer valid.  This flag is set until the last
   reference is gone, and there are no chances for a new ref.  vgonel()
   holds this lock across the entire function, which greatly simplifies
   logic.
 _ Only vfree() in one place in vgone() not three.
 - Adjust vget() to check the VI_DOOMED flag prior to waiting on the lock
   in the LK_NOWAIT case.  In other cases, check after we have slept and
   acquired an exlusive lock.  This will simulate the old vx_wait()
   behavior.

Sponsored by:	Isilon Systems, Inc.
2005-03-13 11:54:28 +00:00
Jeff Roberson
d9a9c2c22c - Enable SMP VFS by default on current. More users are needed to turn up
any remaining bugs.  Anyone inconvenienced by this can still disable it
   in the loader.

Sponsored by:	Isilon Systems, Inc.
2005-02-23 10:05:43 +00:00
Jeff Roberson
d8a7c99a1c - Only the xlock holder should be calling VOP_LOCK on a vp once VI_XLOCK
has been set.  Assert that this is the case so that we catch filesystems
   who are using naked VOP_LOCKs in illegal cases.

Sponsored by:	Isilon Systems, Inc.
2005-02-23 00:11:14 +00:00
Jeff Roberson
4c11620bb9 - Add a check for xlock in vop_lock_assert. Presently the xlock is
considered to be as good as an exclusive lock, although there is still a
   possibility of someone acquiring a VOP LOCK while xlock is held.

Sponsored by:	Isilon Systems, Inc.
2005-02-22 23:59:11 +00:00
Poul-Henning Kamp
767056c0e8 Zero the v_un container field to make sure everything is gone. 2005-02-22 18:56:18 +00:00
Poul-Henning Kamp
aa2f6ddc3f Reap more benefits from DEVFS:
List devfs_dirents rather than vnodes off their shared struct cdev, this
saves a pointer field in the vnode at the expense of a field in the
devfs_dirent.  There are often 100 times more vnodes so this is bargain.
In addition it makes it harder for people to try to do stypid things like
"finding the vnode from cdev".

Since DEVFS handles all VCHR nodes now, we can do the vnode related
cleanup in devfs_reclaim() instead of in dev_rel() and vgonel().
Similarly, we can do the struct cdev related cleanup in dev_rel()
instead of devfs_reclaim().

	rename idestroy_dev() to destroy_devl() for consistency.

	Add LIST_ENTRY de_alias to struct devfs_dirent.
	Remove v_specnext from struct vnode.
	Change si_hlist to si_alist in struct cdev.
	String new devfs vnodes' devfs_dirent on si_alist when
	we create them and take them off in devfs_reclaim().

	Fix devfs_revoke() accordingly.  Also don't clear fields
	devfs_reclaim() will clear when called from vgone();

	Let devfs_reclaim() call dev_rel() instead of vgonel().

	Move the usecount tracking from dev_rel() to devfs_reclaim(),
	and let dev_rel() take a struct cdev argument instead of vnode.

	Destroy SI_CHEAPCLONE devices in dev_rel() (instead of
	devfs_reclaim()) when they are no longer used.   (This
	should maybe happen in devfs_close() instead.)
2005-02-22 15:51:07 +00:00
Poul-Henning Kamp
7fc940b266 Remove vfinddev(), it is generally bogus when faced with jails and
chroot and has no legitimate use(r)s in the tree.
2005-02-22 14:11:47 +00:00
Poul-Henning Kamp
dfd4be14bd Try to unbreak the vnode locking around vop_reclaim() (based mostly on
patch from kan@).

Pull bufobj_invalbuf() out of vinvalbuf() and make g_vfs call it on
close.  This is not yet a generally safe function, but for this very
specific use it is safe.  This solves the problem with buffers not
being flushed by unmount or after failed mount attempts.
2005-02-19 11:44:57 +00:00
Poul-Henning Kamp
900b7e2648 Make sure to drop the VI_LOCK in vgonel();
Spotted by: Taku YAMAMOTO <taku@tackymt.homeip.net>
2005-02-18 11:13:56 +00:00
Poul-Henning Kamp
4d8ac58b05 Introduce vx_wait{l}() and use it instead of home-rolled versions. 2005-02-17 10:49:51 +00:00
Poul-Henning Kamp
58aac12894 Convert KASSERTS to VNASSERTS 2005-02-17 10:28:58 +00:00
Poul-Henning Kamp
1ba212823f Make various vnode related functions static 2005-02-10 12:28:58 +00:00
Poul-Henning Kamp
fe0198779c Don't pass NULL to vprint() 2005-02-10 08:55:08 +00:00
Jeff Roberson
68f2274d97 - Add a new assert in the getnewvnode(). Assert that the usecount is still
0 to detect getnewvnode() races.
 - Add the vnode address to a few panics near by to help in debugging.

Sponsored by:	Isilon Systems, Inc.
2005-02-08 23:27:10 +00:00
Poul-Henning Kamp
b9489a449c Access vmobject via the bufobj instead of the vnode 2005-02-07 10:04:06 +00:00
Poul-Henning Kamp
b348abd6cd Don't call VOP_DESTROYVOBJECT(), trust that VOP_RECLAIM() did what
was necessary.
2005-02-07 07:48:03 +00:00
Poul-Henning Kamp
d4eb29ba71 Remove unused argument to vrecycle() 2005-01-28 13:08:21 +00:00
Poul-Henning Kamp
1fdfaafb08 Integrate vclean() into vgonel().
Various associated polishing.
2005-01-28 13:00:03 +00:00
Poul-Henning Kamp
3fc8dd0653 Remove register keyword 2005-01-28 12:39:10 +00:00
Poul-Henning Kamp
8516dd18e1 Don't use VOP_GETVOBJECT, use vp->v_object directly. 2005-01-25 00:40:01 +00:00
Poul-Henning Kamp
b5b6ec5faa Eliminate the constant flags argument to vclean() 2005-01-24 22:22:02 +00:00
Poul-Henning Kamp
7c93282e42 Change vprint() to vn_printf() which takes varargs.
Add #define for vprint() to call vn_printf().
2005-01-24 13:58:08 +00:00
Poul-Henning Kamp
35764be39e Kill the VV_OBJBUF and test the v_object for NULL instead. 2005-01-24 13:13:57 +00:00
Jeff Roberson
d1fcf3bb31 - Add the tunable and sysctl for the mpsafevfs. It currently defaults
to off.
 - Protect access to mnt_kern_flag with the mointpoint mutex.
 - Remove some KASSERTs which are not legal checks without the appropriate
   locks held.
 - Use VCANRECYCLE() rather than rolling several slightly different
   checks together.
 - Return from vtryrecycle() with a recycled vnode rather than a locked
   vnode.  This simplifies some locking.
 - Remove several GIANT_REQUIRED lines.
 - Add a few KASSERTs to help with INACT debugging.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:41:01 +00:00
Poul-Henning Kamp
7bf38aeae7 Fix a bug I introduced in 1.561 which has caused considerable filesystem
unhappiness lately.

As far as I can tell, no files that have made it safely to disk
have been endangered, but stuff in transit has been in peril.

Pointy hat:	phk
2005-01-16 21:09:39 +00:00
Poul-Henning Kamp
7c0745eeae Eliminate unused and unnecessary "cred" argument from vinvalbuf() 2005-01-14 07:33:51 +00:00
Poul-Henning Kamp
e39db32ab0 Ditch vfs_object_create() and make the callers call VOP_CREATEVOBJECT()
directly.
2005-01-13 12:25:19 +00:00
Poul-Henning Kamp
6ef8480a88 Add BO_SYNC() and add a default which uses the secret vnode pointer
and VOP_FSYNC() for now.
2005-01-11 10:43:08 +00:00
Poul-Henning Kamp
6afa350d53 More vnode -> bufobj migration. 2005-01-11 10:16:39 +00:00
Poul-Henning Kamp
8d785753bd Give flushbuflist() a struct bufv as first argument and avoid home-rolling
TAILQ_FOREACH_SAFE().

Loose the error pointer argument and return any errors the normal way.

Return EAGAIN for the case where more work needs to be done.
2005-01-11 10:01:54 +00:00
Poul-Henning Kamp
8df6bac4c7 Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC().
I'm not sure why a credential was added to these in the first place, it is
not used anywhere and it doesn't make much sense:

	The credentials for syncing a file (ability to write to the
	file) should be checked at the system call level.

	Credentials for syncing one or more filesystems ("none")
	should be checked at the system call level as well.

	If the filesystem implementation needs a particular credential
	to carry out the syncing it would logically have to the
	cached mount credential, or a credential cached along with
	any delayed write data.

Discussed with:	rwatson
2005-01-11 07:36:22 +00:00
Warner Losh
9454b2d864 /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
Poul-Henning Kamp
0b3e4fe239 Since we do not support forceful unmount of DEVFS we can do away with
the partially implemented vnode-readoption code in vgonechrl().
2005-01-04 08:49:14 +00:00
Poul-Henning Kamp
e87047b437 We can only ever get to vgonechrl() from a devfs vnode, so we do not
need to reassign the vp->v_op to devfs_specops, we know that is the
value already.

Make devfs_specops private to devfs.
2004-12-20 21:34:29 +00:00
Poul-Henning Kamp
20a92a18f1 The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly
split the conversion of the remaining three filesystems out from the root
mounting changes, so in one go:

cd9660:
	Convert to nmount.
	Add omount compat shims.
	Remove dedicated rootfs mounting code.
	Use vfs_mountedfrom()
	Rely on vfs_mount.c calling VFS_STATFS()

nfs(client):
	Convert to nmount (the simple way, mount_nfs(8) is still necessary).
	Add omount compat shims.
	Drop COMPAT_PRELITE2 mount arg compatibility.

ffs:
	Convert to nmount.
	Add omount compat shims.
	Remove dedicated rootfs mounting code.
	Use vfs_mountedfrom()
	Rely on vfs_mount.c calling VFS_STATFS()

Remove vfs_omount() method, all filesystems are now converted.

Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem
task, and they all do it now.

Change rootmounting to use DEVFS trampoline:

vfs_mount.c:
	Mount devfs on /.  Devfs needs no 'from' so this is clean.
	symlink /dev to /.  This makes it possible to lookup /dev/foo.
	Mount "real" root filesystem on /.
	Surgically move the devfs mountpoint from under the real root
	filesystem onto /dev in the real root filesystem.

Remove now unnecessary getdiskbyname().

kern_init.c:
	Don't do devfs mounting and rootvnode assignment here, it was
	already handled by vfs_mount.c.

Remove now unused bdevvp(), addaliasu() and addalias().  Put the
few necessary lines in devfs where they belong.  This eliminates the
second-last source of bogo vnodes, leaving only the lemming-syncer.

Remove rootdev variable, it doesn't give meaning in a global context and
was not trustworth anyway.  Correct information is provided by
statfs(/).
2004-12-07 08:15:41 +00:00
Poul-Henning Kamp
f76fedd20b Improve vprint() a little bit: break long lines, reduce indent and tell
if the VI_LOCK() is held.
2004-12-03 12:09:34 +00:00
Poul-Henning Kamp
aec0fb7b40 Back when VOP_* was introduced, we did not have new-style struct
initializations but we did have lofty goals and big ideals.

Adjust to more contemporary circumstances and gain type checking.

	Replace the entire vop_t frobbing thing with properly typed
	structures.  The only casualty is that we can not add a new
	VOP_ method with a loadable module.  History has not given
	us reason to belive this would ever be feasible in the the
	first place.

	Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc.

	Give coda correct prototypes and function definitions for
	all vop_()s.

	Generate a bit more data from the vnode_if.src file:  a
	struct vop_vector and protype typedefs for all vop methods.

	Add a new vop_bypass() and make vop_default be a pointer
	to another struct vop_vector.

	Remove a lot of vfs_init since vop_vector is ready to use
	from the compiler.

	Cast various vop_mumble() to void * with uppercase name,
	for instance VOP_PANIC, VOP_NULL etc.

	Implement VCALL() by making vdesc_offset the offsetof() the
	relevant function pointer in vop_vector.  This is disgusting
	but since the code is generated by a script comparatively
	safe.  The alternative for nullfs etc. would be much worse.

	Fix up all vnode method vectors to remove casts so they
	become typesafe.  (The bulk of this is generated by scripts)
2004-12-01 23:16:38 +00:00
Poul-Henning Kamp
a752aa8f17 Move pbgetvp() and pbrelvp() to vm_pager.c with the rest of the pbuf stuff. 2004-11-15 08:12:50 +00:00
Poul-Henning Kamp
11bcbee11b Move the bit of the syncer which deals with vnodes into a separate
function.
2004-11-14 15:24:38 +00:00
Poul-Henning Kamp
db442506db Eliminate vop_revoke() function now that devfs_revoke() does the entire job. 2004-11-13 23:38:13 +00:00
Poul-Henning Kamp
c5b846fe8e Slim vnodes by another four bytes by eliminating the (now) unused field
v_cachedid.
2004-11-10 07:31:06 +00:00
Poul-Henning Kamp
c13a4e8820 Remove vn_todev() 2004-11-10 07:17:28 +00:00
Poul-Henning Kamp
b797084e48 Remove vnode->v_cachedfs.
It was only used for the highly dangerous "export all vnodes with a sysctl"
function.
2004-11-09 22:51:03 +00:00
Poul-Henning Kamp
c569065139 Remove buf->b_dev field. 2004-11-04 07:59:57 +00:00
Poul-Henning Kamp
e0b687d33b Always initialize bo_private along with bo_ops in getnewvnode().
Spotted by:	tegge
2004-11-03 21:09:23 +00:00
Poul-Henning Kamp
996b2c82ca Loose vfs_mountedon() 2004-10-29 11:15:08 +00:00
Poul-Henning Kamp
e1f355fe4e Give the bufobj a private __bo_vnode for now to keep the syncer floating [1]
At some point later the syncer will unlearn about vnodes and the filesystems
method called by the syncer will know enough about what's in bo_private to
do the right thing.

[1] Ok, I know, but I couldn't resist the pun.
2004-10-29 09:33:32 +00:00
Poul-Henning Kamp
20eba72f53 Move the syncer linkage from vnode to bufobj.
This is not quite a perfect separation: the syncer still think it knows
that everything is a vnode.
2004-10-27 08:05:02 +00:00
Poul-Henning Kamp
5d9d81e7ea Put the I/O block size in bufobj->bo_bsize.
We keep si_bsize_phys around for now as that is the simplest way to pull
the number out of disk device drivers in devfs_open().  The correct solution
would be to do an ioctl(DIOCGSECTORSIZE), but the point is probably mooth
when filesystems sit on GEOM, so don't bother for now.
2004-10-26 07:39:12 +00:00
Poul-Henning Kamp
156cb26583 Loose the v_dirty* and v_clean* alias macros.
Check the count field where we just want to know the full/empty state,
rather than using TAILQ_EMPTY() or TAILQ_FIRST().
2004-10-25 09:14:03 +00:00
Poul-Henning Kamp
ee1d0eb330 Remove vnode->v_bsize. This was a dead-end. 2004-10-25 07:50:59 +00:00
Poul-Henning Kamp
4dcd0ac4cf Collapse vnode->v_object and buf->b_object into bufobj->bo_object. 2004-10-25 06:02:57 +00:00
Poul-Henning Kamp
b792bebeea Move the buffer method vector (buf->b_op) to the bufobj.
Extend it with a strategy method.

Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY
song and dance.

Rename ibwrite to bufwrite().

Move the two NFS buf_ops to more sensible places, add bufstrategy
to them.

Add inlines for bwrite() and bstrategy() which calls through
buf->b_bufobj->b_ops->b_{write,strategy}().

Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().
2004-10-24 20:03:41 +00:00
Robert Watson
b0e86f6ac2 When MAC is enabled, warn if getnewvnode() is asked to produce a vnode
without a mountpoint.  In this scenario, there's no useful source for
a label on the vnode, since we can't query the mountpoint for the
labeling strategy or default label.
2004-10-22 11:04:58 +00:00
Poul-Henning Kamp
ff7c5a4880 Alas, poor SPECFS! -- I knew him, Horatio; A filesystem of infinite
jest, of most excellent fancy: he hath taught me lessons a thousand
times; and now, how abhorred in my imagination it is! my gorge rises
at it.  Here were those hacks that I have curs'd I know not how
oft.  Where be your kludges now? your workarounds? your layering
violations, that were wont to set the table on a roar?

Move the skeleton of specfs into devfs where it now belongs and
bury the rest.
2004-10-22 09:59:37 +00:00
Poul-Henning Kamp
494eb176e7 Add b_bufobj to struct buf which eventually will eliminate the need for b_vp.
Initialize b_bufobj for all buffers.

Make incore() and gbincore() take a bufobj instead of a vnode.

Make inmem() local to vfs_bio.c

Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj)
also VI_MTX() to BO_MTX(),

Make buf_vlist_add() take a bufobj instead of a vnode.

Eliminate other uses of bp->b_vp where bp->b_bufobj will do.

Various minor polishing: remove "register", turn panic into KASSERT,
use new function declarations, TAILQ_FOREACH_SAFE() etc.
2004-10-22 08:47:20 +00:00
Poul-Henning Kamp
a76d8f4ec9 Move the VI_BWAIT flag into no bo_flag element of bufobj and call it BO_WWAIT
Add bufobj_wref(), bufobj_wdrop() and bufobj_wwait() to handle the write
count on a bufobj.  Bufobj_wdrop() replaces vwakeup().

Use these functions all relevant places except in ffs_softdep.c where
the use if interlocked_sleep() makes this impossible.

Rename b_vnbufs to b_bobufs now that we touch all the relevant files anyway.
2004-10-21 15:53:54 +00:00
Poul-Henning Kamp
1bca607b9f Add BO_* macros parallel to VI_* macros for manipulating the bo_mtx.
Initialize the bo_mtx when we allocate a vnode i getnewvnode() For
now we point to the vnodes interlock mutex, that retains the exact
same locking sematics.

Move v_numoutput from vnode to bufobj.  Add renaming macro to
postpone code sweep.
2004-10-21 14:42:31 +00:00
Poul-Henning Kamp
67647b2312 Polish vtruncbuf() to improve readability and style a bit. 2004-10-21 14:13:54 +00:00
Poul-Henning Kamp
e163395619 Simplify buf_vlist_remove().
Now that we have encapsulated the splaytree related information
into a structure we can eliminate the half of this function.
2004-10-21 13:48:50 +00:00
Greg Lehey
57259f2864 vtryrecycle: Don't rely on type VBAD alone to mean that we don't need
to clean the vnode.  If v_data is set, we still need to
	     clean it.  This code change should catch all incidents of
	     the previous commit (INVARIANTS only).
2004-10-06 02:09:59 +00:00
Greg Lehey
f2154b33d2 getnewvnode: Weaken the panic "cleaned vnode isn't" to a warning.
Discussion: this panic (or waning) only occurs when the kernel is
  compiled with INVARIANTS.  Otherwise the problem (which means that
  the vp->v_data field isn't NULL, and represents a coding error and
  possibly a memory leak) is silently ignored by setting it to NULL
  later on.

  Panicking here isn't very helpful: by this time, we can only find
  the symptoms.  The panic occurs long after the reason for "not
  cleaning" has been forgotten; in the case in point, it was the
  result of severe file system corruption which left the v_type field
  set to VBAD.  That issue will be addressed by a separate commit.
2004-10-06 02:06:11 +00:00
Poul-Henning Kamp
ba2851254f Fix a LOR relating to freeing cdevs. 2004-10-01 06:33:39 +00:00
Poul-Henning Kamp
70526ca6a5 Hold dev_lock and check for NULL devsw pointer when we determine
if a vnode is a disk.
2004-09-24 06:16:08 +00:00
Poul-Henning Kamp
a0e78d2eb0 Do not refcount the cdevsw, but rather maintain a cdev->si_threadcount
of the number of threads which are inside whatever is behind the
cdevsw for this particular cdev.

Make the device mutex visible through dev_lock() and dev_unlock().
We may want finer granularity later.

Replace spechash_mtx use with dev_lock()/dev_unlock().
2004-09-23 07:17:41 +00:00
Poul-Henning Kamp
08dbd671ff Remove unused B_WRITEINPROG flag 2004-09-15 21:49:22 +00:00
Poul-Henning Kamp
1affa3adc8 Create simple function init_va_filerev() for initializing a va_filerev
field.

Replace three instances of longhaired initialization va_filerev fields.

Added XXX comment wondering why we don't use random bits instead of
uptime of the system for this purpose.
2004-09-07 09:17:05 +00:00
Don Lewis
8ded654028 Don't attempt to trigger the syncer thread final sync code in the
shutdown_pre_sync state if the RB_NOSYNC flag is set.  This is the
likely cause of hangs after a system panic that are keeping crash
dumps from being done.

This is a MFC candidate for RELENG_5.

MFC after:	3 days
2004-08-20 19:21:47 +00:00
David E. O'Brien
78c37b0de8 s/MAX_SAFE_MAXVNODES/MAXVNODES_MAX/g 2004-08-16 08:33:37 +00:00
John-Mark Gurney
ad3b9257c2 Add locking to the kqueue subsystem. This also makes the kqueue subsystem
a more complete subsystem, and removes the knowlege of how things are
implemented from the drivers.  Include locking around filter ops, so a
module like aio will know when not to be unloaded if there are outstanding
knotes using it's filter ops.

Currently, it uses the MTX_DUPOK even though it is not always safe to
aquire duplicate locks.  Witness currently doesn't support the ability
to discover if a dup lock is ok (in some cases).

Reviewed by:	green, rwatson (both earlier versions)
2004-08-15 06:24:42 +00:00
Robert Watson
87e83e7d4c In v_addpollinfo(), we allocate storage to back vp->v_pollinfo. However,
we may sleep when doing so; check that we didn't race with another thread
allocating storage for the vnode after allocation is made to a local
pointer, and only update the vnode pointer if it's still NULL.  Otherwise,
accept that another thread got there first, and release the local storage.

Discussed with:	jmg
2004-08-11 01:27:53 +00:00
Nate Lawson
c8c216d558 Skip the syncing disks loop if there are no dirty buffers. Remove a
variable used to flag the initial printf.

Submitted by:	truckman (earlier version)
2004-08-10 01:32:05 +00:00
David E. O'Brien
64298d52cc Put a cap on the auto-tuning of kern.maxvnodes.
Cap value chosen by:	scottl
2004-08-02 21:52:43 +00:00
Nate Lawson
b1c8139147 Minor message cleanup. 2004-07-30 01:30:05 +00:00
Poul-Henning Kamp
3dfe213e61 Convert the vfsconf list to a TAILQ.
Introduce vfs_byname() function to find things on it.

Staticize vfs_nmount() function under the name vfs_donmount().

Various cleanups.
2004-07-27 22:32:01 +00:00
Colin Percival
56f21b9d74 Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is
somewhat clearer, but more importantly allows for a consistent naming
scheme for suser_cred flags.

The old name is still defined, but will be removed in a few days (unless I
hear any complaints...)

Discussed with:	rwatson, scottl
Requested by:	jhb
2004-07-26 07:24:04 +00:00
Poul-Henning Kamp
cf95b5c381 Eliminate unused second argument to reassignbuf() and simplify it
accordingly.
2004-07-25 21:24:23 +00:00
Alfred Perlstein
05656b6e2b put several of the options for DEBUG_VFS_LOCKS under control of sysctls. 2004-07-21 07:13:14 +00:00
Alfred Perlstein
bb5faea34f Cleanup shutdown output. 2004-07-15 08:01:00 +00:00
Alfred Perlstein
da6303bacc Tidy up system shutdown. 2004-07-15 04:29:48 +00:00
Alfred Perlstein
f257b7a54b Make VFS_ROOT() and vflush() take a thread argument.
This is to allow filesystems to decide based on the passed thread
which vnode to return.
Several filesystems used curthread, they now use the passed thread.
2004-07-12 08:14:09 +00:00
Alfred Perlstein
7ae8ce5df1 Dump the actual bad values when this assertion is tripped. 2004-07-12 04:13:38 +00:00
Marcel Moolenaar
32240d082c Update for the KDB framework:
o  Call kdb_enter() instead of Debugger().
2004-07-10 21:47:53 +00:00
Alfred Perlstein
057589c485 fixup sysctl by fsid node 2004-07-08 06:11:36 +00:00
Alfred Perlstein
ea0104b032 Introduce vfs_suser(), used to test if a user should have special privs
for a mount.
2004-07-06 09:37:43 +00:00
Alfred Perlstein
c713aaaeca NFS mobility PHASE I, II & III (phase VI, and V pending):
Rebind the client socket when we experience a timeout.  This fixes
the case where our IP changes for some reason.

Signal a VFS event when NFS transitions from up to down and vice
versa.

Add a placeholder vfs_sysctl where we will put status reporting
shortly.

Also:
Make down NFS mounts return EIO instead of EINTR when there is a
soft timeout or force unmount in progress.
2004-07-06 09:12:03 +00:00
Don Lewis
27875d9c88 Unconditionally set last_work_seen while in the SYNCER_RUNNING state
so that last_work_seen has a reasonable value at the transition
to the SYNCER_SHUTTING_DOWN state, even if net_worklist_len happened
to be zero at the time.

Initialize last_work_seen to zero as a safety measure in case the
syncer never ran in the SYNCER_RUNNING state.

Tested by:	phk
2004-07-05 21:32:01 +00:00
Don Lewis
faf1b66d1d Rework syncer termination code:
Speed up the syncer when shutting down by sleeping for a shorter
    period of time instead of cranking up rushjob and using the
    normal one second sleep.

    Skip empty worklist slots when shutting down to avoid lengthy
    intervals of inactivity.

    Give I/O more time to complete between steps by not speeding the
    syncer quite as much.

    Terminate the syncer after one full pass through the worklist
    plus one second with the worklist containing nothing but syncer
    vnodes.

    Print an indication of shutdown progress to the console.

Add a sysctl, vfs.worklist_len, to allow the size of the syncer worklist
to be monitored.
2004-07-05 01:07:33 +00:00
Poul-Henning Kamp
c555963fd1 Give synthetic root filesystem device vnodes a v_bsize of DEV_BSIZE. 2004-07-04 22:33:22 +00:00
Alfred Perlstein
2d1dca73ee Pass the operation in with the fsidctl.
Remove some fsidctls that we will not be using.
Correct prototypes for fs sysctls.
2004-07-04 20:21:58 +00:00
Poul-Henning Kamp
7f6599fec6 Make the last commit handle non-phk root devices better. 2004-07-04 19:42:25 +00:00