Commit Graph

72 Commits

Author SHA1 Message Date
kib
dfd7a7f46d - Fix nullfs vnode reference leak in nullfs_reclaim_lowervp(). The
null_hashget() obtains the reference on the nullfs vnode, which must
  be dropped.

- Fix a wart which existed from the introduction of the nullfs
  caching, do not unlock lower vnode in the nullfs_reclaim_lowervp().
  It should be innocent, but now it is also formally safe.  Inform the
  nullfs_reclaim() about this using the NULLV_NOUNLOCK flag set on
  nullfs inode.

- Add a callback to the upper filesystems for the lower vnode
  unlinking. When inactivating a nullfs vnode, check if the lower
  vnode was unlinked, indicated by nullfs flag NULLV_DROP or VV_NOSYNC
  on the lower vnode, and reclaim upper vnode if so.  This allows
  nullfs to purge cached vnodes for the unlinked lower vnode, avoiding
  excessive caching.

Reported by:	G??ran L??wkrantz <goran.lowkrantz@ismobile.com>
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2013-05-11 11:17:44 +00:00
kib
169c0a0a9f The current default size of the nullfs hash table used to lookup the
existing nullfs vnode by the lower vnode is only 16 slots.  Since the
default mode for the nullfs is to cache the vnodes, hash has extremely
huge chains.

Size the nullfs hashtbl based on the current value of
desiredvnodes. Use vfs_hash_index() to calculate the hash bucket for a
given vnode.

Pointy hat to:	    kib
Diagnosed and reviewed by:	peter
Tested by:    peter, pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	5 days
2013-01-14 05:44:47 +00:00
kib
f74da69096 Fix reversed condition in the assertion.
Pointy hat to:	kib
MFC after:	13 days
2013-01-04 07:52:47 +00:00
kib
a7c71037df Add the "nocache" nullfs mount option, which disables the caching of
the free nullfs vnodes, switching nullfs behaviour to pre-r240285.
The option is mostly intended as the last-resort when higher pressure
on the vnode cache due to doubling of the vnode counts is not
desirable.

Note that disabling the cache costs more than 2x wall time in the
metadata-hungry scenarious.  The default is "cache".

Tested and benchmarked by:	pho (previous version)
MFC after:	2 weeks
2013-01-03 19:17:57 +00:00
kib
f31aa350da Remove the check and panic for an impossible condition. The NULL
lowervp vnode v_vnlock would cause panic due to NULL pointer
dereference much earlier.

MFC after:	1 week
2012-11-20 15:25:00 +00:00
attilio
b9a061c88d r16312 is not any longer real since many years (likely since when VFS
received granular locking) but the comment present in UFS has been
copied all over other filesystems code incorrectly for several times.

Removes comments that makes no sense now.

Reviewed by:	kib
MFC after:	3 days
2012-11-19 22:43:45 +00:00
kib
3ed1c80d25 Allow shared lookups for nullfs mounts, if lower filesystem supports
it.  There are two problems which shall be addressed for shared
lookups use to have measurable effect on nullfs scalability:

1. When vfs_lookup() calls VOP_LOOKUP() for nullfs, which passes lookup
operation to lower fs, resulting vnode is often only shared-locked. Then
null_nodeget() cannot instantiate covering vnode for lower vnode, since
insmntque1() and null_hashins() require exclusive lock on the lower.

Change the assert that lower vnode is exclusively locked to only
require any lock.  If null hash failed to find pre-existing nullfs
vnode for lower vnode and the vnode is shared-locked, the lower vnode
lock is upgraded.

2. Nullfs reclaims its vnodes on deactivation. This is due to nullfs
inability to detect reclamation of the lower vnode.  Reclamation of a
nullfs vnode at deactivation time prevents a reference to the lower
vnode to become stale.

Change nullfs VOP_INACTIVE to not reclaim the vnode, instead use the
VFS_RECLAIM_LOWERVP to get notification and reclaim upper vnode
together with the reclamation of the lower vnode.

Note that nullfs reclamation procedure calls vput() on the lowervp
vnode, temporary unlocking the vnode being reclaimed. This seems to be
fine for MPSAFE filesystems, but not-MPSAFE code often put partially
initialized vnode on some globally visible list, and later can decide
that half-constructed vnode is not needed.  If nullfs mount is created
above such filesystem, then other threads might catch such not
properly initialized vnode. Instead of trying to overcome this case,
e.g. by recursing the lower vnode lock in null_reclaim_lowervp(), I
decided to rely on nearby removal of the support for non-MPSAFE
filesystems.

In collaboration with:	pho
MFC after:	3 weeks
2012-09-09 19:20:23 +00:00
kib
cca832bf9d Do not expose unlocked unconstructed nullfs vnode on mount list.
Lock the native nullfs vnode lock before switching the locks.

Tested by:	pho
MFC after:	1 week
2012-03-02 09:48:46 +00:00
kib
9ea303ddb8 Document that null_nodeget() cannot take shared-locked lowervp due to
insmntque() requirements.

Tested by:	pho
MFC after:	1 week
2012-02-29 15:18:04 +00:00
kib
2ebb3c07b9 Move the code to destroy half-contructed nullfs vnode into helper
function null_destroy_proto() from null_insmntque_dtr(). Also
apply null_destroy_proto() in null_nodeget() when we raced and a vnode
is found in the hash, so the currently allocated protonode shall be
destroyed.

Lock the vnode interlock around reassigning the v_vnlock.

In fact, this path will not be exercised after several later commits,
since null_nodeget() cannot take shared-locked lowervp at all due to
insmntque() requirements.

Reported by:	rea
Tested by:	pho
MFC after:	1 week
2012-02-29 15:06:00 +00:00
kib
6fca263f3f Merge a split multi-line comment.
MFC after:	1 week
2012-02-29 14:43:27 +00:00
rea
558e00899f Subject: NULLFS: properly destroy node hash
Use hashdestroy() instead of naive free().

Approved by:	kib
MFC after:	2 weeks
2012-01-18 11:23:46 +00:00
dim
7958662e66 In sys/fs/nullfs/null_subr.c, in a KASSERT, output the correct vnode
pointer 'lowervp' instead of 'vp', which is uninitialized at that point.

Reviewed by:	kib
MFC after:	1 week
2012-01-05 17:06:04 +00:00
kib
e4b773d887 Do the vput() for the lowervp in the null_nodeget() for error case too.
Several callers of null_nodeget() did the cleanup itself, but several
missed it, most prominent being null_bypass(). Remove the cleanup from
the callers, now null_nodeget() handles lowervp free itself.

Reported and tested by:	pho
MFC after:	1 week
2012-01-03 21:09:07 +00:00
kib
0c5a3f31ef Document the state of the lowervp vnode for null_nodeget().
Tested by:	pho
MFC after:	1 week
2012-01-03 21:03:20 +00:00
kib
c537bf125e Use the plain panic calls, without additional printing around them.
The debugger and dumping support is adequate.

Tested by:	pho
MFC after:	1 week
2011-11-19 07:40:13 +00:00
kib
0e1f2d3b3e Do not drop vnode interlock in null_checkvp(). null_lock() verifies that
v_data is not-null before calling NULLVPTOLOWERVP(), and dropping the
interlock allows for reclaim to clean v_data and free the memory.

While there, remove unneeded semicolons and convert the infinite loops
to panics. I have a will to remove null_checkvp() altogether, or leave
it as a trivial stub, but not now.

Reported and tested by:	pho
2009-05-31 14:54:20 +00:00
des
66f807ed8b Retire the MALLOC and FREE macros. They are an abomination unto style(9).
MFC after:	3 months
2008-10-23 15:53:51 +00:00
jeff
30e699e885 - Simplify null_hashget() and null_hashins() by using vref() rather
than a complex series of steps involving vget() without a lock type
   to emulate the same thing.
2008-03-29 23:24:54 +00:00
attilio
18d0a0dd51 vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by:	Diego Sardina <siarodx at gmail dot com>,
		Andrea Di Pasquale <whyx dot it at gmail dot com>
2008-01-10 01:10:58 +00:00
daichi
87bd60ac74 This changes give nullfs correctly work with latest unionfs.
Submitted by:   Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by:    jeff, kensmith
Approved by:    re (kensmith)
MFC after:      1 week
2007-10-14 13:57:11 +00:00
tegge
214bc5723c Make insmntque() externally visibile and allow it to fail (e.g. during
late stages of unmount).  On failure, the vnode is recycled.

Add insmntque1(), to allow for file system specific cleanup when
recycling vnode on failure.

Change getnewvnode() to no longer call insmntque().  Previously,
embryonic vnodes were put onto the list of vnode belonging to a file
system, which is unsafe for a file system marked MPSAFE.

Change vfs_hash_insert() to no longer lock the vnode.  The caller now
has that responsibility.

Change most file systems to lock the vnode and call insmntque() or
insmntque1() after a new vnode has been sufficiently setup.  Handle
failed insmntque*() calls by propagating errors to callers, possibly
after some file system specific cleanup.

Approved by:	re (kensmith)
Reviewed by:	kib
In collaboration with:	kib
2007-03-13 01:50:27 +00:00
jeff
79ca2be05c - Assert that the lowervp is locked in null_hashget().
- Simplify the logic dealing with recycled vnodes in null_hashget() and
   null_hashins().  Since we hold the lower node locked in both cases
   the null node can not be undergoing recycling unless reclaim somehow
   called null_nodeget().  The logic that was in place was not safe and
   was essentially dead code.

MFC After:	1 week
2006-02-22 06:15:12 +00:00
rwatson
be4f357149 Normalize a significant number of kernel malloc type names:
- Prefer '_' to ' ', as it results in more easily parsed results in
  memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
  as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
  memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
  attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion.  Similar changes are required for UMA zone names.
2005-10-31 15:41:29 +00:00
jeff
69e9f89f88 - Clear VI_OWEINACT before calling vget() with no lock type. We know
the node is actually already locked, and VOP_INACTIVE is not desirable
   in this case.
2005-04-11 11:17:20 +00:00
jeff
57fd917aad - Assume that all lower filesystems now support proper locking. Assert
that they set v->v_vnlock.  This is true for all filesystems in the
   tree.
 - Remove all uses of LK_THISLAYER.  If the lower layer is locked, the
   null layer is locked.  We only use vget() to get a reference now.
   null essentially does no locking.  This fixes LOOKUP_SHARED with
   nullfs.
 - Remove the special LK_DRAIN considerations, I do not believe this is
   needed now as LK_DRAIN doesn't destroy the lower vnode's lock, and
   it's hardly used anymore.
 - Add one well commented hack to prevent the lowervp from going away
   while we're in it's VOP_LOCK routine.  This can only happen if we're
   forcibly unmounted while some callers are waiting in the lock.  In
   this case the lowervp could be recycled after we drop our last ref
   in null_reclaim().  Prevent this with a vhold().
2005-03-15 13:49:33 +00:00
jeff
0d9df2e12d - The VI_DOOMED flag now signals the end of a vnode's relationship with
the filesystem.  Check that rather than VI_XLOCK.
 - VOP_INACTIVE should no longer drop the vnode lock.
 - The vnode lock is required around calls to vrecycle() and vgone().

Sponsored by:	Isilon Systems, Inc.
2005-03-13 12:18:25 +00:00
imp
fb0a393e99 /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 18:10:42 +00:00
phk
59f305606c Back when VOP_* was introduced, we did not have new-style struct
initializations but we did have lofty goals and big ideals.

Adjust to more contemporary circumstances and gain type checking.

	Replace the entire vop_t frobbing thing with properly typed
	structures.  The only casualty is that we can not add a new
	VOP_ method with a loadable module.  History has not given
	us reason to belive this would ever be feasible in the the
	first place.

	Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc.

	Give coda correct prototypes and function definitions for
	all vop_()s.

	Generate a bit more data from the vnode_if.src file:  a
	struct vop_vector and protype typedefs for all vop methods.

	Add a new vop_bypass() and make vop_default be a pointer
	to another struct vop_vector.

	Remove a lot of vfs_init since vop_vector is ready to use
	from the compiler.

	Cast various vop_mumble() to void * with uppercase name,
	for instance VOP_PANIC, VOP_NULL etc.

	Implement VCALL() by making vdesc_offset the offsetof() the
	relevant function pointer in vop_vector.  This is disgusting
	but since the code is generated by a script comparatively
	safe.  The alternative for nullfs etc. would be much worse.

	Fix up all vnode method vectors to remove casts so they
	become typesafe.  (The bulk of this is generated by scripts)
2004-12-01 23:16:38 +00:00
marcel
32de0087b0 Update for the KDB framework:
o  Call kdb_enter() instead of Debugger().
o  Make debugging code conditional upon KDB instead of DDB.
2004-07-10 21:20:11 +00:00
imp
b49b7fe799 Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson
2004-04-07 20:46:16 +00:00
tjr
35c71928a0 MFp4: Fix two bugs causing possible deadlocks or panics, and one nit:
- Emulate lock draining (LK_DRAIN) in null_lock() to avoid deadlocks
  when the vnode is being recycled.
- Don't allow null_nodeget() to return a nullfs vnode from the wrong
  mount when multiple nullfs's are mounted. It's unclear why these checks
  were removed in null_subr.c 1.35, but they are definitely necessary.
  Without the checks, trying to unmount a nullfs mount will erroneously
  return EBUSY, and forcibly unmounting with -f will cause a panic.
- Bump LOG2_SIZEVNODE up to 8, since vnodes are >256 bytes now. The old
  value (7) didn't cause any problems, but made the hash algorithm
  suboptimal.

These changes fix nullfs enough that a parallel buildworld succeeds.

Submitted by:	tegge (partially; LK_DRAIN)
Tested by:	kris
2003-06-17 08:52:45 +00:00
imp
cf874b345d Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
alfred
bf8e8a6e8f Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
mckusick
25230d4c6a Regularize the vop_stdlock'ing protocol across all the filesystems
that use it. Specifically, vop_stdlock uses the lock pointed to by
vp->v_vnlock. By default, getnewvnode sets up vp->v_vnlock to
reference vp->v_lock. Filesystems that wish to use the default
do not need to allocate a lock at the front of their node structure
(as some still did) or do a lockinit. They can simply start using
vn_lock/VOP_UNLOCK. Filesystems that wish to manage their own locks,
but still use the vop_stdlock functions (such as nullfs) can simply
replace vp->v_vnlock with a pointer to the lock that they wish to
have used for the vnode. Such filesystems are responsible for
setting the vp->v_vnlock back to the default in their vop_reclaim
routine (e.g., vp->v_vnlock = &vp->v_lock).

In theory, this set of changes cleans up the existing filesystem
lock interface and should have no function change to the existing
locking scheme.

Sponsored by:	DARPA & NAI Labs.
2002-10-14 03:20:36 +00:00
jeff
f7e588347b - Use vrefcnt() where it is safe to do so instead of doing direct and
unlocked accesses to v_usecount.
 - Lock access to the buf lists in the various sync routines.  interlock
   locking could be avoided almost entirely in leaf filesystems if the
   fsync function had a generic helper.
2002-09-25 02:32:42 +00:00
njl
0590c43070 Remove all use of vnode->v_tag, replacing with appropriate substitutes.
v_tag is now const char * and should only be used for debugging.

Additionally:
1. All users of VT_NTS now check vfsconf->vf_type VFCF_NETWORK
2. The user of VT_PROCFS now checks for the new flag VV_PROCDEP, which
is propagated by pseudofs to all child vnodes if the fs sets PFS_PROCDEP.

Suggested by:   phk
Reviewed by:    bde, rwatson (earlier version)
2002-09-14 09:02:28 +00:00
semenu
50d99cdfec Fix a race during null node creation between relookuping the hash and
adding vnode to hash. The fix is to use atomic hash-lookup-and-add-if-
not-found operation. The odd thing is that this race can't happen
actually because the lowervp vnode is locked exclusively now during the
whole process of null node creation. This must be thought as a step
toward shared lookups.

Also remove vp->v_mount checks when looking for a match in the hash,
as this is the vestige.

Also add comments and cosmetic changes.
2002-06-13 21:49:09 +00:00
semenu
b350b26afc Change null_hashlock into null_hashmtx, because there is no need for
lockmgr and this helps to vget() vnode from hash without a race.

Reviewed by:	bp
MFC after:	2 weeks
2002-06-13 20:18:50 +00:00
semenu
7e3005a305 Fix the "error" path (when dropping not fully initialized vnode).
Also move hash operations out of null_vnops.c and explicitly initialize
v_lock in null_node_alloc (to set wmesg).

Reviewed by:	bp
MFC after:	2 weeks
2002-06-13 18:25:06 +00:00
julian
5596676e6c KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
ru
35437d86aa - FDESC, FIFO, NULL, PORTAL, PROC, UMAP and UNION file
systems were repo-copied from sys/miscfs to sys/fs.

- Renamed the following file systems and their modules:
  fdesc -> fdescfs, portal -> portalfs, union -> unionfs.

- Renamed corresponding kernel options:
  FDESC -> FDESCFS, PORTAL -> PORTALFS, UNION -> UNIONFS.

- Install header files for the above file systems.

- Removed bogus -I${.CURDIR}/../../sys CFLAGS from userland
  Makefiles.
2001-05-23 09:42:29 +00:00
markm
bcca5847d5 Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by:	bde (with reservations)
2001-05-01 08:13:21 +00:00
phk
e87f7a15ad Mechanical change to use <sys/queue.h> macro API instead of
fondling implementation details.

Created with: sed(1)
Reviewed by: md5(1)
2001-02-04 13:13:25 +00:00
phk
94a5006c9a Remove unneeded #include <sys/proc.h> lines. 2000-10-29 13:57:19 +00:00
jasone
4e290e67b7 Convert lockmgr locks from using simple locks to using mutexes.
Add lockdestroy() and appropriate invocations, which corresponds to
lockinit() and must be called to clean up after a lockmgr lock is no
longer needed.
2000-10-04 01:29:17 +00:00
bp
c2ae01d2e9 Fix vnode locking bugs in the nullfs.
Add correct support for v_object management, so mmap() operation should
work properly.
Add support for extattrctl() routine (submitted by semenu).

At this point nullfs can be considered as functional and much more stable.
In fact, it should behave as a "hard" "symlink" to underlying filesystem.

Reviewed in general by:		mckusick, dillon
Parts of logic obtained from:	NetBSD
2000-09-25 15:38:32 +00:00
bp
64ac0aa678 Various cleanups towards make nullfs functional (it is still broken
at this point):

Replace all '#ifdef DEBUG' with '#ifdef NULLFS_DEBUG' and add NULLFSDEBUG
macro.

Protect nullfs hash table with lockmgr.

Use proper order of operations when freeing mnt_data.

Return correct fsid in the null_getattr().

Add null_open() function to catch MNT_NODEV (obtained from NetBSD).

Add null_rename() to catch cross-fs rename operations (submitted by
Ustimenko Semen <semen@iclub.nsu.ru>)

Remove duplicate $FreeBSD$ tags.
2000-09-05 09:02:07 +00:00
bp
7106b8bf8a Get rid from the __P() macros.
Encouraged by:	peter
2000-09-05 07:54:39 +00:00
jake
961b97d434 Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by:		msmith and others
2000-05-26 02:09:24 +00:00