Commit Graph

144 Commits

Author SHA1 Message Date
Mohan Srinivasan
7d7d9e2242 Fixes up the handling of shared vnode lock lookups in the NFS client,
adds a FS type specific flag indicating that the FS supports shared
vnode lock lookups, adds some logic in vfs_lookup.c to test this flag
and set lock flags appropriately.

- amd on 6.x is a non-starter (without this change). Using amd under
  heavy load results in a deadlock (with cascading vnode locks all the
  way to the root) very quickly.
- This change should also fix the more general problem of cascading
  vnode deadlocks when an NFS server goes down.

Ideally, we wouldn't need these changes, as enabling shared vnode lock
lookups globally would work. Unfortunately, UFS, for example isn't
ready for shared vnode lock lookups, crashing pretty quickly.

This change is the result of discussions with Stephan Uphoff (ups@).

Reviewed by:	ups@
2006-09-13 18:39:09 +00:00
Mohan Srinivasan
f1cdf89911 Changes to make the NFS client MP safe.
Thanks to Kris Kennaway for testing and sending lots of bugs my way.
2006-05-19 00:04:24 +00:00
Jim Rees
cb156cc603 fix a problem with XID re-use when a server returns NFSERR_JUKEBOX.
Submitted by:	cel@citi.umich.edu
Fixed by:	rick@snowhite.cis.uoguelph.ca
Approved by:	alfred
MFC after:	3 weeks
2005-11-21 18:39:18 +00:00
Paul Saab
4321eae6b7 Make nfs_timer() MPSAFE. With this change, the bottom half of the NFS
client (the interface with the protocol stack and callouts) is
Giant-free.

Submitted by:	Mohan Srinivasan.
2005-07-19 21:27:25 +00:00
Jeff Roberson
c0f681c21d - The VI_DOOMED flag now signals the end of a vnode's relationship with
the filesystem.  Check that rather than VI_XLOCK.

Sponsored by:	Isilon Systems, Inc.
2005-03-13 12:14:56 +00:00
Warner Losh
c398230b64 /* -> /*- for license, minor formatting changes 2005-01-07 01:45:51 +00:00
Paul Saab
35ec46b7f2 Rewrite of the NFS client's reply handling. We now have NFS socket
upcalls which do RPC header parsing and match up the reply with the
request. NFS calls now sleep on the nfsreq structure. This enables
us to eliminate the NFS recvlock.

Submitted by:	Mohan Srinivasan mohans at yahoo-inc dot com
2004-12-06 21:11:15 +00:00
Paul Saab
ddc6c40075 2 fixes that improve on the consistency of the NFS client cache.
- Change the cached mtime to a 'struct timespec' from a
  time_t. Improving the precision of the cached mtime tightens up
  NFS' "close-to-open" consistency considerably.
- Always force an over-the-wire consistency check from nfs_open()
  (unless the file is marked modified). This further improves
  NFS' "close-to-open" consistency.

Submitted by:	Mohan Srinivasan mohans at yahoo-inc dot com
2004-12-06 19:18:00 +00:00
Paul Saab
b8d0fc9581 Add non-blocking versions of nfsm_dissect() and friends, for use from
socket callbacks or similar callers, from both the NFS client and the
server.
Instituted nfsm_dissect_nonblock(), nfsm_dissect_xx_nonblock(). And
nfsm_disct() now takes an extra M_TRYWAIT/M_DONTWAIT argument.

Submitted by:	Mohan Srinivasan mohans at yahoo-inc dot com
2004-12-06 17:33:52 +00:00
Poul-Henning Kamp
8b431c9576 For reasons unknown, the nfs locking code used a fifo to send requests to
userland and a dedicated system call to get replies.

The vnode-bypass of fifos broke this into a panic.

Ditch all the magic and create a device /dev/nfslock instead, and
use that for both directions apart from the shorter path, this is
also faster because the device driver runs Giant free using the
vnode bypass.

Noticed by:	marcel
2004-12-06 08:31:32 +00:00
Poul-Henning Kamp
aec0fb7b40 Back when VOP_* was introduced, we did not have new-style struct
initializations but we did have lofty goals and big ideals.

Adjust to more contemporary circumstances and gain type checking.

	Replace the entire vop_t frobbing thing with properly typed
	structures.  The only casualty is that we can not add a new
	VOP_ method with a loadable module.  History has not given
	us reason to belive this would ever be feasible in the the
	first place.

	Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc.

	Give coda correct prototypes and function definitions for
	all vop_()s.

	Generate a bit more data from the vnode_if.src file:  a
	struct vop_vector and protype typedefs for all vop methods.

	Add a new vop_bypass() and make vop_default be a pointer
	to another struct vop_vector.

	Remove a lot of vfs_init since vop_vector is ready to use
	from the compiler.

	Cast various vop_mumble() to void * with uppercase name,
	for instance VOP_PANIC, VOP_NULL etc.

	Implement VCALL() by making vdesc_offset the offsetof() the
	relevant function pointer in vop_vector.  This is disgusting
	but since the code is generated by a script comparatively
	safe.  The alternative for nullfs etc. would be much worse.

	Fix up all vnode method vectors to remove casts so they
	become typesafe.  (The bulk of this is generated by scripts)
2004-12-01 23:16:38 +00:00
Poul-Henning Kamp
494eb176e7 Add b_bufobj to struct buf which eventually will eliminate the need for b_vp.
Initialize b_bufobj for all buffers.

Make incore() and gbincore() take a bufobj instead of a vnode.

Make inmem() local to vfs_bio.c

Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj)
also VI_MTX() to BO_MTX(),

Make buf_vlist_add() take a bufobj instead of a vnode.

Eliminate other uses of bp->b_vp where bp->b_bufobj will do.

Various minor polishing: remove "register", turn panic into KASSERT,
use new function declarations, TAILQ_FOREACH_SAFE() etc.
2004-10-22 08:47:20 +00:00
Poul-Henning Kamp
a76d8f4ec9 Move the VI_BWAIT flag into no bo_flag element of bufobj and call it BO_WWAIT
Add bufobj_wref(), bufobj_wdrop() and bufobj_wwait() to handle the write
count on a bufobj.  Bufobj_wdrop() replaces vwakeup().

Use these functions all relevant places except in ffs_softdep.c where
the use if interlocked_sleep() makes this impossible.

Rename b_vnbufs to b_bobufs now that we touch all the relevant files anyway.
2004-10-21 15:53:54 +00:00
Poul-Henning Kamp
c0f46dd1e4 Remove support for using NFS device nodes. 2004-09-28 08:50:01 +00:00
Poul-Henning Kamp
e3c5a7a4dd When we traverse the vnodes on a mountpoint we need to look out for
our cached 'next vnode' being removed from this mountpoint.  If we
find that it was recycled, we restart our traversal from the start
of the list.

Code to do that is in all local disk filesystems (and a few other
places) and looks roughly like this:

		MNT_ILOCK(mp);
	loop:
		for (vp = TAILQ_FIRST(&mp...);
		    (vp = nvp) != NULL;
		    nvp = TAILQ_NEXT(vp,...)) {
			if (vp->v_mount != mp)
				goto loop;
			MNT_IUNLOCK(mp);
			...
			MNT_ILOCK(mp);
		}
		MNT_IUNLOCK(mp);

The code which takes vnodes off a mountpoint looks like this:

	MNT_ILOCK(vp->v_mount);
	...
	TAILQ_REMOVE(&vp->v_mount->mnt_nvnodelist, vp, v_nmntvnodes);
	...
	MNT_IUNLOCK(vp->v_mount);
	...
	vp->v_mount = something;

(Take a moment and try to spot the locking error before you read on.)

On a SMP system, one CPU could have removed nvp from our mountlist
but not yet gotten to assign a new value to vp->v_mount while another
CPU simultaneously get to the top of the traversal loop where it
finds that (vp->v_mount != mp) is not true despite the fact that
the vnode has indeed been removed from our mountpoint.

Fix:

Introduce the macro MNT_VNODE_FOREACH() to traverse the list of
vnodes on a mountpoint while taking into account that vnodes may
be removed from the list as we go.  This saves approx 65 lines of
duplicated code.

Split the insmntque() which potentially moves a vnode from one mount
point to another into delmntque() and insmntque() which does just
what the names say.

Fix delmntque() to set vp->v_mount to NULL while holding the
mountpoint lock.
2004-07-04 08:52:35 +00:00
Poul-Henning Kamp
f3732fd15b Second half of the dev_t cleanup.
The big lines are:
	NODEV -> NULL
	NOUDEV -> NODEV
	udev_t -> dev_t
	udev2dev() -> findcdev()

Various minor adjustments including handling of userland access to kernel
space struct cdev etc.
2004-06-17 17:16:53 +00:00
Peter Edwards
7c8ca9400e Let the NFS client notice a file's size changing as a modification.
This avoids presenting invalid data to the client's applications
when the file is modified, and then extended within the window of
the resolution of the modifcation timestamp.

Reviewed By:	iedowse
PR:		kern/64091
2004-04-14 23:23:55 +00:00
Marcel Moolenaar
5fa064a8d8 Unbreak build: s/TAILQ_ISEMPTY/TAILQ_EMPTY/g 2004-04-11 17:15:36 +00:00
Peter Edwards
1630ff08a3 Clean up properly when unloading NFS client module.
This includes a modified form of some code from Thomas Moestl (tmm@)
to properly clean up the UMA zone and the "nfsnodehashtbl" hash
table.

Reviewed By:	iedowse
PR:		16299
2004-04-11 13:30:20 +00:00
Warner Losh
2fcbca0d85 Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson
2004-04-07 05:00:01 +00:00
Jim Rees
f9955a5f53 only do nfs rpc callouts if there is work to do.
Submitted by:	kan
Approved by:	alfred
2004-03-25 21:48:09 +00:00
Poul-Henning Kamp
651b11eaf2 Remove unused second arg to vfinddev().
Don't call addaliasu() on VBLK nodes.
2004-03-11 16:33:11 +00:00
Alfred Perlstein
90abe7f294 Use function pointers to remove the depenancy cross dependancy on nfs4
and the nfs3 client.  Also fix some bugs that happen to be causing crashes
in both v3 and v4 introduced by the v4 import.

Submitted by: Jim Rees <rees@umich.edu>
Approved by: re
2003-11-22 02:21:49 +00:00
Alfred Perlstein
1bf8720450 University of Michigan's Citi NFSv4 kernel client code.
Submitted by: Jim Rees <rees@umich.edu>
2003-11-14 20:54:10 +00:00
Alexander Kabaev
ca430f2e92 Remove mntvnode_mtx and replace it with per-mountpoint mutex.
Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to
operate on this mutex transparently.

Eventually new mutex will be protecting more fields in
struct mount, not only vnode list.

Discussed with: jeff
2003-11-05 04:30:08 +00:00
Jeff Roberson
b21ad23fff - Check the XLOCK before we inspect the vnode. 2003-10-05 06:46:45 +00:00
Poul-Henning Kamp
2a3d23acc6 Name the vnode method vectors consistently with the rest of the filesystems.
This improves the output of src/tools/tools/vop_table
2003-09-12 16:44:40 +00:00
Warner Losh
a163d034fa Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
Alfred Perlstein
44956c9863 Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
Mike Barcroft
2b7f24d210 Change iov_base's type from char *' to the standard void *'. All
uses of iov_base which assume its type is `char *' (in order to do
pointer arithmetic) have been updated to cast iov_base to `char *'.
2002-10-11 14:58:34 +00:00
Jeff Roberson
8926aed697 - Lock access to the buf lists.
- Use vrefcnt() where appropriate.
 - Add some locking asserts.
2002-09-25 02:38:43 +00:00
Matthew Dillon
3d8f797ac1 Convert old style (type foo *)0 casts to NULLs
PR:		kern/40360
Requested by:	Hiten PAndya via direct email
2002-07-11 17:54:58 +00:00
Jeff Roberson
ab426dc822 Remove references to vm_zone.h and switch over to the new uma API. 2002-03-20 10:07:52 +00:00
Ian Dowse
9669bb479a Avoid passing the variable `tl' to functions that just use it for
temporary storage. In the old NFS code it wasn't at all clear if
the value of `tl' was used across or after macro calls, but I'm
fairly confident that the convention was to keep its use local.
Each ex-macro function now uses a local version of this variable,
so all of the double-indirection goes away.

The only exception to the `local use' rule for `tl' is nfsm_clget(),
which is left unchanged by this commit.

Reviewed by:	peter
2001-12-18 01:22:09 +00:00
Matthew Dillon
c72ccd014d Change the vnode list under the mount point from a LIST to a TAILQ
in preparation for an implementation of limiting code for kern.maxvnodes.

MFC after:	3 days
2001-10-23 01:21:29 +00:00
Peter Wemm
1290984b33 Make nfsm_dissect() have an obvious return value. 2001-09-27 22:40:38 +00:00
Peter Wemm
ea7fe289fe Tidy up nfsm_build usage. This is only partially finished. 2001-09-27 02:33:36 +00:00
Ian Dowse
1782e17d6f Add a missing dereference level. This caused nfsm_postop_attr_xx()
to try and extract node attributes from an RPC reply even if none
were present.

Reviewed by:	peter
2001-09-25 00:00:33 +00:00
Peter Wemm
247c65c27f Oops. Fix a missing indirection level. gcc didn't complain about it on
x86, but did complain about it on alpha (since int and pointer are
different sizes)
2001-09-20 03:45:51 +00:00
Peter Wemm
eb25edbda3 Cleanup and split of nfs client and server code.
This builds on the top of several repo-copies.
2001-09-18 23:32:09 +00:00
Julian Elischer
b40ce4165d KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
Matthew Dillon
0cddd8f023 With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage).  Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.
2001-07-04 16:20:28 +00:00
John Baldwin
bc2327c310 - Protect the mnt_vnode list with the mntvnode lock.
- Use queue(9) macros.
2001-06-28 04:10:07 +00:00
Alfred Perlstein
2395531439 Introduce a global lock for the vm subsystem (vm_mtx).
vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb
2001-05-19 01:28:09 +00:00
Greg Lehey
60fb0ce365 Revert consequences of changes to mount.h, part 2.
Requested by:	bde
2001-04-29 02:45:39 +00:00
Greg Lehey
d98dc34f52 Correct #includes to work with fixed sys/mount.h. 2001-04-23 09:05:15 +00:00
Bosko Milekic
2a0c503e7a * Rename M_WAIT mbuf subsystem flag to M_TRYWAIT.
This is because calls with M_WAIT (now M_TRYWAIT) may not wait
  forever when nothing is available for allocation, and may end up
  returning NULL. Hopefully we now communicate more of the right thing
  to developers and make it very clear that it's necessary to check whether
  calls with M_(TRY)WAIT also resulted in a failed allocation.
  M_TRYWAIT basically means "try harder, block if necessary, but don't
  necessarily wait forever." The time spent blocking is tunable with
  the kern.ipc.mbuf_wait sysctl.
  M_WAIT is now deprecated but still defined for the next little while.

* Fix a typo in a comment in mbuf.h

* Fix some code that was actually passing the mbuf subsystem's M_WAIT to
  malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the
  value of the M_WAIT flag, this could have became a big problem.
2000-12-21 21:44:31 +00:00
Kirk McKusick
d6514f21d7 In preparation for deprecating CIRCLEQ macros in favor of TAILQ
macros which provide the same functionality and are a bit more
efficient, convert use of CIRCLEQ's in NFS to TAILQ's.
2000-11-14 08:00:39 +00:00
David Malone
dc6dd1259f Problem to avoid processes getting stuck in "vmopar". From Ian's
mail:

	The problem seems to originate with NFS's postop_attr
	information that is returned with a read or write RPC.
	Within a vm_fault context, the code cannot deal with
	vnode_pager_setsize() shrinking a vnode.

	The workaround in the patch below stops the nfsm_postop_attr()
	macro from ever shrinking a vnode. If the new size in the
	postop_attr information is smaller, then it just sets the
	nfsnode n_attrstamp to 0 to stop the wrong size getting
	used in the future. This change only affects postop_attr
	attributes; the nfsm_loadattr() macro works as normal.

	The change is implemented by adding a new argument to
	nfs_loadattrcache() called 'dontshrink'. When this is
	non-zero, nfs_loadattrcache() will never reduce the
	vnode/nfsnode size; instead it zeros n_attrstamp.

There remain other was processes can get stuck in vmopar.

Submitted by:	Ian Dowse <iedowse@maths.tcd.ie>
Reviewed by:	dillon
Tested by:	Vadim Belman <voland@lflat.org>
2000-10-24 10:13:36 +00:00
Kirk McKusick
9b97113391 This patch corrects the first round of panics and hangs reported
with the new snapshot code.

Update addaliasu to correctly implement the semantics of the old
checkalias function. When a device vnode first comes into existence,
check to see if an anonymous vnode for the same device was created
at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than
creating a new vnode for the device. This corrects a problem which
caused the kernel to panic when taking a snapshot of the root
filesystem.

Change the calling convention of vn_write_suspend_wait() to be the
same as vn_start_write().

Split out softdep_flushworklist() from softdep_flushfiles() so that
it can be used to clear the work queue when suspending filesystem
operations.

Access to buffers becomes recursive so that snapshots can recursively
traverse their indirect blocks using ffs_copyonwrite() when checking
for the need for copy on write when flushing one of their own indirect
blocks. This eliminates a deadlock between the syncer daemon and a
process taking a snapshot.

Ensure that softdep_process_worklist() can never block because of a
snapshot being taken. This eliminates a problem with buffer starvation.

Cleanup change in ffs_sync() which did not synchronously wait when
MNT_WAIT was specified. The result was an unclean filesystem panic
when doing forcible unmount with heavy filesystem I/O in progress.

Return a zero'ed block when reading a block that was not in use at
the time that a snapshot was taken. Normally, these blocks should
never be read. However, the readahead code will occationally read
them which can cause unexpected behavior.

Clean up the debugging code that ensures that no blocks be written
on a filesystem while it is suspended. Snapshots must explicitly
label the blocks that they are writing during the suspension so that
they do not cause a `write on suspended filesystem' panic.

Reorganize ffs_copyonwrite() to eliminate a deadlock and also to
prevent a race condition that would permit the same block to be
copied twice. This change eliminates an unexpected soft updates
inconsistency in fsck caused by the double allocation.

Use bqrelse rather than brelse for buffers that will be needed
soon again by the snapshot code. This improves snapshot performance.
2000-07-24 05:28:33 +00:00