Commit Graph

688 Commits

Author SHA1 Message Date
phk
849729c60b Don't clobber mnt_stat.f_mntonname 2004-12-07 14:26:39 +00:00
phk
4a639d6164 The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly
split the conversion of the remaining three filesystems out from the root
mounting changes, so in one go:

cd9660:
	Convert to nmount.
	Add omount compat shims.
	Remove dedicated rootfs mounting code.
	Use vfs_mountedfrom()
	Rely on vfs_mount.c calling VFS_STATFS()

nfs(client):
	Convert to nmount (the simple way, mount_nfs(8) is still necessary).
	Add omount compat shims.
	Drop COMPAT_PRELITE2 mount arg compatibility.

ffs:
	Convert to nmount.
	Add omount compat shims.
	Remove dedicated rootfs mounting code.
	Use vfs_mountedfrom()
	Rely on vfs_mount.c calling VFS_STATFS()

Remove vfs_omount() method, all filesystems are now converted.

Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem
task, and they all do it now.

Change rootmounting to use DEVFS trampoline:

vfs_mount.c:
	Mount devfs on /.  Devfs needs no 'from' so this is clean.
	symlink /dev to /.  This makes it possible to lookup /dev/foo.
	Mount "real" root filesystem on /.
	Surgically move the devfs mountpoint from under the real root
	filesystem onto /dev in the real root filesystem.

Remove now unnecessary getdiskbyname().

kern_init.c:
	Don't do devfs mounting and rootvnode assignment here, it was
	already handled by vfs_mount.c.

Remove now unused bdevvp(), addaliasu() and addalias().  Put the
few necessary lines in devfs where they belong.  This eliminates the
second-last source of bogo vnodes, leaving only the lemming-syncer.

Remove rootdev variable, it doesn't give meaning in a global context and
was not trustworth anyway.  Correct information is provided by
statfs(/).
2004-12-07 08:15:41 +00:00
ps
1d9d717d90 Always issue wakeups() to the NFS requestors under the mutex
to close all potential cases of missed wakeups.

Submitted by:	Mohan Srinivasan mohans at yahoo-inc dot com
2004-12-07 03:39:52 +00:00
ps
eeccf3813d Rewrite of the NFS client's reply handling. We now have NFS socket
upcalls which do RPC header parsing and match up the reply with the
request. NFS calls now sleep on the nfsreq structure. This enables
us to eliminate the NFS recvlock.

Submitted by:	Mohan Srinivasan mohans at yahoo-inc dot com
2004-12-06 21:11:15 +00:00
ps
8eaa4f53e4 2 fixes that improve on the consistency of the NFS client cache.
- Change the cached mtime to a 'struct timespec' from a
  time_t. Improving the precision of the cached mtime tightens up
  NFS' "close-to-open" consistency considerably.
- Always force an over-the-wire consistency check from nfs_open()
  (unless the file is marked modified). This further improves
  NFS' "close-to-open" consistency.

Submitted by:	Mohan Srinivasan mohans at yahoo-inc dot com
2004-12-06 19:18:00 +00:00
ps
aa4aa62af0 Serialize NFS vinvalbuf operations by acquiring/upgrading to the
vnode EXCLUSIVE lock. This prevents threads from adding pages to
the vnode while an invalidation is in progress, closing potential
races. In the bioread() path, callers acquire the SHARED vnode lock
- so while an invalidate was in progress, it was possible to fault
in new pages onto the vnode causing the invalidation to take a while
or fail. We saw these races at Yahoo! with very large files+heavy
concurrent access. Forcing an upgrade to EXCLUSIVE lock before doing
the invalidation closes all these races.

Submitted by:	Mohan Srinivasan mohans at yahoo-inc dot com
2004-12-06 18:52:28 +00:00
ps
5feadd3eba Add non-blocking versions of nfsm_dissect() and friends, for use from
socket callbacks or similar callers, from both the NFS client and the
server.
Instituted nfsm_dissect_nonblock(), nfsm_dissect_xx_nonblock(). And
nfsm_disct() now takes an extra M_TRYWAIT/M_DONTWAIT argument.

Submitted by:	Mohan Srinivasan mohans at yahoo-inc dot com
2004-12-06 17:33:52 +00:00
ps
ebd6438ae1 - If all data has been committed to stable storage on the server, it
is safe to turn off the nfsnode's NMODIFIED flag.
- Move the check for signals to the top of the loop where we loop
  around the dirty buffers on the vnode, scheduling writes. This
  ensures that we'll break ouf of the flush operation on reception of
  a signal.

Submitted by:	Mohan Srinivasan mohans at yahoo-inc dot com
2004-12-06 16:35:58 +00:00
rwatson
a98750ac1b Correct a typo in a comment. 2004-12-06 16:11:25 +00:00
phk
753d615ec0 For reasons unknown, the nfs locking code used a fifo to send requests to
userland and a dedicated system call to get replies.

The vnode-bypass of fifos broke this into a panic.

Ditch all the magic and create a device /dev/nfslock instead, and
use that for both directions apart from the shorter path, this is
also faster because the device driver runs Giant free using the
vnode bypass.

Noticed by:	marcel
2004-12-06 08:31:32 +00:00
rwatson
6b017b90b9 Convert GIANT_REQUIRED; in nfs_mountroot() to NET_ASSERT_GIANT(),
and annotate that nfs_mountroot assumes it is OK to step on the
values in the global NFSv3 diskless structure as the mountroot
function is called during a serialized part of the boot, before
any other NFS client activity occurs.

MFC after:	2 weeks
2004-12-05 22:53:17 +00:00
rwatson
22be685755 Convert a GIANT_REQUIRED; into a NET_ASSERT_GIANT();, as sockets are
now only conditionally protected by Giant based on debug.mpsafenet.
2004-12-05 22:50:09 +00:00
phk
6c14f71ef7 VFS_STATFS(mp, ...) is mostly called with &mp->mnt_stat, but a few cases
doesn't.  Most of the implementations have grown weeds for this so they
copy some fields from mnt_stat if the passed argument isn't that.

Fix this the cleaner way:  Always call the implementation on mnt_stat
and copy that in toto to the VFS_STATFS argument if different.
2004-12-05 22:41:02 +00:00
marcel
8b42e21d12 Fix null-pointer indirect function calls introduced in the previous
commit. In the new world order, the transitive closure on the vector
operations is not precomputed. As such, it's unsafe to actually use
any of the function pointers in an indirect function call. They can
be null, and we need to use the default vector in that case.
This is mostly a quick fix for the four function pointers that are
ed explicitly. A more generic or scalable solution is likely to see
the light of day.

No pathos on: current@
2004-12-05 22:30:28 +00:00
phk
59f305606c Back when VOP_* was introduced, we did not have new-style struct
initializations but we did have lofty goals and big ideals.

Adjust to more contemporary circumstances and gain type checking.

	Replace the entire vop_t frobbing thing with properly typed
	structures.  The only casualty is that we can not add a new
	VOP_ method with a loadable module.  History has not given
	us reason to belive this would ever be feasible in the the
	first place.

	Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc.

	Give coda correct prototypes and function definitions for
	all vop_()s.

	Generate a bit more data from the vnode_if.src file:  a
	struct vop_vector and protype typedefs for all vop methods.

	Add a new vop_bypass() and make vop_default be a pointer
	to another struct vop_vector.

	Remove a lot of vfs_init since vop_vector is ready to use
	from the compiler.

	Cast various vop_mumble() to void * with uppercase name,
	for instance VOP_PANIC, VOP_NULL etc.

	Implement VCALL() by making vdesc_offset the offsetof() the
	relevant function pointer in vop_vector.  This is disgusting
	but since the code is generated by a script comparatively
	safe.  The alternative for nullfs etc. would be much worse.

	Fix up all vnode method vectors to remove casts so they
	become typesafe.  (The bulk of this is generated by scripts)
2004-12-01 23:16:38 +00:00
phk
cb64ed501e Remove redundant functions (repo-copied from nfsclient) for dealing with
fifos.
2004-12-01 20:18:56 +00:00
phk
ab549174e2 Scripted modification of vop_* prototypes to use typedefs. 2004-12-01 19:08:40 +00:00
phk
4eaab0b383 Add missing #include 2004-12-01 07:34:08 +00:00
ps
3601987765 Fix for a race between lookup and readdirplus, that causes
a deadlock (with NFS exclusive vnode locks enabled). Lookup
grabs the parent's lock and wants to lock child. Readdirplus
locks the child and wants to lock parent (for loading the attrs
for ".."). The fix is to not load the attrs for ".." in
readdirplus.

Submitted by:	Mohan Srinivasan mohans at yahoo-inc dot com
Reviewed by:	rwatson
2004-12-01 06:51:07 +00:00
ps
531cb416ae Clean all dirty pages (dirtied by mmap'ed writes) in nfs_close().
This closes a major hole in close-to-open consistency support.
Added a new sysctl so that this can be disabled for single NFS
client applications with very large amounts of mmap'ed IO (for
performance).

Submitted by:	Mohan Srinivasan mohans at yahoo-inc dot com
Reviewed by:	rwatson
2004-12-01 06:48:54 +00:00
ps
69d7e65011 Fix for a (blocks) underrun bug where negative values were being
returned back to df from a statfs call. Causing df to print negative
values.

Submitted by:	Mohan Srinivasan mohans at yahoo-inc dot com
Reviewed by:	rwatson
2004-12-01 06:42:21 +00:00
ps
2b85447398 Fix for a bug in nfs_mkdir() that called vrele() instead of vput()
in the error cases, causing panics.

Submitted by:	Mohan Srinivasan mohans at yahoo-inc dot com
Reviewed by:	rwatson
2004-11-29 23:05:30 +00:00
jeff
9caab2e843 - Eliminate the acquisition and release of the bqlock in bremfree() by
setting the B_REMFREE flag in the buf.  This is done to prevent lock order
   reversals with code that must call bremfree() with a local lock held.
   This also reduces overhead by removing two lock operations per buf for
   fsync() and similar.
 - Check for the B_REMFREE flag in brelse() and bqrelse() after the bqlock
   has been acquired so that we may remove ourself from the free-list.
 - Provide a bremfreef() function to immediately remove a buf from a
   free-list for use only by NFS.  This is done because the nfsclient code
   overloads the b_freelist queue for its own async. io queue.
 - Simplify the numfreebuffers accounting by removing a switch statement
   that executed the same code in every possible case.
 - getnewbuf() can encounter locked bufs on free-lists once Giant is removed.
   Remove a panic associated with this condition and delay asserts that
   inspect the buf until after it is locked.

Reviewed by:	phk
Sponsored by:	Isilon Systems, Inc.
2004-11-18 08:44:09 +00:00
phk
5eae02ee76 Detect root mount attempts on the flag, not on the NULL path. 2004-11-09 22:21:52 +00:00
phk
e5715b2cc1 Retire b_magic now, we have the bufobj containing the same hint. 2004-11-04 09:48:18 +00:00
phk
1b25a59886 Move the buffer method vector (buf->b_op) to the bufobj.
Extend it with a strategy method.

Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY
song and dance.

Rename ibwrite to bufwrite().

Move the two NFS buf_ops to more sensible places, add bufstrategy
to them.

Add inlines for bwrite() and bstrategy() which calls through
buf->b_bufobj->b_ops->b_{write,strategy}().

Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().
2004-10-24 20:03:41 +00:00
phk
52a089c526 Add b_bufobj to struct buf which eventually will eliminate the need for b_vp.
Initialize b_bufobj for all buffers.

Make incore() and gbincore() take a bufobj instead of a vnode.

Make inmem() local to vfs_bio.c

Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj)
also VI_MTX() to BO_MTX(),

Make buf_vlist_add() take a bufobj instead of a vnode.

Eliminate other uses of bp->b_vp where bp->b_bufobj will do.

Various minor polishing: remove "register", turn panic into KASSERT,
use new function declarations, TAILQ_FOREACH_SAFE() etc.
2004-10-22 08:47:20 +00:00
phk
3833976d12 Move the VI_BWAIT flag into no bo_flag element of bufobj and call it BO_WWAIT
Add bufobj_wref(), bufobj_wdrop() and bufobj_wwait() to handle the write
count on a bufobj.  Bufobj_wdrop() replaces vwakeup().

Use these functions all relevant places except in ffs_softdep.c where
the use if interlocked_sleep() makes this impossible.

Rename b_vnbufs to b_bobufs now that we touch all the relevant files anyway.
2004-10-21 15:53:54 +00:00
pjd
5515efcf61 Add a missing newline character. 2004-10-14 19:00:44 +00:00
das
c32ecae436 nfsclient/nfs_bio.c has a PHOLD() without a PRELE(). Neither should
be necessary here.  Also, use killproc() instead of psignal().
2004-10-01 05:01:41 +00:00
phk
d3ceec948f Remove support for using NFS device nodes. 2004-09-28 08:50:01 +00:00
phk
5c67a82c63 Remove NFS4 vop method vector for devices: we are desupporing device nodes
on anything but DEVFS and in this case it was not even used (see below).

Put the NFS4 vop method for fifo's behind "#if 0" because it is unused.
Add a XXX comment to say that I think the unusedness is a bug.
2004-09-27 20:02:50 +00:00
phk
46bdd46105 style consistency. 2004-09-27 19:44:39 +00:00
phk
02df7323ee Remove unused B_WRITEINPROG flag 2004-09-15 21:49:22 +00:00
phk
9f1a2f23b2 Explicitly pass vnode to nfs_doio() and mountpoint to nfs_asyncio(). 2004-09-07 08:56:43 +00:00
rwatson
68779f8b5e In nfs_timer(), pass curthread rather than &thread0 into the protocol
send routine.  In IPv6 UDP, the thread will be passed to suser(), which
asserts that if a thread is used for a super user check, it be
curthread.  Many of these protocol entry points probably need to
accept credentials instead of threads.

MT5 candidate.

Noticed/tested by:	kuriyama
2004-08-25 01:23:38 +00:00
phk
2d868d02cf Put a version element in the VFS filesystem configuration structure
and refuse initializing filesystems with a wrong version.  This will
aid maintenance activites on the 5-stable branch.

s/vfs_mount/vfs_omount/

s/vfs_nmount/vfs_mount/

Name our filesystems mount function consistently.

Eliminate the namiedata argument to both vfs_mount and vfs_omount.
It was originally there to save stack space.  A few places abused
it to get hold of some credentials to pass around.  Effectively
it is unused.

Reorganize the root filesystem selection code.
2004-07-30 22:08:52 +00:00
phk
98d8f3741c Move a relic to its correct location(s): Put nfs diskless initialization
calls with the code they call.  (Yet another example of mindless copy&paste).
2004-07-28 21:54:57 +00:00
phk
075684f5fd Remove global variable rootdevs and rootvp, they are unused as such.
Add local rootvp variables as needed.

Remove checks for miniroot's in the swappartition.  We never did that
and most of the filesystems could never be used for that, but it had
still been copy&pasted all over the place.
2004-07-28 20:21:04 +00:00
phk
5297516e02 Eliminate unused second argument to reassignbuf() and simplify it
accordingly.
2004-07-25 21:24:23 +00:00
alfred
b4f778e20c Turn off SO_REUSEADDR and SO_REUSEPORT, they were causing EADDRINUSE
to be returned from the protocol stack.

Pointy hat to me for not groking what those options _really_ mean.
2004-07-13 05:42:59 +00:00
dwmalone
6ff1185c1d Rename Alfred's kern_setsockopt to so_setsockopt, as this seems a
a better name. I have a kern_[sg]etsockopt which I plan to commit
shortly, but the arguments to these function will be quite different
from so_setsockopt.

Approved by:	alfred
2004-07-12 21:42:33 +00:00
alfred
8a1713aada Make VFS_ROOT() and vflush() take a thread argument.
This is to allow filesystems to decide based on the passed thread
which vnode to return.
Several filesystems used curthread, they now use the passed thread.
2004-07-12 08:14:09 +00:00
alfred
031e087d2c Use SO_REUSEADDR and SO_REUSEPORT when reconnecting NFS mounts.
Tune the timeout from 5 seconds to 12 seconds.
Provide a sysctl to show how many reconnects the NFS client has done.

Seems to fix IPv6 from: kuriyama
2004-07-12 06:22:42 +00:00
brian
aae31dbf32 Change the following environment variables to kernel options:
bootp -> BOOTP
    bootp.nfsroot -> BOOTP_NFSROOT
    bootp.nfsv3 -> BOOTP_NFSV3
    bootp.compat -> BOOTP_COMPAT
    bootp.wired_to -> BOOTP_WIRED_TO

- i.e. back out the previous commit.  It's already possible to
pxeboot(8) with a GENERIC kernel.

Pointed out by: dwmalone
2004-07-08 22:35:36 +00:00
brian
2821a50eaa Change the following kernel options to environment variables:
BOOTP -> bootp
    BOOTP_NFSROOT -> bootp.nfsroot
    BOOTP_NFSV3 -> bootp.nfsv3
    BOOTP_COMPAT -> bootp.compat
    BOOTP_WIRED_TO -> bootp.wired_to

This lets you PXE boot with a GENERIC kernel by putting this sort of thing
in loader.conf:

    bootp="YES"
    bootp.nfsroot="YES"
    bootp.nfsv3="YES"
    bootp.wired_to="bge1"

or even setting the variables manually from the OK prompt.
2004-07-08 13:40:33 +00:00
rwatson
7e85c099fc Acquire socket lock in nfs_connect() connection/sleep loop to protect
socket state and avoid missed wakeups.
2004-07-06 16:55:41 +00:00
alfred
864fa13b59 use vfs_suser() to restrict access to the nfs mount's timeout. 2004-07-06 09:40:44 +00:00
alfred
8fd8b8c57f NFS mobility Phase VI:
Export NFS mount state via sysctl.
Export timeout via sysctl.
2004-07-06 09:23:17 +00:00
alfred
97a6f04270 NFS mobility PHASE I, II & III (phase VI, and V pending):
Rebind the client socket when we experience a timeout.  This fixes
the case where our IP changes for some reason.

Signal a VFS event when NFS transitions from up to down and vice
versa.

Add a placeholder vfs_sysctl where we will put status reporting
shortly.

Also:
Make down NFS mounts return EIO instead of EINTR when there is a
soft timeout or force unmount in progress.
2004-07-06 09:12:03 +00:00