Commit Graph

217 Commits

Author SHA1 Message Date
John Baldwin
44731cab3b Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API.  The entire API now consists of two functions
similar to the pre-KSE API.  The suser() function takes a thread pointer
as its only argument.  The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0.  The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on:	smp@
2002-04-01 21:31:13 +00:00
Kirk McKusick
a0595d0249 Add a flags parameter to VFS_VGET to pass through the desired
locking flags when acquiring a vnode. The immediate purpose is
to allow polling lock requests (LK_NOWAIT) needed by soft updates
to avoid deadlock when enlisting other processes to help with
the background cleanup. For the future it will allow the use of
shared locks for read access to vnodes. This change touches a
lot of files as it affects most filesystems within the system.
It has been well tested on FFS, loopback, and CD-ROM filesystems.
only lightly on the others, so if you find a problem there, please
let me (mckusick@mckusick.com) know.
2002-03-17 01:25:47 +00:00
John Baldwin
a854ed9893 Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.
2002-02-27 18:32:23 +00:00
Matthew Dillon
9348f5e7a6 The vnode was not being vput()'d in the EEXIST mknod case on the nfs
server side.  This can lead to a system deadlock.

Reviewed by:    iedowse
Tested by:      Alexey G Misurenko <mag@caravan.ru>, iedowse
Bug found with help by: Alexey G Misurenko <mag@caravan.ru>
MFC at:         earliest convenience
2002-01-14 19:14:08 +00:00
Ian Dowse
8a919282a5 It is required by VOP_CREATE, VOP_MKNOD, VOP_SYMLINK and VOP_MKDIR
that va_mode of the supplied attributes is filled in with a valid
file mode (i.e not VNOVAL, and only ALLPERM bits set). However,
some NFS server op functions didn't guarantee this for all possible
request messages:

If a V3 client chose not include to a mode specification, we could
end up creating an ffs inode with mode 0177777, requiring a manual
fsck on the next reboot. Fix this by setting va_mode to 0 before
calling the VOP if a mode hasn't been supplied by the client.

In nfsrv_symlink(), S_IFMT bits supplied by a V2 client could end
up in the va_mode passed to VOP_SYMLINK with similar effects. We
now use the macro nfstov_mode() to correctly mask the bits.
2002-01-13 05:36:05 +00:00
Ian Dowse
5df3797ebf Fix a few NFSv2 issues that slipped in during the big cleanup. The
semantics of the nfsm_reply() macro were changed so that the caller
has to explicitly handle the V2 error case, whereas before,
nfsm_reply() did a `goto nfsmout' then. A few server ops (setattr,
readlink, create, mkdir) weren't updated to match, so errors in the
V2 case could cause protocol hangs and leaked mbufs.

Correct some comments that describe the old nfsm_reply behaviour.

[older, harmless nit] Remove the unnecessary `nfsmreply0' label in
nfsrv_create(), since for its users, the main `ereply' label does
the same thing.
2002-01-12 03:57:25 +00:00
Mike Smith
b3a39c8ae2 Rename some variables that end up shadowing their namesakes in the NFS client
code.

Reviewed by:	peter
2002-01-08 19:41:06 +00:00
Ian Dowse
9669bb479a Avoid passing the variable `tl' to functions that just use it for
temporary storage. In the old NFS code it wasn't at all clear if
the value of `tl' was used across or after macro calls, but I'm
fairly confident that the convention was to keep its use local.
Each ex-macro function now uses a local version of this variable,
so all of the double-indirection goes away.

The only exception to the `local use' rule for `tl' is nfsm_clget(),
which is left unchanged by this commit.

Reviewed by:	peter
2001-12-18 01:22:09 +00:00
Ian Dowse
eec7ff8aa6 When VOP_SYMLINK fails, the value of *vpp is junk, so we must NULL
out nd.ni_vp to prevent the resource cleanup code at the end of
nfsrv_symlink from trying to vrele it. This fixes a "vrele: negative
ref cnt" panic that can occur when a symlink is attempted on an NFS
filesystem with no free space. Found locally, but the symptoms
correspond to those in the PR referenced below.

PR:		kern/26878
MFC after:	3 days
2001-12-04 16:53:42 +00:00
Ian Dowse
4f6434bdde Now that nfsm_reply() does not usually set 'error' to 0, we need
to do it explicitly in nfsrv_noop so that the reply gets sent back
to the client. This fixes the generation of a selection of RPC
error replies (RPC_PROGMISMATCH, RPC_PROGUNAVAIL, RPC_PROCUNAVAIL
etc.) that are used by some clients to detect support for optional
protocols and features.

Reviewed by:	peter
Reported by:	Thomas Quinot <quinot@inf.enst.fr>
PR:		kern/31479
2001-10-25 19:07:56 +00:00
Peter Wemm
b9b0e19206 Unwind some more macros. NFSMADV() was kinda silly since it was right
next to equivalent m_len adjustments.  Move the nfsm_subs.h macros
into groups depending on which phase they are used in, since that
affects the error recovery requirements.  Collect some of the common error
checking into a single macro as preparation for unwinding some more.
Have nfs_rephead return a value instead of secretly modifying args.
Remove some unused function arguments that were being passed around.
Clarify nfsm_reply()'s error handling (I hope).
2001-09-28 04:37:08 +00:00
Peter Wemm
1290984b33 Make nfsm_dissect() have an obvious return value. 2001-09-27 22:40:38 +00:00
Peter Wemm
ea7fe289fe Tidy up nfsm_build usage. This is only partially finished. 2001-09-27 02:33:36 +00:00
Peter Wemm
eb25edbda3 Cleanup and split of nfs client and server code.
This builds on the top of several repo-copies.
2001-09-18 23:32:09 +00:00
Julian Elischer
b40ce4165d KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
Greg Lehey
60fb0ce365 Revert consequences of changes to mount.h, part 2.
Requested by:	bde
2001-04-29 02:45:39 +00:00
Greg Lehey
d98dc34f52 Correct #includes to work with fixed sys/mount.h. 2001-04-23 09:05:15 +00:00
Jeroen Ruigrok van der Werven
d7d97eb0aa Preceed/preceeding are not english words. Use precede and preceding. 2001-02-18 10:43:53 +00:00
Ian Dowse
27d9bb4e44 Fix some problems that were introduced in revision 1.97. Instead
of returning an error code to the caller, NFS server op routines
must themselves build an error reply and return 0 to the caller.

This is achieved by replacing the erroneous return statements with
code that jumps forward to the op function's reply code. We need
to be careful to ensure that the 'struct mount' pointer is NULL
though, so that the final vn_finished_write() call becomes a no-op.

Reviewed by:	mckusick, dillon
2001-02-09 13:24:06 +00:00
Bosko Milekic
2a0c503e7a * Rename M_WAIT mbuf subsystem flag to M_TRYWAIT.
This is because calls with M_WAIT (now M_TRYWAIT) may not wait
  forever when nothing is available for allocation, and may end up
  returning NULL. Hopefully we now communicate more of the right thing
  to developers and make it very clear that it's necessary to check whether
  calls with M_(TRY)WAIT also resulted in a failed allocation.
  M_TRYWAIT basically means "try harder, block if necessary, but don't
  necessarily wait forever." The time spent blocking is tunable with
  the kern.ipc.mbuf_wait sysctl.
  M_WAIT is now deprecated but still defined for the next little while.

* Fix a typo in a comment in mbuf.h

* Fix some code that was actually passing the mbuf subsystem's M_WAIT to
  malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the
  value of the M_WAIT flag, this could have became a big problem.
2000-12-21 21:44:31 +00:00
Kirk McKusick
f2a2857bb3 Add snapshots to the fast filesystem. Most of the changes support
the gating of system calls that cause modifications to the underlying
filesystem. The gating can be enabled by any filesystem that needs
to consistently suspend operations by adding the vop_stdgetwritemount
to their set of vnops. Once gating is enabled, the function
vfs_write_suspend stops all new write operations to a filesystem,
allows any filesystem modifying system calls already in progress
to complete, then sync's the filesystem to disk and returns. The
function vfs_write_resume allows the suspended write operations to
begin again. Gating is not added by default for all filesystems as
for SMP systems it adds two extra locks to such critical kernel
paths as the write system call. Thus, gating should only be added
as needed.

Details on the use and current status of snapshots in FFS can be
found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness
is not included here. Unless and until you create a snapshot file,
these changes should have no effect on your system (famous last words).
2000-07-11 22:07:57 +00:00
Poul-Henning Kamp
9626b608de Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by:    peter
2000-05-05 09:59:14 +00:00
Poul-Henning Kamp
2c9b67a8df Remove unneeded #include <vm/vm_zone.h>
Generated by:	src/tools/tools/kerninclude
2000-04-30 18:52:11 +00:00
Poul-Henning Kamp
b99c307a21 Rename the existing BUF_STRATEGY() to DEV_STRATEGY()
substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo)

substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo)

This patch is machine generated except for the ccd.c and buf.h parts.
2000-03-20 11:29:10 +00:00
Matthew Dillon
60c959f40b Fix compilation warning on alpha when converting pointer to integer
to generate hash index.

Reviewed by:	 Andrew Gallatin <gallatin@cs.duke.edu>
1999-12-18 19:20:05 +00:00
Matthew Dillon
2cac06495e Have NFS use a snapshot of boottime instead of boottime itself to
generate the NFSv3 Version id.  boottime itself may change, sometimes
    once every tick if you are running xntpd, which really throws off
    clients.  Clients will tend to throw away what they believe to be
    stale data too often, and can get into long loops rewriting the same
    data over and over again because they believe the server has rebooted
    over and over again due to the changing version id.

Approved by:	jkh
1999-12-16 17:01:32 +00:00
Eivind Eklund
762e6b856c Introduce NDFREE (and remove VOP_ABORTOP) 1999-12-15 23:02:35 +00:00
Matthew Dillon
1e64c256dc Add a readahead heuristic to the NFS server side code. While the server
cannot unilaterally pass data to a client it can reduce the physical
    disk transaction overhead by reading larger blocks.  This results in
    better pipelining of requests/responses over the network and an almost
    100% increase in cpu efficiency on the server.  On a 100BaseTX network
    NFS read performance increases from 8.5 MBytes/sec to 10 MB/sec (maxed
    out), and cpu efficiency increases from 72% idle to 80% idle on the server.

Reviewed by:	Alfred Perlstein <bright@wintelcom.net>
1999-12-13 17:34:45 +00:00
Matthew Dillon
5f3bfd608d Fix a number of server-side issues related to aborting badly formed
NFS packets, mainly initializing structure pointers to NULL which
    are conditionally freed prior to return.

PR:		kern/15249
Submitted by:	Ian Dowse <iedowse@maths.tcd.ie>
1999-12-12 07:06:39 +00:00
Eivind Eklund
dd8c04f4c7 Remove WILLRELE from VOP_SYMLINK
Note: Previous commit to these files (except coda_vnops and devfs_vnops)
that claimed to remove WILLRELE from VOP_RENAME actually removed it from
VOP_MKNOD.
1999-11-13 20:58:17 +00:00
Eivind Eklund
edfe736df9 Remove WILLRELE from VOP_RENAME 1999-11-12 03:34:28 +00:00
Matthew Dillon
13e14363fe Make FreeBSD less conservative in determining when to return a cookie
error for a directory.  I have made this change after a great deal of
    review although I cannot be absolutely sure that this meets the spec.

    The issue devolves into whether changes in an underlying (UFS) directory
    can cause NFS directory blocks to be renumbered.  My read of the code
    indicates that NFS directory blocks will not be renumbered, which means
    that the cookies should still remain valid after a change is made to
    the underlying directory.  This being the case, a cookie error should
    not be returned when a change is made to the underlying directory and,
    instead, the NFS client should rely on mtime detection to invalidate and
    reload the directory.

    The use of mtime is problematic in of itself, due to insufficient
    resolution, which is why I believe the original conservative error
    handling was done.  Still, there have been dozens of bug reports by
    people needing solaris<->FreeBSD interoperability and these have to
    be accomodated.
1999-09-29 17:14:58 +00:00
Matthew Dillon
b5acbc8b9c Asynchronized client-side nfs_commit. NFS commit operations were
previously issued synchronously even if async daemons (nfsiod's) were
    available.  The commit has been moved from the strategy code to the doio
    code in order to asynchronize it.

    Removed use of lastr in preparation for removal of vnode->v_lastr.  It
    has been replaced with seqcount, which is already supported by the system
    and, in fact, gives us a better heuristic for sequential detection then
    lastr ever did.

    Made major performance improvements to the server side commit.  The
    server previously fsync'd the entire file for each commit rpc.  The
    server now bawrite()s only those buffers related to the offset/size
    specified in the commit rpc.

    Note that we do not commit the meta-data yet.  This works still needs
    to be done.

    Note that a further optimization can be done (and has not yet been done)
    on the client: we can merge multiple potential commit rpc's into a
    single rpc with a greater file offset/size range and greatly reduce
    rpc traffic.

Reviewed by:	Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>
1999-09-17 05:57:57 +00:00
Peter Wemm
c3aac50f28 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
Bill Paul
9c9743b67b Correct the sanity test length calculation in nfsrv_readdirplus(): len is
being incremented by 4 bytes too few each time through the loop, which
allows more data into the mbuf chain that we really want. In the worst
case, when we're using 32K read/write sizes with a TCP client, this causes
readdirplus replies to sometimes exceed NFS_MAXPACKET which leads to a
panic. This problem cropped up for me using an IRIX 6.5.4 NFSv3 TCP client
with 32K read/write sizes, however supposedly it can be triggered by
WinNT NFS servers too. In theory, it can probably be triggered by any
NFS v3 implementation using TCP as long as it's using the maxiumum block
size.

Reviewed by: Matthew Dillon <dillon@backplane.com>
1999-07-29 21:42:57 +00:00
Alan Cox
3b5f11efe6 Clear error in nfsrv_create when we have a valid reply so that
that reply is actually transmitted.
Submitted by:	dillon
1999-07-28 08:20:49 +00:00
Poul-Henning Kamp
f008cfcc1a I have not one single time remembered the name of this function correctly
so obviously I gave it the wrong name.  s/umakedev/makeudev/g
1999-07-17 18:43:50 +00:00
Julian Elischer
3ba6a72322 Submitted by: "David E. Cross" <crossd@cs.rpi.edu>
Matt missed a line..
1999-06-30 04:29:13 +00:00
Julian Elischer
3d84d191cc Matt's NFS fixes.
Submitted by: Matt Dillon
Reviewed by: David Cross, Julian Elischer, Mike Smith, Drew Gallatin
  3.2 version to follow when tested
1999-06-23 04:44:14 +00:00
Peter Wemm
b903b04cc0 Various changes lifted from the OpenBSD cvs tree:
txdr_hyper and fxdr_hyper tweaks to avoid excessive CPU order knowledge.

nfs_serv.c: don't call nfsm_adj() with negative values, windows clients
could crash servers when doing a readdir of a large directory.

nfs_socket.c: Use IP_PORTRANGE to get a priviliged port without a spin
loop trying to bind().  Don't clobber a mbuf pointer or we get panics
on a NFS3ERR_JUKEBOX error from a server when reusing a freed mbuf.

nfs_subs.c: Don't loose st_blocks on NFSv2 mounts when > 2GB.

Obtained from:  OpenBSD
1999-06-05 05:35:03 +00:00
Poul-Henning Kamp
bfbb9ce670 Divorce "dev_t" from the "major|minor" bitmap, which is now called
udev_t in the kernel but still called dev_t in userland.

Provide functions to manipulate both types:
        major()         umajor()
        minor()         uminor()
        makedev()       umakedev()
        dev2udev()      udev2dev()

For now they're functions, they will become in-line functions
after one of the next two steps in this process.

Return major/minor/makedev to macro-hood for userland.

Register a name in cdevsw[] for the "filedescriptor" driver.

In the kernel the udev_t appears in places where we have the
major/minor number combination, (ie: a potential device: we
may not have the driver nor the device), like in inodes, vattr,
cdevsw registration and so on, whereas the dev_t appears where
we carry around a reference to a actual device.

In the future the cdevsw and the aliased-from vnode will be hung
directly from the dev_t, along with up to two softc pointers for
the device driver and a few houskeeping bits.  This will essentially
replace the current "alias" check code (same buck, bigger bang).

A little stunt has been provided to try to catch places where the
wrong type is being used (dev_t vs udev_t), if you see something
not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if
it makes a difference.  If it does, please try to track it down
(many hands make light work) or at least try to reproduce it
as simply as possible, and describe how to do that.

Without DEVT_FASCIST I belive this patch is a no-op.

Stylistic/posixoid comments about the userland view of the <sys/*.h>
files welcome now, from userland they now contain the end result.

Next planned step: make all dev_t's refer to the same devsw[] which
means convert BLK's to CHR's at the perimeter of the vnodes and
other places where they enter the game (bootdev, mknod, sysctl).
1999-05-11 19:55:07 +00:00
Peter Wemm
dfd5dee1b0 Add sufficient braces to keep egcs happy about potentially ambiguous
if/else nesting.
1999-05-06 18:13:11 +00:00
Poul-Henning Kamp
75c1354190 This Implements the mumbled about "Jail" feature.
This is a seriously beefed up chroot kind of thing.  The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.

For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact:  "real virtual servers".

Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.

Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.

It generally does what one would expect, but setting up a jail
still takes a little knowledge.

A few notes:

   I have no scripts for setting up a jail, don't ask me for them.

   The IP number should be an alias on one of the interfaces.

   mount a /proc in each jail, it will make ps more useable.

   /proc/<pid>/status tells the hostname of the prison for
   jailed processes.

   Quotas are only sensible if you have a mountpoint per prison.

   There are no privisions for stopping resource-hogging.

   Some "#ifdef INET" and similar may be missing (send patches!)

If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!

Tools, comments, patches & documentation most welcome.

Have fun...

Sponsored by:   http://www.rndassociates.com/
Run for almost a year by:       http://www.servetheweb.com/
1999-04-28 11:38:52 +00:00
Poul-Henning Kamp
f711d546d2 Suser() simplification:
1:
  s/suser/suser_xxx/

2:
  Add new function: suser(struct proc *), prototyped in <sys/proc.h>.

3:
  s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/

The remaining suser_xxx() calls will be scrutinized and dealt with
later.

There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.

More changes to the suser() API will come along with the "jail" code.
1999-04-27 11:18:52 +00:00
Doug Rabson
ce02431ffa * Change sysctl from using linker_set to construct its tree using SLISTs.
This makes it possible to change the sysctl tree at runtime.

* Change KLD to find and register any sysctl nodes contained in the loaded
  file and to unregister them when the file is unloaded.

Reviewed by: Archie Cobbs <archie@whistle.com>,
	Peter Wemm <peter@netplex.com.au> (well they looked at it anyway)
1999-02-16 10:49:55 +00:00
Eivind Eklund
5fd7941bd3 Remove the if fixed in the last commit; bde quite correctly point out
that it can never fail.
1998-12-09 15:12:53 +00:00
Eivind Eklund
d27dddc9d5 Fix typo (; in "if (vp == NULL);"). 1998-12-08 23:11:24 +00:00
Peter Wemm
1f2edded90 vm_object_page_clean() last arg changed from TRUE to OBJPC_SYNC. I'm not
sure that this is necessary to be a sync write here since a VOP_FSYNC()
follows and it will schedule, sort and complete the writes that the
vm_object_page_clean() started (as I think I understand things).
1998-10-31 15:39:31 +00:00
Doug Rabson
ecbb00a262 This commit fixes various 64bit portability problems required for
FreeBSD/alpha.  The most significant item is to change the command
argument to ioctl functions from int to u_long.  This change brings us
inline with various other BSD versions.  Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.
1998-06-07 17:13:14 +00:00
Peter Wemm
4152886f7a For the on-the-wire protocol, u_long -> u_int32_t; long -> int32_t;
int -> int32_t; u_short -> u_int16_t.  Also, use mode_t instead of u_short
for storing modes (mode_t is a u_int16_t).

Obtained from: NetBSD
1998-05-31 20:09:01 +00:00
Peter Wemm
261114d95c Cut-n-paste glitch 1998-05-31 19:43:34 +00:00
Peter Wemm
a422fed096 Hide whiteouts from NFS, since the protocol doesn't support them.
Obtained from:  NetBSD
1998-05-31 19:10:52 +00:00
Peter Wemm
c03d64df19 NetBSD has a comment that Solaris 2.5 doesn't do verifiers correctly,
we have weakened this test already for Digital Unix, so it may be enough
for Solaris.  It needs to be checked again.

Obtained from: NetBSD
1998-05-31 19:07:47 +00:00
Peter Wemm
dde4499fec Refuse READDIR / READDIRPLUS rpc's for non-directories
Obtained from: NetBSD
1998-05-31 17:54:18 +00:00
Peter Wemm
e8cf20c8db NFS Jumbo commit part 1. Cosmetic and structural changes only. The aim
of this part of commits is to minimize unnecessary differences between
the other NFS's of similar origin.  Yes, there are gratuitous changes here
that the style folks won't like, but it makes the catch-up less difficult.
1998-05-31 17:27:58 +00:00
Peter Wemm
7c1c33a7dd When using NFSv3, use the remote server's idea of the maximum file size
rather than assuming 2^64.  It may not like files that big. :-)
On the nfs server, calculate and report the max file size as the point
that the block numbers in the cache would turn negative.
(ie: 1099511627775 bytes (1TB)).

One of the things I'm worried about however, is that directory offsets
are really cookies on a NFSv3 server and can be rather large, especially
when/if the server generates the opaque directory cookies by using a local
filesystem offset in what comes out as the upper 32 bits of the 64 bit
cookie.  (a server is free to do this, it could save byte swapping
depending on the native 64 bit byte order)

Obtained from:	NetBSD
1998-05-30 16:33:58 +00:00
Peter Wemm
4204769d9e Only ignore "owner" permissions selectively rather than always. In some
cases we ignore it (eg: read/write) to maintain chmod-after-open semantics
but in other cases we do care, eg: creating files, access() etc.  Never
ignore errors from VOP_ACCESS() on immutable files.

This apparently comes from BSDI (from Keith Bostic) via NetBSD.

PR:		5148
Submitted by:	Yoshiro MIHIRA <sanpei@yy.cs.keio.ac.jp>
1998-05-20 09:05:48 +00:00
Mike Smith
7be2d30077 In the words of the submitter:
---------
Make callers of namei() responsible for releasing references or locks
instead of having the underlying filesystems do it.  This eliminates
redundancy in all terminal filesystems and makes it possible for stacked
transport layers such as umapfs or nullfs to operate correctly.

Quality testing was done with testvn, and lat_fs from the lmbench suite.

Some NFS client testing courtesy of Patrik Kudo.

vop_mknod and vop_symlink still release the returned vpp.  vop_rename
still releases 4 vnode arguments before it returns.  These remaining cases
will be corrected in the next set of patches.
---------

Submitted by:	Michael Hancock <michaelh@cet.co.jp>
1998-05-07 04:58:58 +00:00
Poul-Henning Kamp
227ee8a188 Eradicate the variable "time" from the kernel, using various measures.
"time" wasn't a atomic variable, so splfoo() protection were needed
around any access to it, unless you just wanted the seconds part.

Most uses of time.tv_sec now uses the new variable time_second instead.

gettime() changed to getmicrotime(0.

Remove a couple of unneeded splfoo() protections, the new getmicrotime()
is atomic, (until Bruce sets a breakpoint in it).

A couple of places needed random data, so use read_random() instead
of mucking about with time which isn't random.

Add a new nfs_curusec() function.

Mark a couple of bogosities involving the now disappeard time variable.

Update ffs_update() to avoid the weird "== &time" checks, by fixing the
one remaining call that passwd &time as args.

Change profiling in ncr.c to use ticks instead of time.  Resolution is
the same.

Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call
hzto() which subtracts time" sequences.

Reviewed by:	bde
1998-03-30 09:56:58 +00:00
Eivind Eklund
303b270b0a Staticize. 1998-02-09 06:11:36 +00:00
Eivind Eklund
0b08f5f737 Back out DIAGNOSTIC changes. 1998-02-06 12:14:30 +00:00
Eivind Eklund
47cfdb166d Turn DIAGNOSTIC into a new-style option. 1998-02-04 22:34:03 +00:00
John Dyson
eaf13dd73a Change the busy page mgmt, so that when pages are freed, they
MUST be PG_BUSY.  It is bogus to free a page that isn't busy,
because it is in a state of being "unavailable" when being
freed.  The additional advantage is that the page_remove code
has a better cross-check that the page should be busy and
unavailable for other use.  There were some minor problems
with the collapse code, and this plugs those subtile "holes."

Also, the vfs_bio code wasn't checking correctly for PG_BUSY
pages.  I am going to develop a more consistant scheme for
grabbing pages, busy or otherwise.  For now, we are stuck
with the current morass.
1998-01-31 11:56:53 +00:00
John Dyson
2be70f79f6 Lots of improvements, including restructring the caching and management
of vnodes and objects.  There are some metadata performance improvements
that come along with this.  There are also a few prototypes added when
the need is noticed.  Changes include:

1) Cleaning up vref, vget.
2) Removal of the object cache.
3) Nuke vnode_pager_uncache and friends, because they aren't needed anymore.
4) Correct some missing LK_RETRY's in vn_lock.
5) Correct the page range in the code for msync.

Be gentle, and please give me feedback asap.
1997-12-29 00:25:11 +00:00
Bruce Evans
675ea6f083 Unspammed nested include of <vm/vm_zone.h>. 1997-12-27 02:56:39 +00:00
Bruce Evans
55b211e3af Removed unused #includes. 1997-10-28 15:59:26 +00:00
John Dyson
99448ed11d Change the M_NAMEI allocations to use the zone allocator. This change
plus the previous changes to use the zone allocator decrease the useage
of malloc by half.  The Zone allocator will be upgradeable to be able
to use per CPU-pools, and has more intelligent usage of SPLs.  Additionally,
it has reasonable stats gathering capabilities, while making most calls
inline.
1997-09-21 04:24:27 +00:00
Poul-Henning Kamp
ec1b5c319d Remove a couple of stubborn NetBSD #if's. 1997-09-10 20:22:32 +00:00
Poul-Henning Kamp
07b2d0aaa3 unifdef -U__NetBSD__ -D__FreeBSD__ 1997-09-10 19:52:27 +00:00
Bruce Evans
4d1d4912ae Added used #include - don't depend on <sys/mbuf.h> including
<sys/malloc.h> (unless we only use the bogusly shared M*WAIT flags).
1997-09-02 01:19:47 +00:00
Garrett Wollman
57bf258e3d Fix all areas of the system (or at least all those in LINT) to avoid storing
socket addresses in mbufs.  (Socket buffers are the one exception.)  A number
of kernel APIs needed to get fixed in order to make this happen.  Also,
fix three protocol families which kept PCBs in mbufs to not malloc them
instead.  Delete some old compatibility cruft while we're at it, and add
some new routines in the in_cksum family.
1997-08-16 19:16:27 +00:00
Doug Rabson
c4b3a97040 Allow NULL cookie verifiers for non-NULL offsets. This is needed for
Digital Unix boxes since they appear to always send null verifiers.
1997-07-22 15:35:15 +00:00
Doug Rabson
e775608178 Merge WebNFS changes from NetBSD.
Obtained from:	NetBSD
1997-07-16 09:06:30 +00:00
Bruce Evans
f361d28a2c Don't require superuser privileges for creating fifos. The v2 case was
broken when support for v3 was introduced in rev.1.16.  The v3 case has
always been broken in FreeBSD.

Should be in 2.2.

PR:		3838
1997-06-14 11:19:35 +00:00
Doug Rabson
d1e963a50e Implement the async mount option for NFSv3. This makes NFS pretend that all
writes sent to the server were synchronous and therefore no commits are
needed.  This is the same as the vfs.nfs.async variable on the server but
allows each client to choose whether to work this way.

Also make the vfs.nfs.async variable do the 'right' thing for NFSv3, i.e.
pretend that the write was synchronous.
1997-06-03 13:56:55 +00:00
Doug Rabson
0160dedc65 Implement a separate control for write gathering on NFSv3. This is turned
off for NFSv3 by default since write gathering seems to reduce performance
for NFSv3 by up to 60%.

Add sysctl knobs to control both variables.
1997-05-10 16:59:36 +00:00
Doug Rabson
5ae0f71815 Fix a nasty hang connected with write gathering. Also add debug print
statements to bits of the server which helped me find the hang.
1997-05-10 16:12:03 +00:00
Bruce Evans
b445591810 Removed #include of <ufs/ufs/dir.h>. Nfs no longer depends on any ufs
features, and the one thing that it depended on (DIRBLKSIZ) now has
conflicting spelling.
1997-03-29 12:40:20 +00:00
Peter Wemm
476b25e22e Use the correct (relative to the implementation) ordering of args in
the VOP_LINK() calls, Closes PR#3064

Submitted by: bde
1997-03-25 05:13:40 +00:00
Peter Wemm
289c56e81e The local fs interface does not allow link()/unlink() of directories,
do not allow a remote nfs client to cause local fs corruption either.
1997-03-25 05:08:28 +00:00
Peter Wemm
6875d25465 Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.
1997-02-22 09:48:43 +00:00
John Dyson
996c772f58 This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
		Mount_std mounts will not work until the getfsent
		library routine is changed.

Reviewed by:	various people
Submitted by:	Jeffery Hsu <hsu@freebsd.org>
1997-02-10 02:22:35 +00:00
Jordan K. Hubbard
1130b656e5 Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore.  This update would have been
insane otherwise.
1997-01-14 07:20:47 +00:00
Nate Williams
030e2e9ebb In sys/time.h, struct timespec is defined as:
/*
         * Structure defined by POSIX.4 to be like a timeval.
         */
        struct timespec {
                time_t  ts_sec;         /* seconds */
                long    ts_nsec;        /* and nanoseconds */
        };

        The correct names of the fields are tv_sec and tv_nsec.

Reminded by:	James Drobina <jdrobina@infinet.com>
1996-09-19 18:21:32 +00:00
David Greenman
0247363f69 Release an unneeded reference to a vnode that was gained in a VFS_VGET().
Fixes a readdirplus panic.

Submitted by:	Doug Rabson <dfr@render.com>
1996-09-05 07:58:04 +00:00
Bruce Evans
b71fec07db Eliminated nested include of <sys/unistd.h> in <sys/file.h> in the kernel.
Include it directly in the few places where it is used.

Reduced some #includes of <sys/file.h> to #includes of <sys/fcntl.h> or
nothing.
1996-09-03 14:25:27 +00:00
John Dyson
6476c0d204 Even though this looks like it, this is not a complex code change.
The interface into the "VMIO" system has changed to be more consistant
and robust.  Essentially, it is now no longer necessary to call vn_open
to get merged VM/Buffer cache operation, and exceptional conditions
such as merged operation of VBLK devices is simpler and more correct.

This code corrects a potentially large set of problems including the
problems with ktrace output and loaded systems, file create/deletes,
etc.

Most of the changes to NFS are cosmetic and name changes, eliminating
a layer of subroutine calls.  The direct calls to vput/vrele have
been re-instituted for better cross platform compatibility.

Reviewed by: davidg
1996-08-21 21:56:23 +00:00
Bruce Evans
39a3579443 Fixed a vnode reference leak in nfsrv_rename(). The target inode wasn't
released until the file system was unmounted.  This bug also affected
kern/vfs_syscalls.c but was fixed in rev.1.18 and rev.1.20 there.

Reviewed by:	davidg
1996-06-08 12:16:26 +00:00
Bruce Evans
21e0797227 Fixed nfs sysctls. They missed out on the fs -> vfs name changes from
Lite2.  This broke nfsstat.
1996-04-30 23:23:09 +00:00
Poul-Henning Kamp
99cb299316 Add an option NFS_NOSERVER which saves 100K in the install kernel (or
any other kernel that uses it).  Use with option NFS.
1996-01-13 23:27:58 +00:00
Poul-Henning Kamp
b8dce649f1 Staticize. 1995-12-17 21:14:36 +00:00
David Greenman
efeaf95a41 Untangled the vm.h include file spaghetti. 1995-12-07 12:48:31 +00:00
Poul-Henning Kamp
a98ca4699e Second batch of cleanup changes.
This time mostly making a lot of things static and some unused
variables here and there.
1995-10-29 15:33:36 +00:00
David Greenman
c0c06a67d2 Added NFS_ASYNC kernel option. It only has an effect for NFSv2. 1995-08-24 11:39:31 +00:00
Doug Rabson
c3b2cc769c Some fixes found using gcc -Wall:
nfsm_rpchead() has been called with the wrong number of args and misplaced
args since someone added new args in the middle for nfsv3.

Here's another one that would be important on 64-bit systems.  VOP_READDIR
takes a `u_int **cookies' arg.

Submitted by:	Bruce Evans <bde@zeta.org.au>
1995-08-24 10:45:16 +00:00
David Greenman
75d8591e04 Fixed bug where vnode_pager_uncache() wasn't always called when it should
be. The result was that the file's space wouldn't be properly freed when
it was deleted.

Submitted by:	John Dyson
1995-08-06 11:55:25 +00:00
Doug Rabson
7faccad982 Slight changes to locking around VOP_READRIR.
Detect in nfsrv_readdirplus when a filesystem soes not support VFS_VGET and
return NFSERR_NOTSUPP so that the client will use ordinary readdir instead.
1995-08-03 12:14:16 +00:00
Doug Rabson
a2c06d4685 Lock the directory vnode before VOP_READDIR in nfsrv_readdirplus 1995-08-02 10:12:47 +00:00
David Greenman
4777741358 Removed my special-case hack for VOP_LINK and fixed the problem with the
wrong vp's ops vector being used by changing the VOP_LINK's argument order.
The special-case hack doesn't go far enough and breaks the generic
bypass routine used in some non-leaf filesystems. Pointed out by Kirk
McKusick.
1995-08-01 18:51:02 +00:00
David Greenman
aa2cabb958 1) Converted v_vmdata to v_object.
2) Removed unnecessary vm_object_lookup()/pager_cache(object, TRUE) pairs
   after vnode_pager_alloc() calls - the object is already guaranteed to be
   persistent.
3) Removed some gratuitous casts.
1995-06-28 12:01:13 +00:00
David Greenman
9879652657 Fixed VOP_LINK argument order botch. 1995-06-28 07:06:55 +00:00
Doug Rabson
a62dc40654 Changes to support version 3 of the NFS protocol.
The version 2 support has been tested (client+server) against FreeBSD-2.0,
IRIX 5.3 and FreeBSD-current (using a loopback mount).  The version 2 support
is stable AFAIK.
The version 3 support has been tested with a loopback mount and minimally
against an IRIX 5.3 server.  It needs more testing and may have problems.
I have patched amd to support the new variable length filehandles although
it will still only use version 2 of the protocol.

Before booting a kernel with these changes, nfs clients will need to at least
build and install /usr/sbin/mount_nfs.  Servers will need to build and
install /usr/sbin/mountd.

NFS diskless support is untested.

Obtained from: Rick Macklem <rick@snowhite.cis.uoguelph.ca>
1995-06-27 11:07:30 +00:00
Rodney W. Grimes
d3628763db Merge RELENG_2_0_5 into HEAD 1995-06-11 19:33:05 +00:00
Rodney W. Grimes
9b2e535452 Remove trailing whitespace. 1995-05-30 08:16:23 +00:00
David Greenman
77f53bcf27 Fixed some serious bugs that resulted in object reference counts not being
handled correctly. This would manifest itself as "object deallocated too
many times" panics and perhaps other strange inconsistencies on NFS servers.

Reviewed by:	me, of course
Submitted by:	John Dyson
1995-05-29 04:01:09 +00:00
David Greenman
50475e8bd3 Removed unnecessary call to vnode_pager_uncache(). We automatically clear
the VTEXT flag after all mappers have finished with the object.
1995-03-19 12:08:03 +00:00
David Greenman
45e3e324cc Changed some (incorrect) nfsrv_vput()'s back into regular vput()'s. This
fixes the last of the known NQNFS problems (until I find more, that is :-)).
1995-03-17 07:45:19 +00:00
David Greenman
ad21d87fd5 Woops, change a nfsrv_vput back into a nfsrv_vrele.
Submitted by:	John Dyson
1995-02-15 03:38:12 +00:00
David Greenman
6b03a7ffc5 Fixed three bugs related to the merged cache changes. The bugs likely would
make NFS servers flakey - probably the cause of freefall's recent hangs.

Submitted by:	John Dyson
1995-02-15 03:03:03 +00:00
David Greenman
0d94caffca These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.

The majority of the merged VM/cache work is by John Dyson.

The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.

vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme.  The scheme is almost fully compatible with the old filesystem
interface.  Significant improvement in the number of opportunities for write
clustering.

vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache.  Fixup of vfs_cluster to eliminate the bogus pagemove stuff.

vm_object.c:
Yet more improvements in the collapse code.  Elimination of some windows that
can cause list corruption.

vm_pageout.c:
Fixed it, it really works better now.  Somehow in 2.0, some "enhancements"
broke the code.  This code has been reworked from the ground-up.

vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.

pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.

vm_glue.c
Much simpler and more effective swapping code.  No more gratuitous swapping.

proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.

swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency.  Now the
code doesn't need it anymore.

machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.

machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.

ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache.  Add "bypass" support for sneaking in on
busy buffers.

Submitted by:	John Dyson and David Greenman
1995-01-09 16:06:02 +00:00
Poul-Henning Kamp
48fbb6cc7e Prototyping and general gcc-shutting up. Gcc has one warning now which looks
bad, I will get to it eventually, unless somebody beats me to it.
1994-10-02 17:27:07 +00:00
Doug Rabson
9abf4d6ee0 Make NFS ask the filesystems for directory cookies instead of making them
itself.
1994-09-28 16:45:22 +00:00
Garrett Wollman
c9b1d6048d More loadable VFS changes:
- Make a number of filesystems work again when they are statically compiled
  (blush)

- FIFOs are no longer optional; ``options FIFO'' removed from distributed
  config files.
1994-09-22 19:38:41 +00:00
David Greenman
1cdeb653a8 "bogus" fixes from 1.1.5 to work around some cache coherency problems. 1994-08-29 06:09:15 +00:00
David Greenman
3c4dd3568f Added $Id$ 1994-08-02 07:55:43 +00:00
Rodney W. Grimes
26f9a76710 The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.
Reviewed by:	Rodney W. Grimes
Submitted by:	John Dyson and David Greenman
1994-05-25 09:21:21 +00:00
Rodney W. Grimes
df8bae1de4 BSD 4.4 Lite Kernel Sources 1994-05-24 10:09:53 +00:00