Commit Graph

342 Commits

Author SHA1 Message Date
archie
84bd80a4f9 The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static
and local variables, goto labels, and functions declared but not defined.
1998-12-07 21:58:50 +00:00
archie
dcdc1187e3 Examine all occurrences of sprintf(), strcat(), and str[n]cpy()
for possible buffer overflow problems. Replaced most sprintf()'s
with snprintf(); for others cases, added terminating NUL bytes where
appropriate, replaced constants like "16" with sizeof(), etc.

These changes include several bug fixes, but most changes are for
maintainability's sake. Any instance where it wasn't "immediately
obvious" that a buffer overflow could not occur was made safer.

Reviewed by:	Bruce Evans <bde@zeta.org.au>
Reviewed by:	Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by:	Mike Spengler <mks@networkcs.com>
1998-12-04 22:54:57 +00:00
dillon
e76e1ae431 Make bootp error message slightly more verbose 1998-12-03 20:28:23 +00:00
msmith
ced8fdc0cc Reimplement the NFS ACCESS RPC cache as an "accelerator" rather than a true
cache.  If the cached result lets us say "yes", then go with that.  If
we're not sure, or we think the answer might be "no", go to the wire to be
certain.    This avoids all of the possible false veto cases, and allows us
to key the cached value with just the UID for which the cached value holds,
reducing the bloat of the nfsnode structure from 104 bytes to just 12 bytes.

Since the "yes" case is by far the most common, this should still provide
a substantial performance improvement.  Also default the cache to on, with
a conservative timeout (2 seconds).  This improves performance if NFS is
loaded as a KLD module, as there's not (yet) code to parse an option out
of the module arguments to set it, and sysctl doesn't work (yet) for OIDs
in modules.

The 'accelerator' mode was suggested by Bjoern Groenvall (bg@sics.se)

Feedback on this would be appreciated as testing has been necessarily
limited by Comdex, and it would be valuable to have this in 2.2.8.
1998-11-15 20:36:18 +00:00
msmith
67d947c3dd Avoid a null pointer reference if the target of an NFS rename has been
sillrenamed, or if the source vnode doesn't have an associated nfsnode.

Bug report from Andrew Gallatin <gallatin@cs.duke.edu>
1998-11-13 22:58:48 +00:00
dfr
4b2a9a875b Fix a panic in nfsrv_dorec() where a NULL pointer could be passed to
free() sometimes.

Reviewed by: Eric Haug <ejh@eas.slu.edu>
1998-11-13 09:44:12 +00:00
msmith
aebc5396ec Implement NFS ACCESS RPC result caching.
This yields startling performance increases for NFS clients for many
access profiles, due to the fact that ACCESS results are persistently
cached in the namecache in many cases.

Note that the code is somewhat conservative in that it requires an
exact credential match for a cache hit.  This bloats the nfsnode
structure by sizeof(struct ucred) (96 bytes).  Any less conservative
approach opens the possibility for a false veto in eg. setuid
applications.  Alternative suggestions would be welcomed.

The cache is normally disabled, to activate set the sysctl variable
vfs.nfs.access_cache_timeout to a nonzero value.  This is the time in
seconds that a cached entry will be considered valid; useful values appear
to be 2-10 seconds.  Performance of the cache can be monitored with the
vfs.nfs.access_cache_hits and vfs.nfs.access_cache_hits variables.
1998-11-13 02:39:09 +00:00
peter
c227178f58 Remove [apparently] bogus casts to u_long for the vnode_pager_setsize()
second argument.  np_size is a 64 bit int, so is the second arg.  This
might have caused needless 2G/4G file size problems.

I believe it was Bruce who queried this.
1998-11-09 07:00:14 +00:00
peter
cb150e060c Use TAILQ macros for clean/dirty block list processing. Set b_xflags
rather than abusing the list next pointer with a magic number.
1998-10-31 15:31:29 +00:00
mckusick
cfd4c9364b In nfs_link(), check for a cross-device mount *before* looking
in the v_data field.
Obtained from: Charles Hannum, via Frank van der Linden <frank@wins.uva.nl>
1998-09-29 23:39:37 +00:00
mckusick
bc2396f099 Missing vput when cross-device link error is detected in nfs_link. 1998-09-29 23:29:48 +00:00
mckusick
eefdfcf4ef During truncation, have to notify the VM about the new size
of the NFS file *before* doing the nfs_vinvalbuf operation.
Otherwise some invalid data may show up in an mmap.
1998-09-29 23:28:32 +00:00
mckusick
068dbf8c1d Frank sez: 'It fixes a problem with servers that return 0 values
for some of the fsinfo RPC fields. It is strictly speaking not
wrong to do this, as the spec says that "it is expected that a
server will make a best effort at supporting all the attributes",
but pretty unusual. You guessed it, it's NT servers that do it.'
Obtained from: Frank van der Linden <frank@wins.uva.nl>
1998-09-29 23:15:53 +00:00
mckusick
43835ae8ed Do not need (or want) to take a reference on an NFS file that
is being deleted due to an forcible unmount. The problem is
that vgone calls vclean() which then calls calls nfs_inactive()
with VXLOCK set on the vnode. Nfs_inactive() was calling vget()
to get a reference on the vnode, which in turn hung on VXLOCK.
Nfs_inactive() now checks v_usecount to make sure that the vnode
is not coming from vclean() before it does a vget().
1998-09-29 23:15:25 +00:00
mckusick
21912d285b The code checks each fragment mark to see if it's valid; if the fragment
is less than NFS_MINPACKET or greater than NFS_MAXPACKET in size, it
barfs and, I think, drops the connection.

However, there's no guarantee that in a multi-fragment RPC, all the
fragments will be at least as large as NFS_MINPACKET.

In fact, with the version of "tclnfs" we have here, which supports NFS
over TCP, at least when built under SunOS 4.1.3 (i.e., with 4.1.3's
user-mode ONC RPC library), I can *repeatably* cause "tclnfs" to send a
request with more than one fragment, one of which is only 8 bytes long.
I just do a 3877-byte write to a file, at an offset of 0.

The check that "slp->ns_reclen" is greater than or equal to
NFS_MINPACKET serves no useful purpose - if the NFS server code can't
handle packets < NFS_MINPACKET bytes, it can't handle them over *any*
protocol, so the check has to be done above the RPC-over-TCP layer - and
should be removed.
Obtained from: Fix from Guy Harris, forwarded by Rick Macklem.
1998-09-29 22:33:05 +00:00
mckusick
572fac0aa1 Mark directory buffers that have no valid data with B_INVAL
so that they are not put in the cache.
1998-09-29 22:01:10 +00:00
mckusick
970f96cae3 When adding data to a buffer, we need to clear the B_NEEDCOMMIT flag
which says that the data is on server but not committed.
1998-09-29 21:46:54 +00:00
bde
1fa06f3088 Removed statically configured mount type numbers (MOUNT_*) and all
references to them.

The change a couple of days ago to ignore these numbers in statically
configured vfsconf structs was slightly premature because the cd9660,
cfs, devfs, ext2fs, nfs vfs's still used MOUNT_* instead of the number
in their vfsconf struct.
1998-09-07 13:17:06 +00:00
bde
8e84c55866 Made unloading of the nfs LKM sort of work. This is mainly to test
detachment of vfs sysctls.  Unloading of vfs LKMs doesn't actually
work for any vfs, since it leaves garbage pointers to memory
allocation control structures.
1998-09-07 05:42:15 +00:00
bde
fe74c5117a Ignore the statically configured vfs type numbers and assign vfs
type numbers in vfs attach order (modulo incomplete reuse of old
numbers after vfs LKMs are unloaded).  This requires reinitializing
the sysctl tree (or at least the vfs subtree) for vfs's that support
sysctls (currently only nfs).  sysctl_order() already handled
reinitialization reasonably except it checked for annulled self
references in the wrong place.

Fixed sysctls for vfs LKMs.
1998-09-05 17:13:28 +00:00
bde
3db8c0c3b6 Instantiate `nfs_mount_type' in a standard file so that it is present
when nfs is an LKM.  Declare it in a header file.  Don't forget to use
it in non-Lite2 code.  Initialize it to -1 instead of to 0, since 0
will soon be the mount type number for the first vfs loaded.

NetBSD uses strcmp() to avoid this ugly global.
1998-09-05 15:17:34 +00:00
dfr
f5985434e9 Cosmetic changes to the PAGE_XXX macros to make them consistent with
the other objects in vm.
1998-09-04 08:06:57 +00:00
luoqi
00da9ece0c Check for NULL pointer before freeing a struct sockaddr. m_freem() can handle
NULL, buf free() can't.
1998-09-01 02:31:52 +00:00
wollman
fc229edf09 Yow! Completely change the way socket options are handled, eliminating
another specialized mbuf type in the process.  Also clean up some
of the cruft surrounding IPFW, multicast routing, RSVP, and other
ill-explored corners.
1998-08-23 03:07:17 +00:00
bde
228c4dc2c1 Fixed printf format errors. 1998-08-18 00:32:50 +00:00
dfr
235f5214dd Protect all modifications to v_numoutput with splbio(). 1998-08-13 08:09:08 +00:00
bde
c0640859f8 Don't configure compatibility code for pre-Lite2 mount() calls by
default.  This code should go away soon.
1998-08-12 20:17:42 +00:00
peter
00be1faa66 If we get an ENOBUFS from the network, it's normally transient network
interface congestion (eg: nfs over a ppp link, etc).  Don't log these
for UDP mounts, and don't cause syscalls to fail with EINTR.
This stops the 'nfs send error 55' warnings.

If the error is because the system is really hosed, this is the least
of your problems...
1998-08-01 09:04:02 +00:00
bde
cd39e51e66 Cast pointers to uintptr_t/intptr_t instead of to u_long/long,
respectively.  Most of the longs should probably have been
u_longs, but this changes is just to prevent warnings about
casts between pointers and integers of different sizes, not
to fix poorly chosen types.
1998-07-15 02:32:35 +00:00
julian
5bfbd9138f VOP_STRATEGY grows an (struct vnode *) argument
as the value in b_vp is often not really what you want.
(and needs to be frobbed). more cleanups will follow this.
Reviewed by: Bruce Evans <bde@freebsd.org>
1998-07-04 20:45:42 +00:00
jmg
329c2b429b fix buildworld hopefully be3fore anyone complains...
NFS_*TIMO should possibly be converted to sysctl vars (jkh's suggestion),
but in some cases it looks like nfs keeps a copy of the value in a struct

hash sizes are already ifdef'd KERNEL, so there aren't userland inpact
from them...
1998-06-30 11:19:22 +00:00
jmg
caa72af181 convert some nfs tunables to options, these are:
NFS_MINATTRTIMO         VREG attrib cache timeout in sec
NFS_MAXATTRTIMO
NFS_MINDIRATTRTIMO      VDIR attrib cache timeout in sec
NFS_MAXDIRATTRTIMO
NFS_GATHERDELAY         Default write gather delay (msec)
NFS_UIDHASHSIZ          Tune the size of nfssvc_sock with this
NFS_WDELAYHASHSIZ       and with this
NFS_MUIDHASHSIZ         Tune the size of nfsmount with this
NFS_NOSERVER            (already documented in LINT)
NFS_DEBUG               turn on NFS debugging

also, because NFS_ROOT is used by very different files, it has been
renamed to opt_nfsroot.h instead of the old opt_nfs.h....
1998-06-30 03:01:37 +00:00
bde
feb3b094a7 Fixed typo in ifdefed code. (NFS_ACDEBUG is not in LINT. Therefore,
code controlled by it did not even compile.)
1998-06-21 12:50:12 +00:00
bde
9beca6bfbc Avoid an egcs pessimization for 64-bit signed division on i386's.
Pre-2.8 versions of gcc generate a call to __divdi3() for all 64-bit
signed divisions, but egcs optimizes them to a shift and fixup when
the divisor is a constant power of 2.  Unfortunately, it generates
a call to __cmpdi2() for the fixup, although all except possibly
ancient versions of gcc and egcs do ordinary 64-bit comparisons
inline.
1998-06-14 15:52:00 +00:00
dfr
92449ad5fa This commit fixes various 64bit portability problems required for
FreeBSD/alpha.  The most significant item is to change the command
argument to ioctl functions from int to u_long.  This change brings us
inline with various other BSD versions.  Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.
1998-06-07 17:13:14 +00:00
peter
7d519e2203 Make sure we go a nfs_fsinfo() in get/putpages before calling
readrpc/writerpc, since they assume it's already been done.  This could
break if the first read/write access to a nfs filesystem was an exec() or
mmap() instead of a read(), write() syscall.  (or statfs()).
nfs_getpages() could return an errno (EOPNOTSUPP) instead of a VM_PAGER_*
return code.  Some layout tweaks for the get/putpages code.
1998-06-01 11:32:53 +00:00
peter
b1ce849a05 Fix post-test pre-commit cleanup typo. 1998-06-01 11:07:16 +00:00
peter
d2ca298269 readlink() returns EINVAL rather than EPERM if called on a non-symlink. 1998-06-01 10:59:23 +00:00
peter
c0e6bda9ea Preset the maximum file size before we get to nfs_fsinfo(), based on
an (over?) conservative assumption about what the client can store in it's
buffer cache using a signed 32-bit 512-byte block number index.  Otherwise
it's possible for some file access when maxfilesize = 0 (eg: /usr is nfs
mounted and doing an execve())
Pointed out by:	 bde

XXX It might make sense to do a preemptive nfs_fsinfo() call at mount time.
1998-06-01 10:01:31 +00:00
peter
f74a201183 For the on-the-wire protocol, u_long -> u_int32_t; long -> int32_t;
int -> int32_t; u_short -> u_int16_t.  Also, use mode_t instead of u_short
for storing modes (mode_t is a u_int16_t).

Obtained from: NetBSD
1998-05-31 20:09:01 +00:00
peter
692d47eec9 Support 'mount -u' remounts. This may require disconnecting and rebinding
the socket.  Certain mode changes are not allowed.

Obtained from:  NetBSD
1998-05-31 19:49:31 +00:00
peter
ff45fa255d xdr encode -1 properly.
Obtained from: NetBSD
1998-05-31 19:29:28 +00:00
peter
2987193b30 Fully fill in nfsv2 write rpc requests rather than leaving garbage.
Obtained from: NetBSD
1998-05-31 19:28:15 +00:00
peter
9a4322a2d0 Don't silently fail to set file flags.
Obtained from:  NetBSD
1998-05-31 19:24:19 +00:00
peter
c531dc87c6 Don't blindly accept the server's preferences if they are too small.
Obtained from:  NetBSD
1998-05-31 19:20:44 +00:00
peter
d46a21d4ce Prototype support for selectively allowing non-reserved ports on a per
export basis.  Needs userland support yet.

Obtained from:  NetBSD
1998-05-31 19:16:08 +00:00
peter
8eceafffb2 Don't pass a second copy of the uid/gid in with the v2/v3 sattr structures,
it just makes more work.  We pass a copy of the uid/gid with the
credentials.  (although, this may need to be revisited if a non AUTHUNIX
authentication method (such as NFSKERB) ever gets implemented).

Obtained from:  NetBSD
1998-05-31 19:00:19 +00:00
peter
434d53c8c0 Use the new SB_UPCALL flag,
Obtained from:  NetBSD (but I changed the flag clear order in case).
1998-05-31 18:46:06 +00:00
peter
f7f311c40b NFS_SMALLFH is defined in nfsproto.h, not sys/mount.h
Obtained from:  NetBSD
1998-05-31 18:32:23 +00:00
peter
624a676561 Don't let the user try "rmdir ."
Obtained from:  NetBSD
1998-05-31 18:30:42 +00:00
peter
cc573f4f48 Don't let the user try and unlink() a directory on a NFS server.
Obtained from:  NetBSD
1998-05-31 18:28:45 +00:00
peter
6bcfa0ac79 When a write rpc returns an error, break the loop.
Obtained from: NetBSD
1998-05-31 18:27:07 +00:00
peter
695d7683f9 Don't leak an mbuf when a write rpc returns zero bytes written.
Obtained from: NetBSD
1998-05-31 18:25:32 +00:00
peter
5c3c04f77a #ifdef a diagnostic printf
Obtained from:  NetBSD
1998-05-31 18:23:24 +00:00
peter
c16cfc54ff Don't try and free mrep twice on some error conditions.
Obtained from:  NetBSD
1998-05-31 18:19:43 +00:00
peter
5223619fcd #ifdef a diagnostic panic, plus another missed costmetic change.
Obtained from:  NetBSD
1998-05-31 18:11:03 +00:00
peter
e6c2fb9a1e We have gained 2 more errno's, add them to the NFSv2 mapping table. 1998-05-31 18:09:18 +00:00
peter
63de139aff Missed a cosmetic change that the other BSD's have. 1998-05-31 18:08:09 +00:00
peter
2929d04ecf oops, nfs_msg() is called from client code too. 1998-05-31 18:06:07 +00:00
peter
a6a122d71b When we can't reconnect a socket, don't forget to unlock before retrying
or we can deadlock.

Obtained from:  NetBSD
1998-05-31 18:02:56 +00:00
peter
76064558f0 Don't log zero length reads, this can happen during normal operation.
Obtained from: NetBSD
1998-05-31 18:00:46 +00:00
peter
4183d70dd3 Consider for readdir chunk sizes when tuning socket buffer reservations.
Obtained from:  NetBSD
1998-05-31 17:57:43 +00:00
peter
353bcc0c75 Some const's
Obtained from: NetBSD
1998-05-31 17:48:07 +00:00
peter
cbaa5d2256 NFS Jumbo commit part 1. Cosmetic and structural changes only. The aim
of this part of commits is to minimize unnecessary differences between
the other NFS's of similar origin.  Yes, there are gratuitous changes here
that the style folks won't like, but it makes the catch-up less difficult.
1998-05-31 17:27:58 +00:00
peter
3b0d915f97 VOP_ABORTUP() appears to be called with the wrong vnode. The other callers
that I checked (eg: ufs_link()) do the ABORTOP on the directory rather than
the file itself.  After Michael Hancock's patches, the abortop doesn't seem
all that critial now since something else will free the pathname buffer.
1998-05-31 01:03:07 +00:00
peter
432cd8e2e7 When using NFSv3, use the remote server's idea of the maximum file size
rather than assuming 2^64.  It may not like files that big. :-)
On the nfs server, calculate and report the max file size as the point
that the block numbers in the cache would turn negative.
(ie: 1099511627775 bytes (1TB)).

One of the things I'm worried about however, is that directory offsets
are really cookies on a NFSv3 server and can be rather large, especially
when/if the server generates the opaque directory cookies by using a local
filesystem offset in what comes out as the upper 32 bits of the 64 bit
cookie.  (a server is free to do this, it could save byte swapping
depending on the native 64 bit byte order)

Obtained from:	NetBSD
1998-05-30 16:33:58 +00:00
peter
adcdd6ebdb Convert a couple of large allocations to use zones rather than malloc
for better packing.  This means that we can choose better values for the
various hash entries without having to try and get it all to fit within
an artificial power of two limit for malloc's sake.
1998-05-24 14:41:56 +00:00
peter
fb32317974 s/flags/flag/ 1998-05-20 08:05:45 +00:00
peter
de5055805c A cleaner fix for PR#5102, clear nonsense flags at mount time rather than
in the core of nfs_bio.c at the 11th hour.

PR:		5102
1998-05-20 08:02:24 +00:00
peter
b4c1590661 Don't change argp->flags after it's been copied. 1998-05-20 07:59:21 +00:00
peter
8c1d4bd15f Allow control of the attribute cache timeouts at mount time.
We had run out of bits in the nfs mount flags, I have moved the internal
state flags into a seperate variable.  These are no longer visible via
statfs(), but I don't know of anything that looks at them.
1998-05-19 07:11:27 +00:00
bde
544ba257e4 Get timespecs directly instead of via timevals. 1998-05-16 16:20:50 +00:00
bde
d87f73393d Don't abuse `+' to combine flags. 1998-05-16 16:03:10 +00:00
bde
886e4ee4a1 Backed out rev.1.76. It just added style bugs. 1998-05-16 15:21:29 +00:00
bde
f72c46e826 Get timespecs directly instead of via timevals. 1998-05-16 15:11:24 +00:00
peter
4005649c78 Add missing arg to vget().. Serves me right for committing a 2.2 patch to
-current without testing it there.. :-(

Submitted by: Michael Hancock <michaelh@cet.co.jp>
1998-05-13 07:49:08 +00:00
peter
bc2286faf9 Hold a reference to the vnode during the sillyrename cleanup. If we block
in nfs_vinvalbuf() or the nfs_removeit(), we can have the nfsnode reallocated
from underneath us (eg: replaced by a ufs 'struct inode') which can cause
disk corruption ('freeing free block' when di_db[5] gets trashed).
This is not a cheap fix, but it'll do until the nfsnodes get reference
counting and/or locking.

Apparently NetBSD have a similar fix (apparently from BSDI).

I wish all PR's had this much useful detail. :-)

PR: 6611
Submitted by: Stephen Clawson <sclawson@marker.cs.utah.edu>
1998-05-13 06:10:13 +00:00
peter
aa1955d6ea Move the *vpp initialization earlier so that it's set in all error cases.
This should stop the 'panic: leaf should not be empty' nfs panic.

PR: 1856
Submitted by: msaitoh@spa.is.uec.ac.jp
1998-05-13 05:47:09 +00:00
msmith
b4bbf78c74 In the words of the submitter:
---------
Make callers of namei() responsible for releasing references or locks
instead of having the underlying filesystems do it.  This eliminates
redundancy in all terminal filesystems and makes it possible for stacked
transport layers such as umapfs or nullfs to operate correctly.

Quality testing was done with testvn, and lat_fs from the lmbench suite.

Some NFS client testing courtesy of Patrik Kudo.

vop_mknod and vop_symlink still release the returned vpp.  vop_rename
still releases 4 vnode arguments before it returns.  These remaining cases
will be corrected in the next set of patches.
---------

Submitted by:	Michael Hancock <michaelh@cet.co.jp>
1998-05-07 04:58:58 +00:00
msmith
54f95d7f2a As described by the submitter:
Reverse the VFS_VRELE patch.  Reference counting of vnodes does not need
to be done per-fs.  I noticed this while fixing vfs layering violations.
Doing reference counting in generic code is also the preference cited by
John Heidemann in recent discussions with him.

The implementation of alternative vnode management per-fs is still a valid
requirement for some filesystems but will be revisited sometime later,
most likely using a different framework.

Submitted by:	Michael Hancock <michaelh@cet.co.jp>
1998-05-06 05:29:41 +00:00
phk
b3b6700ad6 Use random() to find our initial xid. 1998-04-06 11:41:07 +00:00
phk
8396c1d18d Eradicate the variable "time" from the kernel, using various measures.
"time" wasn't a atomic variable, so splfoo() protection were needed
around any access to it, unless you just wanted the seconds part.

Most uses of time.tv_sec now uses the new variable time_second instead.

gettime() changed to getmicrotime(0.

Remove a couple of unneeded splfoo() protections, the new getmicrotime()
is atomic, (until Bruce sets a breakpoint in it).

A couple of places needed random data, so use read_random() instead
of mucking about with time which isn't random.

Add a new nfs_curusec() function.

Mark a couple of bogosities involving the now disappeard time variable.

Update ffs_update() to avoid the weird "== &time" checks, by fixing the
one remaining call that passwd &time as args.

Change profiling in ncr.c to use ticks instead of time.  Resolution is
the same.

Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call
hzto() which subtracts time" sequences.

Reviewed by:	bde
1998-03-30 09:56:58 +00:00
steve
6c05201b50 Don't allow the readdirplus routine to be used in NFS V2.
PR:		5102
Reviewed by:	msmith
Submitted by:	Dmitry Kohmanyuk <dk@farm.org>
1998-03-28 16:05:05 +00:00
bde
26647b5b26 Don't depend on <sys/mount.h> including <sys/socket.h>. 1998-03-28 12:04:40 +00:00
bde
878d31d269 Moved some #includes from <sys/param.h> nearer to where they are actually
used.
1998-03-28 10:33:27 +00:00
tegge
1863fc8e05 Add a BOOTP_WIRED_TO option, for use on machines with multiple network
cards where the first detected card should not be used for bootp.
Submitted by:	Doug Ambrisko <ambrisko@whistle.com>
1998-03-14 04:13:56 +00:00
tegge
c7f60bf6a5 Update workaround for limitations in the arp code.
Adjust the RPC timeout message which occured when the old workaround
broke to show the correct IP address.
1998-03-14 03:25:18 +00:00
julian
3da153eb72 Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman)
Submitted by:	Kirk McKusick (mcKusick@mckusick.com)
Obtained from:  WHistle development tree
1998-03-08 09:59:44 +00:00
dyson
067e84884d This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code.  These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances.  Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code.  This code might have been committed seperately, but
almost everything is interrelated.

1)	Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
	are fully valid.
2)	Rather than deactivating erroneously read initial (header) pages in
	kern_exec, we now free them.
3)	Fix the rundown of non-VMIO buffers that are in an inconsistent
	(missing vp) state.
4)	Fix the disassociation of pages from buffers in brelse.  The previous
	code had rotted and was faulty in a couple of important circumstances.
5)	Remove a gratuitious buffer wakeup in vfs_vmio_release.
6)	Remove a crufty and currently unused cluster mechanism for VBLK
	files in vfs_bio_awrite.  When the code is functional, I'll add back
	a cleaner version.
7)	The page busy count wakeups assocated with the buffer cache usage were
	incorrectly cleaned up in a previous commit by me.  Revert to the
	original, correct version, but with a cleaner implementation.
8)	The cluster read code now tries to keep data associated with buffers
	more aggressively (without breaking the heuristics) when it is presumed
	that the read data (buffers) will be soon needed.
9)	Change to filesystem lockmgr locks so that they use LK_NOPAUSE.  The
	delay loop waiting is not useful for filesystem locks, due to the
	length of the time intervals.
10)	Correct and clean-up spec_getpages.
11)	Implement a fully functional nfs_getpages, nfs_putpages.
12)	Fix nfs_write so that modifications are coherent with the NFS data on
	the server disk (at least as well as NFS seems to allow.)
13)	Properly support MS_INVALIDATE on NFS.
14)	Properly pass down MS_INVALIDATE to lower levels of the VM code from
	vm_map_clean.
15)	Better support the notion of pages being busy but valid, so that
	fewer in-transit waits occur.  (use p->busy more for pageouts instead
	of PG_BUSY.)  Since the page is fully valid, it is still usable for
	reads.
16)	It is possible (in error) for cached pages to be busy.  Make the
	page allocation code handle that case correctly.  (It should probably
	be a printf or panic, but I want the system to handle coding errors
	robustly.  I'll probably add a printf.)
17)	Correct the design and usage of vm_page_sleep.  It didn't handle
	consistancy problems very well, so make the design a little less
	lofty.  After vm_page_sleep, if it ever blocked, it is still important
	to relookup the page (if the object generation count changed), and
	verify it's status (always.)
18)	In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19)	Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20)	Fix vm_pager_put_pages and it's descendents to support an int flag
	instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
msmith
14bb3dadc6 Trivial filesystem getpages/putpages implementations, set the second.
These should be considered the first steps in a work-in-progress.
Submitted by:	Terry Lambert <terry@freebsd.org>
1998-03-06 09:46:52 +00:00
msmith
0656734d76 The intent is to get rid of WILLRELE in vnode_if.src by making
a complement to all ops that return a vpp, VFS_VRELE.  This is
initially only for file systems that implement the following ops
that do a WILLRELE:

	vop_create, vop_whiteout, vop_mknod, vop_remove, vop_link,
	vop_rename, vop_mkdir, vop_rmdir, vop_symlink

This is initial DNA that doesn't do anything yet.  VFS_VRELE is
implemented but not called.

A default vfs_vrele was created for fs implementations that use the
standard vnode management routines.

VFS_VRELE implementations were made for the following file systems:

Standard (vfs_vrele)
	ffs mfs nfs msdosfs devfs ext2fs

Custom
	union umapfs

Just EOPNOTSUPP
	fdesc procfs kernfs portal cd9660

These implementations may change as VOP changes are implemented.

In the next phase, in the vop implementations calls to vrele and the vrele
part of vput will be moved to the top layer vfs_vnops and made visible
to all layers.  vput will be replaced by unlock in these cases.  Unlocking
will still be done in the per fs layer but the refcount decrement will be
triggered at the top because it doesn't hurt to hold a vnode reference a
little longer.  This will have minimal impact on the structure of the
existing code.

This will only be done for vnode arguments that are released by the various
fs vop implementations.

Wider use of VFS_VRELE will likely require restructuring of the code.

Reviewed by:	phk, dyson, terry et. al.
Submitted by:	Michael Hancock <michaelh@cet.co.jp>
1998-03-01 22:46:53 +00:00
eivind
86354cd8fc Staticize. 1998-02-09 06:11:36 +00:00
eivind
15aa079292 Back out DIAGNOSTIC changes. 1998-02-06 12:14:30 +00:00
dyson
95e3a3ee03 Fix an omission of a line from the previous commit to this file. The
problem appeared to be an NFS hang.
1998-02-05 16:40:57 +00:00
eivind
d8f3bc5b0e Turn DIAGNOSTIC into a new-style option. 1998-02-04 22:34:03 +00:00
bde
f3dd4c35fa Forward declare some structs so that this file is more self-sufficient. 1998-02-03 21:52:02 +00:00
bde
5756d7aa6c Moved declaration of `union nethostadr' outside of the KERNEL section,
to give pollution compatible with <nfs/nqfs.h>.  At least mount_nfs.c
previously had to #define KERNEL before including <nfs/nfs.h> to get
this pollution, but this gave other pollution.

Moved comment about NFSINT_SIGMASK to immediately before the code that
it applies to.
1998-02-01 21:23:29 +00:00
bde
814840a7b3 Forward declare more structs that are used in prototypes here - don't
depend on <sys/types.h> forward declaring common ones.

Added an underscore to `sin' in prototypes to avoid warnings for the
conflict with the ANSI sin().
1998-02-01 20:34:07 +00:00
tegge
f070ec86c5 Release the buffer when an error occurs while reading directory entries. 1998-01-31 01:27:18 +00:00
dyson
beea50f626 Various NFS fixes:
Make vfs_bio buffer mgmt work better.
	Buffers were being used after brelse.
	Make nfs_getpages work independently of other NFS
		interfaces.  This eliminates some difficult
		recursion problems and decreases pagefault
		overhead.
	Remove an erroneous vfs_unbusy_pages.
	Fix a reentrancy problem, with nfs_vinvalbuf when
		vnode is already being rundown.
	Reassignbuf wasn't being called when needed under
		certain circumstances.

	(Thanks to Bill Paul for help.)
1998-01-25 06:24:09 +00:00