Commit Graph

80 Commits

Author SHA1 Message Date
Garrett Wollman
a29f300e80 The long-awaited mega-massive-network-code- cleanup. Part I.
This commit includes the following changes:
1) Old-style (pr_usrreq()) protocols are no longer supported, the compatibility
glue for them is deleted, and the kernel will panic on boot if any are compiled
in.

2) Certain protocol entry points are modified to take a process structure,
so they they can easily tell whether or not it is possible to sleep, and
also to access credentials.

3) SS_PRIV is no more, and with it goes the SO_PRIVSTATE setsockopt()
call.  Protocols should use the process pointer they are now passed.

4) The PF_LOCAL and PF_ROUTE families have been updated to use the new
style, as has the `raw' skeleton family.

5) PF_LOCAL sockets now obey the process's umask when creating a socket
in the filesystem.

As a result, LINT is now broken.  I'm hoping that some enterprising hacker
with a bit more time will either make the broken bits work (should be
easy for netipx) or dike them out.
1997-04-27 20:01:29 +00:00
Doug Rabson
9aa2858d44 Fix broken usage of nm_readdirsize and increase the socket buffers for UDP
to prevent possible socket overflows.

2.2 candidate.

PR:		kern/3304
Reviewed by:	Thomas David Rivers <ponds!rivers@dg-rtp.dg.com>
1997-04-22 17:38:01 +00:00
Doug Rabson
4ba14e3a10 Fix various bugs in the locking protocol, allowing proper shared locks
to be used.  This should fix the lock panics that people are seeing.
1997-04-04 17:49:35 +00:00
Bruce Evans
b445591810 Removed #include of <ufs/ufs/dir.h>. Nfs no longer depends on any ufs
features, and the one thing that it depended on (DIRBLKSIZ) now has
conflicting spelling.
1997-03-29 12:40:20 +00:00
Bruce Evans
00780cef44 Define our own version of DIRBLKSIZ instead of (ab)using ufs's value.
Use the same value of 512 (ufs actually uses DEV_BSIZE).  There are
too many versions of DIRBLKSIZ, one for ufs, one for ext2fs, one for
nfs, one for ibcs2, one for linux, one for applications, ... I think
nfs's DIRBLKSIZ needs to be a divisor of the directory blocks sizes
of all supported file systems.  There is also NFS_DIRBLKSIZ, which is
different from nfs's DIRBLKSIZ but is sometimes confused with it in
comments.

Removed a bogus #ifdef KERNEL that hid the tunable constants for nfs.
This came in undocumented with the Lite2 merge although it isn't in
Lite2.  It required more-bogus #define KERNEL's in fstat and pstat
to make the constants visible.

Restored a spelling fix from rev.1.17.

Removed duplicate #defines of all the the NFS mount option flags.
1997-03-29 12:34:33 +00:00
Guido van Rooij
394da4c167 Add code that will reject nfs requests in teh kernel from nonprivileged
ports. This option will be automatically set/cleraed when mount is run
without/with the -n option.
Reviewed by:	Doug Rabson
1997-03-27 20:01:07 +00:00
Peter Wemm
476b25e22e Use the correct (relative to the implementation) ordering of args in
the VOP_LINK() calls, Closes PR#3064

Submitted by: bde
1997-03-25 05:13:40 +00:00
Peter Wemm
289c56e81e The local fs interface does not allow link()/unlink() of directories,
do not allow a remote nfs client to cause local fs corruption either.
1997-03-25 05:08:28 +00:00
Bruce Evans
3c81694426 Fixed some invalid (non-atomic) accesses to `time', mostly ones of the
form `tv = time'.  Use a new function gettime().  The current version
just forces atomicicity without fixing precision or efficiency bugs.
Simplified some related valid accesses by using the central function.
1997-03-22 06:53:45 +00:00
Peter Wemm
6875d25465 Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.
1997-02-22 09:48:43 +00:00
John Dyson
996c772f58 This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
		Mount_std mounts will not work until the getfsent
		library routine is changed.

Reviewed by:	various people
Submitted by:	Jeffery Hsu <hsu@freebsd.org>
1997-02-10 02:22:35 +00:00
Jordan K. Hubbard
1130b656e5 Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore.  This update would have been
insane otherwise.
1997-01-14 07:20:47 +00:00
Doug Rabson
f438ae02f5 Improve the queuing algorithms used by NFS' asynchronous i/o. The
existing mechanism uses a global queue for some buffers and the
vp->b_dirtyblkhd queue for others.  This turns sequential writes into
randomly ordered writes to the server, affecting both read and write
performance.  The existing mechanism also copes badly with hung
servers, tending to block accesses to other servers when all the iods
are waiting for a hung server.

The new mechanism uses a queue for each mount point.  All asynchronous
i/o goes through this queue which preserves the ordering of requests.
A simple mechanism ensures that the iods are shared out fairly between
active mount points.  This removes the sysctl variable vfs.nfs.dwrite
since the new queueing mechanism removes the old delayed write code
completely.

This should go into the 2.2 branch.
1996-11-06 10:53:16 +00:00
Doug Rabson
f31dba4c5d This fixes a problem with the nfs socket handling code which happens
if a single process is performing a large number of requests (in this
case writing a large file).  The writing process could monopolise the
recieve lock and prevent any other processes from recieving their
replies.

It also adds a new sysctl variable 'vfs.nfs.dwrite' which controls the
behaviour which originally pointed out the problem.  When a process
writes to a file over NFS, it usually arranges for another process
(the 'iod') to perform the request.  If no iods are available, then it
turns the write into a 'delayed write' which is later picked up by the
next iod to do a write request for that file.  This can cause that
particular iod to do a disproportionate number of requests from a
single process which can harm performance on some NFS servers.  The
alternative is to perform the write synchronously in the context of
the original writing process if no iod is avaiable for asynchronous
writing.

The 'delayed write' behaviour is selected when vfs.nfs.dwrite=1 and
the non-delayed behaviour is selected when vfs.nfs.dwrite=0.  The
default is vfs.nfs.dwrite=1; if many people tell me that performance
is better if vfs.nfs.dwrite=0 then I will change the default.

Submitted by:	Hidetoshi Shimokawa <simokawa@sat.t.u-tokyo.ac.jp>
1996-10-11 10:15:33 +00:00
Nate Williams
030e2e9ebb In sys/time.h, struct timespec is defined as:
/*
         * Structure defined by POSIX.4 to be like a timeval.
         */
        struct timespec {
                time_t  ts_sec;         /* seconds */
                long    ts_nsec;        /* and nanoseconds */
        };

        The correct names of the fields are tv_sec and tv_nsec.

Reminded by:	James Drobina <jdrobina@infinet.com>
1996-09-19 18:21:32 +00:00
David Greenman
0247363f69 Release an unneeded reference to a vnode that was gained in a VFS_VGET().
Fixes a readdirplus panic.

Submitted by:	Doug Rabson <dfr@render.com>
1996-09-05 07:58:04 +00:00
Bruce Evans
b71fec07db Eliminated nested include of <sys/unistd.h> in <sys/file.h> in the kernel.
Include it directly in the few places where it is used.

Reduced some #includes of <sys/file.h> to #includes of <sys/fcntl.h> or
nothing.
1996-09-03 14:25:27 +00:00
John Dyson
6476c0d204 Even though this looks like it, this is not a complex code change.
The interface into the "VMIO" system has changed to be more consistant
and robust.  Essentially, it is now no longer necessary to call vn_open
to get merged VM/Buffer cache operation, and exceptional conditions
such as merged operation of VBLK devices is simpler and more correct.

This code corrects a potentially large set of problems including the
problems with ktrace output and loaded systems, file create/deletes,
etc.

Most of the changes to NFS are cosmetic and name changes, eliminating
a layer of subroutine calls.  The direct calls to vput/vrele have
been re-instituted for better cross platform compatibility.

Reviewed by: davidg
1996-08-21 21:56:23 +00:00
Doug Rabson
09c6884729 Various fixes from frank@fwi.uva.nl (Frank van der Linden) via
rick@snowhite.cis.uoguelph.ca:

1. Clear B_NEEDCOMMIT in nfs_write to make sure that dirty data is
correctly send to the server.  If a buffer was dirtied when it was in
the B_DELWRI+B_NEEDCOMMIT state, the state of the buffer was left
unchanged and when the buffer was later cleaned, just a commit rpc was
made to the server to complete the previous write.  Clearing
B_NEEDCOMMIT ensures that another write is made to the server.

2. If a server returned a server (for whatever reason) returned an
answer to a write RPC that implied that fewer bytes than requested
were written, bad things would happen.

3. The setattr operation passed on the atime in stead of the mtime to
the server. The fix is trivial.

4. XIDs always started at 0, but this caused some servers (older DEC
OSF/1 3.0 so I've been told) who had very long-lasting XID caches to
get confused if, after a reboot of a BSD client, RPCs came in with a
XID that had in the past been used before from that client. Patch is
to use the current time in seconds as a starting point for XIDs. The
patch below is not perfect, because it requires the root fs to be
mounted first. This is because of the check BSD systems do, comparing
FS time to system time.

Reviewed by:	Bruce Evans, Terry Lambert.
Obtained from:  frank@fwi.uva.nl (Frank van der Linden) via rick@snowhite.cis.uoguelph.ca
1996-07-16 10:19:45 +00:00
Garrett Wollman
2c37256e5a Modify the kernel to use the new pr_usrreqs interface rather than the old
pr_usrreq mechanism which was poorly designed and error-prone.  This
commit renames pr_usrreq to pr_ousrreq so that old code which depended on it
would break in an obvious manner.  This commit also implements the new
interface for TCP, although the old function is left as an example
(#ifdef'ed out).  This commit ALSO fixes a longstanding bug in the
TCP timer processing (introduced by davidg on 1995/04/12) which caused
timer processing on a TCB to always stop after a single timer had
expired (because it misinterpreted the return value from tcp_usrreq()
to indicate that the TCB had been deleted).  Finally, some code
related to polling has been deleted from if.c because it is not
relevant t -current and doesn't look at all like my current code.
1996-07-11 16:32:50 +00:00
Bruce Evans
8cd5acbce0 Don't truncate minor or major numbers in the nfsv3 client. 1996-06-23 17:19:25 +00:00
Poul-Henning Kamp
5b28a6011f Fix for NFS_NOSERVER
Poul mentioned that he thought this was some kind of timing problem, and
that started me thinking. After a little poking around, I found that
nfs_timer() was completely disabled when NFS_NOSERVER was #defined.
But after looking at nfs_timer(), it seemed like it was something
required by both the client and server code, and disabling it outright
just didn't seem to make any sense. Parts of it relate only to the
NFS server side code, so I disabled those, but I re-enabled the rest
of the function and made sure that it would be called from nfs_init()
(in nfs_subs.c).

With nfs_timer() re-enabled, everything seems to work again. The only
other changes I made were to #ifdef away some variable declarations
in the NFS_NOSERVER case so that gcc would stop complaining about
unused variables.

Reviewed by:	phk
Submitted by:	Bill Paul <wpaul@skynet.ctr.columbia.edu>
1996-06-14 11:13:21 +00:00
Bruce Evans
39a3579443 Fixed a vnode reference leak in nfsrv_rename(). The target inode wasn't
released until the file system was unmounted.  This bug also affected
kern/vfs_syscalls.c but was fixed in rev.1.18 and rev.1.20 there.

Reviewed by:	davidg
1996-06-08 12:16:26 +00:00
Bruce Evans
71d96b71c2 #include <sys/filedesc.h> explicitly instead of depending on it being
bogusly included by <sys/socketvar.h>.
1996-04-30 23:26:52 +00:00
Bruce Evans
21e0797227 Fixed nfs sysctls. They missed out on the fs -> vfs name changes from
Lite2.  This broke nfsstat.
1996-04-30 23:23:09 +00:00
Garrett Wollman
dc915e7cfc Kill XNS.
While we're at it, fix socreate() to take a process argument.  (This
was supposed to get committed days ago...)
1996-02-13 18:16:31 +00:00
Mike Pritchard
6c5e9bbdf5 Fix a bunch of spelling errors in the comment fields of
a bunch of system include files.
1996-01-30 23:02:38 +00:00
John Dyson
bd7e5f992e Eliminated many redundant vm_map_lookup operations for vm_mmap.
Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish
	overhead for merged cache.
Efficiency improvement for vfs_cluster.  It used to do alot of redundant
	calls to cluster_rbuild.
Correct the ordering for vrele of .text and release of credentials.
Use the selective tlb update for 486/586/P6.
Numerous fixes to the size of objects allocated for files.  Additionally,
	fixes in the various pagers.
Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs.
Fixes in the swap pager for exhausted resources.  The pageout code
	will not as readily thrash.
Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into
	page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE),
	thereby improving efficiency of several routines.
Eliminate even more unnecessary vm_page_protect operations.
Significantly speed up process forks.
Make vm_object_page_clean more efficient, thereby eliminating the pause
	that happens every 30seconds.
Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the
	case of filesystems mounted async.
Fix a panic with busy pages when write clustering is done for non-VMIO
	buffers.
1996-01-19 04:00:31 +00:00
Poul-Henning Kamp
99cb299316 Add an option NFS_NOSERVER which saves 100K in the install kernel (or
any other kernel that uses it).  Use with option NFS.
1996-01-13 23:27:58 +00:00
Poul-Henning Kamp
b8dce649f1 Staticize. 1995-12-17 21:14:36 +00:00
David Greenman
efeaf95a41 Untangled the vm.h include file spaghetti. 1995-12-07 12:48:31 +00:00
Bruce Evans
dee6b0ab68 Completed function declarations and/or added prototypes and/or moved
prototypes to the right place.
1995-12-03 10:03:12 +00:00
Bruce Evans
55054f3540 Completed function declarations, added prototypes and removed redundant
declarations.
1995-11-21 15:51:39 +00:00
Bruce Evans
512fef80a9 Completed function declarations and/or added prototypes. 1995-11-21 12:55:26 +00:00
Bruce Evans
e4f937b07b Included <sys/sysproto.h> to get central declarations for syscall args
structs and prototypes for syscalls.

Ifdefed duplicated decentralized declarations of args structs.  It's
convenient to have this visible but they are hard to maintain.  Some
are already different from the central declarations.  4.4lite2 puts
them in comments in the function headers but I wanted to avoid the
large changes for that.
1995-11-14 05:16:37 +00:00
Joerg Wunsch
e046098fa9 Include a prerequisite header (so this is consistent again with the
NFSv2 state).
1995-10-31 21:17:59 +00:00
Poul-Henning Kamp
a98ca4699e Second batch of cleanup changes.
This time mostly making a lot of things static and some unused
variables here and there.
1995-10-29 15:33:36 +00:00
David Greenman
c0c06a67d2 Added NFS_ASYNC kernel option. It only has an effect for NFSv2. 1995-08-24 11:39:31 +00:00
David Greenman
dcc84850a7 Killed redundant declarations of nfsm_rpchead(). 1995-08-24 11:04:04 +00:00
Doug Rabson
c3b2cc769c Some fixes found using gcc -Wall:
nfsm_rpchead() has been called with the wrong number of args and misplaced
args since someone added new args in the middle for nfsv3.

Here's another one that would be important on 64-bit systems.  VOP_READDIR
takes a `u_int **cookies' arg.

Submitted by:	Bruce Evans <bde@zeta.org.au>
1995-08-24 10:45:16 +00:00
Doug Rabson
27df97742b Add support for amd direct maps.
Reviewed by:	Thomas Graichen <graichen@sirius.physik.fu-berlin.de>
1995-08-24 10:17:39 +00:00
David Greenman
75d8591e04 Fixed bug where vnode_pager_uncache() wasn't always called when it should
be. The result was that the file's space wouldn't be properly freed when
it was deleted.

Submitted by:	John Dyson
1995-08-06 11:55:25 +00:00
Doug Rabson
7faccad982 Slight changes to locking around VOP_READRIR.
Detect in nfsrv_readdirplus when a filesystem soes not support VFS_VGET and
return NFSERR_NOTSUPP so that the client will use ordinary readdir instead.
1995-08-03 12:14:16 +00:00
Doug Rabson
a2c06d4685 Lock the directory vnode before VOP_READDIR in nfsrv_readdirplus 1995-08-02 10:12:47 +00:00
David Greenman
4777741358 Removed my special-case hack for VOP_LINK and fixed the problem with the
wrong vp's ops vector being used by changing the VOP_LINK's argument order.
The special-case hack doesn't go far enough and breaks the generic
bypass routine used in some non-leaf filesystems. Pointed out by Kirk
McKusick.
1995-08-01 18:51:02 +00:00
Bruce Evans
28f8db1403 Eliminate sloppy common-style declarations. There should be none left for
the LINT configuation.
1995-07-29 11:44:31 +00:00
David Greenman
24aa09cd4f vnode_pager_alloc() never returns NULL, so don't check for it. 1995-07-20 09:43:12 +00:00
David Greenman
24a1cce34f NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct
proc or any VM system structure will have to be rebuilt!!!

Much needed overhaul of the VM system. Included in this first round of
changes:

1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages,
   haspage, and sync operations are supported. The haspage interface now
   provides information about clusterability. All pager routines now take
   struct vm_object's instead of "pagers".

2) Improved data structures. In the previous paradigm, there is constant
   confusion caused by pagers being both a data structure ("allocate a
   pager") and a collection of routines. The idea of a pager structure has
   escentially been eliminated. Objects now have types, and this type is
   used to index the appropriate pager. In most cases, items in the pager
   structure were duplicated in the object data structure and thus were
   unnecessary. In the few cases that remained, a un_pager structure union
   was created in the object to contain these items.

3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now
   be removed. For instance, vm_object_enter(), vm_object_lookup(),
   vm_object_remove(), and the associated object hash list were some of the
   things that were removed.

4) simple_lock's removed. Discussion with several people reveals that the
   SMP locking primitives used in the VM system aren't likely the mechanism
   that we'll be adopting. Even if it were, the locking that was in the code
   was very inadequate and would have to be mostly re-done anyway. The
   locking in a uni-processor kernel was a no-op but went a long way toward
   making the code difficult to read and debug.

5) Places that attempted to kludge-up the fact that we don't have kernel
   thread support have been fixed to reflect the reality that we are really
   dealing with processes, not threads. The VM system didn't have complete
   thread support, so the comments and mis-named routines were just wrong.
   We now use tsleep and wakeup directly in the lock routines, for instance.

6) Where appropriate, the pagers have been improved, especially in the
   pager_alloc routines. Most of the pager_allocs have been rewritten and
   are now faster and easier to maintain.

7) The pagedaemon pageout clustering algorithm has been rewritten and
   now tries harder to output an even number of pages before and after
   the requested page. This is sort of the reverse of the ideal pagein
   algorithm and should provide better overall performance.

8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup
   have been removed. Some other unnecessary casts have also been removed.

9) Some almost useless debugging code removed.

10) Terminology of shadow objects vs. backing objects straightened out.
    The fact that the vm_object data structure escentially had this
    backwards really confused things. The use of "shadow" and "backing
    object" throughout the code is now internally consistent and correct
    in the Mach terminology.

11) Several minor bug fixes, including one in the vm daemon that caused
    0 RSS objects to not get purged as intended.

12) A "default pager" has now been created which cleans up the transition
    of objects to the "swap" type. The previous checks throughout the code
    for swp->pg_data != NULL were really ugly. This change also provides
    the rudiments for future backing of "anonymous" memory by something
    other than the swap pager (via the vnode pager, for example), and it
    allows the decision about which of these pagers to use to be made
    dynamically (although will need some additional decision code to do
    this, of course).

13) (dyson) MAP_COPY has been deprecated and the corresponding "copy
    object" code has been removed. MAP_COPY was undocumented and non-
    standard. It was furthermore broken in several ways which caused its
    behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will
    continue to work correctly, but via the slightly different semantics
    of MAP_PRIVATE.

14) (dyson) Sharing maps have been removed. It's marginal usefulness in a
    threads design can be worked around in other ways. Both #12 and #13
    were done to simplify the code and improve readability and maintain-
    ability. (As were most all of these changes)

TODO:

1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing
   this will reduce the vnode pager to a mere fraction of its current size.

2) Rewrite vm_fault and the swap/vnode pagers to use the clustering
   information provided by the new haspage pager interface. This will
   substantially reduce the overhead by eliminating a large number of
   VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be
   improved to provide both a "behind" and "ahead" indication of
   contiguousness.

3) Implement the extended features of pager_haspage in swap_pager_haspage().
   It currently just says 0 pages ahead/behind.

4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps
   via a much more general mechanism that could also be used for disk
   striping of regular filesystems.

5) Do something to improve the architecture of vm_object_collapse(). The
   fact that it makes calls into the swap pager and knows too much about
   how the swap pager operates really bothers me. It also doesn't allow
   for collapsing of non-swap pager objects ("unnamed" objects backed by
   other pagers).
1995-07-13 08:48:48 +00:00
David Greenman
06cb725951 Moved call to VOP_GETATTR() out of vnode_pager_alloc() and into the places
that call vnode_pager_alloc() so that a failure return can be dealt with.
This fixes a panic seen on NFS clients when a file being opened is deleted
on the server before the open completes.
1995-07-09 06:58:03 +00:00
David Greenman
aa2cabb958 1) Converted v_vmdata to v_object.
2) Removed unnecessary vm_object_lookup()/pager_cache(object, TRUE) pairs
   after vnode_pager_alloc() calls - the object is already guaranteed to be
   persistent.
3) Removed some gratuitous casts.
1995-06-28 12:01:13 +00:00