Commit Graph

648 Commits

Author SHA1 Message Date
Kris Kennaway
bf61e26696 Fix some signed/unsigned integer confusion, and add bounds checking of
arguments to some functions.

Obtained from:	NetBSD
Reviewed by:	peter
MFC after:	2 weeks
2001-09-10 11:28:07 +00:00
Matthew Dillon
4e174404a3 Pushdown Giant for nfs syscalls (nfssvc()) 2001-08-31 22:39:36 +00:00
Andrey A. Chernov
f6bf1abc1b Stupid error from my side in prev. commit: || -> && 2001-08-23 18:02:29 +00:00
Andrey A. Chernov
e02faad5ca Implement l_len<0 per POSIX check.
Check for valid l_whence too.
2001-08-23 16:13:59 +00:00
Andrey A. Chernov
6c3f4fef64 Even better move: suppose that server is able to handle SEEK_END,
so check arguments for all but not SEEK_END case, leaving SEEK_END
handling for server
2001-08-23 14:21:26 +00:00
Andrey A. Chernov
e018907ed4 Apparently SEEK_END locking not supported by NFS. Previous variant
returns EINVAL in that case, change it to EOPNOTSUPP.
2001-08-23 14:09:16 +00:00
Andrey A. Chernov
fb2f187058 Move <machine/*> after <sys/*>
Pointed by:	bde
2001-08-23 13:27:58 +00:00
Andrey A. Chernov
e9d095afdc adv. lock:
detect off_t overflow _before_ it occurse and return EOVERFLOW instead of
EINVAL
2001-08-23 08:20:21 +00:00
Ian Dowse
02b31a0ee9 Fix a client-side memory leak in nfs_flush(). The code allocates
a temporary array to store struct buf pointers if the list doesn't
fit in a local array. Usually it frees the array when finished,
but if it jumps to the 'again' label and the new list does fit in
the local array then it can forget to free a previously malloc'd
M_TEMP memory.

Move the free() up a line so that it frees any previously allocated
memory whether or not it needs to malloc a new array.

Reviewed by:	dillon
2001-08-01 10:25:13 +00:00
Peter Wemm
7b141d5db3 Check the filehandle size when mounting.
Obtained from:  Constantine Sapuntzakis <csapuntz@openbsd.org>
2001-07-30 20:01:59 +00:00
John Baldwin
617e358cdf - Sort includes.
- Update vmmeter statistics for vnode pagein/pageouts in getpages/putpages.
2001-07-04 20:14:59 +00:00
Matthew Dillon
0cddd8f023 With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage).  Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.
2001-07-04 16:20:28 +00:00
John Baldwin
bc2327c310 - Protect the mnt_vnode list with the mntvnode lock.
- Use queue(9) macros.
2001-06-28 04:10:07 +00:00
Jake Burkholder
d389ead74f Unlock the process returned from pfind() if it does not return NULL.
This fixes a witness lock violation for nfssvc returning with locks
held.

Submitted by:	Jean-Luc Richier <Jean-Luc.Richier@imag.fr>
PR:		kern/27776
2001-06-01 01:30:51 +00:00
Robert Watson
b1fc0ec1a7 o Merge contents of struct pcred into struct ucred. Specifically, add the
real uid, saved uid, real gid, and saved gid to ucred, as well as the
  pcred->pc_uidinfo, which was associated with the real uid, only rename
  it to cr_ruidinfo so as not to conflict with cr_uidinfo, which
  corresponds to the effective uid.
o Remove p_cred from struct proc; add p_ucred to struct proc, replacing
  original macro that pointed.
  p->p_ucred to p->p_cred->pc_ucred.
o Universally update code so that it makes use of ucred instead of pcred,
  p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo,
  cr_{r,sv}{u,g}id instead of p_*, etc.
o Remove pcred0 and its initialization from init_main.c; initialize
  cr_ruidinfo there.
o Restruction many credential modification chunks to always crdup while
  we figure out locking and optimizations; generally speaking, this
  means moving to a structure like this:
        newcred = crdup(oldcred);
        ...
        p->p_ucred = newcred;
        crfree(oldcred);
  It's not race-free, but better than nothing.  There are also races
  in sys_process.c, all inter-process authorization, fork, exec, and
  exit.
o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid;
  remove comments indicating that the old arrangement was a problem.
o Restructure exec1() a little to use newcred/oldcred arrangement, and
  use improved uid management primitives.
o Clean up exit1() so as to do less work in credential cleanup due to
  pcred removal.
o Clean up fork1() so as to do less work in credential cleanup and
  allocation.
o Clean up ktrcanset() to take into account changes, and move to using
  suser_xxx() instead of performing a direct uid==0 comparision.
o Improve commenting in various kern_prot.c credential modification
  calls to better document current behavior.  In a couple of places,
  current behavior is a little questionable and we need to check
  POSIX.1 to make sure it's "right".  More commenting work still
  remains to be done.
o Update credential management calls, such as crfree(), to take into
  account new ruidinfo reference.
o Modify or add the following uid and gid helper routines:
      change_euid()
      change_egid()
      change_ruid()
      change_rgid()
      change_svuid()
      change_svgid()
  In each case, the call now acts on a credential not a process, and as
  such no longer requires more complicated process locking/etc.  They
  now assume the caller will do any necessary allocation of an
  exclusive credential reference.  Each is commented to document its
  reference requirements.
o CANSIGIO() is simplified to require only credentials, not processes
  and pcreds.
o Remove lots of (p_pcred==NULL) checks.
o Add an XXX to authorization code in nfs_lock.c, since it's
  questionable, and needs to be considered carefully.
o Simplify posix4 authorization code to require only credentials, not
  processes and pcreds.  Note that this authorization, as well as
  CANSIGIO(), needs to be updated to use the p_cansignal() and
  p_cansched() centralized authorization routines, as they currently
  do not take into account some desirable restrictions that are handled
  by the centralized routines, as well as being inconsistent with other
  similar authorization instances.
o Update libkvm to take these changes into account.

Obtained from:	TrustedBSD Project
Reviewed by:	green, bde, jhb, freebsd-arch, freebsd-audit
2001-05-25 16:59:11 +00:00
John Baldwin
ce70e0a964 Assert Giant is held by the caller rather than getting it and releasing
it in getpages/putpages.
2001-05-23 22:26:05 +00:00
Ruslan Ermilov
99d300a1ec - FDESC, FIFO, NULL, PORTAL, PROC, UMAP and UNION file
systems were repo-copied from sys/miscfs to sys/fs.

- Renamed the following file systems and their modules:
  fdesc -> fdescfs, portal -> portalfs, union -> unionfs.

- Renamed corresponding kernel options:
  FDESC -> FDESCFS, PORTAL -> PORTALFS, UNION -> UNIONFS.

- Install header files for the above file systems.

- Removed bogus -I${.CURDIR}/../../sys CFLAGS from userland
  Makefiles.
2001-05-23 09:42:29 +00:00
Alfred Perlstein
2395531439 Introduce a global lock for the vm subsystem (vm_mtx).
vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb
2001-05-19 01:28:09 +00:00
Ian Dowse
0864ef1e8a Change the second argument of vflush() to an integer that specifies
the number of references on the filesystem root vnode to be both
expected and released. Many filesystems hold an extra reference on
the filesystem root vnode, which must be accounted for when
determining if the filesystem is busy and then released if it isn't
busy. The old `skipvp' approach required individual filesystem
xxx_unmount functions to re-implement much of vflush()'s logic to
deal with the root vnode.

All 9 filesystems that hold an extra reference on the root vnode
got the logic wrong in the case of forced unmounts, so `umount -f'
would always fail if there were any extra root vnode references.
Fix this issue centrally in vflush(), now that we can.

This commit also fixes a vnode reference leak in devfs, which could
result in idle devfs filesystems that refuse to unmount.

Reviewed by:	phk, bp
2001-05-16 18:04:37 +00:00
Mark Murray
fb919e4d5a Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by:	bde (with reservations)
2001-05-01 08:13:21 +00:00
Poul-Henning Kamp
b7ebffbc08 Add a vop_stdbmap(), and make it part of the default vop vector.
Make 7 filesystems which don't really know about VOP_BMAP rely
on the default vector, rather than more or less complete local
vop_nopbmap() implementations.
2001-04-29 11:48:41 +00:00
Alfred Perlstein
f411fba5d3 Remove incorrect comment.
Submitted by: quinot@inf.enst.fr <quinot@inf.enst.fr>
PR: kern/26893
2001-04-29 03:10:24 +00:00
Greg Lehey
60fb0ce365 Revert consequences of changes to mount.h, part 2.
Requested by:	bde
2001-04-29 02:45:39 +00:00
Greg Lehey
d98dc34f52 Correct #includes to work with fixed sys/mount.h. 2001-04-23 09:05:15 +00:00
Alfred Perlstein
d8d5fa8805 vnode_pager_freepage() is really vm_page_free() in disguise,
nuke vnode_pager_freepage() and replace all calls to it with vm_page_free()
2001-04-19 06:18:23 +00:00
Alfred Perlstein
603c86672c Implement client side NFS locks.
Obtained from: BSD/os
Import Ok'd by: mckusick, jkh, motd on builder.freebsd.org
2001-04-17 20:45:23 +00:00
Poul-Henning Kamp
f84e29a06c This patch removes the VOP_BWRITE() vector.
VOP_BWRITE() was a hack which made it possible for NFS client
side to use struct buf with non-bio backing.

This patch takes a more general approach and adds a bp->b_op
vector where more methods can be added.

The success of this patch depends on bp->b_op being initialized
all relevant places for some value of "relevant" which is not
easy to determine.  For now the buffers have grown a b_magic
element which will make such issues a tiny bit easier to debug.
2001-04-17 08:56:39 +00:00
Peter Wemm
9d10eb0c0c Create debug.hashstat.[raw]nchash and debug.hashstat.[raw]nfsnode to
enable easy access to the hash chain stats.  The raw prefixed versions
dump an integer array to userland with the chain lengths.  This cheats
and calls it an array of 'struct int' rather than 'int' or sysctl -a
faithfully dumps out the 128K array on an average machine.  The non-raw
versions return 4 integers: count, number of chains used, maximum chain
length, and percentage utilization (fixed point, multiplied by 100).
The raw forms are more useful for analyzing the hash distribution, while
the other form can be read easily by humans and stats loggers.
2001-04-11 00:39:20 +00:00
Robert Watson
2955f0b360 o Rather than arbitrarily construct a credential in the nfs_statfs()
VFS operation, make use of the calling process's credential.  This
  solution may not be ideal (there are a number of other possible
  proposals, including making use of the proc0 credential, adding a
  credential argument to the VFSOP, and switching from a hard-coded
  ucred to a hard-coded nfscred), it is simple and appears to
  work.  The arguments against using simply crget() are fairly
  strong: it is the only place in the code (other than a nearly
  identical invocation in ncp) where crget() is invoked, other than
  in the process credential creation code; as ucred becomes extensible,
  this use of crget() without appropriate context results in less and
  less meaningful credential data.  The implementation here will
  probably be tweaked as a result of experimentation and further
  exploration of the requirements.  In the mean-time, it allows
  progress to be made in ucred expansion for new security models without
  causing a crash every time df is used on an NFS mounted file system.

  This code has been interop tested against FreeBSD and Solaris NFS
  servers.  While using the process credentials should not introduce
  interop problems, please let me know if any turn out to exist.

Reviewed by:	freebsd-arch
2001-04-05 06:12:38 +00:00
Peter Wemm
439fea92c2 Use the same API as the example code.
Allow the initial hash value to be passed in, as the examples do.
Incrementally hash in the dvp->v_id (using the official api) rather than
add it.  This seems to help power-of-two predictable filename trees
where the filenames repeat on a power-of-two cycle and the directory trees
have power-of-two components in it.  The simple add then mask was causing
things like 12000+ entry collision chains while most other entries have
between 0 and 3 entries each.  This way seems to improve things.
2001-03-20 02:10:18 +00:00
Peter Wemm
6eb39ac8fc Use a generic implementation of the Fowler/Noll/Vo hash (FNV hash).
Make the name cache hash as well as the nfsnode hash use it.

As a special tweak, create an unsigned version of register_t.  This allows
us to use a special tweak for the 64 bit versions that significantly
speeds up the i386 version (ie: int64 XOR int64 is slower than int64
XOR int32).

The code layout is a little strange for the string function, but I was
able to get between 5 to 10% improvement over the original version I
started with. The layout affects gcc code generation choices and this way
was fastest on x86 and alpha.

Note that 'CPUTYPE=p3' etc makes a fair difference to this.  It is
around 45% faster with -march=pentiumpro on a p6 cpu.
2001-03-17 09:31:06 +00:00
Peter Wemm
be1d4058eb Dramatically improve the **lame** nfs_hash(). This is based on the
Fowler / Noll / Vo Hash (http://www.isthe.com/chongo/tech/comp/fnv/).

This improves hash coverage a *massive* amount.  We were seeing one
set of machines that were using 0.84% of their 131072 entry nfsnode
hash buckets with maximum chain lengths of up to ~500 entries.  The
machine was spending nearly 100% of its time in 'system'.
A test with this has pushed the coverage from a few perCent up to 91%
utilization with a max chain length of 11.

Submitted by:  David Filo
2001-03-17 05:43:01 +00:00
John Baldwin
19eb87d22a Grab the process lock while calling psignal and before calling psignal. 2001-03-07 03:37:06 +00:00
Adrian Chadd
f3a90da995 Reviewed by: jlemon
An initial tidyup of the mount() syscall and VFS mount code.

This code replaces the earlier work done by jlemon in an attempt to
make linux_mount() work.

* the guts of the mount work has been moved into vfs_mount().

* move `type', `path' and `flags' from being userland variables into being
  kernel variables in vfs_mount(). `data' remains a pointer into
  userspace.

* Attempt to verify the `type' and `path' strings passed to vfs_mount()
  aren't too long.

* rework mount() and linux_mount() to take the userland parameters
  (besides data, as mentioned) and pass kernel variables to vfs_mount().
  (linux_mount() already did this, I've just tidied it up a little more.)

* remove the copyin*() stuff for `path'. `data' still requires copyin*()
  since its a pointer into userland.

* set `mount->mnt_statf_mntonname' in vfs_mount() rather than in each
  filesystem.  This variable is generally initialised with `path', and
  each filesystem can override it if they want to.

* NOTE: f_mntonname is intiailised with "/" in the case of a root mount.
2001-03-01 21:00:17 +00:00
Matthew Dillon
63692125a9 Fix lockup for loopback NFS mounts. The pipelined I/O limitations could be
hit on the client side and prevent the server side from retiring writes.
Pipeline operations turned off for all READs (no big loss since reads are
usually synchronous) and for NFS writes, and left on for the default bwrite().
(MFC expected prior to 4.3 freeze)

Testing by: mjacob, dillon
2001-02-28 04:13:11 +00:00
Brian Feldman
c0511d3b58 Switch to using a struct xucred instead of a struct xucred when not
actually in the kernel.  This structure is a different size than
what is currently in -CURRENT, but should hopefully be the last time
any application breakage is caused there.  As soon as any major
inconveniences are removed, the definition of the in-kernel struct
ucred should be conditionalized upon defined(_KERNEL).

This also changes struct export_args to remove dependency on the
constantly-changing struct ucred, as well as limiting the bounds
of the size fields to the correct size.  This means: a) mountd and
friends won't break all the time, b) mountd and friends won't crash
the kernel all the time if they don't know what they're doing wrt
actual struct export_args layout.

Reviewed by:	bde
2001-02-18 13:30:20 +00:00
Tor Egge
7d1af7b215 Enable use of DHCP extensions.
Reviewed by:	Per Kristian Hove <Per.Hove@math.ntnu.no>
2001-02-02 02:35:40 +00:00
Matthew Dillon
d2d00d11be NFS O_EXCL file create semantics temporarily uses file attributes to store
the file verifier.  The NFS client is supposed to do a SETATTR after a
successful O_EXCL open/create to clean up the attributes.  FreeBSD's
client code was generating a SETATTR rpc but was not generating an access
or modification time update within that rpc, leaving the file with a
broken access time that solaris chokes on (and it doesn't look very
nice when you ls -lua under FreeBSD either!).    Fixed.
2001-01-04 22:45:19 +00:00
Bosko Milekic
2a0c503e7a * Rename M_WAIT mbuf subsystem flag to M_TRYWAIT.
This is because calls with M_WAIT (now M_TRYWAIT) may not wait
  forever when nothing is available for allocation, and may end up
  returning NULL. Hopefully we now communicate more of the right thing
  to developers and make it very clear that it's necessary to check whether
  calls with M_(TRY)WAIT also resulted in a failed allocation.
  M_TRYWAIT basically means "try harder, block if necessary, but don't
  necessarily wait forever." The time spent blocking is tunable with
  the kern.ipc.mbuf_wait sysctl.
  M_WAIT is now deprecated but still defined for the next little while.

* Fix a typo in a comment in mbuf.h

* Fix some code that was actually passing the mbuf subsystem's M_WAIT to
  malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the
  value of the M_WAIT flag, this could have became a big problem.
2000-12-21 21:44:31 +00:00
David Malone
7cc0979fd6 Convert more malloc+bzero to malloc+M_ZERO.
Submitted by:	josh@zipperup.org
Submitted by:	Robert Drehmel <robd@gmx.net>
2000-12-08 21:51:06 +00:00
Poul-Henning Kamp
a52585d77e Simplify the tprintf() API.
Loose the special <sys/tprintf.h> #include file.
2000-11-26 20:35:21 +00:00
Matthew Dillon
279d722604 This patchset fixes a large number of file descriptor race conditions.
Pre-rfork code assumed inherent locking of a process's file descriptor
    array.  However, with the advent of rfork() the file descriptor table
    could be shared between processes.  This patch closes over a dozen
    serious race conditions related to one thread manipulating the table
    (e.g. closing or dup()ing a descriptor) while another is blocked in
    an open(), close(), fcntl(), read(), write(), etc...

PR: kern/11629
Discussed with: Alexander Viro <viro@math.psu.edu>
2000-11-18 21:01:04 +00:00
Kirk McKusick
d6514f21d7 In preparation for deprecating CIRCLEQ macros in favor of TAILQ
macros which provide the same functionality and are a bit more
efficient, convert use of CIRCLEQ's in NFS to TAILQ's.
2000-11-14 08:00:39 +00:00
Eivind Eklund
e3c4036b18 Give vop_mmap an untimely death. The opportunity to give it a timely
death timed out in 1996.
2000-11-01 17:57:24 +00:00
Poul-Henning Kamp
53ce36d17a Remove unneeded #include <sys/proc.h> lines. 2000-10-29 13:57:19 +00:00
Tor Egge
e4e7a9a4e9 Reduce kernel stack usage by not having large packets on the stack.
Supply correct size parameter to dhcpd.
Replace some magic numbers with macro names.
Handle more than one interface.
2000-10-29 01:19:32 +00:00
Tor Egge
5b93d1da3f Eliminate some bitrot (nonexisting member variable names).
Don't use curproc when a proc pointer is available.
2000-10-24 23:33:01 +00:00
Tor Egge
6d7518c134 Style fixes. 2000-10-24 22:40:18 +00:00
Tor Egge
f6ee793a3c Make RPC timeout message more readable.
Supply proc pointer to sosend.
2000-10-24 22:37:55 +00:00
David Malone
dc6dd1259f Problem to avoid processes getting stuck in "vmopar". From Ian's
mail:

	The problem seems to originate with NFS's postop_attr
	information that is returned with a read or write RPC.
	Within a vm_fault context, the code cannot deal with
	vnode_pager_setsize() shrinking a vnode.

	The workaround in the patch below stops the nfsm_postop_attr()
	macro from ever shrinking a vnode. If the new size in the
	postop_attr information is smaller, then it just sets the
	nfsnode n_attrstamp to 0 to stop the wrong size getting
	used in the future. This change only affects postop_attr
	attributes; the nfsm_loadattr() macro works as normal.

	The change is implemented by adding a new argument to
	nfs_loadattrcache() called 'dontshrink'. When this is
	non-zero, nfs_loadattrcache() will never reduce the
	vnode/nfsnode size; instead it zeros n_attrstamp.

There remain other was processes can get stuck in vmopar.

Submitted by:	Ian Dowse <iedowse@maths.tcd.ie>
Reviewed by:	dillon
Tested by:	Vadim Belman <voland@lflat.org>
2000-10-24 10:13:36 +00:00
Boris Popov
c523a62949 Make nfs PDIRUNLOCK aware. Now it is possible to use nullfs mounts on top
of nfs mounts, but there can be side effects because nfs uses shared locks
for vnodes.
2000-10-15 08:06:32 +00:00
Boris Popov
823548e131 Add missed vop_stdunlock() for fifo's vnops (this affects only v2 mounts).
Give nfs's node lock its own name.
2000-10-15 08:01:28 +00:00
Jason Evans
a18b1f1d4d Convert lockmgr locks from using simple locks to using mutexes.
Add lockdestroy() and appropriate invocations, which corresponds to
lockinit() and must be called to clean up after a lockmgr lock is no
longer needed.
2000-10-04 01:29:17 +00:00
Boris Popov
67e871664b Add a lock structure to vnode structure. Previously it was either allocated
separately (nfs, cd9660 etc) or keept as a first element of structure
referenced by v_data pointer(ffs). Such organization leads to known problems
with stacked filesystems.

From this point vop_no*lock*() functions maintain only interlock lock.
vop_std*lock*() functions maintain built-in v_lock structure using lockmgr().
vop_sharedlock() is compatible with vop_stdunlock(), but maintains a shared
lock on vnode.

If filesystem wishes to export lockmgr compatible lock, it can put an address
of this lock to v_vnlock field. This indicates that the upper filesystem
can take advantage of it and use single lock structure for entire (or part)
of stack of vnodes. This field shouldn't be examined or modified by VFS code
except for initialization purposes.

Reviewed in general by:	mckusick
2000-09-25 15:24:04 +00:00
Mike Smith
a77773909d Don't scan for the "right" network interface by shooting in the dark.
Assume that the nfs_diskless structure is correctly set up; the provider
ought to be getting it right.
2000-09-05 22:29:36 +00:00
Kirk McKusick
9b97113391 This patch corrects the first round of panics and hangs reported
with the new snapshot code.

Update addaliasu to correctly implement the semantics of the old
checkalias function. When a device vnode first comes into existence,
check to see if an anonymous vnode for the same device was created
at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than
creating a new vnode for the device. This corrects a problem which
caused the kernel to panic when taking a snapshot of the root
filesystem.

Change the calling convention of vn_write_suspend_wait() to be the
same as vn_start_write().

Split out softdep_flushworklist() from softdep_flushfiles() so that
it can be used to clear the work queue when suspending filesystem
operations.

Access to buffers becomes recursive so that snapshots can recursively
traverse their indirect blocks using ffs_copyonwrite() when checking
for the need for copy on write when flushing one of their own indirect
blocks. This eliminates a deadlock between the syncer daemon and a
process taking a snapshot.

Ensure that softdep_process_worklist() can never block because of a
snapshot being taken. This eliminates a problem with buffer starvation.

Cleanup change in ffs_sync() which did not synchronously wait when
MNT_WAIT was specified. The result was an unclean filesystem panic
when doing forcible unmount with heavy filesystem I/O in progress.

Return a zero'ed block when reading a block that was not in use at
the time that a snapshot was taken. Normally, these blocks should
never be read. However, the readahead code will occationally read
them which can cause unexpected behavior.

Clean up the debugging code that ensures that no blocks be written
on a filesystem while it is suspended. Snapshots must explicitly
label the blocks that they are writing during the suspension so that
they do not cause a `write on suspended filesystem' panic.

Reorganize ffs_copyonwrite() to eliminate a deadlock and also to
prevent a race condition that would permit the same block to be
copied twice. This change eliminates an unexpected soft updates
inconsistency in fsck caused by the double allocation.

Use bqrelse rather than brelse for buffers that will be needed
soon again by the snapshot code. This improves snapshot performance.
2000-07-24 05:28:33 +00:00
Paul Saab
fb27899f3b Correctly set the Maximum DHCP Message Size. bootpd now works
again as well as ISC dhcpd.
2000-06-13 09:32:09 +00:00
Jake Burkholder
e39756439c Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by:		msmith and others
2000-05-26 02:09:24 +00:00
Jake Burkholder
740a1973a6 Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by:	phk
Reviewed by:	phk
Approved by:	mdodd
2000-05-23 20:41:01 +00:00
Poul-Henning Kamp
831e32f863 Include a RFC 1533 "Maximum DHCP Message Size" option in our request.
ISC DHCP will limit the reply length to 64 bytes for bootp replies
unless we explicitly tell it we can do more.  We tell it that we can do
1200 bytes.
2000-05-07 14:29:19 +00:00
Poul-Henning Kamp
9626b608de Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by:    peter
2000-05-05 09:59:14 +00:00
Poul-Henning Kamp
2c9b67a8df Remove unneeded #include <vm/vm_zone.h>
Generated by:	src/tools/tools/kerninclude
2000-04-30 18:52:11 +00:00
Poul-Henning Kamp
87150cb06d s/biowait/bufwait/g
Prodded by: several.
2000-04-29 16:25:22 +00:00
Poul-Henning Kamp
3389ae9350 Remove ~25 unneeded #include <sys/conf.h>
Remove ~60 unneeded #include <sys/malloc.h>
2000-04-19 14:58:28 +00:00
Poul-Henning Kamp
8177437d85 Complete the bio/buf divorce for all code below devfs::strategy
Exceptions:
        Vinum untouched.  This means that it cannot be compiled.
        Greg Lehey is on the case.

        CCD not converted yet, casts to struct buf (still safe)

        atapi-cd casts to struct buf to examine B_PHYS
2000-04-15 05:54:02 +00:00
Poul-Henning Kamp
c244d2de43 Move B_ERROR flag to b_ioflags and call it BIO_ERROR.
(Much of this done by script)

Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED.

Move b_pblkno and b_iodone_chain to struct bio while we transition, they
will be obsoleted once bio structs chain/stack.

Add bio_queue field for struct bio aware disksort.

Address a lot of stylistic issues brought up by bde.
2000-04-02 15:24:56 +00:00
Matthew Dillon
8d1b3828fa Add a sysctl to specify the amount of UDP receive space NFS should
reserve, in maximal NFS packets.  Originally only 2 packets worth of
    space was reserved.  The default is now 4, which appears to greatly
    improve performance for slow to mid-speed machines on gigabit networks.

    Add documentation and correct some prior documentation.

Problem Researched by: Andrew Gallatin <gallatin@cs.duke.edu>
Approved by: jkh
2000-03-27 21:38:35 +00:00
Poul-Henning Kamp
b99c307a21 Rename the existing BUF_STRATEGY() to DEV_STRATEGY()
substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo)

substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo)

This patch is machine generated except for the ccd.c and buf.h parts.
2000-03-20 11:29:10 +00:00
Poul-Henning Kamp
21144e3bf1 Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new
field in struct buf: b_iocmd.  The b_iocmd is enforced to have
exactly one bit set.

B_WRITE was bogusly defined as zero giving rise to obvious coding
mistakes.

Also eliminate the redundant struct buf flag B_CALL, it can just
as efficiently be done by comparing b_iodone to NULL.

Should you get a panic or drop into the debugger, complaining about
"b_iocmd", don't continue.  It is likely to write on your disk
where it should have been reading.

This change is a step in the direction towards a stackable BIO capability.

A lot of this patch were machine generated (Thanks to style(9) compliance!)

Vinum users:  Greg has not had time to test this yet, be careful.
2000-03-20 10:44:49 +00:00
Peter Wemm
242c5536ea Clean up some loose ends in the network code, including the X.25 and ISO
#ifdefs.  Clean out unused netisr's and leftover netisr linker set gunk.
Tested on x86 and alpha, including world.

Approved by:	jkh
2000-02-13 03:32:07 +00:00
Matthew Dillon
34ddf54812 The alpha build cuases the 'nfsuid bloated' warning to occur. Well,
there is nothing we can do about it.  In fact, after further review
    there simply are not very many instances of the two structures NFS
    checks for 'bloat' so I've decided to simply rip the checks out entirely.

Submitted by:	 Andrew Gallatin <gallatin@cs.duke.edu>
2000-01-13 20:18:25 +00:00
Yoshinobu Inoue
fb59c426ff tcp updates to support IPv6.
also a small patch to sys/nfs/nfs_socket.c, as max_hdr size change.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project
2000-01-09 19:17:30 +00:00
Matthew Dillon
c37c9620cd Enhance reassignbuf(). When a buffer cannot be time-optimally inserted
into vnode dirtyblkhd we append it to the list instead of prepend it to
    the list in order to maintain a 'forward' locality of reference, which
    is arguably better then 'reverse'.  The original algorithm did things this
    way to but at a huge time cost.

    Enhance the append interlock for NFS writes to handle intr/soft mounts
    better.

    Fix the hysteresis for NFS async daemon I/O requests to reduce the
    number of unnecessary context switches.

    Modify handling of NFS mount options.  Any given user option that is
    too high now defaults to the kernel maximum for that option rather then
    the kernel default for that option.

Reviewed by:	 Alfred Perlstein <bright@wintelcom.net>
2000-01-05 05:11:37 +00:00
Matthew Dillon
54986abd15 Fix at least one source of the continued 'NFS append race'. close()
was calling nfs_flush() and then clearing the NMODIFIED bit.  This is
    not legal since there might still be dirty buffers after the nfs_flush
    (for example, pending commits).  The clearing of this bit in turn prevented
    a necessary vinvalbuf() from occuring leaving left over dirty buffers
    even after truncating the file in a new operation.  The fix is to
    simply not clear NMODIFIED.

    Also added a sysctl vfs.nfs.nfsv3_commit_on_close which, if set to 1,
    will cause close() to do a stage 1 write AND a stage 2 commit
    synchronously.  By default only the stage 1 write is done synchronously.

Reviewed by:	Alfred Perlstein <bright@wintelcom.net>
2000-01-05 00:32:18 +00:00
Peter Wemm
c447342094 Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot).  This is consistant with the other
BSD's who made this change quite some time ago.  More commits to come.
1999-12-29 05:07:58 +00:00
Alfred Perlstein
20883b0f10 make getfh a standard syscall instead of dependant on having
NFSSERVER defined, useful for userland fileservers that want to
use a filehandle type interface to the filesystem.

Submitted by: Assar Westerlund assar@stacken.kth.se
PR: kern/15452
1999-12-21 20:21:12 +00:00
Robert Watson
91f37dcba1 Second pass commit to introduce new ACL and Extended Attribute system
calls, vnops, vfsops, both in /kern, and to individual file systems that
require a vfsop_ array entry.

Reviewed by:	eivind
1999-12-19 06:08:07 +00:00
Brian Feldman
d25f3712b7 M_PREPEND-related cleanups (unregisterifying struct mbuf *s). 1999-12-19 01:55:37 +00:00
Eivind Eklund
762e6b856c Introduce NDFREE (and remove VOP_ABORTOP) 1999-12-15 23:02:35 +00:00
Matthew Dillon
b7303db36e Fix two problems: First, fix the append seek position race that can
occur due to np->n_size potentially changing if nfs_getcacheblk()
    blocks in nfs_write().

    Second, under -current we must supply the proper bufsize when obtaining
    buffers that straddle the EOF, but due to the fact that np->n_size can
    change out from under us it is possible that we may specify the wrong
    buffer size and wind up truncating dirty data written by another
    process.

    Both problems are solved by implementing nfs_rslock(), which allows us
    to lock around sensitive buffer cache operations such as those that
    occur when appending to a file.

    It is believed that this race is responsible for causing dirtyoff/dirtyend
    and (in stable) validoff/validend to exceed the buffer size.  Therefore
    we have now added a warning printf for the dirtyoff/end case in current.

    However, we have introduced a new problem which we need to fix at some
    point, and that is that soft or intr NFS mounts may become
    uninterruptable from the point of view of process A which is stuck waiting
    on rslock while process B is stuck doing the rpc.  To unstick process A,
    process B would have to be interrupted first.

Reviewed by:	Alfred Perlstein <bright@wintelcom.net>
1999-12-14 19:07:54 +00:00
Matthew Dillon
4682c8eac9 Fix a timeout deadlock that can occur when the process holding the
receive lock hasn't yet managed to send its own request.

PR:		kern/15055
Submitted by:	Ian Dowse iedowse@maths.tcd.ie
1999-12-13 04:24:55 +00:00
Matthew Dillon
5f3bfd608d Fix a number of server-side issues related to aborting badly formed
NFS packets, mainly initializing structure pointers to NULL which
    are conditionally freed prior to return.

PR:		kern/15249
Submitted by:	Ian Dowse <iedowse@maths.tcd.ie>
1999-12-12 07:06:39 +00:00
Matthew Dillon
ea94c7b968 Synopsis of problem being fixed: Dan Nelson originally reported that
blocks of zeros could wind up in a file written to over NFS by a client.
    The problem only occurs a few times per several gigabytes of data.   This
    problem turned out to be bug #3 below.

    bug #1:

        B_CLUSTEROK must be cleared when an NFS buffer is reverted from
        stage 2 (ready for commit rpc) to stage 1 (ready for write).
        Reversions can occur when a dirty NFS buffer is redirtied with new
        data.

        Otherwise the VFS/BIO system may end up thinking that a stage 1
        NFS buffer is clusterable.  Stage 1 NFS buffers are not clusterable.

    bug #2:

        B_CLUSTEROK was inappropriately set for a 'short' NFS buffer (short
        buffers only occur near the EOF of the file).  Change to only set
        when the buffer is a full biosize (usually 8K).  This bug has no
        effect but should be fixed in -current anyway.  It need not be
        backported.

    bug #3:

        B_NEEDCOMMIT was inappropriately set in nfs_flush() (which is
	typically only called by the update daemon).  nfs_flush()
        does a multi-pass loop but due to the lack of vnode locking it
        is possible for new buffers to be added to the dirtyblkhd list
        while a flush operation is going on.  This may result in nfs_flush()
        setting B_NEEDCOMMIT on a buffer which has *NOT* yet gone through its
        stage 1 write, causing only the commit rpc to be made and thus
        causing the contents of the buffer to be thrown away (never sent to
        the server).

    The patch also contains some cleanup, which only applies to the commit
    into -current.

Reviewed by:	dg, julian
Originally Reported by: Dan Nelson <dnelson@emsphone.com>
1999-12-12 06:09:57 +00:00
Eivind Eklund
6bdfe06ad9 Lock reporting and assertion changes.
* lockstatus() and VOP_ISLOCKED() gets a new process argument and a new
  return value: LK_EXCLOTHER, when the lock is held exclusively by another
  process.
* The ASSERT_VOP_(UN)LOCKED family is extended to use what this gives them
* Extend the vnode_if.src format to allow more exact specification than
  locked/unlocked.

This commit should not do any semantic changes unless you are using
DEBUG_VFS_LOCKS.

Discussed with:	grog, mch, peter, phk
Reviewed by:	peter
1999-12-11 16:13:02 +00:00
Matthew Dillon
98733bd871 The symlink implementation could improperly return a NULL vp along with
a 0 error code.  The problem occured with NFSv2 mounts and also with
    any NFSv3 mount returning an EEXIST error (which is translated to 0
    prior to return).  The reply to the rpc only contains the file handle
    for the no-error case under NFSv3.  The error case under NFSv3 and
    all cases under NFSv2 do *not* return the file handle.  The fix is
    to do a secondary lookup to obtain the file handle and thus be able
    to generate a return vnode for the situations where the rpc reply
    does not contain the required information.

    The bug was originally introduced when VOP_SYMLINK semantics were
    changed for -CURRENT.  The NFS symlink implementation was not properly
    modified to go along with the change despite the fact that three
    people reviewed the code.  It took four attempts to get the current
    fix correct with five people.  Is NFS obfuscated?  Ha!

Reviewed by:	Alfred Perlstein <bright@wintelcom.net>
Testing and Discussion: "Viren R.Shah" <viren@rstcorp.com>, Eivind Eklund <eivind@FreeBSD.ORG>, Ian Dowse <iedowse@maths.tcd.ie>
1999-11-30 06:56:15 +00:00
Eivind Eklund
679106b15a Remap the error EEXISTS => 0 *before* using error to determine if we should
return a vp.
1999-11-27 18:14:41 +00:00
Matthew Dillon
b314ed9662 nm_srtt and nm_sdrtt are arrays[4]. Remove explicit initialization
of element [4] in both, which goes beyond the end of the array, leaving
    [0], [1], [2], and [3].  This bug did not cause any problems since
    the overrun fields are initialized after the bogus array init but
    needs to be fixed anyway.

Submitted by:	 Ian Dowse <iedowse@maths.tcd.ie>
1999-11-22 04:50:09 +00:00
Eivind Eklund
b6335212d6 Fix VOP_MKNOD for loss of WILLRELE. I don't know how I could have missed
this in the first place :-(

Noticed by:	bde
1999-11-20 16:09:10 +00:00
Eivind Eklund
dd8c04f4c7 Remove WILLRELE from VOP_SYMLINK
Note: Previous commit to these files (except coda_vnops and devfs_vnops)
that claimed to remove WILLRELE from VOP_RENAME actually removed it from
VOP_MKNOD.
1999-11-13 20:58:17 +00:00
Matthew Dillon
a6aa6d9137 Remove special case socket sharing code in order to allow nfsd to
bind IP addresses to udp/cltp sockets separately.

PR:		kern/13049
Reviewed by:	David Malone <dwmalone@maths.tcd.ie>, freebsd-current
1999-11-11 17:24:02 +00:00
Matthew Dillon
6b21e94604 Fix nfssvc_addsock() to not attempt to free a NULL socket structure
when returning an error.  Bug fix was extracted from the PR.  The PR
    is not yet entirely resolved by this commit.

PR:		kern/13049
Reviewed by:	Matt Dillon <dillon@freebsd.org>
Submitted by:	Ian Dowse <iedowse@maths.tcd.ie>
1999-11-08 19:10:16 +00:00
Mike Smith
b7017a8210 Call bootpc_init before we try to mount an NFS root, if we're configured
to use BOOTP for NFS root discovery.

The entire interface setup inside nfs_mountroot is evil, and should die.
1999-11-01 23:55:38 +00:00
Poul-Henning Kamp
923502ff91 useracc() the prequel:
Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>.  This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.

This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.
1999-10-29 18:09:36 +00:00
Matthew Dillon
a5d3fe3f85 Move NFS access cache hits/misses into nfsstats structure so
/usr/bin/nfsstat can get to it easily.
1999-10-25 19:22:33 +00:00
Poul-Henning Kamp
3b6fb88590 Before we start to mess with the VFS name-cache clean things up a little bit:
Isolate the namecache in its own file, and give it a dedicated malloc type.
1999-10-03 12:18:29 +00:00
Marcel Moolenaar
16df98ecc6 Careless use of struct proc *p caused major problems. 'p' is allowed to
be NULL in this function (nfs_sigintr). Reorder the statements and guard
them all with a single if (p != NULL).

reported, reviewed and tested by: jdp
1999-09-29 20:12:39 +00:00
Marcel Moolenaar
2c42a14602 sigset_t change (part 2 of 5)
-----------------------------

The core of the signalling code has been rewritten to operate
on the new sigset_t. No methodological changes have been made.
Most references to a sigset_t object are through macros (see
signalvar.h) to create a level of abstraction and to provide
a basis for further improvements.

The NSIG constant has not been changed to reflect the maximum
number of signals possible. The reason is that it breaks
programs (especially shells) which assume that all signals
have a non-null name in sys_signame. See src/bin/sh/trap.c
for an example. Instead _SIG_MAXSIG has been introduced to
hold the maximum signal possible with the new sigset_t.

struct sigprop has been moved from signalvar.h to kern_sig.c
because a) it is only used there, and b) access must be done
though function sigprop(). The latter because the table doesn't
holds properties for all signals, but only for the first NSIG
signals.

signal.h has been reorganized to make reading easier and to
add the new and/or modified structures. The "old" structures
are moved to signalvar.h to prevent namespace polution.

Especially the coda filesystem suffers from the change, because
it contained lines like (p->p_sigmask == SIGIO), which is easy
to do for integral types, but not for compound types.

NOTE: kdump (and port linux_kdump) must be recompiled.

Thanks to Garrett Wollman and Daniel Eischen for pressing the
importance of changing sigreturn as well.
1999-09-29 15:03:48 +00:00
Matthew Dillon
8fdd2461b3 Add comment to clarify a commit rpc optimization already being performed. 1999-09-20 19:10:28 +00:00
Matthew Dillon
b5acbc8b9c Asynchronized client-side nfs_commit. NFS commit operations were
previously issued synchronously even if async daemons (nfsiod's) were
    available.  The commit has been moved from the strategy code to the doio
    code in order to asynchronize it.

    Removed use of lastr in preparation for removal of vnode->v_lastr.  It
    has been replaced with seqcount, which is already supported by the system
    and, in fact, gives us a better heuristic for sequential detection then
    lastr ever did.

    Made major performance improvements to the server side commit.  The
    server previously fsync'd the entire file for each commit rpc.  The
    server now bawrite()s only those buffers related to the offset/size
    specified in the commit rpc.

    Note that we do not commit the meta-data yet.  This works still needs
    to be done.

    Note that a further optimization can be done (and has not yet been done)
    on the client: we can merge multiple potential commit rpc's into a
    single rpc with a greater file offset/size range and greatly reduce
    rpc traffic.

Reviewed by:	Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>
1999-09-17 05:57:57 +00:00
Alfred Perlstein
c24fda81c9 Seperate the export check in VFS_FHTOVP, exports are now checked via
VFS_CHECKEXP.

Add fh(open|stat|stafs) syscalls to allow userland to query filesystems
based on (network) filehandle.

Obtained from:	NetBSD
1999-09-11 00:46:08 +00:00
Alfred Perlstein
5a5fccc8e7 All unimplemented VFS ops now have entries in kern/vfs_default.c that return
reasonable defaults.

This avoids confusing and ugly casting to eopnotsupp or making dummy functions.
Bogus casting of filesystem sysctls to eopnotsupp() have been removed.

This should make *_vfsops.c more readable and reduce bloat.

Reviewed by:	msmith, eivind
Approved by:	phk
Tested by:	Jeroen Ruigrok/Asmodai <asmodai@wxs.nl>
1999-09-07 22:42:38 +00:00
Poul-Henning Kamp
9626728875 remove unused variables. 1999-08-28 19:21:03 +00:00
Peter Wemm
c3aac50f28 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
Poul-Henning Kamp
dbafb3660f Simplify the handling of VCHR and VBLK vnodes using the new dev_t:
Make the alias list a SLIST.

        Drop the "fast recycling" optimization of vnodes (including
        the returning of a prexisting but stale vnode from checkalias).
        It doesn't buy us anything now that we don't hardlimit
        vnodes anymore.

        Rename checkalias2() and checkalias() to addalias() and
        addaliasu() - which takes dev_t and udev_t arg respectively.

        Make the revoke syscalls use vcount() instead of VALIASED.

        Remove VALIASED flag, we don't need it now and it is faster
        to traverse the much shorter lists than to maintain the
        flag.

        vfs_mountedon() can check the dev_t directly, all the vnodes
        point to the same one.

Print the devicename in specfs/vprint().

Remove a couple of stale LFS vnode flags.

Remove unimplemented/unused LK_DRAINED;
1999-08-26 14:53:31 +00:00
Peter Wemm
ac7cc2e469 Convert all the nfs macros to do { blah } while (0) to ensure it
works correctly in if/else etc.  egcs had probably picked up most of the
problems here before with "ambiguous braces" etc, but this should
increase the robustness a bit.  Based on an idea from Eivind Eklund.
1999-08-19 14:50:12 +00:00
Alan Cox
2c28a10540 Add the (inline) function vm_page_undirty for clearing the dirty bitmask
of a vm_page.

Use it.

Submitted by:	dillon
1999-08-17 04:02:34 +00:00
Dmitrij Tejblum
e868365294 nfs_getcacheblk() can return 0 if the mount is interruptible. It need to be
checked by the caller.

Broken in: rev. 1.70 (1999/05/02)
1999-08-12 18:04:39 +00:00
Poul-Henning Kamp
0ef1c82630 Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>,
a few lines into <sys/vnode.h>.

Add a few fields to struct specinfo, paving the way for the fun part.
1999-08-08 18:43:05 +00:00
Peter Wemm
56ba093ddb Don't over-allocate and over-copy shorter NFSv2 filehandles and then
correct the pointers afterwards.

It's kinda bogus that we generate a 24 (?) byte filehandle (2 x int32
fsid and 16 byte VFS fhandle) and pad it out to 64 bytes for NFSv3 with
garbage.  The whole point of NFSv3's variable filehandle length was
to allow for shorter handles, both in memory and over the wire.  I plan
on taking a shot at fixing this shortly.
1999-08-04 14:41:39 +00:00
Mike Smith
98f8aa275b As described by the submitter:
I did some tcpdumping the other day and noticed that GETATTR calls
  were frequently followed by an ACCESS call to the same file. The
  attached patch changes nfs_getattr to fill the access cache as a side
  effect. This is accomplished by calling ACCESS rather than
  GETATTR. This implies a modest overhead of 4 bytes in the request and
  8 bytes in the response compared to doing a vanilla GETATTR.
...
  [The patch comprises two parts] The first
  is the "real" patch, the second counts misses and hits rather than
  fills and hits. The difference is subtle but important because both
  nfs_getattr and nfs_access now fill the cache. It also changes the
  default value of nfsaccess_cache_timeout to better match the attribute
  cache. IMHO, file timestamps change much more frequently than
  protection bits.

Submitted by:	Bjoern Groenvall <bg@sics.se>
Reviewed by:	dillon (partially)
1999-07-31 01:51:58 +00:00
Bill Paul
44fe63e5e7 Close PR #12651: the hash calculation routine has changed in other
parts of the kernel but was not updated in nfs_readdirplusrpc().
1999-07-30 04:51:35 +00:00
Bill Paul
f069164876 Fix two bugs in nfs_readdirplus(). The first is that in some cases,
vnodes are locked and never unlocked, which leads to processes starting
to wedge up after doing a mount -o nfsv3,tcp,rdirplus foo:/fs /fs; ls /fs.
The second is that sometimes cnp is accessed without having been
properly initialized: cnp->cn_nameptr points to an earlier name while
"len" contains the length of a current name of different size. This
leads to an attempt to dereference *(cn->cn_nameptr + len) which will
sometimes cause a page fault and a panic.

With these two fixes, client side readdirplus works correctly with
FreeBSD, IRIX 6.5.4 and Solaris 2.5.1 and 2.6 servers.

Submitted by: Matthew Dillon <dillon@backplane.com>
1999-07-30 04:02:04 +00:00
Poul-Henning Kamp
f008cfcc1a I have not one single time remembered the name of this function correctly
so obviously I gave it the wrong name.  s/umakedev/makeudev/g
1999-07-17 18:43:50 +00:00
Peter Wemm
8180de4a33 Fix warning. va_fsid is udev_t, which is int32_t. No need to use %lx. 1999-07-01 13:32:54 +00:00
Julian Elischer
c1f020038b Submitted by: Conrad Minshall <conrad@apple.com>
Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>

The following ugly hack to the exit path of nfs_readlinkrpc() circumvents
an Auspex bug: for symlinks longer than 112 (0x70) they return a 1024 byte
xdr string - the correct data with many nulls appended.  Without this fix
namei returns ENAMETOOLONG, at least it does on our source base and on
FreeBSD 3.0.  Note we do not (and should not) rely upon their null padding.
1999-06-30 02:53:51 +00:00
Peter Wemm
5fc491ef58 Fix a KASSERT() that was negated and lead to:
nfs_strategy: buffer 0xxxxx not locked
when you attempted to write and had INVARIANTS turned on.
1999-06-28 12:34:40 +00:00
Peter Wemm
e96c1fdc3f Minor tweaks to make sure (new) prerequisites for <sys/buf.h> (mostly
splbio()/splx()) are #included in time.
1999-06-27 11:44:22 +00:00
Kirk McKusick
67812eacd7 Convert buffer locking from using the B_BUSY and B_WANTED flags to using
lockmgr locks. This commit should be functionally equivalent to the old
semantics. That is, all buffer locking is done with LK_EXCLUSIVE
requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will
be done in future commits.
1999-06-26 02:47:16 +00:00
Julian Elischer
3d84d191cc Matt's NFS fixes.
Submitted by: Matt Dillon
Reviewed by: David Cross, Julian Elischer, Mike Smith, Drew Gallatin
  3.2 version to follow when tested
1999-06-23 04:44:14 +00:00
Matt Jacob
3d32dfbfdf Thanks to Bruce for noticing this.... compare against the *new* nfsnode's
mount point for seeing whether or not the new nfsnode is already in the
hash queue. We're pretty much guaranteed that the old nfsnode is already
in the hash queue. Wank! Infinite Loop! Looks like just a minor typo....
(ah the influence of fortran ... np && np2... why not nfsnode_the_first &&
nfsnode_the_second???)...
1999-06-19 19:33:44 +00:00
Kirk McKusick
f9c8cab591 Add a vnode argument to VOP_BWRITE to get rid of the last vnode
operator special case. Delete special case code from vnode_if.sh,
vnode_if.src, umap_vnops.c, and null_vnops.c.
1999-06-16 23:27:55 +00:00
Matt Jacob
e672bf9cd6 If we retry this operation from the top of this routine, we need to
make sure we've freed any allocated resources (to avoid a memory leak)
and and do the right thing with respect to the nfs node hash lock we'd
acquired.
1999-06-15 23:24:14 +00:00
Peter Wemm
b903b04cc0 Various changes lifted from the OpenBSD cvs tree:
txdr_hyper and fxdr_hyper tweaks to avoid excessive CPU order knowledge.

nfs_serv.c: don't call nfsm_adj() with negative values, windows clients
could crash servers when doing a readdir of a large directory.

nfs_socket.c: Use IP_PORTRANGE to get a priviliged port without a spin
loop trying to bind().  Don't clobber a mbuf pointer or we get panics
on a NFS3ERR_JUKEBOX error from a server when reusing a freed mbuf.

nfs_subs.c: Don't loose st_blocks on NFSv2 mounts when > 2GB.

Obtained from:  OpenBSD
1999-06-05 05:35:03 +00:00
Peter Wemm
ba68ee9642 Fix a malloc race
Obtained from:  OpenBSD (csapuntz)
1999-06-05 05:26:36 +00:00
Peter Wemm
adbde675ee Don't mistake a non-async block that needs to be committed for an
interrupted write.

Obtained from: fvdl@NetBSD.org via OpenBSD.
1999-06-05 05:25:37 +00:00
Poul-Henning Kamp
bfbb9ce670 Divorce "dev_t" from the "major|minor" bitmap, which is now called
udev_t in the kernel but still called dev_t in userland.

Provide functions to manipulate both types:
        major()         umajor()
        minor()         uminor()
        makedev()       umakedev()
        dev2udev()      udev2dev()

For now they're functions, they will become in-line functions
after one of the next two steps in this process.

Return major/minor/makedev to macro-hood for userland.

Register a name in cdevsw[] for the "filedescriptor" driver.

In the kernel the udev_t appears in places where we have the
major/minor number combination, (ie: a potential device: we
may not have the driver nor the device), like in inodes, vattr,
cdevsw registration and so on, whereas the dev_t appears where
we carry around a reference to a actual device.

In the future the cdevsw and the aliased-from vnode will be hung
directly from the dev_t, along with up to two softc pointers for
the device driver and a few houskeeping bits.  This will essentially
replace the current "alias" check code (same buck, bigger bang).

A little stunt has been provided to try to catch places where the
wrong type is being used (dev_t vs udev_t), if you see something
not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if
it makes a difference.  If it does, please try to track it down
(many hands make light work) or at least try to reproduce it
as simply as possible, and describe how to do that.

Without DEVT_FASCIST I belive this patch is a no-op.

Stylistic/posixoid comments about the userland view of the <sys/*.h>
files welcome now, from userland they now contain the end result.

Next planned step: make all dev_t's refer to the same devsw[] which
means convert BLK's to CHR's at the perimeter of the vnodes and
other places where they enter the game (bootdev, mknod, sysctl).
1999-05-11 19:55:07 +00:00
Poul-Henning Kamp
b0eeea2042 remove b_proc from struct buf, it's (now) unused.
Reviewed by:	dillon, bde
1999-05-06 20:00:34 +00:00
Peter Wemm
dfd5dee1b0 Add sufficient braces to keep egcs happy about potentially ambiguous
if/else nesting.
1999-05-06 18:13:11 +00:00
Alan Cox
7f2f2dae43 All directory accesses must be made with NFS_DIRBLKSIZE chunks to avoid
confusing the directory read cookie cache.  The nfs_access implementation
for v2 mounts attempts to read from the directory if root is the user
so that root can't access cached files when the server remaps root
to some other user.

Submitted by:	Doug Rabson <dfr@nlsystems.com>
Reviewed by:	Matthew Dillon <dillon@apollo.backplane.com>
1999-05-03 20:59:14 +00:00
Alan Cox
4221e284a3 The VFS/BIO subsystem contained a number of hacks in order to optimize
piecemeal, middle-of-file writes for NFS.  These hacks have caused no
end of trouble, especially when combined with mmap().  I've removed
them.  Instead, NFS will issue a read-before-write to fully
instantiate the struct buf containing the write.  NFS does, however,
optimize piecemeal appends to files.  For most common file operations,
you will not notice the difference.  The sole remaining fragment in
the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache
coherency issues with read-merge-write style operations.  NFS also
optimizes the write-covers-entire-buffer case by avoiding the
read-before-write.  There is quite a bit of room for further
optimization in these areas.

The VM system marks pages fully-valid (AKA vm_page_t->valid =
VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault.  This
is not correct operation.  The vm_pager_get_pages() code is now
responsible for marking VM pages all-valid.  A number of VM helper
routines have been added to aid in zeroing-out the invalid portions of
a VM page prior to the page being marked all-valid.  This operation is
necessary to properly support mmap().  The zeroing occurs most often
when dealing with file-EOF situations.  Several bugs have been fixed
in the NFS subsystem, including bits handling file and directory EOF
situations and buf->b_flags consistancy issues relating to clearing
B_ERROR & B_INVAL, and handling B_DONE.

getblk() and allocbuf() have been rewritten.  B_CACHE operation is now
formally defined in comments and more straightforward in
implementation.  B_CACHE for VMIO buffers is based on the validity of
the backing store.  B_CACHE for non-VMIO buffers is based simply on
whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear,
and vise-versa).  biodone() is now responsible for setting B_CACHE
when a successful read completes.  B_CACHE is also set when a bdwrite()
is initiated and when a bwrite() is initiated.  VFS VOP_BWRITE
routines (there are only two - nfs_bwrite() and bwrite()) are now
expected to set B_CACHE.  This means that bowrite() and bawrite() also
set B_CACHE indirectly.

There are a number of places in the code which were previously using
buf->b_bufsize (which is DEV_BSIZE aligned) when they should have
been using buf->b_bcount.  These have been fixed.  getblk() now clears
B_DONE on return because the rest of the system is so bad about
dealing with B_DONE.

Major fixes to NFS/TCP have been made.  A server-side bug could cause
requests to be lost by the server due to nfs_realign() overwriting
other rpc's in the same TCP mbuf chain.  The server's kernel must be
recompiled to get the benefit of the fixes.

Submitted by:	Matthew Dillon <dillon@apollo.backplane.com>
1999-05-02 23:57:16 +00:00
Poul-Henning Kamp
f711d546d2 Suser() simplification:
1:
  s/suser/suser_xxx/

2:
  Add new function: suser(struct proc *), prototyped in <sys/proc.h>.

3:
  s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/

The remaining suser_xxx() calls will be scrutinized and dealt with
later.

There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.

More changes to the suser() API will come along with the "jail" code.
1999-04-27 11:18:52 +00:00
Dmitrij Tejblum
c1eefce941 Fixed printf format errors on alpha. 1999-04-24 11:29:48 +00:00
Peter Wemm
ae3d216ad8 Close a potential mbuf and/or mbuf cluster leak in the client-side NFS
statfs() code.  Free the whole chain, not just the first one.
1999-04-10 18:53:29 +00:00
Peter Wemm
8a0d8193f2 Hold nfsd's upages in-core with PHOLD rather than P_NOSWAP. 1999-04-06 03:07:54 +00:00
Julian Elischer
8d17e69460 Catch a case spotted by Tor where files mmapped could leave garbage in the
unallocated parts of the last page when the file ended on a frag
but not a page boundary.
Delimitted by tags PRE_MATT_MMAP_EOF and POST_MATT_MMAP_EOF,
in files alpha/alpha/pmap.c i386/i386/pmap.c nfs/nfs_bio.c vm/pmap.h
    vm/vm_page.c vm/vm_page.h vm/vnode_pager.c miscfs/specfs/spec_vnops.c
    ufs/ufs/ufs_readwrite.c kern/vfs_bio.c

Submitted by: Matt Dillon <dillon@freebsd.org>
Reviewed by: Alan Cox <alc@freebsd.org>
1999-04-05 19:38:30 +00:00
Julian Elischer
4ef2094e45 Reviewed by: Many at differnt times in differnt parts,
including alan, john, me, luoqi, and kirk
Submitted by:	Matt Dillon <dillon@frebsd.org>

This change implements a relatively sophisticated fix to getnewbuf().
There were two problems with getnewbuf(). First, the writerecursion
can lead to a system stack overflow when you have NFS and/or VN
devices in the system. Second, the free/dirty buffer accounting was
completely broken. Not only did the nfs routines blow it trying to
manually account for the buffer state, but the accounting that was
done did not work well with the purpose of their existance: figuring
out when getnewbuf() needs to sleep.

The meat of the change is to kern/vfs_bio.c. The remaining diffs are
all minor except for NFS, which includes both the fixes for bp
interaction AND fixes for a 'biodone(): buffer already done' lockup.
Sys/buf.h also contains a chaining structure which is not used by
this patchset but is used by other patches that are coming soon.
This patch deliniated by tags PRE_MAT_GETBUF and POST_MAT_GETBUF.
(sorry for the missing T matt)
1999-03-12 02:24:58 +00:00
Peter Wemm
803870b48d Untangle the nfs send and receive queue locking a little. One lock
routine was [ab]used for two different things, and you couldn't tell from
the wait channel which one had wedged.
Catch a few things missing from NFS_NOSERVER.
1999-02-25 00:03:51 +00:00
Doug Rabson
ef5253d801 Move the declaration of the vfs.nfs sysctl node outside an ifdef so that
it builds if NFS_NOSERVER is defined.

Spotted by: Bruce Evans <bde@zeta.org.au>
1999-02-18 09:19:41 +00:00
Bruce Evans
1f2e401efc Fixed bitrot in NFS_ACDEBUG option. 1999-02-17 13:59:29 +00:00
Doug Rabson
ce02431ffa * Change sysctl from using linker_set to construct its tree using SLISTs.
This makes it possible to change the sysctl tree at runtime.

* Change KLD to find and register any sysctl nodes contained in the loaded
  file and to unregister them when the file is unloaded.

Reviewed by: Archie Cobbs <archie@whistle.com>,
	Peter Wemm <peter@netplex.com.au> (well they looked at it anyway)
1999-02-16 10:49:55 +00:00
Matthew Dillon
2a2ecc3027 General additional cleanup of VOP API for NFS ops - mainly NFS ignoring
the API for freeing up cnp's.  This cleanup should not effect nominal
    operation one way or the other since NFS VOPs just happen to be called
    with flags that match what it actually does to the NAMEI components it
    gets.  Still, if an NFS error occured, there was probably some memory
    leakage of NAMEI components with certain NFS VOP ops.
1999-02-13 09:47:30 +00:00
Matthew Dillon
5e9d4f1303 PR: kern/9970
Remove incorrect vput() in nfs_link()
1999-02-13 08:01:59 +00:00
Matthew Dillon
61da17a62c Flush delayed-write data out prior to issuing a rename rpc. This appears
to fix the problem w/ NFSV3 whereby a make installworld would get into
    high-network-bandwidth situations continuously trying to retry nfs writes
    that fail with a 'stale file handle' error.
1999-02-06 07:48:56 +00:00
Matthew Dillon
697457a133 Fix warnings related to -Wall -Wcast-qual 1999-01-28 17:32:05 +00:00
Matthew Dillon
8aef171243 Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile
1999-01-28 00:57:57 +00:00
Matthew Dillon
fe08c21a53 Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile.

    This commit includes significant work to proper handle const arguments
    for the DDB symbol routines.
1999-01-27 23:45:44 +00:00
Matthew Dillon
cdb96ab470 Fix nasty bug in nfs_access(). A conditional was if (a = b) instead of
if (a == b).
1999-01-27 22:45:49 +00:00
Matthew Dillon
53b3bd0e25 Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile
1999-01-27 22:45:13 +00:00
Matthew Dillon
831a80b0d5 Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile
1999-01-27 22:42:27 +00:00
Matthew Dillon
1c7c3c6a86 This is a rather large commit that encompasses the new swapper,
changes to the VM system to support the new swapper, VM bug
    fixes, several VM optimizations, and some additional revamping of the
    VM code.  The specific bug fixes will be documented with additional
    forced commits.  This commit is somewhat rough in regards to code
    cleanup issues.

Reviewed by:	"John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
1999-01-21 08:29:12 +00:00
Eivind Eklund
f65b39828c Remove two cases of unused variable sp3. 1999-01-12 12:39:14 +00:00
Eivind Eklund
fb1167777a Remove the 'waslocked' parameter to vfs_object_create(). 1999-01-05 18:50:03 +00:00
Tim Vanderhoek
dea9268b70 Silence -Wtrigraph.
Submitted by:	Bradley Dunn <bradley@dunn.org>  (pr: kern/8817)
1998-12-30 00:37:44 +00:00
Doug Rabson
6cd60632a6 Fix for creating files on a Solaris 7 server with NFSv3 (the request was
slightly garbled but older servers seemed to understand it).

Reviewed by: David O'Brien <obrien@nuxi.ucdavis.edu>
1998-12-25 10:34:27 +00:00
Dmitrij Tejblum
85f118c801 Added 3 new errno values, requred by various standards: EOVERFLOW,
ECANCELED, EILSEQ.

Fixed ibcs2 and especially linux EIDRM and ENOMSG errno mapping.
Reviewed by:	Dan Nelson <dnelson@emsphone.com>
1998-12-14 18:54:04 +00:00
Dmitrij Tejblum
10db74e96d (Hopefully) fix support for "large" files. Mostly cast block numbers to off_t
before they multiplied to block sizes.
1998-12-14 17:51:30 +00:00
Archie Cobbs
f1d19042b0 The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static
and local variables, goto labels, and functions declared but not defined.
1998-12-07 21:58:50 +00:00
Archie Cobbs
2127f26023 Examine all occurrences of sprintf(), strcat(), and str[n]cpy()
for possible buffer overflow problems. Replaced most sprintf()'s
with snprintf(); for others cases, added terminating NUL bytes where
appropriate, replaced constants like "16" with sizeof(), etc.

These changes include several bug fixes, but most changes are for
maintainability's sake. Any instance where it wasn't "immediately
obvious" that a buffer overflow could not occur was made safer.

Reviewed by:	Bruce Evans <bde@zeta.org.au>
Reviewed by:	Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by:	Mike Spengler <mks@networkcs.com>
1998-12-04 22:54:57 +00:00
Matthew Dillon
aeb728f0d5 Make bootp error message slightly more verbose 1998-12-03 20:28:23 +00:00
Mike Smith
ad6d02135b Reimplement the NFS ACCESS RPC cache as an "accelerator" rather than a true
cache.  If the cached result lets us say "yes", then go with that.  If
we're not sure, or we think the answer might be "no", go to the wire to be
certain.    This avoids all of the possible false veto cases, and allows us
to key the cached value with just the UID for which the cached value holds,
reducing the bloat of the nfsnode structure from 104 bytes to just 12 bytes.

Since the "yes" case is by far the most common, this should still provide
a substantial performance improvement.  Also default the cache to on, with
a conservative timeout (2 seconds).  This improves performance if NFS is
loaded as a KLD module, as there's not (yet) code to parse an option out
of the module arguments to set it, and sysctl doesn't work (yet) for OIDs
in modules.

The 'accelerator' mode was suggested by Bjoern Groenvall (bg@sics.se)

Feedback on this would be appreciated as testing has been necessarily
limited by Comdex, and it would be valuable to have this in 2.2.8.
1998-11-15 20:36:18 +00:00
Mike Smith
692c33253b Avoid a null pointer reference if the target of an NFS rename has been
sillrenamed, or if the source vnode doesn't have an associated nfsnode.

Bug report from Andrew Gallatin <gallatin@cs.duke.edu>
1998-11-13 22:58:48 +00:00
Doug Rabson
86442b5201 Fix a panic in nfsrv_dorec() where a NULL pointer could be passed to
free() sometimes.

Reviewed by: Eric Haug <ejh@eas.slu.edu>
1998-11-13 09:44:12 +00:00
Mike Smith
c5118de899 Implement NFS ACCESS RPC result caching.
This yields startling performance increases for NFS clients for many
access profiles, due to the fact that ACCESS results are persistently
cached in the namecache in many cases.

Note that the code is somewhat conservative in that it requires an
exact credential match for a cache hit.  This bloats the nfsnode
structure by sizeof(struct ucred) (96 bytes).  Any less conservative
approach opens the possibility for a false veto in eg. setuid
applications.  Alternative suggestions would be welcomed.

The cache is normally disabled, to activate set the sysctl variable
vfs.nfs.access_cache_timeout to a nonzero value.  This is the time in
seconds that a cached entry will be considered valid; useful values appear
to be 2-10 seconds.  Performance of the cache can be monitored with the
vfs.nfs.access_cache_hits and vfs.nfs.access_cache_hits variables.
1998-11-13 02:39:09 +00:00
Peter Wemm
dad00f4e9c Remove [apparently] bogus casts to u_long for the vnode_pager_setsize()
second argument.  np_size is a 64 bit int, so is the second arg.  This
might have caused needless 2G/4G file size problems.

I believe it was Bruce who queried this.
1998-11-09 07:00:14 +00:00
Peter Wemm
40c8cfe552 Use TAILQ macros for clean/dirty block list processing. Set b_xflags
rather than abusing the list next pointer with a magic number.
1998-10-31 15:31:29 +00:00
Kirk McKusick
b5cf6d7984 In nfs_link(), check for a cross-device mount *before* looking
in the v_data field.
Obtained from: Charles Hannum, via Frank van der Linden <frank@wins.uva.nl>
1998-09-29 23:39:37 +00:00
Kirk McKusick
d25ad74791 Missing vput when cross-device link error is detected in nfs_link. 1998-09-29 23:29:48 +00:00
Kirk McKusick
b6b74f2f4e During truncation, have to notify the VM about the new size
of the NFS file *before* doing the nfs_vinvalbuf operation.
Otherwise some invalid data may show up in an mmap.
1998-09-29 23:28:32 +00:00
Kirk McKusick
e68e908bda Frank sez: 'It fixes a problem with servers that return 0 values
for some of the fsinfo RPC fields. It is strictly speaking not
wrong to do this, as the spec says that "it is expected that a
server will make a best effort at supporting all the attributes",
but pretty unusual. You guessed it, it's NT servers that do it.'
Obtained from: Frank van der Linden <frank@wins.uva.nl>
1998-09-29 23:15:53 +00:00
Kirk McKusick
35800d700a Do not need (or want) to take a reference on an NFS file that
is being deleted due to an forcible unmount. The problem is
that vgone calls vclean() which then calls calls nfs_inactive()
with VXLOCK set on the vnode. Nfs_inactive() was calling vget()
to get a reference on the vnode, which in turn hung on VXLOCK.
Nfs_inactive() now checks v_usecount to make sure that the vnode
is not coming from vclean() before it does a vget().
1998-09-29 23:15:25 +00:00
Kirk McKusick
96438eb911 The code checks each fragment mark to see if it's valid; if the fragment
is less than NFS_MINPACKET or greater than NFS_MAXPACKET in size, it
barfs and, I think, drops the connection.

However, there's no guarantee that in a multi-fragment RPC, all the
fragments will be at least as large as NFS_MINPACKET.

In fact, with the version of "tclnfs" we have here, which supports NFS
over TCP, at least when built under SunOS 4.1.3 (i.e., with 4.1.3's
user-mode ONC RPC library), I can *repeatably* cause "tclnfs" to send a
request with more than one fragment, one of which is only 8 bytes long.
I just do a 3877-byte write to a file, at an offset of 0.

The check that "slp->ns_reclen" is greater than or equal to
NFS_MINPACKET serves no useful purpose - if the NFS server code can't
handle packets < NFS_MINPACKET bytes, it can't handle them over *any*
protocol, so the check has to be done above the RPC-over-TCP layer - and
should be removed.
Obtained from: Fix from Guy Harris, forwarded by Rick Macklem.
1998-09-29 22:33:05 +00:00
Kirk McKusick
1cda241131 Mark directory buffers that have no valid data with B_INVAL
so that they are not put in the cache.
1998-09-29 22:01:10 +00:00
Kirk McKusick
113b88d241 When adding data to a buffer, we need to clear the B_NEEDCOMMIT flag
which says that the data is on server but not committed.
1998-09-29 21:46:54 +00:00
Bruce Evans
8994ca3ce9 Removed statically configured mount type numbers (MOUNT_*) and all
references to them.

The change a couple of days ago to ignore these numbers in statically
configured vfsconf structs was slightly premature because the cd9660,
cfs, devfs, ext2fs, nfs vfs's still used MOUNT_* instead of the number
in their vfsconf struct.
1998-09-07 13:17:06 +00:00
Bruce Evans
cae300be0f Made unloading of the nfs LKM sort of work. This is mainly to test
detachment of vfs sysctls.  Unloading of vfs LKMs doesn't actually
work for any vfs, since it leaves garbage pointers to memory
allocation control structures.
1998-09-07 05:42:15 +00:00
Bruce Evans
e99ea9ec2b Ignore the statically configured vfs type numbers and assign vfs
type numbers in vfs attach order (modulo incomplete reuse of old
numbers after vfs LKMs are unloaded).  This requires reinitializing
the sysctl tree (or at least the vfs subtree) for vfs's that support
sysctls (currently only nfs).  sysctl_order() already handled
reinitialization reasonably except it checked for annulled self
references in the wrong place.

Fixed sysctls for vfs LKMs.
1998-09-05 17:13:28 +00:00
Bruce Evans
500b04a257 Instantiate `nfs_mount_type' in a standard file so that it is present
when nfs is an LKM.  Declare it in a header file.  Don't forget to use
it in non-Lite2 code.  Initialize it to -1 instead of to 0, since 0
will soon be the mount type number for the first vfs loaded.

NetBSD uses strcmp() to avoid this ugly global.
1998-09-05 15:17:34 +00:00
Doug Rabson
e69763a315 Cosmetic changes to the PAGE_XXX macros to make them consistent with
the other objects in vm.
1998-09-04 08:06:57 +00:00
Luoqi Chen
4ef872a4c5 Check for NULL pointer before freeing a struct sockaddr. m_freem() can handle
NULL, buf free() can't.
1998-09-01 02:31:52 +00:00
Garrett Wollman
cfe8b629f1 Yow! Completely change the way socket options are handled, eliminating
another specialized mbuf type in the process.  Also clean up some
of the cruft surrounding IPFW, multicast routing, RSVP, and other
ill-explored corners.
1998-08-23 03:07:17 +00:00
Bruce Evans
18df27bda2 Fixed printf format errors. 1998-08-18 00:32:50 +00:00
Doug Rabson
7032ad107e Protect all modifications to v_numoutput with splbio(). 1998-08-13 08:09:08 +00:00
Bruce Evans
13950bd2ed Don't configure compatibility code for pre-Lite2 mount() calls by
default.  This code should go away soon.
1998-08-12 20:17:42 +00:00
Peter Wemm
c5fa8d1a2c If we get an ENOBUFS from the network, it's normally transient network
interface congestion (eg: nfs over a ppp link, etc).  Don't log these
for UDP mounts, and don't cause syscalls to fail with EINTR.
This stops the 'nfs send error 55' warnings.

If the error is because the system is really hosed, this is the least
of your problems...
1998-08-01 09:04:02 +00:00
Bruce Evans
a23d65bfc8 Cast pointers to uintptr_t/intptr_t instead of to u_long/long,
respectively.  Most of the longs should probably have been
u_longs, but this changes is just to prevent warnings about
casts between pointers and integers of different sizes, not
to fix poorly chosen types.
1998-07-15 02:32:35 +00:00
Julian Elischer
fd5d1124e2 VOP_STRATEGY grows an (struct vnode *) argument
as the value in b_vp is often not really what you want.
(and needs to be frobbed). more cleanups will follow this.
Reviewed by: Bruce Evans <bde@freebsd.org>
1998-07-04 20:45:42 +00:00
John-Mark Gurney
56786ee91b fix buildworld hopefully be3fore anyone complains...
NFS_*TIMO should possibly be converted to sysctl vars (jkh's suggestion),
but in some cases it looks like nfs keeps a copy of the value in a struct

hash sizes are already ifdef'd KERNEL, so there aren't userland inpact
from them...
1998-06-30 11:19:22 +00:00
John-Mark Gurney
df394affa2 convert some nfs tunables to options, these are:
NFS_MINATTRTIMO         VREG attrib cache timeout in sec
NFS_MAXATTRTIMO
NFS_MINDIRATTRTIMO      VDIR attrib cache timeout in sec
NFS_MAXDIRATTRTIMO
NFS_GATHERDELAY         Default write gather delay (msec)
NFS_UIDHASHSIZ          Tune the size of nfssvc_sock with this
NFS_WDELAYHASHSIZ       and with this
NFS_MUIDHASHSIZ         Tune the size of nfsmount with this
NFS_NOSERVER            (already documented in LINT)
NFS_DEBUG               turn on NFS debugging

also, because NFS_ROOT is used by very different files, it has been
renamed to opt_nfsroot.h instead of the old opt_nfs.h....
1998-06-30 03:01:37 +00:00
Bruce Evans
29c0cb37eb Fixed typo in ifdefed code. (NFS_ACDEBUG is not in LINT. Therefore,
code controlled by it did not even compile.)
1998-06-21 12:50:12 +00:00
Bruce Evans
4c4918c9e4 Avoid an egcs pessimization for 64-bit signed division on i386's.
Pre-2.8 versions of gcc generate a call to __divdi3() for all 64-bit
signed divisions, but egcs optimizes them to a shift and fixup when
the divisor is a constant power of 2.  Unfortunately, it generates
a call to __cmpdi2() for the fixup, although all except possibly
ancient versions of gcc and egcs do ordinary 64-bit comparisons
inline.
1998-06-14 15:52:00 +00:00
Doug Rabson
ecbb00a262 This commit fixes various 64bit portability problems required for
FreeBSD/alpha.  The most significant item is to change the command
argument to ioctl functions from int to u_long.  This change brings us
inline with various other BSD versions.  Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.
1998-06-07 17:13:14 +00:00
Peter Wemm
55b41976c1 Make sure we go a nfs_fsinfo() in get/putpages before calling
readrpc/writerpc, since they assume it's already been done.  This could
break if the first read/write access to a nfs filesystem was an exec() or
mmap() instead of a read(), write() syscall.  (or statfs()).
nfs_getpages() could return an errno (EOPNOTSUPP) instead of a VM_PAGER_*
return code.  Some layout tweaks for the get/putpages code.
1998-06-01 11:32:53 +00:00
Peter Wemm
c0c4b3be24 Fix post-test pre-commit cleanup typo. 1998-06-01 11:07:16 +00:00
Peter Wemm
881b695f32 readlink() returns EINVAL rather than EPERM if called on a non-symlink. 1998-06-01 10:59:23 +00:00
Peter Wemm
b3c6f3134f Preset the maximum file size before we get to nfs_fsinfo(), based on
an (over?) conservative assumption about what the client can store in it's
buffer cache using a signed 32-bit 512-byte block number index.  Otherwise
it's possible for some file access when maxfilesize = 0 (eg: /usr is nfs
mounted and doing an execve())
Pointed out by:	 bde

XXX It might make sense to do a preemptive nfs_fsinfo() call at mount time.
1998-06-01 10:01:31 +00:00
Peter Wemm
4152886f7a For the on-the-wire protocol, u_long -> u_int32_t; long -> int32_t;
int -> int32_t; u_short -> u_int16_t.  Also, use mode_t instead of u_short
for storing modes (mode_t is a u_int16_t).

Obtained from: NetBSD
1998-05-31 20:09:01 +00:00
Peter Wemm
75c6892c16 Support 'mount -u' remounts. This may require disconnecting and rebinding
the socket.  Certain mode changes are not allowed.

Obtained from:  NetBSD
1998-05-31 19:49:31 +00:00
Peter Wemm
5738e077a6 xdr encode -1 properly.
Obtained from: NetBSD
1998-05-31 19:29:28 +00:00
Peter Wemm
9a0248a5dd Fully fill in nfsv2 write rpc requests rather than leaving garbage.
Obtained from: NetBSD
1998-05-31 19:28:15 +00:00
Peter Wemm
ec26d608b6 Don't silently fail to set file flags.
Obtained from:  NetBSD
1998-05-31 19:24:19 +00:00
Peter Wemm
ccc2eb6a3a Don't blindly accept the server's preferences if they are too small.
Obtained from:  NetBSD
1998-05-31 19:20:44 +00:00
Peter Wemm
71c667c91b Prototype support for selectively allowing non-reserved ports on a per
export basis.  Needs userland support yet.

Obtained from:  NetBSD
1998-05-31 19:16:08 +00:00
Peter Wemm
13b9f88167 Don't pass a second copy of the uid/gid in with the v2/v3 sattr structures,
it just makes more work.  We pass a copy of the uid/gid with the
credentials.  (although, this may need to be revisited if a non AUTHUNIX
authentication method (such as NFSKERB) ever gets implemented).

Obtained from:  NetBSD
1998-05-31 19:00:19 +00:00
Peter Wemm
d0e443aa3a Use the new SB_UPCALL flag,
Obtained from:  NetBSD (but I changed the flag clear order in case).
1998-05-31 18:46:06 +00:00
Peter Wemm
b258e976a8 NFS_SMALLFH is defined in nfsproto.h, not sys/mount.h
Obtained from:  NetBSD
1998-05-31 18:32:23 +00:00
Peter Wemm
fe92746897 Don't let the user try "rmdir ."
Obtained from:  NetBSD
1998-05-31 18:30:42 +00:00
Peter Wemm
f82a64e18d Don't let the user try and unlink() a directory on a NFS server.
Obtained from:  NetBSD
1998-05-31 18:28:45 +00:00
Peter Wemm
124765333e When a write rpc returns an error, break the loop.
Obtained from: NetBSD
1998-05-31 18:27:07 +00:00
Peter Wemm
d6bad9e190 Don't leak an mbuf when a write rpc returns zero bytes written.
Obtained from: NetBSD
1998-05-31 18:25:32 +00:00
Peter Wemm
a710a27b67 #ifdef a diagnostic printf
Obtained from:  NetBSD
1998-05-31 18:23:24 +00:00
Peter Wemm
e9156323b8 Don't try and free mrep twice on some error conditions.
Obtained from:  NetBSD
1998-05-31 18:19:43 +00:00
Peter Wemm
6301c8c330 #ifdef a diagnostic panic, plus another missed costmetic change.
Obtained from:  NetBSD
1998-05-31 18:11:03 +00:00
Peter Wemm
1da42e389c We have gained 2 more errno's, add them to the NFSv2 mapping table. 1998-05-31 18:09:18 +00:00
Peter Wemm
946010a5a4 Missed a cosmetic change that the other BSD's have. 1998-05-31 18:08:09 +00:00
Peter Wemm
535fa8520e oops, nfs_msg() is called from client code too. 1998-05-31 18:06:07 +00:00
Peter Wemm
4a5f4c547e When we can't reconnect a socket, don't forget to unlock before retrying
or we can deadlock.

Obtained from:  NetBSD
1998-05-31 18:02:56 +00:00
Peter Wemm
6bea90a1ee Don't log zero length reads, this can happen during normal operation.
Obtained from: NetBSD
1998-05-31 18:00:46 +00:00
Peter Wemm
6c1a945540 Consider for readdir chunk sizes when tuning socket buffer reservations.
Obtained from:  NetBSD
1998-05-31 17:57:43 +00:00
Peter Wemm
c489c83e4c Some const's
Obtained from: NetBSD
1998-05-31 17:48:07 +00:00
Peter Wemm
e8cf20c8db NFS Jumbo commit part 1. Cosmetic and structural changes only. The aim
of this part of commits is to minimize unnecessary differences between
the other NFS's of similar origin.  Yes, there are gratuitous changes here
that the style folks won't like, but it makes the catch-up less difficult.
1998-05-31 17:27:58 +00:00
Peter Wemm
9e8799c9cd VOP_ABORTUP() appears to be called with the wrong vnode. The other callers
that I checked (eg: ufs_link()) do the ABORTOP on the directory rather than
the file itself.  After Michael Hancock's patches, the abortop doesn't seem
all that critial now since something else will free the pathname buffer.
1998-05-31 01:03:07 +00:00
Peter Wemm
7c1c33a7dd When using NFSv3, use the remote server's idea of the maximum file size
rather than assuming 2^64.  It may not like files that big. :-)
On the nfs server, calculate and report the max file size as the point
that the block numbers in the cache would turn negative.
(ie: 1099511627775 bytes (1TB)).

One of the things I'm worried about however, is that directory offsets
are really cookies on a NFSv3 server and can be rather large, especially
when/if the server generates the opaque directory cookies by using a local
filesystem offset in what comes out as the upper 32 bits of the 64 bit
cookie.  (a server is free to do this, it could save byte swapping
depending on the native 64 bit byte order)

Obtained from:	NetBSD
1998-05-30 16:33:58 +00:00
Peter Wemm
0d7d0fcf29 Convert a couple of large allocations to use zones rather than malloc
for better packing.  This means that we can choose better values for the
various hash entries without having to try and get it all to fit within
an artificial power of two limit for malloc's sake.
1998-05-24 14:41:56 +00:00
Peter Wemm
b550c193c4 s/flags/flag/ 1998-05-20 08:05:45 +00:00
Peter Wemm
dfae73fd2e A cleaner fix for PR#5102, clear nonsense flags at mount time rather than
in the core of nfs_bio.c at the 11th hour.

PR:		5102
1998-05-20 08:02:24 +00:00
Peter Wemm
c578853467 Don't change argp->flags after it's been copied. 1998-05-20 07:59:21 +00:00
Peter Wemm
fe6c0d4599 Allow control of the attribute cache timeouts at mount time.
We had run out of bits in the nfs mount flags, I have moved the internal
state flags into a seperate variable.  These are no longer visible via
statfs(), but I don't know of anything that looks at them.
1998-05-19 07:11:27 +00:00
Bruce Evans
6fc500878a Get timespecs directly instead of via timevals. 1998-05-16 16:20:50 +00:00
Bruce Evans
7db3328337 Don't abuse `+' to combine flags. 1998-05-16 16:03:10 +00:00
Bruce Evans
ba692924a8 Backed out rev.1.76. It just added style bugs. 1998-05-16 15:21:29 +00:00
Bruce Evans
bf57f6f9b3 Get timespecs directly instead of via timevals. 1998-05-16 15:11:24 +00:00
Peter Wemm
e4a57cb44a Add missing arg to vget().. Serves me right for committing a 2.2 patch to
-current without testing it there.. :-(

Submitted by: Michael Hancock <michaelh@cet.co.jp>
1998-05-13 07:49:08 +00:00
Peter Wemm
9733f8ee44 Hold a reference to the vnode during the sillyrename cleanup. If we block
in nfs_vinvalbuf() or the nfs_removeit(), we can have the nfsnode reallocated
from underneath us (eg: replaced by a ufs 'struct inode') which can cause
disk corruption ('freeing free block' when di_db[5] gets trashed).
This is not a cheap fix, but it'll do until the nfsnodes get reference
counting and/or locking.

Apparently NetBSD have a similar fix (apparently from BSDI).

I wish all PR's had this much useful detail. :-)

PR: 6611
Submitted by: Stephen Clawson <sclawson@marker.cs.utah.edu>
1998-05-13 06:10:13 +00:00
Peter Wemm
3b5745a500 Move the *vpp initialization earlier so that it's set in all error cases.
This should stop the 'panic: leaf should not be empty' nfs panic.

PR: 1856
Submitted by: msaitoh@spa.is.uec.ac.jp
1998-05-13 05:47:09 +00:00
Mike Smith
7be2d30077 In the words of the submitter:
---------
Make callers of namei() responsible for releasing references or locks
instead of having the underlying filesystems do it.  This eliminates
redundancy in all terminal filesystems and makes it possible for stacked
transport layers such as umapfs or nullfs to operate correctly.

Quality testing was done with testvn, and lat_fs from the lmbench suite.

Some NFS client testing courtesy of Patrik Kudo.

vop_mknod and vop_symlink still release the returned vpp.  vop_rename
still releases 4 vnode arguments before it returns.  These remaining cases
will be corrected in the next set of patches.
---------

Submitted by:	Michael Hancock <michaelh@cet.co.jp>
1998-05-07 04:58:58 +00:00
Mike Smith
79cc756d8b As described by the submitter:
Reverse the VFS_VRELE patch.  Reference counting of vnodes does not need
to be done per-fs.  I noticed this while fixing vfs layering violations.
Doing reference counting in generic code is also the preference cited by
John Heidemann in recent discussions with him.

The implementation of alternative vnode management per-fs is still a valid
requirement for some filesystems but will be revisited sometime later,
most likely using a different framework.

Submitted by:	Michael Hancock <michaelh@cet.co.jp>
1998-05-06 05:29:41 +00:00
Poul-Henning Kamp
2f5f6b74ca Use random() to find our initial xid. 1998-04-06 11:41:07 +00:00
Poul-Henning Kamp
227ee8a188 Eradicate the variable "time" from the kernel, using various measures.
"time" wasn't a atomic variable, so splfoo() protection were needed
around any access to it, unless you just wanted the seconds part.

Most uses of time.tv_sec now uses the new variable time_second instead.

gettime() changed to getmicrotime(0.

Remove a couple of unneeded splfoo() protections, the new getmicrotime()
is atomic, (until Bruce sets a breakpoint in it).

A couple of places needed random data, so use read_random() instead
of mucking about with time which isn't random.

Add a new nfs_curusec() function.

Mark a couple of bogosities involving the now disappeard time variable.

Update ffs_update() to avoid the weird "== &time" checks, by fixing the
one remaining call that passwd &time as args.

Change profiling in ncr.c to use ticks instead of time.  Resolution is
the same.

Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call
hzto() which subtracts time" sequences.

Reviewed by:	bde
1998-03-30 09:56:58 +00:00
Steve Price
b9921401f1 Don't allow the readdirplus routine to be used in NFS V2.
PR:		5102
Reviewed by:	msmith
Submitted by:	Dmitry Kohmanyuk <dk@farm.org>
1998-03-28 16:05:05 +00:00
Bruce Evans
771b51ef7b Don't depend on <sys/mount.h> including <sys/socket.h>. 1998-03-28 12:04:40 +00:00
Bruce Evans
08637435f2 Moved some #includes from <sys/param.h> nearer to where they are actually
used.
1998-03-28 10:33:27 +00:00
Tor Egge
8f7030a7cc Add a BOOTP_WIRED_TO option, for use on machines with multiple network
cards where the first detected card should not be used for bootp.
Submitted by:	Doug Ambrisko <ambrisko@whistle.com>
1998-03-14 04:13:56 +00:00
Tor Egge
8bd965cce4 Update workaround for limitations in the arp code.
Adjust the RPC timeout message which occured when the old workaround
broke to show the correct IP address.
1998-03-14 03:25:18 +00:00
Julian Elischer
b1897c197c Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman)
Submitted by:	Kirk McKusick (mcKusick@mckusick.com)
Obtained from:  WHistle development tree
1998-03-08 09:59:44 +00:00
John Dyson
8f9110f6a1 This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code.  These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances.  Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code.  This code might have been committed seperately, but
almost everything is interrelated.

1)	Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
	are fully valid.
2)	Rather than deactivating erroneously read initial (header) pages in
	kern_exec, we now free them.
3)	Fix the rundown of non-VMIO buffers that are in an inconsistent
	(missing vp) state.
4)	Fix the disassociation of pages from buffers in brelse.  The previous
	code had rotted and was faulty in a couple of important circumstances.
5)	Remove a gratuitious buffer wakeup in vfs_vmio_release.
6)	Remove a crufty and currently unused cluster mechanism for VBLK
	files in vfs_bio_awrite.  When the code is functional, I'll add back
	a cleaner version.
7)	The page busy count wakeups assocated with the buffer cache usage were
	incorrectly cleaned up in a previous commit by me.  Revert to the
	original, correct version, but with a cleaner implementation.
8)	The cluster read code now tries to keep data associated with buffers
	more aggressively (without breaking the heuristics) when it is presumed
	that the read data (buffers) will be soon needed.
9)	Change to filesystem lockmgr locks so that they use LK_NOPAUSE.  The
	delay loop waiting is not useful for filesystem locks, due to the
	length of the time intervals.
10)	Correct and clean-up spec_getpages.
11)	Implement a fully functional nfs_getpages, nfs_putpages.
12)	Fix nfs_write so that modifications are coherent with the NFS data on
	the server disk (at least as well as NFS seems to allow.)
13)	Properly support MS_INVALIDATE on NFS.
14)	Properly pass down MS_INVALIDATE to lower levels of the VM code from
	vm_map_clean.
15)	Better support the notion of pages being busy but valid, so that
	fewer in-transit waits occur.  (use p->busy more for pageouts instead
	of PG_BUSY.)  Since the page is fully valid, it is still usable for
	reads.
16)	It is possible (in error) for cached pages to be busy.  Make the
	page allocation code handle that case correctly.  (It should probably
	be a printf or panic, but I want the system to handle coding errors
	robustly.  I'll probably add a printf.)
17)	Correct the design and usage of vm_page_sleep.  It didn't handle
	consistancy problems very well, so make the design a little less
	lofty.  After vm_page_sleep, if it ever blocked, it is still important
	to relookup the page (if the object generation count changed), and
	verify it's status (always.)
18)	In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19)	Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20)	Fix vm_pager_put_pages and it's descendents to support an int flag
	instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
Mike Smith
651ae11e2f Trivial filesystem getpages/putpages implementations, set the second.
These should be considered the first steps in a work-in-progress.
Submitted by:	Terry Lambert <terry@freebsd.org>
1998-03-06 09:46:52 +00:00
Mike Smith
34bdbbd0de The intent is to get rid of WILLRELE in vnode_if.src by making
a complement to all ops that return a vpp, VFS_VRELE.  This is
initially only for file systems that implement the following ops
that do a WILLRELE:

	vop_create, vop_whiteout, vop_mknod, vop_remove, vop_link,
	vop_rename, vop_mkdir, vop_rmdir, vop_symlink

This is initial DNA that doesn't do anything yet.  VFS_VRELE is
implemented but not called.

A default vfs_vrele was created for fs implementations that use the
standard vnode management routines.

VFS_VRELE implementations were made for the following file systems:

Standard (vfs_vrele)
	ffs mfs nfs msdosfs devfs ext2fs

Custom
	union umapfs

Just EOPNOTSUPP
	fdesc procfs kernfs portal cd9660

These implementations may change as VOP changes are implemented.

In the next phase, in the vop implementations calls to vrele and the vrele
part of vput will be moved to the top layer vfs_vnops and made visible
to all layers.  vput will be replaced by unlock in these cases.  Unlocking
will still be done in the per fs layer but the refcount decrement will be
triggered at the top because it doesn't hurt to hold a vnode reference a
little longer.  This will have minimal impact on the structure of the
existing code.

This will only be done for vnode arguments that are released by the various
fs vop implementations.

Wider use of VFS_VRELE will likely require restructuring of the code.

Reviewed by:	phk, dyson, terry et. al.
Submitted by:	Michael Hancock <michaelh@cet.co.jp>
1998-03-01 22:46:53 +00:00
Eivind Eklund
303b270b0a Staticize. 1998-02-09 06:11:36 +00:00
Eivind Eklund
0b08f5f737 Back out DIAGNOSTIC changes. 1998-02-06 12:14:30 +00:00
John Dyson
857fe6801a Fix an omission of a line from the previous commit to this file. The
problem appeared to be an NFS hang.
1998-02-05 16:40:57 +00:00