Commit Graph

506 Commits

Author SHA1 Message Date
alc
87da2c3cf3 - Acquire the vm_object's lock when performing vm_object_page_clean().
- Add a parameter to vm_pageout_flush() that tells vm_pageout_flush()
   whether its caller has locked the vm_object.  (This is a temporary
   measure to bootstrap vm_object locking.)
2003-04-24 04:31:25 +00:00
jeff
49e05d204e - Lock bufs before inspecting their flags. 2003-03-13 07:05:22 +00:00
des
2756b6c964 More low-hanging fruit: kill caddr_t in calls to wakeup(9) / [mt]sleep(9). 2003-03-02 16:54:40 +00:00
jeff
9e4c9a6ce9 - Add an interlock argument to BUF_LOCK and BUF_TIMELOCK.
- Remove the buftimelock mutex and acquire the buf's interlock to protect
   these fields instead.
 - Hold the vnode interlock while locking bufs on the clean/dirty queues.
   This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another
   BUF_LOCK with a LK_TIMEFAIL to a single lock.

Reviewed by:	arch, mckusick
2003-02-25 03:37:48 +00:00
phk
a0169e1761 Don't use mbuf allocator flags for malloc(9). 2003-02-22 10:35:37 +00:00
imp
cf874b345d Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
alfred
bf8e8a6e8f Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
dillon
ccd5574cc6 Bow to the whining masses and change a union back into void *. Retain
removal of unnecessary casts and throw in some minor cleanups to see if
anyone complains, just for the hell of it.
2003-01-13 00:33:17 +00:00
dillon
ddf9ef103e Change struct file f_data to un_data, a union of the correct struct
pointer types, and remove a huge number of casts from code using it.

Change struct xfile xf_data to xun_data (ABI is still compatible).

If we need to add a #define for f_data and xf_data we can, but I don't
think it will be necessary.  There are no operational changes in this
commit.
2003-01-12 01:37:13 +00:00
schweikh
d3367c5f5d Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup,
especially in troff files.
2003-01-01 18:49:04 +00:00
dillon
4ecb4d83e4 Abstract-out the constants for the sequential heuristic.
No operational changes.

MFC after:	1 day
2002-12-28 20:28:10 +00:00
iedowse
aeec108485 In the NFSv3 `fsinfo' procedure reply, don't claim that we support
32k read and write operations on datagram sockets when in fact we
reject requests larger than 16k. It must be the case that virtually
all clients use data sizes of 16k or less for UDP transport (FreeBSD's
client defaults to 8k and never exceeds 16k), as this bug has been
present ever since NFSv3 support was added.

Reported by:	Senthil <lihtnes78@netscape.net>
Reviewed by:	dillon
Approved by:	re
MFC-after:	1 week
2002-12-05 16:58:11 +00:00
rwatson
b8dd64f5ef Permit MAC policies to instrument the access control decisions for
system accounting configuration and for nfsd server thread attach.
Policies might use this to protect the integrity or confidentiality
of accounting data, limit the ability to turn on or off accounting,
as well as to prevent inappropriately labeled threads from becoming nfs
server threads.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-04 15:13:36 +00:00
jeff
b2bc94cc0d - Introduce a new macro, since that's what nfs loves, called
nfsm_srvpathsiz.  This macro plucks a length out of an rpc request and
   verifies that its size does not exceed NFS_MAXPATHLEN.  If it does
   it generates an ENAMETOOLONG response.
 - Use this macro, and the existing nfsm_srvnamsiz macro in two places
   where we deal with paths passed in by the client.

This fixes a linux interoperability bug.  Linux was sending oversized path
components which would cause us to ignore the request all together.  This
causes linux to hang indefinitly while it waits for a response.  This
could still happen in other cases where we error out with EBADRPC.

Sponsored by:	Isilon Systems, Inc.
Reviewed by:	alfred, fabbri@isilon.com, neal@isilon.com
2002-10-31 22:35:03 +00:00
rwatson
6fe81fed62 Set the NOMACCHECK flag for namei()'s generated by the NFS server code.
We currently don't enforce protections on NFS-originated VOP's.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-19 21:27:40 +00:00
rwatson
ddb3d2b4ee Correct a problem wherein NFS servers running NFSv2 would not return
certain classes of failure responses to the client during a failed
remove operation.

Submitted by:	Ian Dowse <iedowse@maths.tcd.ie>
2002-10-03 21:50:37 +00:00
jeff
004a26186f - Use incore() instead of gbincore() so we don't have to acquire the
vnode interlock.
2002-09-25 02:39:39 +00:00
phk
63d87674c8 Use m_length() instead of home-rolled versions. 2002-09-18 19:44:14 +00:00
phk
d5c3cc895c Make the V2 errno translation more resistent to new errnos. 2002-08-21 19:28:44 +00:00
jeff
02517b6731 - Replace v_flag with v_iflag and v_vflag
- v_vflag is protected by the vnode lock and is used when synchronization
   with VOP calls is needed.
 - v_iflag is protected by interlock and is used for dealing with vnode
   management issues.  These flags include X/O LOCK, FREE, DOOMED, etc.
 - All accesses to v_iflag and v_vflag have either been locked or marked with
   mp_fixme's.
 - Many ASSERT_VOP_LOCKED calls have been added where the locking was not
   clear.
 - Many functions in vfs_subr.c were restructured to provide for stronger
   locking.

Idea stolen from:	BSD/OS
2002-08-04 10:29:36 +00:00
peter
ec96aab91e Oops, another unused arg to nfssvc_nfsd(). *blush*
Submitted by:	jake
2002-07-24 23:10:34 +00:00
peter
f26018fef5 Fully exterminate nfsd_srvargs and nfsd_cargs. They were either unused
or giant NOP's.  There was a credential in srvargs that was giving
rwatson some heartburn. :-)
2002-07-24 22:27:35 +00:00
rwatson
2fe187e9c4 Stick a dark comment in about the fact that the NFS server code allocates
a ucred by itself as part of an nfs descriptor, then bzero's the ucred,
fails to initialize the mutex, etc.  This is very bad, but I don't have
time to fix it right now.  nfsd should instead hold a cred pointer,
and the credential should be properly initialized, probably from a
descendent of a kernel process credential.
2002-07-24 14:24:16 +00:00
ume
ad7629f7ee sync comment with reality. IN6P_BINDV6ONLY -> IN6P_IPV6_V6ONLY. 2002-07-22 15:55:50 +00:00
dillon
c57275f347 'recm' was not being unconditionally cleared for each loop, leading to
system lockups (infinite loops) when a zero-length RPC is received.
Linux clients will sometimes send zero-length RPC requests.

Reorganize the use of recm in the loop.

Cc: security@freebsd.org
Submitted by:	Mike Junk <junk@isilon.com>
MFC after:	3 days
2002-07-17 01:07:08 +00:00
alfred
df766765ba Add IPv6 support.
Submitted by: Jean-Luc Richier <Jean-Luc.Richier@imag.fr>
2002-07-15 19:40:23 +00:00
dillon
0b74a2da00 Convert old style (type foo *)0 casts to NULLs
PR:		kern/40360
Requested by:	Hiten PAndya via direct email
2002-07-11 17:54:58 +00:00
dillon
da4e111a55 Replace the global buffer hash table with per-vnode splay trees using a
methodology similar to the vm_map_entry splay and the VM splay that Alan
Cox is working on.  Extensive testing has appeared to have shown no
increase in overhead.

Disadvantages
    Dirties more cache lines during lookups.

    Not as fast as a hash table lookup (but still N log N and optimal
    when there is locality of reference).

Advantages
    vnode->v_dirtyblkhd is now perfectly sorted, making fsync/sync/filesystem
    syncer operate more efficiently.

    I get to rip out all the old hacks (some of which were mine) that tried
    to keep the v_dirtyblkhd tailq sorted.

    The per-vnode splay tree should be easier to lock / SMPng pushdown on
    vnodes will be easier.

    This commit along with another that Alan is working on for the VM page
    global hash table will allow me to implement ranged fsync(), optimize
    server-side nfs commit rpcs, and implement partial syncs by the
    filesystem syncer (aka filesystem syncer would detect that someone is
    trying to get the vnode lock, remembers its place, and skip to the
    next vnode).

Note that the buffer cache splay is somewhat more complex then other splays
due to special handling of background bitmap writes (multiple buffers with
the same lblkno in the same vnode), and B_INVAL discontinuities between the
old hash table and the existence of the buffer on the v_cleanblkhd list.

Suggested by: alc
2002-07-10 17:02:32 +00:00
tanimura
e6fa9b9e92 Back out my lats commit of locking down a socket, it conflicts with hsu's work.
Requested by:	hsu
2002-05-31 11:52:35 +00:00
tanimura
92d8381dd5 Lock down a socket, milestone 1.
o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a
  socket buffer. The mutex in the receive buffer also protects the data
  in struct socket.

o Determine the lock strategy for each members in struct socket.

o Lock down the following members:

  - so_count
  - so_options
  - so_linger
  - so_state

o Remove *_locked() socket APIs.  Make the following socket APIs
  touching the members above now require a locked socket:

 - sodisconnect()
 - soisconnected()
 - soisconnecting()
 - soisdisconnected()
 - soisdisconnecting()
 - sofree()
 - soref()
 - sorele()
 - sorwakeup()
 - sotryfree()
 - sowakeup()
 - sowwakeup()

Reviewed by:	alfred
2002-05-20 05:41:09 +00:00
trhodes
28d42899b7 More s/file system/filesystem/g 2002-05-16 21:28:32 +00:00
iedowse
cb60904294 Limit to the maximum allowed reply size the amount of data that
nfsrv_readdir and nfsrv_readdirplus can return. A client request
containing an over-large `count' field could trigger the "Bad nfs
svc reply" panic in nfs_syscalls.c.

Spotted while trying to reproduce kern/37304, which turned out to
be fixed in FreeBSD a long time ago.

MFC after:	1 week
2002-04-21 16:14:54 +00:00
jhb
dc2e474f79 Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API.  The entire API now consists of two functions
similar to the pre-KSE API.  The suser() function takes a thread pointer
as its only argument.  The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0.  The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on:	smp@
2002-04-01 21:31:13 +00:00
jeff
5cc8ffe0d4 Remove references to vm_zone.h and switch over to the new uma API. 2002-03-20 10:07:52 +00:00
mckusick
14dd08fd15 Add a flags parameter to VFS_VGET to pass through the desired
locking flags when acquiring a vnode. The immediate purpose is
to allow polling lock requests (LK_NOWAIT) needed by soft updates
to avoid deadlock when enlisting other processes to help with
the background cleanup. For the future it will allow the use of
shared locks for read access to vnodes. This change touches a
lot of files as it affects most filesystems within the system.
It has been well tested on FFS, loopback, and CD-ROM filesystems.
only lightly on the others, so if you find a problem there, please
let me (mckusick@mckusick.com) know.
2002-03-17 01:25:47 +00:00
jhb
3706cd3509 Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.
2002-02-27 18:32:23 +00:00
dillon
45630603d0 The vnode was not being vput()'d in the EEXIST mknod case on the nfs
server side.  This can lead to a system deadlock.

Reviewed by:    iedowse
Tested by:      Alexey G Misurenko <mag@caravan.ru>, iedowse
Bug found with help by: Alexey G Misurenko <mag@caravan.ru>
MFC at:         earliest convenience
2002-01-14 19:14:08 +00:00
iedowse
a2d0f7e01a It is required by VOP_CREATE, VOP_MKNOD, VOP_SYMLINK and VOP_MKDIR
that va_mode of the supplied attributes is filled in with a valid
file mode (i.e not VNOVAL, and only ALLPERM bits set). However,
some NFS server op functions didn't guarantee this for all possible
request messages:

If a V3 client chose not include to a mode specification, we could
end up creating an ffs inode with mode 0177777, requiring a manual
fsck on the next reboot. Fix this by setting va_mode to 0 before
calling the VOP if a mode hasn't been supplied by the client.

In nfsrv_symlink(), S_IFMT bits supplied by a V2 client could end
up in the va_mode passed to VOP_SYMLINK with similar effects. We
now use the macro nfstov_mode() to correctly mask the bits.
2002-01-13 05:36:05 +00:00
iedowse
641697c62b Fix a few NFSv2 issues that slipped in during the big cleanup. The
semantics of the nfsm_reply() macro were changed so that the caller
has to explicitly handle the V2 error case, whereas before,
nfsm_reply() did a `goto nfsmout' then. A few server ops (setattr,
readlink, create, mkdir) weren't updated to match, so errors in the
V2 case could cause protocol hangs and leaked mbufs.

Correct some comments that describe the old nfsm_reply behaviour.

[older, harmless nit] Remove the unnecessary `nfsmreply0' label in
nfsrv_create(), since for its users, the main `ereply' label does
the same thing.
2002-01-12 03:57:25 +00:00
iedowse
e41e7ac3ee The macro nfsm_reply() is supposed to allocate a reply in all cases,
but since the nfs cleanup, it hasn't done so in the case where
`error' is EBADRPC. Callers of this macro expect it to initialise
*mrq, and the `nfsmout' exit point expects a reply to be allocated
if error == 0. When nfsm_reply() was called with error = EBADRPC,
whatever junk was in *mrq (often a stale pointer to an old reply
mbuf) would be assumed to be a valid reply and passed to pru_sosend(),
causing a crash sooner or later.

Fix this by allocating a reply even in the EBADRPC case like we
used to do. This bug was specific to -current.
2002-01-11 22:22:39 +00:00
msmith
7f06d73491 Rename some variables that end up shadowing their namesakes in the NFS client
code.

Reviewed by:	peter
2002-01-08 19:41:06 +00:00
iedowse
6e9f1df98f Avoid passing the variable `tl' to functions that just use it for
temporary storage. In the old NFS code it wasn't at all clear if
the value of `tl' was used across or after macro calls, but I'm
fairly confident that the convention was to keep its use local.
Each ex-macro function now uses a local version of this variable,
so all of the double-indirection goes away.

The only exception to the `local use' rule for `tl' is nfsm_clget(),
which is left unchanged by this commit.

Reviewed by:	peter
2001-12-18 01:22:09 +00:00
iedowse
e16d92f4df When VOP_SYMLINK fails, the value of *vpp is junk, so we must NULL
out nd.ni_vp to prevent the resource cleanup code at the end of
nfsrv_symlink from trying to vrele it. This fixes a "vrele: negative
ref cnt" panic that can occur when a symlink is attempted on an NFS
filesystem with no free space. Found locally, but the symptoms
correspond to those in the PR referenced below.

PR:		kern/26878
MFC after:	3 days
2001-12-04 16:53:42 +00:00
dillon
86ed17d675 Give struct socket structures a ref counting interface similar to
vnodes.  This will hopefully serve as a base from which we can
expand the MP code.  We currently do not attempt to obtain any
mutex or SX locks, but the door is open to add them when we nail
down exactly how that part of it is going to work.
2001-11-17 03:07:11 +00:00
peter
4048edc3f3 Fix a leftover client comment, long line fix. 2001-11-15 23:49:02 +00:00
iedowse
6eb4ba0b09 Now that nfsm_reply() does not usually set 'error' to 0, we need
to do it explicitly in nfsrv_noop so that the reply gets sent back
to the client. This fixes the generation of a selection of RPC
error replies (RPC_PROGMISMATCH, RPC_PROGUNAVAIL, RPC_PROCUNAVAIL
etc.) that are used by some clients to detect support for optional
protocols and features.

Reviewed by:	peter
Reported by:	Thomas Quinot <quinot@inf.enst.fr>
PR:		kern/31479
2001-10-25 19:07:56 +00:00
peter
562ebdfbed Unwind some more macros. NFSMADV() was kinda silly since it was right
next to equivalent m_len adjustments.  Move the nfsm_subs.h macros
into groups depending on which phase they are used in, since that
affects the error recovery requirements.  Collect some of the common error
checking into a single macro as preparation for unwinding some more.
Have nfs_rephead return a value instead of secretly modifying args.
Remove some unused function arguments that were being passed around.
Clarify nfsm_reply()'s error handling (I hope).
2001-09-28 04:37:08 +00:00
peter
d1e3264e13 Oops. I forgot to cvs rm this before. There is a common nfsproto.h.
This was a repo copy leftover.
2001-09-28 04:31:23 +00:00
peter
2854bb2840 Make nfsm_dissect() have an obvious return value. 2001-09-27 22:40:38 +00:00
peter
bc122022f9 Tidy up nfsm_build usage. This is only partially finished. 2001-09-27 02:33:36 +00:00
peter
863296308c Wrap a module around the init code so that we have somethign do do a
modfind(2) on, and declare a version so that loader/kldload etc
can locate it (using kldxref's linker.hints file if needed).
2001-09-20 05:13:43 +00:00
peter
85182a8d78 Cleanup and split of nfs client and server code.
This builds on the top of several repo-copies.
2001-09-18 23:32:09 +00:00
peter
2392b3448b Sync some differences that were different between the copies of the files
that were in nfs/nfs.h and nfsserver/nfs.h in the p4 tree.
2001-09-15 04:41:56 +00:00
julian
5596676e6c KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
kris
bd6f9cb9b6 Fix some signed/unsigned integer confusion, and add bounds checking of
arguments to some functions.

Obtained from:	NetBSD
Reviewed by:	peter
MFC after:	2 weeks
2001-09-10 11:28:07 +00:00
dillon
6b8714e0aa Pushdown Giant for nfs syscalls (nfssvc()) 2001-08-31 22:39:36 +00:00
dillon
e028603b7e With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage).  Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.
2001-07-04 16:20:28 +00:00
jhb
ab91beada7 - Protect the mnt_vnode list with the mntvnode lock.
- Use queue(9) macros.
2001-06-28 04:10:07 +00:00
alfred
a3f0842419 Introduce a global lock for the vm subsystem (vm_mtx).
vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb
2001-05-19 01:28:09 +00:00
markm
bcca5847d5 Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by:	bde (with reservations)
2001-05-01 08:13:21 +00:00
grog
4b9d9cbaac Revert consequences of changes to mount.h, part 2.
Requested by:	bde
2001-04-29 02:45:39 +00:00
grog
1f5de30718 Correct #includes to work with fixed sys/mount.h. 2001-04-23 09:05:15 +00:00
alfred
f0669d6c9e Implement client side NFS locks.
Obtained from: BSD/os
Import Ok'd by: mckusick, jkh, motd on builder.freebsd.org
2001-04-17 20:45:23 +00:00
peter
670e711dd1 Use a generic implementation of the Fowler/Noll/Vo hash (FNV hash).
Make the name cache hash as well as the nfsnode hash use it.

As a special tweak, create an unsigned version of register_t.  This allows
us to use a special tweak for the 64 bit versions that significantly
speeds up the i386 version (ie: int64 XOR int64 is slower than int64
XOR int32).

The code layout is a little strange for the string function, but I was
able to get between 5 to 10% improvement over the original version I
started with. The layout affects gcc code generation choices and this way
was fastest on x86 and alpha.

Note that 'CPUTYPE=p3' etc makes a fair difference to this.  It is
around 45% faster with -march=pentiumpro on a p6 cpu.
2001-03-17 09:31:06 +00:00
green
18d474781f Switch to using a struct xucred instead of a struct xucred when not
actually in the kernel.  This structure is a different size than
what is currently in -CURRENT, but should hopefully be the last time
any application breakage is caused there.  As soon as any major
inconveniences are removed, the definition of the in-kernel struct
ucred should be conditionalized upon defined(_KERNEL).

This also changes struct export_args to remove dependency on the
constantly-changing struct ucred, as well as limiting the bounds
of the size fields to the correct size.  This means: a) mountd and
friends won't break all the time, b) mountd and friends won't crash
the kernel all the time if they don't know what they're doing wrt
actual struct export_args layout.

Reviewed by:	bde
2001-02-18 13:30:20 +00:00
asmodai
3065478332 Preceed/preceeding are not english words. Use precede and preceding. 2001-02-18 10:43:53 +00:00
iedowse
769baae5ab Fix some problems that were introduced in revision 1.97. Instead
of returning an error code to the caller, NFS server op routines
must themselves build an error reply and return 0 to the caller.

This is achieved by replacing the erroneous return statements with
code that jumps forward to the op function's reply code. We need
to be careful to ensure that the 'struct mount' pointer is NULL
though, so that the final vn_finished_write() call becomes a no-op.

Reviewed by:	mckusick, dillon
2001-02-09 13:24:06 +00:00
bmilekic
4b6a7bddad * Rename M_WAIT mbuf subsystem flag to M_TRYWAIT.
This is because calls with M_WAIT (now M_TRYWAIT) may not wait
  forever when nothing is available for allocation, and may end up
  returning NULL. Hopefully we now communicate more of the right thing
  to developers and make it very clear that it's necessary to check whether
  calls with M_(TRY)WAIT also resulted in a failed allocation.
  M_TRYWAIT basically means "try harder, block if necessary, but don't
  necessarily wait forever." The time spent blocking is tunable with
  the kern.ipc.mbuf_wait sysctl.
  M_WAIT is now deprecated but still defined for the next little while.

* Fix a typo in a comment in mbuf.h

* Fix some code that was actually passing the mbuf subsystem's M_WAIT to
  malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the
  value of the M_WAIT flag, this could have became a big problem.
2000-12-21 21:44:31 +00:00
dwmalone
dd75d1d73b Convert more malloc+bzero to malloc+M_ZERO.
Submitted by:	josh@zipperup.org
Submitted by:	Robert Drehmel <robd@gmx.net>
2000-12-08 21:51:06 +00:00
phk
7101ba5caa Simplify the tprintf() API.
Loose the special <sys/tprintf.h> #include file.
2000-11-26 20:35:21 +00:00
dillon
15a44d16ca This patchset fixes a large number of file descriptor race conditions.
Pre-rfork code assumed inherent locking of a process's file descriptor
    array.  However, with the advent of rfork() the file descriptor table
    could be shared between processes.  This patch closes over a dozen
    serious race conditions related to one thread manipulating the table
    (e.g. closing or dup()ing a descriptor) while another is blocked in
    an open(), close(), fcntl(), read(), write(), etc...

PR: kern/11629
Discussed with: Alexander Viro <viro@math.psu.edu>
2000-11-18 21:01:04 +00:00
mckusick
42ecbdff70 In preparation for deprecating CIRCLEQ macros in favor of TAILQ
macros which provide the same functionality and are a bit more
efficient, convert use of CIRCLEQ's in NFS to TAILQ's.
2000-11-14 08:00:39 +00:00
phk
94a5006c9a Remove unneeded #include <sys/proc.h> lines. 2000-10-29 13:57:19 +00:00
dwmalone
23aacadb39 Problem to avoid processes getting stuck in "vmopar". From Ian's
mail:

	The problem seems to originate with NFS's postop_attr
	information that is returned with a read or write RPC.
	Within a vm_fault context, the code cannot deal with
	vnode_pager_setsize() shrinking a vnode.

	The workaround in the patch below stops the nfsm_postop_attr()
	macro from ever shrinking a vnode. If the new size in the
	postop_attr information is smaller, then it just sets the
	nfsnode n_attrstamp to 0 to stop the wrong size getting
	used in the future. This change only affects postop_attr
	attributes; the nfsm_loadattr() macro works as normal.

	The change is implemented by adding a new argument to
	nfs_loadattrcache() called 'dontshrink'. When this is
	non-zero, nfs_loadattrcache() will never reduce the
	vnode/nfsnode size; instead it zeros n_attrstamp.

There remain other was processes can get stuck in vmopar.

Submitted by:	Ian Dowse <iedowse@maths.tcd.ie>
Reviewed by:	dillon
Tested by:	Vadim Belman <voland@lflat.org>
2000-10-24 10:13:36 +00:00
jasone
769e0f974d Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*().  See mutex(9).  (Note: The
  alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
  preempted (i386 only).

Partially contributed by:	BSDi (BSD/OS)
Submissions by (at least):	cp, dfr, dillon, grog, jake, jhb, sheldonh
2000-09-07 01:33:02 +00:00
mckusick
acc66855bf This patch corrects the first round of panics and hangs reported
with the new snapshot code.

Update addaliasu to correctly implement the semantics of the old
checkalias function. When a device vnode first comes into existence,
check to see if an anonymous vnode for the same device was created
at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than
creating a new vnode for the device. This corrects a problem which
caused the kernel to panic when taking a snapshot of the root
filesystem.

Change the calling convention of vn_write_suspend_wait() to be the
same as vn_start_write().

Split out softdep_flushworklist() from softdep_flushfiles() so that
it can be used to clear the work queue when suspending filesystem
operations.

Access to buffers becomes recursive so that snapshots can recursively
traverse their indirect blocks using ffs_copyonwrite() when checking
for the need for copy on write when flushing one of their own indirect
blocks. This eliminates a deadlock between the syncer daemon and a
process taking a snapshot.

Ensure that softdep_process_worklist() can never block because of a
snapshot being taken. This eliminates a problem with buffer starvation.

Cleanup change in ffs_sync() which did not synchronously wait when
MNT_WAIT was specified. The result was an unclean filesystem panic
when doing forcible unmount with heavy filesystem I/O in progress.

Return a zero'ed block when reading a block that was not in use at
the time that a snapshot was taken. Normally, these blocks should
never be read. However, the readahead code will occationally read
them which can cause unexpected behavior.

Clean up the debugging code that ensures that no blocks be written
on a filesystem while it is suspended. Snapshots must explicitly
label the blocks that they are writing during the suspension so that
they do not cause a `write on suspended filesystem' panic.

Reorganize ffs_copyonwrite() to eliminate a deadlock and also to
prevent a race condition that would permit the same block to be
copied twice. This change eliminates an unexpected soft updates
inconsistency in fsck caused by the double allocation.

Use bqrelse rather than brelse for buffers that will be needed
soon again by the snapshot code. This improves snapshot performance.
2000-07-24 05:28:33 +00:00
mckusick
a3d0c189ea Add snapshots to the fast filesystem. Most of the changes support
the gating of system calls that cause modifications to the underlying
filesystem. The gating can be enabled by any filesystem that needs
to consistently suspend operations by adding the vop_stdgetwritemount
to their set of vnops. Once gating is enabled, the function
vfs_write_suspend stops all new write operations to a filesystem,
allows any filesystem modifying system calls already in progress
to complete, then sync's the filesystem to disk and returns. The
function vfs_write_resume allows the suspended write operations to
begin again. Gating is not added by default for all filesystems as
for SMP systems it adds two extra locks to such critical kernel
paths as the write system call. Thus, gating should only be added
as needed.

Details on the use and current status of snapshots in FFS can be
found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness
is not included here. Unless and until you create a snapshot file,
these changes should have no effect on your system (famous last words).
2000-07-11 22:07:57 +00:00
jake
961b97d434 Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by:		msmith and others
2000-05-26 02:09:24 +00:00
jake
d93fbc9916 Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by:	phk
Reviewed by:	phk
Approved by:	mdodd
2000-05-23 20:41:01 +00:00
phk
36c3965ff9 Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by:    peter
2000-05-05 09:59:14 +00:00
phk
10914aa708 Remove unneeded #include <vm/vm_zone.h>
Generated by:	src/tools/tools/kerninclude
2000-04-30 18:52:11 +00:00
phk
6be1308ad1 Remove ~25 unneeded #include <sys/conf.h>
Remove ~60 unneeded #include <sys/malloc.h>
2000-04-19 14:58:28 +00:00
dillon
d7295a1a39 Add a sysctl to specify the amount of UDP receive space NFS should
reserve, in maximal NFS packets.  Originally only 2 packets worth of
    space was reserved.  The default is now 4, which appears to greatly
    improve performance for slow to mid-speed machines on gigabit networks.

    Add documentation and correct some prior documentation.

Problem Researched by: Andrew Gallatin <gallatin@cs.duke.edu>
Approved by: jkh
2000-03-27 21:38:35 +00:00
phk
5df766a0f8 Rename the existing BUF_STRATEGY() to DEV_STRATEGY()
substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo)

substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo)

This patch is machine generated except for the ccd.c and buf.h parts.
2000-03-20 11:29:10 +00:00
phk
a246e10f55 Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new
field in struct buf: b_iocmd.  The b_iocmd is enforced to have
exactly one bit set.

B_WRITE was bogusly defined as zero giving rise to obvious coding
mistakes.

Also eliminate the redundant struct buf flag B_CALL, it can just
as efficiently be done by comparing b_iodone to NULL.

Should you get a panic or drop into the debugger, complaining about
"b_iocmd", don't continue.  It is likely to write on your disk
where it should have been reading.

This change is a step in the direction towards a stackable BIO capability.

A lot of this patch were machine generated (Thanks to style(9) compliance!)

Vinum users:  Greg has not had time to test this yet, be careful.
2000-03-20 10:44:49 +00:00
peter
a5441090de Clean up some loose ends in the network code, including the X.25 and ISO
#ifdefs.  Clean out unused netisr's and leftover netisr linker set gunk.
Tested on x86 and alpha, including world.

Approved by:	jkh
2000-02-13 03:32:07 +00:00
dillon
9bba72b2a3 The alpha build cuases the 'nfsuid bloated' warning to occur. Well,
there is nothing we can do about it.  In fact, after further review
    there simply are not very many instances of the two structures NFS
    checks for 'bloat' so I've decided to simply rip the checks out entirely.

Submitted by:	 Andrew Gallatin <gallatin@cs.duke.edu>
2000-01-13 20:18:25 +00:00
shin
3bdc213839 tcp updates to support IPv6.
also a small patch to sys/nfs/nfs_socket.c, as max_hdr size change.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project
2000-01-09 19:17:30 +00:00
dillon
c6689c797d Enhance reassignbuf(). When a buffer cannot be time-optimally inserted
into vnode dirtyblkhd we append it to the list instead of prepend it to
    the list in order to maintain a 'forward' locality of reference, which
    is arguably better then 'reverse'.  The original algorithm did things this
    way to but at a huge time cost.

    Enhance the append interlock for NFS writes to handle intr/soft mounts
    better.

    Fix the hysteresis for NFS async daemon I/O requests to reduce the
    number of unnecessary context switches.

    Modify handling of NFS mount options.  Any given user option that is
    too high now defaults to the kernel maximum for that option rather then
    the kernel default for that option.

Reviewed by:	 Alfred Perlstein <bright@wintelcom.net>
2000-01-05 05:11:37 +00:00
peter
d53e4c1d80 Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot).  This is consistant with the other
BSD's who made this change quite some time ago.  More commits to come.
1999-12-29 05:07:58 +00:00
alfred
4f1be4b332 make getfh a standard syscall instead of dependant on having
NFSSERVER defined, useful for userland fileservers that want to
use a filehandle type interface to the filesystem.

Submitted by: Assar Westerlund assar@stacken.kth.se
PR: kern/15452
1999-12-21 20:21:12 +00:00
green
99a0254a52 M_PREPEND-related cleanups (unregisterifying struct mbuf *s). 1999-12-19 01:55:37 +00:00
dillon
00382428de Fix compilation warning on alpha when converting pointer to integer
to generate hash index.

Reviewed by:	 Andrew Gallatin <gallatin@cs.duke.edu>
1999-12-18 19:20:05 +00:00
dillon
db701263c1 Have NFS use a snapshot of boottime instead of boottime itself to
generate the NFSv3 Version id.  boottime itself may change, sometimes
    once every tick if you are running xntpd, which really throws off
    clients.  Clients will tend to throw away what they believe to be
    stale data too often, and can get into long loops rewriting the same
    data over and over again because they believe the server has rebooted
    over and over again due to the changing version id.

Approved by:	jkh
1999-12-16 17:01:32 +00:00
eivind
87724eb673 Introduce NDFREE (and remove VOP_ABORTOP) 1999-12-15 23:02:35 +00:00
dillon
bab004e729 Add a readahead heuristic to the NFS server side code. While the server
cannot unilaterally pass data to a client it can reduce the physical
    disk transaction overhead by reading larger blocks.  This results in
    better pipelining of requests/responses over the network and an almost
    100% increase in cpu efficiency on the server.  On a 100BaseTX network
    NFS read performance increases from 8.5 MBytes/sec to 10 MB/sec (maxed
    out), and cpu efficiency increases from 72% idle to 80% idle on the server.

Reviewed by:	Alfred Perlstein <bright@wintelcom.net>
1999-12-13 17:34:45 +00:00
dillon
b01e2d1796 PR: kern/15222
Submitted by:	Ian Dowse <iedowse@maths.tcd.ie>
1999-12-13 17:07:03 +00:00
dillon
275dfe6756 Fix a timeout deadlock that can occur when the process holding the
receive lock hasn't yet managed to send its own request.

PR:		kern/15055
Submitted by:	Ian Dowse iedowse@maths.tcd.ie
1999-12-13 04:24:55 +00:00
dillon
08e8d78b50 Fix a number of server-side issues related to aborting badly formed
NFS packets, mainly initializing structure pointers to NULL which
    are conditionally freed prior to return.

PR:		kern/15249
Submitted by:	Ian Dowse <iedowse@maths.tcd.ie>
1999-12-12 07:06:39 +00:00
dillon
1da2183061 Synopsis of problem being fixed: Dan Nelson originally reported that
blocks of zeros could wind up in a file written to over NFS by a client.
    The problem only occurs a few times per several gigabytes of data.   This
    problem turned out to be bug #3 below.

    bug #1:

        B_CLUSTEROK must be cleared when an NFS buffer is reverted from
        stage 2 (ready for commit rpc) to stage 1 (ready for write).
        Reversions can occur when a dirty NFS buffer is redirtied with new
        data.

        Otherwise the VFS/BIO system may end up thinking that a stage 1
        NFS buffer is clusterable.  Stage 1 NFS buffers are not clusterable.

    bug #2:

        B_CLUSTEROK was inappropriately set for a 'short' NFS buffer (short
        buffers only occur near the EOF of the file).  Change to only set
        when the buffer is a full biosize (usually 8K).  This bug has no
        effect but should be fixed in -current anyway.  It need not be
        backported.

    bug #3:

        B_NEEDCOMMIT was inappropriately set in nfs_flush() (which is
	typically only called by the update daemon).  nfs_flush()
        does a multi-pass loop but due to the lack of vnode locking it
        is possible for new buffers to be added to the dirtyblkhd list
        while a flush operation is going on.  This may result in nfs_flush()
        setting B_NEEDCOMMIT on a buffer which has *NOT* yet gone through its
        stage 1 write, causing only the commit rpc to be made and thus
        causing the contents of the buffer to be thrown away (never sent to
        the server).

    The patch also contains some cleanup, which only applies to the commit
    into -current.

Reviewed by:	dg, julian
Originally Reported by: Dan Nelson <dnelson@emsphone.com>
1999-12-12 06:09:57 +00:00
dillon
913c1ce734 nm_srtt and nm_sdrtt are arrays[4]. Remove explicit initialization
of element [4] in both, which goes beyond the end of the array, leaving
    [0], [1], [2], and [3].  This bug did not cause any problems since
    the overrun fields are initialized after the bogus array init but
    needs to be fixed anyway.

Submitted by:	 Ian Dowse <iedowse@maths.tcd.ie>
1999-11-22 04:50:09 +00:00
eivind
4ce73d7096 Remove WILLRELE from VOP_SYMLINK
Note: Previous commit to these files (except coda_vnops and devfs_vnops)
that claimed to remove WILLRELE from VOP_RENAME actually removed it from
VOP_MKNOD.
1999-11-13 20:58:17 +00:00
eivind
21fff7b1c2 Remove WILLRELE from VOP_RENAME 1999-11-12 03:34:28 +00:00
dillon
ac372bbf0e Remove special case socket sharing code in order to allow nfsd to
bind IP addresses to udp/cltp sockets separately.

PR:		kern/13049
Reviewed by:	David Malone <dwmalone@maths.tcd.ie>, freebsd-current
1999-11-11 17:24:02 +00:00
dillon
2fb940a103 Fix nfssvc_addsock() to not attempt to free a NULL socket structure
when returning an error.  Bug fix was extracted from the PR.  The PR
    is not yet entirely resolved by this commit.

PR:		kern/13049
Reviewed by:	Matt Dillon <dillon@freebsd.org>
Submitted by:	Ian Dowse <iedowse@maths.tcd.ie>
1999-11-08 19:10:16 +00:00
dillon
47a282162e Move NFS access cache hits/misses into nfsstats structure so
/usr/bin/nfsstat can get to it easily.
1999-10-25 19:22:33 +00:00
phk
322edeeaa9 Before we start to mess with the VFS name-cache clean things up a little bit:
Isolate the namecache in its own file, and give it a dedicated malloc type.
1999-10-03 12:18:29 +00:00
marcel
a77102446a Careless use of struct proc *p caused major problems. 'p' is allowed to
be NULL in this function (nfs_sigintr). Reorder the statements and guard
them all with a single if (p != NULL).

reported, reviewed and tested by: jdp
1999-09-29 20:12:39 +00:00
dillon
af7bbb9a33 Make FreeBSD less conservative in determining when to return a cookie
error for a directory.  I have made this change after a great deal of
    review although I cannot be absolutely sure that this meets the spec.

    The issue devolves into whether changes in an underlying (UFS) directory
    can cause NFS directory blocks to be renumbered.  My read of the code
    indicates that NFS directory blocks will not be renumbered, which means
    that the cookies should still remain valid after a change is made to
    the underlying directory.  This being the case, a cookie error should
    not be returned when a change is made to the underlying directory and,
    instead, the NFS client should rely on mtime detection to invalidate and
    reload the directory.

    The use of mtime is problematic in of itself, due to insufficient
    resolution, which is why I believe the original conservative error
    handling was done.  Still, there have been dozens of bug reports by
    people needing solaris<->FreeBSD interoperability and these have to
    be accomodated.
1999-09-29 17:14:58 +00:00
marcel
d5e8d714b9 sigset_t change (part 2 of 5)
-----------------------------

The core of the signalling code has been rewritten to operate
on the new sigset_t. No methodological changes have been made.
Most references to a sigset_t object are through macros (see
signalvar.h) to create a level of abstraction and to provide
a basis for further improvements.

The NSIG constant has not been changed to reflect the maximum
number of signals possible. The reason is that it breaks
programs (especially shells) which assume that all signals
have a non-null name in sys_signame. See src/bin/sh/trap.c
for an example. Instead _SIG_MAXSIG has been introduced to
hold the maximum signal possible with the new sigset_t.

struct sigprop has been moved from signalvar.h to kern_sig.c
because a) it is only used there, and b) access must be done
though function sigprop(). The latter because the table doesn't
holds properties for all signals, but only for the first NSIG
signals.

signal.h has been reorganized to make reading easier and to
add the new and/or modified structures. The "old" structures
are moved to signalvar.h to prevent namespace polution.

Especially the coda filesystem suffers from the change, because
it contained lines like (p->p_sigmask == SIGIO), which is easy
to do for integral types, but not for compound types.

NOTE: kdump (and port linux_kdump) must be recompiled.

Thanks to Garrett Wollman and Daniel Eischen for pressing the
importance of changing sigreturn as well.
1999-09-29 15:03:48 +00:00
dillon
581716d4df Asynchronized client-side nfs_commit. NFS commit operations were
previously issued synchronously even if async daemons (nfsiod's) were
    available.  The commit has been moved from the strategy code to the doio
    code in order to asynchronize it.

    Removed use of lastr in preparation for removal of vnode->v_lastr.  It
    has been replaced with seqcount, which is already supported by the system
    and, in fact, gives us a better heuristic for sequential detection then
    lastr ever did.

    Made major performance improvements to the server side commit.  The
    server previously fsync'd the entire file for each commit rpc.  The
    server now bawrite()s only those buffers related to the offset/size
    specified in the commit rpc.

    Note that we do not commit the meta-data yet.  This works still needs
    to be done.

    Note that a further optimization can be done (and has not yet been done)
    on the client: we can merge multiple potential commit rpc's into a
    single rpc with a greater file offset/size range and greatly reduce
    rpc traffic.

Reviewed by:	Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>
1999-09-17 05:57:57 +00:00
alfred
b9136a6115 Seperate the export check in VFS_FHTOVP, exports are now checked via
VFS_CHECKEXP.

Add fh(open|stat|stafs) syscalls to allow userland to query filesystems
based on (network) filehandle.

Obtained from:	NetBSD
1999-09-11 00:46:08 +00:00
phk
d311a0563b remove unused variables. 1999-08-28 19:21:03 +00:00
peter
3b842d34e8 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
phk
591c94d4c6 Simplify the handling of VCHR and VBLK vnodes using the new dev_t:
Make the alias list a SLIST.

        Drop the "fast recycling" optimization of vnodes (including
        the returning of a prexisting but stale vnode from checkalias).
        It doesn't buy us anything now that we don't hardlimit
        vnodes anymore.

        Rename checkalias2() and checkalias() to addalias() and
        addaliasu() - which takes dev_t and udev_t arg respectively.

        Make the revoke syscalls use vcount() instead of VALIASED.

        Remove VALIASED flag, we don't need it now and it is faster
        to traverse the much shorter lists than to maintain the
        flag.

        vfs_mountedon() can check the dev_t directly, all the vnodes
        point to the same one.

Print the devicename in specfs/vprint().

Remove a couple of stale LFS vnode flags.

Remove unimplemented/unused LK_DRAINED;
1999-08-26 14:53:31 +00:00
peter
d4c0c0bd4a Convert all the nfs macros to do { blah } while (0) to ensure it
works correctly in if/else etc.  egcs had probably picked up most of the
problems here before with "ambiguous braces" etc, but this should
increase the robustness a bit.  Based on an idea from Eivind Eklund.
1999-08-19 14:50:12 +00:00
phk
e938d317d5 Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>,
a few lines into <sys/vnode.h>.

Add a few fields to struct specinfo, paving the way for the fun part.
1999-08-08 18:43:05 +00:00
peter
fce0d9b321 Don't over-allocate and over-copy shorter NFSv2 filehandles and then
correct the pointers afterwards.

It's kinda bogus that we generate a 24 (?) byte filehandle (2 x int32
fsid and 16 byte VFS fhandle) and pad it out to 64 bytes for NFSv3 with
garbage.  The whole point of NFSv3's variable filehandle length was
to allow for shorter handles, both in memory and over the wire.  I plan
on taking a shot at fixing this shortly.
1999-08-04 14:41:39 +00:00
wpaul
a0ef521585 Correct the sanity test length calculation in nfsrv_readdirplus(): len is
being incremented by 4 bytes too few each time through the loop, which
allows more data into the mbuf chain that we really want. In the worst
case, when we're using 32K read/write sizes with a TCP client, this causes
readdirplus replies to sometimes exceed NFS_MAXPACKET which leads to a
panic. This problem cropped up for me using an IRIX 6.5.4 NFSv3 TCP client
with 32K read/write sizes, however supposedly it can be triggered by
WinNT NFS servers too. In theory, it can probably be triggered by any
NFS v3 implementation using TCP as long as it's using the maxiumum block
size.

Reviewed by: Matthew Dillon <dillon@backplane.com>
1999-07-29 21:42:57 +00:00
alc
1f3845a859 Clear error in nfsrv_create when we have a valid reply so that
that reply is actually transmitted.
Submitted by:	dillon
1999-07-28 08:20:49 +00:00
phk
6c373ff516 I have not one single time remembered the name of this function correctly
so obviously I gave it the wrong name.  s/umakedev/makeudev/g
1999-07-17 18:43:50 +00:00
julian
b242252948 Submitted by: "David E. Cross" <crossd@cs.rpi.edu>
Matt missed a line..
1999-06-30 04:29:13 +00:00
peter
6d9ab211eb Minor tweaks to make sure (new) prerequisites for <sys/buf.h> (mostly
splbio()/splx()) are #included in time.
1999-06-27 11:44:22 +00:00
mckusick
5b58f2f951 Convert buffer locking from using the B_BUSY and B_WANTED flags to using
lockmgr locks. This commit should be functionally equivalent to the old
semantics. That is, all buffer locking is done with LK_EXCLUSIVE
requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will
be done in future commits.
1999-06-26 02:47:16 +00:00
julian
7dc5713bf7 Matt's NFS fixes.
Submitted by: Matt Dillon
Reviewed by: David Cross, Julian Elischer, Mike Smith, Drew Gallatin
  3.2 version to follow when tested
1999-06-23 04:44:14 +00:00
peter
21732dea0c Various changes lifted from the OpenBSD cvs tree:
txdr_hyper and fxdr_hyper tweaks to avoid excessive CPU order knowledge.

nfs_serv.c: don't call nfsm_adj() with negative values, windows clients
could crash servers when doing a readdir of a large directory.

nfs_socket.c: Use IP_PORTRANGE to get a priviliged port without a spin
loop trying to bind().  Don't clobber a mbuf pointer or we get panics
on a NFS3ERR_JUKEBOX error from a server when reusing a freed mbuf.

nfs_subs.c: Don't loose st_blocks on NFSv2 mounts when > 2GB.

Obtained from:  OpenBSD
1999-06-05 05:35:03 +00:00
phk
7e26ca1d1a Divorce "dev_t" from the "major|minor" bitmap, which is now called
udev_t in the kernel but still called dev_t in userland.

Provide functions to manipulate both types:
        major()         umajor()
        minor()         uminor()
        makedev()       umakedev()
        dev2udev()      udev2dev()

For now they're functions, they will become in-line functions
after one of the next two steps in this process.

Return major/minor/makedev to macro-hood for userland.

Register a name in cdevsw[] for the "filedescriptor" driver.

In the kernel the udev_t appears in places where we have the
major/minor number combination, (ie: a potential device: we
may not have the driver nor the device), like in inodes, vattr,
cdevsw registration and so on, whereas the dev_t appears where
we carry around a reference to a actual device.

In the future the cdevsw and the aliased-from vnode will be hung
directly from the dev_t, along with up to two softc pointers for
the device driver and a few houskeeping bits.  This will essentially
replace the current "alias" check code (same buck, bigger bang).

A little stunt has been provided to try to catch places where the
wrong type is being used (dev_t vs udev_t), if you see something
not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if
it makes a difference.  If it does, please try to track it down
(many hands make light work) or at least try to reproduce it
as simply as possible, and describe how to do that.

Without DEVT_FASCIST I belive this patch is a no-op.

Stylistic/posixoid comments about the userland view of the <sys/*.h>
files welcome now, from userland they now contain the end result.

Next planned step: make all dev_t's refer to the same devsw[] which
means convert BLK's to CHR's at the perimeter of the vnodes and
other places where they enter the game (bootdev, mknod, sysctl).
1999-05-11 19:55:07 +00:00
phk
f57a01ebfc remove b_proc from struct buf, it's (now) unused.
Reviewed by:	dillon, bde
1999-05-06 20:00:34 +00:00
peter
73556bfee1 Add sufficient braces to keep egcs happy about potentially ambiguous
if/else nesting.
1999-05-06 18:13:11 +00:00
alc
5cb08a2652 The VFS/BIO subsystem contained a number of hacks in order to optimize
piecemeal, middle-of-file writes for NFS.  These hacks have caused no
end of trouble, especially when combined with mmap().  I've removed
them.  Instead, NFS will issue a read-before-write to fully
instantiate the struct buf containing the write.  NFS does, however,
optimize piecemeal appends to files.  For most common file operations,
you will not notice the difference.  The sole remaining fragment in
the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache
coherency issues with read-merge-write style operations.  NFS also
optimizes the write-covers-entire-buffer case by avoiding the
read-before-write.  There is quite a bit of room for further
optimization in these areas.

The VM system marks pages fully-valid (AKA vm_page_t->valid =
VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault.  This
is not correct operation.  The vm_pager_get_pages() code is now
responsible for marking VM pages all-valid.  A number of VM helper
routines have been added to aid in zeroing-out the invalid portions of
a VM page prior to the page being marked all-valid.  This operation is
necessary to properly support mmap().  The zeroing occurs most often
when dealing with file-EOF situations.  Several bugs have been fixed
in the NFS subsystem, including bits handling file and directory EOF
situations and buf->b_flags consistancy issues relating to clearing
B_ERROR & B_INVAL, and handling B_DONE.

getblk() and allocbuf() have been rewritten.  B_CACHE operation is now
formally defined in comments and more straightforward in
implementation.  B_CACHE for VMIO buffers is based on the validity of
the backing store.  B_CACHE for non-VMIO buffers is based simply on
whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear,
and vise-versa).  biodone() is now responsible for setting B_CACHE
when a successful read completes.  B_CACHE is also set when a bdwrite()
is initiated and when a bwrite() is initiated.  VFS VOP_BWRITE
routines (there are only two - nfs_bwrite() and bwrite()) are now
expected to set B_CACHE.  This means that bowrite() and bawrite() also
set B_CACHE indirectly.

There are a number of places in the code which were previously using
buf->b_bufsize (which is DEV_BSIZE aligned) when they should have
been using buf->b_bcount.  These have been fixed.  getblk() now clears
B_DONE on return because the rest of the system is so bad about
dealing with B_DONE.

Major fixes to NFS/TCP have been made.  A server-side bug could cause
requests to be lost by the server due to nfs_realign() overwriting
other rpc's in the same TCP mbuf chain.  The server's kernel must be
recompiled to get the benefit of the fixes.

Submitted by:	Matthew Dillon <dillon@apollo.backplane.com>
1999-05-02 23:57:16 +00:00
phk
ca21a25f17 This Implements the mumbled about "Jail" feature.
This is a seriously beefed up chroot kind of thing.  The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.

For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact:  "real virtual servers".

Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.

Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.

It generally does what one would expect, but setting up a jail
still takes a little knowledge.

A few notes:

   I have no scripts for setting up a jail, don't ask me for them.

   The IP number should be an alias on one of the interfaces.

   mount a /proc in each jail, it will make ps more useable.

   /proc/<pid>/status tells the hostname of the prison for
   jailed processes.

   Quotas are only sensible if you have a mountpoint per prison.

   There are no privisions for stopping resource-hogging.

   Some "#ifdef INET" and similar may be missing (send patches!)

If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!

Tools, comments, patches & documentation most welcome.

Have fun...

Sponsored by:   http://www.rndassociates.com/
Run for almost a year by:       http://www.servetheweb.com/
1999-04-28 11:38:52 +00:00
phk
16e3fbd2c1 Suser() simplification:
1:
  s/suser/suser_xxx/

2:
  Add new function: suser(struct proc *), prototyped in <sys/proc.h>.

3:
  s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/

The remaining suser_xxx() calls will be scrutinized and dealt with
later.

There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.

More changes to the suser() API will come along with the "jail" code.
1999-04-27 11:18:52 +00:00
dt
9b8660ce53 Fixed printf format errors on alpha. 1999-04-24 11:29:48 +00:00
peter
2cb5b6d7d5 Untangle the nfs send and receive queue locking a little. One lock
routine was [ab]used for two different things, and you couldn't tell from
the wait channel which one had wedged.
Catch a few things missing from NFS_NOSERVER.
1999-02-25 00:03:51 +00:00
dfr
0f4e134c5f Move the declaration of the vfs.nfs sysctl node outside an ifdef so that
it builds if NFS_NOSERVER is defined.

Spotted by: Bruce Evans <bde@zeta.org.au>
1999-02-18 09:19:41 +00:00
bde
9cf8505943 Fixed bitrot in NFS_ACDEBUG option. 1999-02-17 13:59:29 +00:00
dfr
22ceb237f0 * Change sysctl from using linker_set to construct its tree using SLISTs.
This makes it possible to change the sysctl tree at runtime.

* Change KLD to find and register any sysctl nodes contained in the loaded
  file and to unregister them when the file is unloaded.

Reviewed by: Archie Cobbs <archie@whistle.com>,
	Peter Wemm <peter@netplex.com.au> (well they looked at it anyway)
1999-02-16 10:49:55 +00:00
dillon
dbf5cd2b57 Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile
1999-01-27 22:42:27 +00:00
dillon
df24433bbe This is a rather large commit that encompasses the new swapper,
changes to the VM system to support the new swapper, VM bug
    fixes, several VM optimizations, and some additional revamping of the
    VM code.  The specific bug fixes will be documented with additional
    forced commits.  This commit is somewhat rough in regards to code
    cleanup issues.

Reviewed by:	"John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
1999-01-21 08:29:12 +00:00
eivind
ffaaca5874 Remove the 'waslocked' parameter to vfs_object_create(). 1999-01-05 18:50:03 +00:00
hoek
5e720f3594 Silence -Wtrigraph.
Submitted by:	Bradley Dunn <bradley@dunn.org>  (pr: kern/8817)
1998-12-30 00:37:44 +00:00
dfr
1cee4d444d Fix for creating files on a Solaris 7 server with NFSv3 (the request was
slightly garbled but older servers seemed to understand it).

Reviewed by: David O'Brien <obrien@nuxi.ucdavis.edu>
1998-12-25 10:34:27 +00:00
dt
7212f6ac0c Added 3 new errno values, requred by various standards: EOVERFLOW,
ECANCELED, EILSEQ.

Fixed ibcs2 and especially linux EIDRM and ENOMSG errno mapping.
Reviewed by:	Dan Nelson <dnelson@emsphone.com>
1998-12-14 18:54:04 +00:00
eivind
56b8c7c844 Remove the if fixed in the last commit; bde quite correctly point out
that it can never fail.
1998-12-09 15:12:53 +00:00
eivind
d74ab3f9f8 Fix typo (; in "if (vp == NULL);"). 1998-12-08 23:11:24 +00:00
archie
60d13c7a9d The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static
and local variables, goto labels, and functions declared but not defined.
1998-12-07 21:58:50 +00:00
dfr
f8c57dfec2 Fix a panic in nfsrv_dorec() where a NULL pointer could be passed to
free() sometimes.

Reviewed by: Eric Haug <ejh@eas.slu.edu>
1998-11-13 09:44:12 +00:00
peter
4dde2ece8b Remove [apparently] bogus casts to u_long for the vnode_pager_setsize()
second argument.  np_size is a 64 bit int, so is the second arg.  This
might have caused needless 2G/4G file size problems.

I believe it was Bruce who queried this.
1998-11-09 07:00:14 +00:00
peter
052ed056ae vm_object_page_clean() last arg changed from TRUE to OBJPC_SYNC. I'm not
sure that this is necessary to be a sync write here since a VOP_FSYNC()
follows and it will schedule, sort and complete the writes that the
vm_object_page_clean() started (as I think I understand things).
1998-10-31 15:39:31 +00:00
peter
8ef35acf90 Use TAILQ macros for clean/dirty block list processing. Set b_xflags
rather than abusing the list next pointer with a magic number.
1998-10-31 15:31:29 +00:00
mckusick
a037faba69 The code checks each fragment mark to see if it's valid; if the fragment
is less than NFS_MINPACKET or greater than NFS_MAXPACKET in size, it
barfs and, I think, drops the connection.

However, there's no guarantee that in a multi-fragment RPC, all the
fragments will be at least as large as NFS_MINPACKET.

In fact, with the version of "tclnfs" we have here, which supports NFS
over TCP, at least when built under SunOS 4.1.3 (i.e., with 4.1.3's
user-mode ONC RPC library), I can *repeatably* cause "tclnfs" to send a
request with more than one fragment, one of which is only 8 bytes long.
I just do a 3877-byte write to a file, at an offset of 0.

The check that "slp->ns_reclen" is greater than or equal to
NFS_MINPACKET serves no useful purpose - if the NFS server code can't
handle packets < NFS_MINPACKET bytes, it can't handle them over *any*
protocol, so the check has to be done above the RPC-over-TCP layer - and
should be removed.
Obtained from: Fix from Guy Harris, forwarded by Rick Macklem.
1998-09-29 22:33:05 +00:00
bde
3adc4cd6e2 Made unloading of the nfs LKM sort of work. This is mainly to test
detachment of vfs sysctls.  Unloading of vfs LKMs doesn't actually
work for any vfs, since it leaves garbage pointers to memory
allocation control structures.
1998-09-07 05:42:15 +00:00
bde
a84a2dedfc Instantiate `nfs_mount_type' in a standard file so that it is present
when nfs is an LKM.  Declare it in a header file.  Don't forget to use
it in non-Lite2 code.  Initialize it to -1 instead of to 0, since 0
will soon be the mount type number for the first vfs loaded.

NetBSD uses strcmp() to avoid this ugly global.
1998-09-05 15:17:34 +00:00
luoqi
6579ddcc26 Check for NULL pointer before freeing a struct sockaddr. m_freem() can handle
NULL, buf free() can't.
1998-09-01 02:31:52 +00:00
wollman
a76fb5eefa Yow! Completely change the way socket options are handled, eliminating
another specialized mbuf type in the process.  Also clean up some
of the cruft surrounding IPFW, multicast routing, RSVP, and other
ill-explored corners.
1998-08-23 03:07:17 +00:00
peter
6adfc2e8bc If we get an ENOBUFS from the network, it's normally transient network
interface congestion (eg: nfs over a ppp link, etc).  Don't log these
for UDP mounts, and don't cause syscalls to fail with EINTR.
This stops the 'nfs send error 55' warnings.

If the error is because the system is really hosed, this is the least
of your problems...
1998-08-01 09:04:02 +00:00
bde
863d5c8b68 Cast pointers to uintptr_t/intptr_t instead of to u_long/long,
respectively.  Most of the longs should probably have been
u_longs, but this changes is just to prevent warnings about
casts between pointers and integers of different sizes, not
to fix poorly chosen types.
1998-07-15 02:32:35 +00:00
kato
b262a6b558 Moved `#ifndef NFS_NOSERVER' after including nfs.h. 1998-07-02 12:41:42 +00:00
jmg
c8ef0cb9cd fix buildworld hopefully be3fore anyone complains...
NFS_*TIMO should possibly be converted to sysctl vars (jkh's suggestion),
but in some cases it looks like nfs keeps a copy of the value in a struct

hash sizes are already ifdef'd KERNEL, so there aren't userland inpact
from them...
1998-06-30 11:19:22 +00:00
jmg
0e50288276 convert some nfs tunables to options, these are:
NFS_MINATTRTIMO         VREG attrib cache timeout in sec
NFS_MAXATTRTIMO
NFS_MINDIRATTRTIMO      VDIR attrib cache timeout in sec
NFS_MAXDIRATTRTIMO
NFS_GATHERDELAY         Default write gather delay (msec)
NFS_UIDHASHSIZ          Tune the size of nfssvc_sock with this
NFS_WDELAYHASHSIZ       and with this
NFS_MUIDHASHSIZ         Tune the size of nfsmount with this
NFS_NOSERVER            (already documented in LINT)
NFS_DEBUG               turn on NFS debugging

also, because NFS_ROOT is used by very different files, it has been
renamed to opt_nfsroot.h instead of the old opt_nfs.h....
1998-06-30 03:01:37 +00:00
bde
193dd07396 Fixed typo in ifdefed code. (NFS_ACDEBUG is not in LINT. Therefore,
code controlled by it did not even compile.)
1998-06-21 12:50:12 +00:00
bde
a90040b583 Avoid an egcs pessimization for 64-bit signed division on i386's.
Pre-2.8 versions of gcc generate a call to __divdi3() for all 64-bit
signed divisions, but egcs optimizes them to a shift and fixup when
the divisor is a constant power of 2.  Unfortunately, it generates
a call to __cmpdi2() for the fixup, although all except possibly
ancient versions of gcc and egcs do ordinary 64-bit comparisons
inline.
1998-06-14 15:52:00 +00:00
dfr
1d5f38ac22 This commit fixes various 64bit portability problems required for
FreeBSD/alpha.  The most significant item is to change the command
argument to ioctl functions from int to u_long.  This change brings us
inline with various other BSD versions.  Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.
1998-06-07 17:13:14 +00:00
peter
19ad2aa63b For the on-the-wire protocol, u_long -> u_int32_t; long -> int32_t;
int -> int32_t; u_short -> u_int16_t.  Also, use mode_t instead of u_short
for storing modes (mode_t is a u_int16_t).

Obtained from: NetBSD
1998-05-31 20:09:01 +00:00
peter
401c250cc4 Support 'mount -u' remounts. This may require disconnecting and rebinding
the socket.  Certain mode changes are not allowed.

Obtained from:  NetBSD
1998-05-31 19:49:31 +00:00
peter
5080277e0e Cut-n-paste glitch 1998-05-31 19:43:34 +00:00
peter
21746bb862 Prototype support for selectively allowing non-reserved ports on a per
export basis.  Needs userland support yet.

Obtained from:  NetBSD
1998-05-31 19:16:08 +00:00
peter
7966818099 Hide whiteouts from NFS, since the protocol doesn't support them.
Obtained from:  NetBSD
1998-05-31 19:10:52 +00:00
peter
49e79dfe9e NetBSD has a comment that Solaris 2.5 doesn't do verifiers correctly,
we have weakened this test already for Digital Unix, so it may be enough
for Solaris.  It needs to be checked again.

Obtained from: NetBSD
1998-05-31 19:07:47 +00:00
peter
a21fad22e0 Don't pass a second copy of the uid/gid in with the v2/v3 sattr structures,
it just makes more work.  We pass a copy of the uid/gid with the
credentials.  (although, this may need to be revisited if a non AUTHUNIX
authentication method (such as NFSKERB) ever gets implemented).

Obtained from:  NetBSD
1998-05-31 19:00:19 +00:00
peter
985cae8566 Use the new SB_UPCALL flag,
Obtained from:  NetBSD (but I changed the flag clear order in case).
1998-05-31 18:46:06 +00:00
peter
c50a18d361 Don't try and free mrep twice on some error conditions.
Obtained from:  NetBSD
1998-05-31 18:19:43 +00:00
peter
66a3e6b96c #ifdef a diagnostic panic, plus another missed costmetic change.
Obtained from:  NetBSD
1998-05-31 18:11:03 +00:00
peter
32e00d316d We have gained 2 more errno's, add them to the NFSv2 mapping table. 1998-05-31 18:09:18 +00:00
peter
87e3e1a54b Missed a cosmetic change that the other BSD's have. 1998-05-31 18:08:09 +00:00
peter
e410a1b026 oops, nfs_msg() is called from client code too. 1998-05-31 18:06:07 +00:00
peter
35835ef239 When we can't reconnect a socket, don't forget to unlock before retrying
or we can deadlock.

Obtained from:  NetBSD
1998-05-31 18:02:56 +00:00
peter
7f449d8699 Don't log zero length reads, this can happen during normal operation.
Obtained from: NetBSD
1998-05-31 18:00:46 +00:00
peter
7246bc5193 Consider for readdir chunk sizes when tuning socket buffer reservations.
Obtained from:  NetBSD
1998-05-31 17:57:43 +00:00
peter
2b239be950 Refuse READDIR / READDIRPLUS rpc's for non-directories
Obtained from: NetBSD
1998-05-31 17:54:18 +00:00
peter
cbeeaf83f2 Some const's
Obtained from: NetBSD
1998-05-31 17:48:07 +00:00
peter
e58631da3c NFS Jumbo commit part 1. Cosmetic and structural changes only. The aim
of this part of commits is to minimize unnecessary differences between
the other NFS's of similar origin.  Yes, there are gratuitous changes here
that the style folks won't like, but it makes the catch-up less difficult.
1998-05-31 17:27:58 +00:00
peter
aa33f20993 When using NFSv3, use the remote server's idea of the maximum file size
rather than assuming 2^64.  It may not like files that big. :-)
On the nfs server, calculate and report the max file size as the point
that the block numbers in the cache would turn negative.
(ie: 1099511627775 bytes (1TB)).

One of the things I'm worried about however, is that directory offsets
are really cookies on a NFSv3 server and can be rather large, especially
when/if the server generates the opaque directory cookies by using a local
filesystem offset in what comes out as the upper 32 bits of the 64 bit
cookie.  (a server is free to do this, it could save byte swapping
depending on the native 64 bit byte order)

Obtained from:	NetBSD
1998-05-30 16:33:58 +00:00
peter
6d06da8101 Convert a couple of large allocations to use zones rather than malloc
for better packing.  This means that we can choose better values for the
various hash entries without having to try and get it all to fit within
an artificial power of two limit for malloc's sake.
1998-05-24 14:41:56 +00:00
peter
ef0bb32854 Only ignore "owner" permissions selectively rather than always. In some
cases we ignore it (eg: read/write) to maintain chmod-after-open semantics
but in other cases we do care, eg: creating files, access() etc.  Never
ignore errors from VOP_ACCESS() on immutable files.

This apparently comes from BSDI (from Keith Bostic) via NetBSD.

PR:		5148
Submitted by:	Yoshiro MIHIRA <sanpei@yy.cs.keio.ac.jp>
1998-05-20 09:05:48 +00:00
peter
1777a04b11 Allow control of the attribute cache timeouts at mount time.
We had run out of bits in the nfs mount flags, I have moved the internal
state flags into a seperate variable.  These are no longer visible via
statfs(), but I don't know of anything that looks at them.
1998-05-19 07:11:27 +00:00
bde
76e69cb4f4 Get timespecs directly instead of via timevals. 1998-05-16 15:11:24 +00:00
msmith
964ce778b1 In the words of the submitter:
---------
Make callers of namei() responsible for releasing references or locks
instead of having the underlying filesystems do it.  This eliminates
redundancy in all terminal filesystems and makes it possible for stacked
transport layers such as umapfs or nullfs to operate correctly.

Quality testing was done with testvn, and lat_fs from the lmbench suite.

Some NFS client testing courtesy of Patrik Kudo.

vop_mknod and vop_symlink still release the returned vpp.  vop_rename
still releases 4 vnode arguments before it returns.  These remaining cases
will be corrected in the next set of patches.
---------

Submitted by:	Michael Hancock <michaelh@cet.co.jp>
1998-05-07 04:58:58 +00:00
phk
fe94bc8288 Use random() to find our initial xid. 1998-04-06 11:41:07 +00:00
phk
9b703b1455 Eradicate the variable "time" from the kernel, using various measures.
"time" wasn't a atomic variable, so splfoo() protection were needed
around any access to it, unless you just wanted the seconds part.

Most uses of time.tv_sec now uses the new variable time_second instead.

gettime() changed to getmicrotime(0.

Remove a couple of unneeded splfoo() protections, the new getmicrotime()
is atomic, (until Bruce sets a breakpoint in it).

A couple of places needed random data, so use read_random() instead
of mucking about with time which isn't random.

Add a new nfs_curusec() function.

Mark a couple of bogosities involving the now disappeard time variable.

Update ffs_update() to avoid the weird "== &time" checks, by fixing the
one remaining call that passwd &time as args.

Change profiling in ncr.c to use ticks instead of time.  Resolution is
the same.

Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call
hzto() which subtracts time" sequences.

Reviewed by:	bde
1998-03-30 09:56:58 +00:00
eivind
d7a6ab2803 Staticize. 1998-02-09 06:11:36 +00:00
eivind
4547a09753 Back out DIAGNOSTIC changes. 1998-02-06 12:14:30 +00:00
eivind
c552a9a1c3 Turn DIAGNOSTIC into a new-style option. 1998-02-04 22:34:03 +00:00
bde
ffbb93a37a Added #include of <sys/queue.h> so that this file is more "self"-sufficent. 1998-02-03 22:19:35 +00:00
bde
742edae5eb Forward declare some structs so that this file is more self-sufficient. 1998-02-03 21:52:02 +00:00
bde
a1f8745634 Moved declaration of `union nethostadr' outside of the KERNEL section,
to give pollution compatible with <nfs/nqfs.h>.  At least mount_nfs.c
previously had to #define KERNEL before including <nfs/nfs.h> to get
this pollution, but this gave other pollution.

Moved comment about NFSINT_SIGMASK to immediately before the code that
it applies to.
1998-02-01 21:23:29 +00:00
dyson
2aacd1ab4f Change the busy page mgmt, so that when pages are freed, they
MUST be PG_BUSY.  It is bogus to free a page that isn't busy,
because it is in a state of being "unavailable" when being
freed.  The additional advantage is that the page_remove code
has a better cross-check that the page should be busy and
unavailable for other use.  There were some minor problems
with the collapse code, and this plugs those subtile "holes."

Also, the vfs_bio code wasn't checking correctly for PG_BUSY
pages.  I am going to develop a more consistant scheme for
grabbing pages, busy or otherwise.  For now, we are stuck
with the current morass.
1998-01-31 11:56:53 +00:00
dyson
cd67bb82fe Lots of improvements, including restructring the caching and management
of vnodes and objects.  There are some metadata performance improvements
that come along with this.  There are also a few prototypes added when
the need is noticed.  Changes include:

1) Cleaning up vref, vget.
2) Removal of the object cache.
3) Nuke vnode_pager_uncache and friends, because they aren't needed anymore.
4) Correct some missing LK_RETRY's in vn_lock.
5) Correct the page range in the code for msync.

Be gentle, and please give me feedback asap.
1997-12-29 00:25:11 +00:00
bde
3c1b6940fc Unspammed nested include of <vm/vm_zone.h>. 1997-12-27 02:56:39 +00:00
bde
52dbbb4e05 Added a used include.
Fixed a gratuitous ANSIism and nearby KNF violations.
1997-12-20 00:25:01 +00:00
bde
a323d6696a Don't call malloc(..., M_WAITOK) at splnet(). Doing so is often
a mistake (since softnet interrupts may occur if malloc() waits),
and doing it harmlessly but unnecessarily here interfered with
detection of the mistaken cases.
1997-11-24 14:18:00 +00:00
phk
4d26888936 Remove a bunch of variables which were unused both in GENERIC and LINT.
Found by:	-Wunused
1997-11-07 08:53:44 +00:00
phk
4c8218a5c7 Move the "retval" (3rd) parameter from all syscall functions and put
it in struct proc instead.

This fixes a boatload of compiler warning, and removes a lot of cruft
from the sources.

I have not removed the /*ARGSUSED*/, they will require some looking at.

libkvm, ps and other userland struct proc frobbing programs will need
recompiled.
1997-11-06 19:29:57 +00:00
bde
fb826377ff Removed unused #includes. 1997-10-28 15:59:26 +00:00
bde
974f2bea15 Don't #include <nfs/nfs.h> in <nfs/nfs_node.h> if KERNEL is defined.
Fixed everything that depended on the nested include.
1997-10-28 14:06:25 +00:00
phk
190f3b183f Always initialize the syscall vectors for our "private" syscalls (not
just in the LKM case).
Plug nqnfs_vop_lease_check directly into the default_vnodeop_p table.
1997-10-26 20:13:52 +00:00
phk
36e7a51ea1 Last major round (Unless Bruce thinks of somthing :-) of malloc changes.
Distribute all but the most fundamental malloc types.  This time I also
remembered the trick to making things static:  Put "static" in front of
them.

A couple of finer points by:	bde
1997-10-12 20:26:33 +00:00
phk
645e7b2ab6 Distribute and statizice a lot of the malloc M_* types.
Substantial input from:	bde
1997-10-11 18:31:40 +00:00
dyson
e64b1984f9 Change the M_NAMEI allocations to use the zone allocator. This change
plus the previous changes to use the zone allocator decrease the useage
of malloc by half.  The Zone allocator will be upgradeable to be able
to use per CPU-pools, and has more intelligent usage of SPLs.  Additionally,
it has reasonable stats gathering capabilities, while making most calls
inline.
1997-09-21 04:24:27 +00:00
phk
7ef2e93b99 Remove a couple of stubborn NetBSD #if's. 1997-09-10 20:22:32 +00:00
phk
4d0f72e0e2 unifdef -U__NetBSD__ -D__FreeBSD__ 1997-09-10 19:52:27 +00:00
bde
a6e315b69d Added used #include - don't depend on <sys/mbuf.h> including
<sys/malloc.h> (unless we only use the bogusly shared M*WAIT flags).
1997-09-02 01:19:47 +00:00
wollman
4542c1cf5d Fix all areas of the system (or at least all those in LINT) to avoid storing
socket addresses in mbufs.  (Socket buffers are the one exception.)  A number
of kernel APIs needed to get fixed in order to make this happen.  Also,
fix three protocol families which kept PCBs in mbufs to not malloc them
instead.  Delete some old compatibility cruft while we're at it, and add
some new routines in the in_cksum family.
1997-08-16 19:16:27 +00:00
bde
9195bd1ec7 Removed unused #includes. 1997-08-02 14:33:27 +00:00
dfr
331e950c31 Correct some dumb mistakes in the WebNFS stuff.
Submitted by:	bde
1997-07-22 15:35:57 +00:00
dfr
29f6f6602d Allow NULL cookie verifiers for non-NULL offsets. This is needed for
Digital Unix boxes since they appear to always send null verifiers.
1997-07-22 15:35:15 +00:00
dfr
b627718fbd Merge WebNFS changes from NetBSD.
Obtained from:	NetBSD
1997-07-16 09:06:30 +00:00
tegge
fdf5be50ae Clear nfs_iodwant[myiod] when the nfsiod process exits due to a signal. 1997-06-25 21:07:26 +00:00
bde
672c312961 Don't require superuser privileges for creating fifos. The v2 case was
broken when support for v3 was introduced in rev.1.16.  The v3 case has
always been broken in FreeBSD.

Should be in 2.2.

PR:		3838
1997-06-14 11:19:35 +00:00
dfr
99eae7b7b1 Various fixes from NetBSD:
Use u_int for rpc procedure numbers.
	Some fixes to NQNFS.
	A rare NULL pointer dereference.
	Ignore NFSMNT_NOCONN for TCP mounts.

Obtained from:	NetBSD
1997-06-03 17:22:47 +00:00
dfr
dc78066f3d Implement the async mount option for NFSv3. This makes NFS pretend that all
writes sent to the server were synchronous and therefore no commits are
needed.  This is the same as the vfs.nfs.async variable on the server but
allows each client to choose whether to work this way.

Also make the vfs.nfs.async variable do the 'right' thing for NFSv3, i.e.
pretend that the write was synchronous.
1997-06-03 13:56:55 +00:00
dfr
d7e320b30e Fix a few bugs with NFS and mmap caused by NFS' use of b_validoff
and b_validend.  The changes to vfs_bio.c are a bit ugly but hopefully
can be tidied up later by a slight redesign.

PR:		kern/2573, kern/2754, kern/3046 (possibly)
Reviewed by:	dyson
1997-05-19 14:36:56 +00:00
dfr
0ba9709268 Don't keep addresses in mbuf chains. This should simplify the next round
of network changes from Garret.

Reviewed by:	Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
1997-05-13 17:25:44 +00:00
dfr
824696dc98 Implement a separate control for write gathering on NFSv3. This is turned
off for NFSv3 by default since write gathering seems to reduce performance
for NFSv3 by up to 60%.

Add sysctl knobs to control both variables.
1997-05-10 16:59:36 +00:00
dfr
ff5630dff6 Fix a nasty hang connected with write gathering. Also add debug print
statements to bits of the server which helped me find the hang.
1997-05-10 16:12:03 +00:00
dfr
840e81b7a5 Allow NULL rpcs on non-privileged ports at all times to work around broken
clients.

PR:		kern/3298
Submitted by:	Tor Egge <Tor.Egge@idi.ntnu.no>
1997-04-30 09:51:37 +00:00
wollman
6afbf203bd The long-awaited mega-massive-network-code- cleanup. Part I.
This commit includes the following changes:
1) Old-style (pr_usrreq()) protocols are no longer supported, the compatibility
glue for them is deleted, and the kernel will panic on boot if any are compiled
in.

2) Certain protocol entry points are modified to take a process structure,
so they they can easily tell whether or not it is possible to sleep, and
also to access credentials.

3) SS_PRIV is no more, and with it goes the SO_PRIVSTATE setsockopt()
call.  Protocols should use the process pointer they are now passed.

4) The PF_LOCAL and PF_ROUTE families have been updated to use the new
style, as has the `raw' skeleton family.

5) PF_LOCAL sockets now obey the process's umask when creating a socket
in the filesystem.

As a result, LINT is now broken.  I'm hoping that some enterprising hacker
with a bit more time will either make the broken bits work (should be
easy for netipx) or dike them out.
1997-04-27 20:01:29 +00:00
dfr
5a956ea098 Fix broken usage of nm_readdirsize and increase the socket buffers for UDP
to prevent possible socket overflows.

2.2 candidate.

PR:		kern/3304
Reviewed by:	Thomas David Rivers <ponds!rivers@dg-rtp.dg.com>
1997-04-22 17:38:01 +00:00
dfr
290a0d9360 Fix various bugs in the locking protocol, allowing proper shared locks
to be used.  This should fix the lock panics that people are seeing.
1997-04-04 17:49:35 +00:00
bde
5cfbec5a24 Removed #include of <ufs/ufs/dir.h>. Nfs no longer depends on any ufs
features, and the one thing that it depended on (DIRBLKSIZ) now has
conflicting spelling.
1997-03-29 12:40:20 +00:00
bde
bc8606ec4b Define our own version of DIRBLKSIZ instead of (ab)using ufs's value.
Use the same value of 512 (ufs actually uses DEV_BSIZE).  There are
too many versions of DIRBLKSIZ, one for ufs, one for ext2fs, one for
nfs, one for ibcs2, one for linux, one for applications, ... I think
nfs's DIRBLKSIZ needs to be a divisor of the directory blocks sizes
of all supported file systems.  There is also NFS_DIRBLKSIZ, which is
different from nfs's DIRBLKSIZ but is sometimes confused with it in
comments.

Removed a bogus #ifdef KERNEL that hid the tunable constants for nfs.
This came in undocumented with the Lite2 merge although it isn't in
Lite2.  It required more-bogus #define KERNEL's in fstat and pstat
to make the constants visible.

Restored a spelling fix from rev.1.17.

Removed duplicate #defines of all the the NFS mount option flags.
1997-03-29 12:34:33 +00:00
guido
8db0f5f4fd Add code that will reject nfs requests in teh kernel from nonprivileged
ports. This option will be automatically set/cleraed when mount is run
without/with the -n option.
Reviewed by:	Doug Rabson
1997-03-27 20:01:07 +00:00
peter
5dc5a09bf1 Use the correct (relative to the implementation) ordering of args in
the VOP_LINK() calls, Closes PR#3064

Submitted by: bde
1997-03-25 05:13:40 +00:00
peter
6c63b328f9 The local fs interface does not allow link()/unlink() of directories,
do not allow a remote nfs client to cause local fs corruption either.
1997-03-25 05:08:28 +00:00
bde
0bc1781701 Fixed some invalid (non-atomic) accesses to `time', mostly ones of the
form `tv = time'.  Use a new function gettime().  The current version
just forces atomicicity without fixing precision or efficiency bugs.
Simplified some related valid accesses by using the central function.
1997-03-22 06:53:45 +00:00
peter
94b6d72794 Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.
1997-02-22 09:48:43 +00:00
dyson
10f666af84 This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
		Mount_std mounts will not work until the getfsent
		library routine is changed.

Reviewed by:	various people
Submitted by:	Jeffery Hsu <hsu@freebsd.org>
1997-02-10 02:22:35 +00:00
jkh
808a36ef65 Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore.  This update would have been
insane otherwise.
1997-01-14 07:20:47 +00:00
dfr
4c8f7388e5 Improve the queuing algorithms used by NFS' asynchronous i/o. The
existing mechanism uses a global queue for some buffers and the
vp->b_dirtyblkhd queue for others.  This turns sequential writes into
randomly ordered writes to the server, affecting both read and write
performance.  The existing mechanism also copes badly with hung
servers, tending to block accesses to other servers when all the iods
are waiting for a hung server.

The new mechanism uses a queue for each mount point.  All asynchronous
i/o goes through this queue which preserves the ordering of requests.
A simple mechanism ensures that the iods are shared out fairly between
active mount points.  This removes the sysctl variable vfs.nfs.dwrite
since the new queueing mechanism removes the old delayed write code
completely.

This should go into the 2.2 branch.
1996-11-06 10:53:16 +00:00
dfr
de60fb9205 This fixes a problem with the nfs socket handling code which happens
if a single process is performing a large number of requests (in this
case writing a large file).  The writing process could monopolise the
recieve lock and prevent any other processes from recieving their
replies.

It also adds a new sysctl variable 'vfs.nfs.dwrite' which controls the
behaviour which originally pointed out the problem.  When a process
writes to a file over NFS, it usually arranges for another process
(the 'iod') to perform the request.  If no iods are available, then it
turns the write into a 'delayed write' which is later picked up by the
next iod to do a write request for that file.  This can cause that
particular iod to do a disproportionate number of requests from a
single process which can harm performance on some NFS servers.  The
alternative is to perform the write synchronously in the context of
the original writing process if no iod is avaiable for asynchronous
writing.

The 'delayed write' behaviour is selected when vfs.nfs.dwrite=1 and
the non-delayed behaviour is selected when vfs.nfs.dwrite=0.  The
default is vfs.nfs.dwrite=1; if many people tell me that performance
is better if vfs.nfs.dwrite=0 then I will change the default.

Submitted by:	Hidetoshi Shimokawa <simokawa@sat.t.u-tokyo.ac.jp>
1996-10-11 10:15:33 +00:00
nate
45c85d421d In sys/time.h, struct timespec is defined as:
/*
         * Structure defined by POSIX.4 to be like a timeval.
         */
        struct timespec {
                time_t  ts_sec;         /* seconds */
                long    ts_nsec;        /* and nanoseconds */
        };

        The correct names of the fields are tv_sec and tv_nsec.

Reminded by:	James Drobina <jdrobina@infinet.com>
1996-09-19 18:21:32 +00:00
dg
5ca5efce79 Release an unneeded reference to a vnode that was gained in a VFS_VGET().
Fixes a readdirplus panic.

Submitted by:	Doug Rabson <dfr@render.com>
1996-09-05 07:58:04 +00:00
bde
51ff523803 Eliminated nested include of <sys/unistd.h> in <sys/file.h> in the kernel.
Include it directly in the few places where it is used.

Reduced some #includes of <sys/file.h> to #includes of <sys/fcntl.h> or
nothing.
1996-09-03 14:25:27 +00:00
dyson
966cbc5d29 Even though this looks like it, this is not a complex code change.
The interface into the "VMIO" system has changed to be more consistant
and robust.  Essentially, it is now no longer necessary to call vn_open
to get merged VM/Buffer cache operation, and exceptional conditions
such as merged operation of VBLK devices is simpler and more correct.

This code corrects a potentially large set of problems including the
problems with ktrace output and loaded systems, file create/deletes,
etc.

Most of the changes to NFS are cosmetic and name changes, eliminating
a layer of subroutine calls.  The direct calls to vput/vrele have
been re-instituted for better cross platform compatibility.

Reviewed by: davidg
1996-08-21 21:56:23 +00:00
dfr
90a643ddb4 Various fixes from frank@fwi.uva.nl (Frank van der Linden) via
rick@snowhite.cis.uoguelph.ca:

1. Clear B_NEEDCOMMIT in nfs_write to make sure that dirty data is
correctly send to the server.  If a buffer was dirtied when it was in
the B_DELWRI+B_NEEDCOMMIT state, the state of the buffer was left
unchanged and when the buffer was later cleaned, just a commit rpc was
made to the server to complete the previous write.  Clearing
B_NEEDCOMMIT ensures that another write is made to the server.

2. If a server returned a server (for whatever reason) returned an
answer to a write RPC that implied that fewer bytes than requested
were written, bad things would happen.

3. The setattr operation passed on the atime in stead of the mtime to
the server. The fix is trivial.

4. XIDs always started at 0, but this caused some servers (older DEC
OSF/1 3.0 so I've been told) who had very long-lasting XID caches to
get confused if, after a reboot of a BSD client, RPCs came in with a
XID that had in the past been used before from that client. Patch is
to use the current time in seconds as a starting point for XIDs. The
patch below is not perfect, because it requires the root fs to be
mounted first. This is because of the check BSD systems do, comparing
FS time to system time.

Reviewed by:	Bruce Evans, Terry Lambert.
Obtained from:  frank@fwi.uva.nl (Frank van der Linden) via rick@snowhite.cis.uoguelph.ca
1996-07-16 10:19:45 +00:00
wollman
36ac1a0263 Modify the kernel to use the new pr_usrreqs interface rather than the old
pr_usrreq mechanism which was poorly designed and error-prone.  This
commit renames pr_usrreq to pr_ousrreq so that old code which depended on it
would break in an obvious manner.  This commit also implements the new
interface for TCP, although the old function is left as an example
(#ifdef'ed out).  This commit ALSO fixes a longstanding bug in the
TCP timer processing (introduced by davidg on 1995/04/12) which caused
timer processing on a TCB to always stop after a single timer had
expired (because it misinterpreted the return value from tcp_usrreq()
to indicate that the TCB had been deleted).  Finally, some code
related to polling has been deleted from if.c because it is not
relevant t -current and doesn't look at all like my current code.
1996-07-11 16:32:50 +00:00
bde
a106d0e3dd Don't truncate minor or major numbers in the nfsv3 client. 1996-06-23 17:19:25 +00:00
phk
ea659e907a Fix for NFS_NOSERVER
Poul mentioned that he thought this was some kind of timing problem, and
that started me thinking. After a little poking around, I found that
nfs_timer() was completely disabled when NFS_NOSERVER was #defined.
But after looking at nfs_timer(), it seemed like it was something
required by both the client and server code, and disabling it outright
just didn't seem to make any sense. Parts of it relate only to the
NFS server side code, so I disabled those, but I re-enabled the rest
of the function and made sure that it would be called from nfs_init()
(in nfs_subs.c).

With nfs_timer() re-enabled, everything seems to work again. The only
other changes I made were to #ifdef away some variable declarations
in the NFS_NOSERVER case so that gcc would stop complaining about
unused variables.

Reviewed by:	phk
Submitted by:	Bill Paul <wpaul@skynet.ctr.columbia.edu>
1996-06-14 11:13:21 +00:00
bde
10d2a11249 Fixed a vnode reference leak in nfsrv_rename(). The target inode wasn't
released until the file system was unmounted.  This bug also affected
kern/vfs_syscalls.c but was fixed in rev.1.18 and rev.1.20 there.

Reviewed by:	davidg
1996-06-08 12:16:26 +00:00
bde
4284463ff9 #include <sys/filedesc.h> explicitly instead of depending on it being
bogusly included by <sys/socketvar.h>.
1996-04-30 23:26:52 +00:00