Commit Graph

354 Commits

Author SHA1 Message Date
Dag-Erling Smørgrav
521f364b80 More low-hanging fruit: kill caddr_t in calls to wakeup(9) / [mt]sleep(9). 2003-03-02 16:54:40 +00:00
Jeff Roberson
17661e5ac4 - Add an interlock argument to BUF_LOCK and BUF_TIMELOCK.
- Remove the buftimelock mutex and acquire the buf's interlock to protect
   these fields instead.
 - Hold the vnode interlock while locking bufs on the clean/dirty queues.
   This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another
   BUF_LOCK with a LK_TIMEFAIL to a single lock.

Reviewed by:	arch, mckusick
2003-02-25 03:37:48 +00:00
Poul-Henning Kamp
1c7d84ed3c Don't use mbuf allocator flags for malloc(9). 2003-02-22 10:35:37 +00:00
Warner Losh
a163d034fa Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
Alfred Perlstein
44956c9863 Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
Matthew Dillon
48e3128b34 Bow to the whining masses and change a union back into void *. Retain
removal of unnecessary casts and throw in some minor cleanups to see if
anyone complains, just for the hell of it.
2003-01-13 00:33:17 +00:00
Matthew Dillon
cd72f2180b Change struct file f_data to un_data, a union of the correct struct
pointer types, and remove a huge number of casts from code using it.

Change struct xfile xf_data to xun_data (ABI is still compatible).

If we need to add a #define for f_data and xf_data we can, but I don't
think it will be necessary.  There are no operational changes in this
commit.
2003-01-12 01:37:13 +00:00
Jens Schweikhardt
9d5abbddbf Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup,
especially in troff files.
2003-01-01 18:49:04 +00:00
Matthew Dillon
45587e2514 Abstract-out the constants for the sequential heuristic.
No operational changes.

MFC after:	1 day
2002-12-28 20:28:10 +00:00
Ian Dowse
2f07688e82 In the NFSv3 `fsinfo' procedure reply, don't claim that we support
32k read and write operations on datagram sockets when in fact we
reject requests larger than 16k. It must be the case that virtually
all clients use data sizes of 16k or less for UDP transport (FreeBSD's
client defaults to 8k and never exceeds 16k), as this bug has been
present ever since NFSv3 support was added.

Reported by:	Senthil <lihtnes78@netscape.net>
Reviewed by:	dillon
Approved by:	re
MFC-after:	1 week
2002-12-05 16:58:11 +00:00
Robert Watson
e5e820fd1f Permit MAC policies to instrument the access control decisions for
system accounting configuration and for nfsd server thread attach.
Policies might use this to protect the integrity or confidentiality
of accounting data, limit the ability to turn on or off accounting,
as well as to prevent inappropriately labeled threads from becoming nfs
server threads.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-04 15:13:36 +00:00
Jeff Roberson
24b50116ed - Introduce a new macro, since that's what nfs loves, called
nfsm_srvpathsiz.  This macro plucks a length out of an rpc request and
   verifies that its size does not exceed NFS_MAXPATHLEN.  If it does
   it generates an ENAMETOOLONG response.
 - Use this macro, and the existing nfsm_srvnamsiz macro in two places
   where we deal with paths passed in by the client.

This fixes a linux interoperability bug.  Linux was sending oversized path
components which would cause us to ignore the request all together.  This
causes linux to hang indefinitly while it waits for a response.  This
could still happen in other cases where we error out with EBADRPC.

Sponsored by:	Isilon Systems, Inc.
Reviewed by:	alfred, fabbri@isilon.com, neal@isilon.com
2002-10-31 22:35:03 +00:00
Robert Watson
94998f80fe Set the NOMACCHECK flag for namei()'s generated by the NFS server code.
We currently don't enforce protections on NFS-originated VOP's.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-19 21:27:40 +00:00
Robert Watson
60cfb7c64a Correct a problem wherein NFS servers running NFSv2 would not return
certain classes of failure responses to the client during a failed
remove operation.

Submitted by:	Ian Dowse <iedowse@maths.tcd.ie>
2002-10-03 21:50:37 +00:00
Jeff Roberson
d3b85e1c8b - Use incore() instead of gbincore() so we don't have to acquire the
vnode interlock.
2002-09-25 02:39:39 +00:00
Poul-Henning Kamp
7ed60de837 Use m_length() instead of home-rolled versions. 2002-09-18 19:44:14 +00:00
Poul-Henning Kamp
f5cd3d67fe Make the V2 errno translation more resistent to new errnos. 2002-08-21 19:28:44 +00:00
Jeff Roberson
e6e370a7fe - Replace v_flag with v_iflag and v_vflag
- v_vflag is protected by the vnode lock and is used when synchronization
   with VOP calls is needed.
 - v_iflag is protected by interlock and is used for dealing with vnode
   management issues.  These flags include X/O LOCK, FREE, DOOMED, etc.
 - All accesses to v_iflag and v_vflag have either been locked or marked with
   mp_fixme's.
 - Many ASSERT_VOP_LOCKED calls have been added where the locking was not
   clear.
 - Many functions in vfs_subr.c were restructured to provide for stronger
   locking.

Idea stolen from:	BSD/OS
2002-08-04 10:29:36 +00:00
Peter Wemm
c86e55f6f0 Oops, another unused arg to nfssvc_nfsd(). *blush*
Submitted by:	jake
2002-07-24 23:10:34 +00:00
Peter Wemm
28cc58165d Fully exterminate nfsd_srvargs and nfsd_cargs. They were either unused
or giant NOP's.  There was a credential in srvargs that was giving
rwatson some heartburn. :-)
2002-07-24 22:27:35 +00:00
Robert Watson
30268a208b Stick a dark comment in about the fact that the NFS server code allocates
a ucred by itself as part of an nfs descriptor, then bzero's the ucred,
fails to initialize the mutex, etc.  This is very bad, but I don't have
time to fix it right now.  nfsd should instead hold a cred pointer,
and the credential should be properly initialized, probably from a
descendent of a kernel process credential.
2002-07-24 14:24:16 +00:00
Hajimu UMEMOTO
506c29ecf3 sync comment with reality. IN6P_BINDV6ONLY -> IN6P_IPV6_V6ONLY. 2002-07-22 15:55:50 +00:00
Matthew Dillon
a96f7d1a1b 'recm' was not being unconditionally cleared for each loop, leading to
system lockups (infinite loops) when a zero-length RPC is received.
Linux clients will sometimes send zero-length RPC requests.

Reorganize the use of recm in the loop.

Cc: security@freebsd.org
Submitted by:	Mike Junk <junk@isilon.com>
MFC after:	3 days
2002-07-17 01:07:08 +00:00
Alfred Perlstein
09ce4f7aaf Add IPv6 support.
Submitted by: Jean-Luc Richier <Jean-Luc.Richier@imag.fr>
2002-07-15 19:40:23 +00:00
Matthew Dillon
3d8f797ac1 Convert old style (type foo *)0 casts to NULLs
PR:		kern/40360
Requested by:	Hiten PAndya via direct email
2002-07-11 17:54:58 +00:00
Matthew Dillon
d331c5d43f Replace the global buffer hash table with per-vnode splay trees using a
methodology similar to the vm_map_entry splay and the VM splay that Alan
Cox is working on.  Extensive testing has appeared to have shown no
increase in overhead.

Disadvantages
    Dirties more cache lines during lookups.

    Not as fast as a hash table lookup (but still N log N and optimal
    when there is locality of reference).

Advantages
    vnode->v_dirtyblkhd is now perfectly sorted, making fsync/sync/filesystem
    syncer operate more efficiently.

    I get to rip out all the old hacks (some of which were mine) that tried
    to keep the v_dirtyblkhd tailq sorted.

    The per-vnode splay tree should be easier to lock / SMPng pushdown on
    vnodes will be easier.

    This commit along with another that Alan is working on for the VM page
    global hash table will allow me to implement ranged fsync(), optimize
    server-side nfs commit rpcs, and implement partial syncs by the
    filesystem syncer (aka filesystem syncer would detect that someone is
    trying to get the vnode lock, remembers its place, and skip to the
    next vnode).

Note that the buffer cache splay is somewhat more complex then other splays
due to special handling of background bitmap writes (multiple buffers with
the same lblkno in the same vnode), and B_INVAL discontinuities between the
old hash table and the existence of the buffer on the v_cleanblkhd list.

Suggested by: alc
2002-07-10 17:02:32 +00:00
Seigo Tanimura
4cc20ab1f0 Back out my lats commit of locking down a socket, it conflicts with hsu's work.
Requested by:	hsu
2002-05-31 11:52:35 +00:00
Seigo Tanimura
243917fe3b Lock down a socket, milestone 1.
o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a
  socket buffer. The mutex in the receive buffer also protects the data
  in struct socket.

o Determine the lock strategy for each members in struct socket.

o Lock down the following members:

  - so_count
  - so_options
  - so_linger
  - so_state

o Remove *_locked() socket APIs.  Make the following socket APIs
  touching the members above now require a locked socket:

 - sodisconnect()
 - soisconnected()
 - soisconnecting()
 - soisdisconnected()
 - soisdisconnecting()
 - sofree()
 - soref()
 - sorele()
 - sorwakeup()
 - sotryfree()
 - sowakeup()
 - sowwakeup()

Reviewed by:	alfred
2002-05-20 05:41:09 +00:00
Tom Rhodes
d394511de3 More s/file system/filesystem/g 2002-05-16 21:28:32 +00:00
Ian Dowse
3aed248695 Limit to the maximum allowed reply size the amount of data that
nfsrv_readdir and nfsrv_readdirplus can return. A client request
containing an over-large `count' field could trigger the "Bad nfs
svc reply" panic in nfs_syscalls.c.

Spotted while trying to reproduce kern/37304, which turned out to
be fixed in FreeBSD a long time ago.

MFC after:	1 week
2002-04-21 16:14:54 +00:00
John Baldwin
44731cab3b Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API.  The entire API now consists of two functions
similar to the pre-KSE API.  The suser() function takes a thread pointer
as its only argument.  The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0.  The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on:	smp@
2002-04-01 21:31:13 +00:00
Jeff Roberson
ab426dc822 Remove references to vm_zone.h and switch over to the new uma API. 2002-03-20 10:07:52 +00:00
Kirk McKusick
a0595d0249 Add a flags parameter to VFS_VGET to pass through the desired
locking flags when acquiring a vnode. The immediate purpose is
to allow polling lock requests (LK_NOWAIT) needed by soft updates
to avoid deadlock when enlisting other processes to help with
the background cleanup. For the future it will allow the use of
shared locks for read access to vnodes. This change touches a
lot of files as it affects most filesystems within the system.
It has been well tested on FFS, loopback, and CD-ROM filesystems.
only lightly on the others, so if you find a problem there, please
let me (mckusick@mckusick.com) know.
2002-03-17 01:25:47 +00:00
John Baldwin
a854ed9893 Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.
2002-02-27 18:32:23 +00:00
Matthew Dillon
9348f5e7a6 The vnode was not being vput()'d in the EEXIST mknod case on the nfs
server side.  This can lead to a system deadlock.

Reviewed by:    iedowse
Tested by:      Alexey G Misurenko <mag@caravan.ru>, iedowse
Bug found with help by: Alexey G Misurenko <mag@caravan.ru>
MFC at:         earliest convenience
2002-01-14 19:14:08 +00:00
Ian Dowse
8a919282a5 It is required by VOP_CREATE, VOP_MKNOD, VOP_SYMLINK and VOP_MKDIR
that va_mode of the supplied attributes is filled in with a valid
file mode (i.e not VNOVAL, and only ALLPERM bits set). However,
some NFS server op functions didn't guarantee this for all possible
request messages:

If a V3 client chose not include to a mode specification, we could
end up creating an ffs inode with mode 0177777, requiring a manual
fsck on the next reboot. Fix this by setting va_mode to 0 before
calling the VOP if a mode hasn't been supplied by the client.

In nfsrv_symlink(), S_IFMT bits supplied by a V2 client could end
up in the va_mode passed to VOP_SYMLINK with similar effects. We
now use the macro nfstov_mode() to correctly mask the bits.
2002-01-13 05:36:05 +00:00
Ian Dowse
5df3797ebf Fix a few NFSv2 issues that slipped in during the big cleanup. The
semantics of the nfsm_reply() macro were changed so that the caller
has to explicitly handle the V2 error case, whereas before,
nfsm_reply() did a `goto nfsmout' then. A few server ops (setattr,
readlink, create, mkdir) weren't updated to match, so errors in the
V2 case could cause protocol hangs and leaked mbufs.

Correct some comments that describe the old nfsm_reply behaviour.

[older, harmless nit] Remove the unnecessary `nfsmreply0' label in
nfsrv_create(), since for its users, the main `ereply' label does
the same thing.
2002-01-12 03:57:25 +00:00
Ian Dowse
66b462a989 The macro nfsm_reply() is supposed to allocate a reply in all cases,
but since the nfs cleanup, it hasn't done so in the case where
`error' is EBADRPC. Callers of this macro expect it to initialise
*mrq, and the `nfsmout' exit point expects a reply to be allocated
if error == 0. When nfsm_reply() was called with error = EBADRPC,
whatever junk was in *mrq (often a stale pointer to an old reply
mbuf) would be assumed to be a valid reply and passed to pru_sosend(),
causing a crash sooner or later.

Fix this by allocating a reply even in the EBADRPC case like we
used to do. This bug was specific to -current.
2002-01-11 22:22:39 +00:00
Mike Smith
b3a39c8ae2 Rename some variables that end up shadowing their namesakes in the NFS client
code.

Reviewed by:	peter
2002-01-08 19:41:06 +00:00
Ian Dowse
9669bb479a Avoid passing the variable `tl' to functions that just use it for
temporary storage. In the old NFS code it wasn't at all clear if
the value of `tl' was used across or after macro calls, but I'm
fairly confident that the convention was to keep its use local.
Each ex-macro function now uses a local version of this variable,
so all of the double-indirection goes away.

The only exception to the `local use' rule for `tl' is nfsm_clget(),
which is left unchanged by this commit.

Reviewed by:	peter
2001-12-18 01:22:09 +00:00
Ian Dowse
eec7ff8aa6 When VOP_SYMLINK fails, the value of *vpp is junk, so we must NULL
out nd.ni_vp to prevent the resource cleanup code at the end of
nfsrv_symlink from trying to vrele it. This fixes a "vrele: negative
ref cnt" panic that can occur when a symlink is attempted on an NFS
filesystem with no free space. Found locally, but the symptoms
correspond to those in the PR referenced below.

PR:		kern/26878
MFC after:	3 days
2001-12-04 16:53:42 +00:00
Matthew Dillon
b1e4abd246 Give struct socket structures a ref counting interface similar to
vnodes.  This will hopefully serve as a base from which we can
expand the MP code.  We currently do not attempt to obtain any
mutex or SX locks, but the door is open to add them when we nail
down exactly how that part of it is going to work.
2001-11-17 03:07:11 +00:00
Peter Wemm
f281e9efbe Fix a leftover client comment, long line fix. 2001-11-15 23:49:02 +00:00
Ian Dowse
4f6434bdde Now that nfsm_reply() does not usually set 'error' to 0, we need
to do it explicitly in nfsrv_noop so that the reply gets sent back
to the client. This fixes the generation of a selection of RPC
error replies (RPC_PROGMISMATCH, RPC_PROGUNAVAIL, RPC_PROCUNAVAIL
etc.) that are used by some clients to detect support for optional
protocols and features.

Reviewed by:	peter
Reported by:	Thomas Quinot <quinot@inf.enst.fr>
PR:		kern/31479
2001-10-25 19:07:56 +00:00
Peter Wemm
b9b0e19206 Unwind some more macros. NFSMADV() was kinda silly since it was right
next to equivalent m_len adjustments.  Move the nfsm_subs.h macros
into groups depending on which phase they are used in, since that
affects the error recovery requirements.  Collect some of the common error
checking into a single macro as preparation for unwinding some more.
Have nfs_rephead return a value instead of secretly modifying args.
Remove some unused function arguments that were being passed around.
Clarify nfsm_reply()'s error handling (I hope).
2001-09-28 04:37:08 +00:00
Peter Wemm
ac25bcab72 Oops. I forgot to cvs rm this before. There is a common nfsproto.h.
This was a repo copy leftover.
2001-09-28 04:31:23 +00:00
Peter Wemm
1290984b33 Make nfsm_dissect() have an obvious return value. 2001-09-27 22:40:38 +00:00
Peter Wemm
ea7fe289fe Tidy up nfsm_build usage. This is only partially finished. 2001-09-27 02:33:36 +00:00
Peter Wemm
987f5fca86 Wrap a module around the init code so that we have somethign do do a
modfind(2) on, and declare a version so that loader/kldload etc
can locate it (using kldxref's linker.hints file if needed).
2001-09-20 05:13:43 +00:00
Peter Wemm
eb25edbda3 Cleanup and split of nfs client and server code.
This builds on the top of several repo-copies.
2001-09-18 23:32:09 +00:00
Peter Wemm
38f48395d6 Sync some differences that were different between the copies of the files
that were in nfs/nfs.h and nfsserver/nfs.h in the p4 tree.
2001-09-15 04:41:56 +00:00
Julian Elischer
b40ce4165d KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
Kris Kennaway
bf61e26696 Fix some signed/unsigned integer confusion, and add bounds checking of
arguments to some functions.

Obtained from:	NetBSD
Reviewed by:	peter
MFC after:	2 weeks
2001-09-10 11:28:07 +00:00
Matthew Dillon
4e174404a3 Pushdown Giant for nfs syscalls (nfssvc()) 2001-08-31 22:39:36 +00:00
Matthew Dillon
0cddd8f023 With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage).  Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.
2001-07-04 16:20:28 +00:00
John Baldwin
bc2327c310 - Protect the mnt_vnode list with the mntvnode lock.
- Use queue(9) macros.
2001-06-28 04:10:07 +00:00
Alfred Perlstein
2395531439 Introduce a global lock for the vm subsystem (vm_mtx).
vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb
2001-05-19 01:28:09 +00:00
Mark Murray
fb919e4d5a Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by:	bde (with reservations)
2001-05-01 08:13:21 +00:00
Greg Lehey
60fb0ce365 Revert consequences of changes to mount.h, part 2.
Requested by:	bde
2001-04-29 02:45:39 +00:00
Greg Lehey
d98dc34f52 Correct #includes to work with fixed sys/mount.h. 2001-04-23 09:05:15 +00:00
Alfred Perlstein
603c86672c Implement client side NFS locks.
Obtained from: BSD/os
Import Ok'd by: mckusick, jkh, motd on builder.freebsd.org
2001-04-17 20:45:23 +00:00
Peter Wemm
6eb39ac8fc Use a generic implementation of the Fowler/Noll/Vo hash (FNV hash).
Make the name cache hash as well as the nfsnode hash use it.

As a special tweak, create an unsigned version of register_t.  This allows
us to use a special tweak for the 64 bit versions that significantly
speeds up the i386 version (ie: int64 XOR int64 is slower than int64
XOR int32).

The code layout is a little strange for the string function, but I was
able to get between 5 to 10% improvement over the original version I
started with. The layout affects gcc code generation choices and this way
was fastest on x86 and alpha.

Note that 'CPUTYPE=p3' etc makes a fair difference to this.  It is
around 45% faster with -march=pentiumpro on a p6 cpu.
2001-03-17 09:31:06 +00:00
Brian Feldman
c0511d3b58 Switch to using a struct xucred instead of a struct xucred when not
actually in the kernel.  This structure is a different size than
what is currently in -CURRENT, but should hopefully be the last time
any application breakage is caused there.  As soon as any major
inconveniences are removed, the definition of the in-kernel struct
ucred should be conditionalized upon defined(_KERNEL).

This also changes struct export_args to remove dependency on the
constantly-changing struct ucred, as well as limiting the bounds
of the size fields to the correct size.  This means: a) mountd and
friends won't break all the time, b) mountd and friends won't crash
the kernel all the time if they don't know what they're doing wrt
actual struct export_args layout.

Reviewed by:	bde
2001-02-18 13:30:20 +00:00
Jeroen Ruigrok van der Werven
d7d97eb0aa Preceed/preceeding are not english words. Use precede and preceding. 2001-02-18 10:43:53 +00:00
Ian Dowse
27d9bb4e44 Fix some problems that were introduced in revision 1.97. Instead
of returning an error code to the caller, NFS server op routines
must themselves build an error reply and return 0 to the caller.

This is achieved by replacing the erroneous return statements with
code that jumps forward to the op function's reply code. We need
to be careful to ensure that the 'struct mount' pointer is NULL
though, so that the final vn_finished_write() call becomes a no-op.

Reviewed by:	mckusick, dillon
2001-02-09 13:24:06 +00:00
Bosko Milekic
2a0c503e7a * Rename M_WAIT mbuf subsystem flag to M_TRYWAIT.
This is because calls with M_WAIT (now M_TRYWAIT) may not wait
  forever when nothing is available for allocation, and may end up
  returning NULL. Hopefully we now communicate more of the right thing
  to developers and make it very clear that it's necessary to check whether
  calls with M_(TRY)WAIT also resulted in a failed allocation.
  M_TRYWAIT basically means "try harder, block if necessary, but don't
  necessarily wait forever." The time spent blocking is tunable with
  the kern.ipc.mbuf_wait sysctl.
  M_WAIT is now deprecated but still defined for the next little while.

* Fix a typo in a comment in mbuf.h

* Fix some code that was actually passing the mbuf subsystem's M_WAIT to
  malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the
  value of the M_WAIT flag, this could have became a big problem.
2000-12-21 21:44:31 +00:00
David Malone
7cc0979fd6 Convert more malloc+bzero to malloc+M_ZERO.
Submitted by:	josh@zipperup.org
Submitted by:	Robert Drehmel <robd@gmx.net>
2000-12-08 21:51:06 +00:00
Poul-Henning Kamp
a52585d77e Simplify the tprintf() API.
Loose the special <sys/tprintf.h> #include file.
2000-11-26 20:35:21 +00:00
Matthew Dillon
279d722604 This patchset fixes a large number of file descriptor race conditions.
Pre-rfork code assumed inherent locking of a process's file descriptor
    array.  However, with the advent of rfork() the file descriptor table
    could be shared between processes.  This patch closes over a dozen
    serious race conditions related to one thread manipulating the table
    (e.g. closing or dup()ing a descriptor) while another is blocked in
    an open(), close(), fcntl(), read(), write(), etc...

PR: kern/11629
Discussed with: Alexander Viro <viro@math.psu.edu>
2000-11-18 21:01:04 +00:00
Kirk McKusick
d6514f21d7 In preparation for deprecating CIRCLEQ macros in favor of TAILQ
macros which provide the same functionality and are a bit more
efficient, convert use of CIRCLEQ's in NFS to TAILQ's.
2000-11-14 08:00:39 +00:00
Poul-Henning Kamp
53ce36d17a Remove unneeded #include <sys/proc.h> lines. 2000-10-29 13:57:19 +00:00
David Malone
dc6dd1259f Problem to avoid processes getting stuck in "vmopar". From Ian's
mail:

	The problem seems to originate with NFS's postop_attr
	information that is returned with a read or write RPC.
	Within a vm_fault context, the code cannot deal with
	vnode_pager_setsize() shrinking a vnode.

	The workaround in the patch below stops the nfsm_postop_attr()
	macro from ever shrinking a vnode. If the new size in the
	postop_attr information is smaller, then it just sets the
	nfsnode n_attrstamp to 0 to stop the wrong size getting
	used in the future. This change only affects postop_attr
	attributes; the nfsm_loadattr() macro works as normal.

	The change is implemented by adding a new argument to
	nfs_loadattrcache() called 'dontshrink'. When this is
	non-zero, nfs_loadattrcache() will never reduce the
	vnode/nfsnode size; instead it zeros n_attrstamp.

There remain other was processes can get stuck in vmopar.

Submitted by:	Ian Dowse <iedowse@maths.tcd.ie>
Reviewed by:	dillon
Tested by:	Vadim Belman <voland@lflat.org>
2000-10-24 10:13:36 +00:00
Jason Evans
0384fff8c5 Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*().  See mutex(9).  (Note: The
  alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
  preempted (i386 only).

Partially contributed by:	BSDi (BSD/OS)
Submissions by (at least):	cp, dfr, dillon, grog, jake, jhb, sheldonh
2000-09-07 01:33:02 +00:00
Kirk McKusick
9b97113391 This patch corrects the first round of panics and hangs reported
with the new snapshot code.

Update addaliasu to correctly implement the semantics of the old
checkalias function. When a device vnode first comes into existence,
check to see if an anonymous vnode for the same device was created
at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than
creating a new vnode for the device. This corrects a problem which
caused the kernel to panic when taking a snapshot of the root
filesystem.

Change the calling convention of vn_write_suspend_wait() to be the
same as vn_start_write().

Split out softdep_flushworklist() from softdep_flushfiles() so that
it can be used to clear the work queue when suspending filesystem
operations.

Access to buffers becomes recursive so that snapshots can recursively
traverse their indirect blocks using ffs_copyonwrite() when checking
for the need for copy on write when flushing one of their own indirect
blocks. This eliminates a deadlock between the syncer daemon and a
process taking a snapshot.

Ensure that softdep_process_worklist() can never block because of a
snapshot being taken. This eliminates a problem with buffer starvation.

Cleanup change in ffs_sync() which did not synchronously wait when
MNT_WAIT was specified. The result was an unclean filesystem panic
when doing forcible unmount with heavy filesystem I/O in progress.

Return a zero'ed block when reading a block that was not in use at
the time that a snapshot was taken. Normally, these blocks should
never be read. However, the readahead code will occationally read
them which can cause unexpected behavior.

Clean up the debugging code that ensures that no blocks be written
on a filesystem while it is suspended. Snapshots must explicitly
label the blocks that they are writing during the suspension so that
they do not cause a `write on suspended filesystem' panic.

Reorganize ffs_copyonwrite() to eliminate a deadlock and also to
prevent a race condition that would permit the same block to be
copied twice. This change eliminates an unexpected soft updates
inconsistency in fsck caused by the double allocation.

Use bqrelse rather than brelse for buffers that will be needed
soon again by the snapshot code. This improves snapshot performance.
2000-07-24 05:28:33 +00:00
Kirk McKusick
f2a2857bb3 Add snapshots to the fast filesystem. Most of the changes support
the gating of system calls that cause modifications to the underlying
filesystem. The gating can be enabled by any filesystem that needs
to consistently suspend operations by adding the vop_stdgetwritemount
to their set of vnops. Once gating is enabled, the function
vfs_write_suspend stops all new write operations to a filesystem,
allows any filesystem modifying system calls already in progress
to complete, then sync's the filesystem to disk and returns. The
function vfs_write_resume allows the suspended write operations to
begin again. Gating is not added by default for all filesystems as
for SMP systems it adds two extra locks to such critical kernel
paths as the write system call. Thus, gating should only be added
as needed.

Details on the use and current status of snapshots in FFS can be
found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness
is not included here. Unless and until you create a snapshot file,
these changes should have no effect on your system (famous last words).
2000-07-11 22:07:57 +00:00
Jake Burkholder
e39756439c Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by:		msmith and others
2000-05-26 02:09:24 +00:00
Jake Burkholder
740a1973a6 Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by:	phk
Reviewed by:	phk
Approved by:	mdodd
2000-05-23 20:41:01 +00:00
Poul-Henning Kamp
9626b608de Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by:    peter
2000-05-05 09:59:14 +00:00
Poul-Henning Kamp
2c9b67a8df Remove unneeded #include <vm/vm_zone.h>
Generated by:	src/tools/tools/kerninclude
2000-04-30 18:52:11 +00:00
Poul-Henning Kamp
3389ae9350 Remove ~25 unneeded #include <sys/conf.h>
Remove ~60 unneeded #include <sys/malloc.h>
2000-04-19 14:58:28 +00:00
Matthew Dillon
8d1b3828fa Add a sysctl to specify the amount of UDP receive space NFS should
reserve, in maximal NFS packets.  Originally only 2 packets worth of
    space was reserved.  The default is now 4, which appears to greatly
    improve performance for slow to mid-speed machines on gigabit networks.

    Add documentation and correct some prior documentation.

Problem Researched by: Andrew Gallatin <gallatin@cs.duke.edu>
Approved by: jkh
2000-03-27 21:38:35 +00:00
Poul-Henning Kamp
b99c307a21 Rename the existing BUF_STRATEGY() to DEV_STRATEGY()
substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo)

substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo)

This patch is machine generated except for the ccd.c and buf.h parts.
2000-03-20 11:29:10 +00:00
Poul-Henning Kamp
21144e3bf1 Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new
field in struct buf: b_iocmd.  The b_iocmd is enforced to have
exactly one bit set.

B_WRITE was bogusly defined as zero giving rise to obvious coding
mistakes.

Also eliminate the redundant struct buf flag B_CALL, it can just
as efficiently be done by comparing b_iodone to NULL.

Should you get a panic or drop into the debugger, complaining about
"b_iocmd", don't continue.  It is likely to write on your disk
where it should have been reading.

This change is a step in the direction towards a stackable BIO capability.

A lot of this patch were machine generated (Thanks to style(9) compliance!)

Vinum users:  Greg has not had time to test this yet, be careful.
2000-03-20 10:44:49 +00:00
Peter Wemm
242c5536ea Clean up some loose ends in the network code, including the X.25 and ISO
#ifdefs.  Clean out unused netisr's and leftover netisr linker set gunk.
Tested on x86 and alpha, including world.

Approved by:	jkh
2000-02-13 03:32:07 +00:00
Matthew Dillon
34ddf54812 The alpha build cuases the 'nfsuid bloated' warning to occur. Well,
there is nothing we can do about it.  In fact, after further review
    there simply are not very many instances of the two structures NFS
    checks for 'bloat' so I've decided to simply rip the checks out entirely.

Submitted by:	 Andrew Gallatin <gallatin@cs.duke.edu>
2000-01-13 20:18:25 +00:00
Yoshinobu Inoue
fb59c426ff tcp updates to support IPv6.
also a small patch to sys/nfs/nfs_socket.c, as max_hdr size change.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project
2000-01-09 19:17:30 +00:00
Matthew Dillon
c37c9620cd Enhance reassignbuf(). When a buffer cannot be time-optimally inserted
into vnode dirtyblkhd we append it to the list instead of prepend it to
    the list in order to maintain a 'forward' locality of reference, which
    is arguably better then 'reverse'.  The original algorithm did things this
    way to but at a huge time cost.

    Enhance the append interlock for NFS writes to handle intr/soft mounts
    better.

    Fix the hysteresis for NFS async daemon I/O requests to reduce the
    number of unnecessary context switches.

    Modify handling of NFS mount options.  Any given user option that is
    too high now defaults to the kernel maximum for that option rather then
    the kernel default for that option.

Reviewed by:	 Alfred Perlstein <bright@wintelcom.net>
2000-01-05 05:11:37 +00:00
Peter Wemm
c447342094 Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot).  This is consistant with the other
BSD's who made this change quite some time ago.  More commits to come.
1999-12-29 05:07:58 +00:00
Alfred Perlstein
20883b0f10 make getfh a standard syscall instead of dependant on having
NFSSERVER defined, useful for userland fileservers that want to
use a filehandle type interface to the filesystem.

Submitted by: Assar Westerlund assar@stacken.kth.se
PR: kern/15452
1999-12-21 20:21:12 +00:00
Brian Feldman
d25f3712b7 M_PREPEND-related cleanups (unregisterifying struct mbuf *s). 1999-12-19 01:55:37 +00:00
Matthew Dillon
60c959f40b Fix compilation warning on alpha when converting pointer to integer
to generate hash index.

Reviewed by:	 Andrew Gallatin <gallatin@cs.duke.edu>
1999-12-18 19:20:05 +00:00
Matthew Dillon
2cac06495e Have NFS use a snapshot of boottime instead of boottime itself to
generate the NFSv3 Version id.  boottime itself may change, sometimes
    once every tick if you are running xntpd, which really throws off
    clients.  Clients will tend to throw away what they believe to be
    stale data too often, and can get into long loops rewriting the same
    data over and over again because they believe the server has rebooted
    over and over again due to the changing version id.

Approved by:	jkh
1999-12-16 17:01:32 +00:00
Eivind Eklund
762e6b856c Introduce NDFREE (and remove VOP_ABORTOP) 1999-12-15 23:02:35 +00:00
Matthew Dillon
1e64c256dc Add a readahead heuristic to the NFS server side code. While the server
cannot unilaterally pass data to a client it can reduce the physical
    disk transaction overhead by reading larger blocks.  This results in
    better pipelining of requests/responses over the network and an almost
    100% increase in cpu efficiency on the server.  On a 100BaseTX network
    NFS read performance increases from 8.5 MBytes/sec to 10 MB/sec (maxed
    out), and cpu efficiency increases from 72% idle to 80% idle on the server.

Reviewed by:	Alfred Perlstein <bright@wintelcom.net>
1999-12-13 17:34:45 +00:00
Matthew Dillon
c9940d3b84 PR: kern/15222
Submitted by:	Ian Dowse <iedowse@maths.tcd.ie>
1999-12-13 17:07:03 +00:00
Matthew Dillon
4682c8eac9 Fix a timeout deadlock that can occur when the process holding the
receive lock hasn't yet managed to send its own request.

PR:		kern/15055
Submitted by:	Ian Dowse iedowse@maths.tcd.ie
1999-12-13 04:24:55 +00:00
Matthew Dillon
5f3bfd608d Fix a number of server-side issues related to aborting badly formed
NFS packets, mainly initializing structure pointers to NULL which
    are conditionally freed prior to return.

PR:		kern/15249
Submitted by:	Ian Dowse <iedowse@maths.tcd.ie>
1999-12-12 07:06:39 +00:00
Matthew Dillon
ea94c7b968 Synopsis of problem being fixed: Dan Nelson originally reported that
blocks of zeros could wind up in a file written to over NFS by a client.
    The problem only occurs a few times per several gigabytes of data.   This
    problem turned out to be bug #3 below.

    bug #1:

        B_CLUSTEROK must be cleared when an NFS buffer is reverted from
        stage 2 (ready for commit rpc) to stage 1 (ready for write).
        Reversions can occur when a dirty NFS buffer is redirtied with new
        data.

        Otherwise the VFS/BIO system may end up thinking that a stage 1
        NFS buffer is clusterable.  Stage 1 NFS buffers are not clusterable.

    bug #2:

        B_CLUSTEROK was inappropriately set for a 'short' NFS buffer (short
        buffers only occur near the EOF of the file).  Change to only set
        when the buffer is a full biosize (usually 8K).  This bug has no
        effect but should be fixed in -current anyway.  It need not be
        backported.

    bug #3:

        B_NEEDCOMMIT was inappropriately set in nfs_flush() (which is
	typically only called by the update daemon).  nfs_flush()
        does a multi-pass loop but due to the lack of vnode locking it
        is possible for new buffers to be added to the dirtyblkhd list
        while a flush operation is going on.  This may result in nfs_flush()
        setting B_NEEDCOMMIT on a buffer which has *NOT* yet gone through its
        stage 1 write, causing only the commit rpc to be made and thus
        causing the contents of the buffer to be thrown away (never sent to
        the server).

    The patch also contains some cleanup, which only applies to the commit
    into -current.

Reviewed by:	dg, julian
Originally Reported by: Dan Nelson <dnelson@emsphone.com>
1999-12-12 06:09:57 +00:00
Matthew Dillon
b314ed9662 nm_srtt and nm_sdrtt are arrays[4]. Remove explicit initialization
of element [4] in both, which goes beyond the end of the array, leaving
    [0], [1], [2], and [3].  This bug did not cause any problems since
    the overrun fields are initialized after the bogus array init but
    needs to be fixed anyway.

Submitted by:	 Ian Dowse <iedowse@maths.tcd.ie>
1999-11-22 04:50:09 +00:00
Eivind Eklund
dd8c04f4c7 Remove WILLRELE from VOP_SYMLINK
Note: Previous commit to these files (except coda_vnops and devfs_vnops)
that claimed to remove WILLRELE from VOP_RENAME actually removed it from
VOP_MKNOD.
1999-11-13 20:58:17 +00:00