Commit Graph

312 Commits

Author SHA1 Message Date
Ian Dowse
92daf89227 Fix a bug in nfsrv_read() that caused the replies to certain NFSv3
short read operations at the end of a file to not have the "eof"
flag set as they should. The problem is that the requested read
count was compared against the rounded-up reply data length instead
of the actual reply data length. This bug appears to have been
introduced in revision 1.78 (June 1999). It causes first-time reads
of certain file sizes (e.g 4094 bytes) to fail with EIO on a RedHat
9.0 NFSv3 client.

MFC after:	1 week
2003-06-24 19:04:26 +00:00
Kirk McKusick
98530110a2 Increase the size of the NFS server hash table to improve performance
when serving up more than about 32 active files. For details see
section 6.3 (pg 111) of Daniel Ellard and Margo Seltzer, ``NFS
Tricks and Benchmarking Traps'' in the Proceedings of the Usenix
2003 Freenix Track, June 9-14, 2003 pg 101-114.

Obtained from:	Daniel Ellard <ellard@eecs.harvard.edu>
Sponsored by:   DARPA & NAI Labs.
2003-06-21 21:01:44 +00:00
David E. O'Brien
ab0de15baf Use __FBSDID(). 2003-06-11 05:37:42 +00:00
Jeffrey Hsu
30ef7aacc2 Protect read-modify-write increment of f_count field with file lock. 2003-06-05 06:05:57 +00:00
Poul-Henning Kamp
7379c88f4f Add /* FALLTHROUGH */
Found by:       FlexeLint
2003-05-31 18:20:26 +00:00
Don Lewis
263c8abeb9 Beat vnode locking in the NFS server code into submission. This change
is not pretty, but it fixes the code so that it no longer violates the
vnode locking rules in the VFS API and doesn't trip any of the locking
assertions enabled by the DEBUG_VFS_LOCKS kernel configuration option.
There is one report that this patch fixed a "locking against myself"
panic on an NFS server that was tripped by a diskless client.

Approved by:	re (scottl)
2003-05-25 06:17:33 +00:00
Alan Cox
b6e48e0372 - Acquire the vm_object's lock when performing vm_object_page_clean().
- Add a parameter to vm_pageout_flush() that tells vm_pageout_flush()
   whether its caller has locked the vm_object.  (This is a temporary
   measure to bootstrap vm_object locking.)
2003-04-24 04:31:25 +00:00
Jeff Roberson
c033bdc013 - Lock bufs before inspecting their flags. 2003-03-13 07:05:22 +00:00
Dag-Erling Smørgrav
521f364b80 More low-hanging fruit: kill caddr_t in calls to wakeup(9) / [mt]sleep(9). 2003-03-02 16:54:40 +00:00
Jeff Roberson
17661e5ac4 - Add an interlock argument to BUF_LOCK and BUF_TIMELOCK.
- Remove the buftimelock mutex and acquire the buf's interlock to protect
   these fields instead.
 - Hold the vnode interlock while locking bufs on the clean/dirty queues.
   This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another
   BUF_LOCK with a LK_TIMEFAIL to a single lock.

Reviewed by:	arch, mckusick
2003-02-25 03:37:48 +00:00
Poul-Henning Kamp
1c7d84ed3c Don't use mbuf allocator flags for malloc(9). 2003-02-22 10:35:37 +00:00
Warner Losh
a163d034fa Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
Alfred Perlstein
44956c9863 Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
Matthew Dillon
48e3128b34 Bow to the whining masses and change a union back into void *. Retain
removal of unnecessary casts and throw in some minor cleanups to see if
anyone complains, just for the hell of it.
2003-01-13 00:33:17 +00:00
Matthew Dillon
cd72f2180b Change struct file f_data to un_data, a union of the correct struct
pointer types, and remove a huge number of casts from code using it.

Change struct xfile xf_data to xun_data (ABI is still compatible).

If we need to add a #define for f_data and xf_data we can, but I don't
think it will be necessary.  There are no operational changes in this
commit.
2003-01-12 01:37:13 +00:00
Jens Schweikhardt
9d5abbddbf Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup,
especially in troff files.
2003-01-01 18:49:04 +00:00
Matthew Dillon
45587e2514 Abstract-out the constants for the sequential heuristic.
No operational changes.

MFC after:	1 day
2002-12-28 20:28:10 +00:00
Ian Dowse
2f07688e82 In the NFSv3 `fsinfo' procedure reply, don't claim that we support
32k read and write operations on datagram sockets when in fact we
reject requests larger than 16k. It must be the case that virtually
all clients use data sizes of 16k or less for UDP transport (FreeBSD's
client defaults to 8k and never exceeds 16k), as this bug has been
present ever since NFSv3 support was added.

Reported by:	Senthil <lihtnes78@netscape.net>
Reviewed by:	dillon
Approved by:	re
MFC-after:	1 week
2002-12-05 16:58:11 +00:00
Robert Watson
e5e820fd1f Permit MAC policies to instrument the access control decisions for
system accounting configuration and for nfsd server thread attach.
Policies might use this to protect the integrity or confidentiality
of accounting data, limit the ability to turn on or off accounting,
as well as to prevent inappropriately labeled threads from becoming nfs
server threads.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-04 15:13:36 +00:00
Jeff Roberson
24b50116ed - Introduce a new macro, since that's what nfs loves, called
nfsm_srvpathsiz.  This macro plucks a length out of an rpc request and
   verifies that its size does not exceed NFS_MAXPATHLEN.  If it does
   it generates an ENAMETOOLONG response.
 - Use this macro, and the existing nfsm_srvnamsiz macro in two places
   where we deal with paths passed in by the client.

This fixes a linux interoperability bug.  Linux was sending oversized path
components which would cause us to ignore the request all together.  This
causes linux to hang indefinitly while it waits for a response.  This
could still happen in other cases where we error out with EBADRPC.

Sponsored by:	Isilon Systems, Inc.
Reviewed by:	alfred, fabbri@isilon.com, neal@isilon.com
2002-10-31 22:35:03 +00:00
Robert Watson
94998f80fe Set the NOMACCHECK flag for namei()'s generated by the NFS server code.
We currently don't enforce protections on NFS-originated VOP's.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-19 21:27:40 +00:00
Robert Watson
60cfb7c64a Correct a problem wherein NFS servers running NFSv2 would not return
certain classes of failure responses to the client during a failed
remove operation.

Submitted by:	Ian Dowse <iedowse@maths.tcd.ie>
2002-10-03 21:50:37 +00:00
Jeff Roberson
d3b85e1c8b - Use incore() instead of gbincore() so we don't have to acquire the
vnode interlock.
2002-09-25 02:39:39 +00:00
Poul-Henning Kamp
7ed60de837 Use m_length() instead of home-rolled versions. 2002-09-18 19:44:14 +00:00
Poul-Henning Kamp
f5cd3d67fe Make the V2 errno translation more resistent to new errnos. 2002-08-21 19:28:44 +00:00
Jeff Roberson
e6e370a7fe - Replace v_flag with v_iflag and v_vflag
- v_vflag is protected by the vnode lock and is used when synchronization
   with VOP calls is needed.
 - v_iflag is protected by interlock and is used for dealing with vnode
   management issues.  These flags include X/O LOCK, FREE, DOOMED, etc.
 - All accesses to v_iflag and v_vflag have either been locked or marked with
   mp_fixme's.
 - Many ASSERT_VOP_LOCKED calls have been added where the locking was not
   clear.
 - Many functions in vfs_subr.c were restructured to provide for stronger
   locking.

Idea stolen from:	BSD/OS
2002-08-04 10:29:36 +00:00
Peter Wemm
c86e55f6f0 Oops, another unused arg to nfssvc_nfsd(). *blush*
Submitted by:	jake
2002-07-24 23:10:34 +00:00
Peter Wemm
28cc58165d Fully exterminate nfsd_srvargs and nfsd_cargs. They were either unused
or giant NOP's.  There was a credential in srvargs that was giving
rwatson some heartburn. :-)
2002-07-24 22:27:35 +00:00
Robert Watson
30268a208b Stick a dark comment in about the fact that the NFS server code allocates
a ucred by itself as part of an nfs descriptor, then bzero's the ucred,
fails to initialize the mutex, etc.  This is very bad, but I don't have
time to fix it right now.  nfsd should instead hold a cred pointer,
and the credential should be properly initialized, probably from a
descendent of a kernel process credential.
2002-07-24 14:24:16 +00:00
Hajimu UMEMOTO
506c29ecf3 sync comment with reality. IN6P_BINDV6ONLY -> IN6P_IPV6_V6ONLY. 2002-07-22 15:55:50 +00:00
Matthew Dillon
a96f7d1a1b 'recm' was not being unconditionally cleared for each loop, leading to
system lockups (infinite loops) when a zero-length RPC is received.
Linux clients will sometimes send zero-length RPC requests.

Reorganize the use of recm in the loop.

Cc: security@freebsd.org
Submitted by:	Mike Junk <junk@isilon.com>
MFC after:	3 days
2002-07-17 01:07:08 +00:00
Alfred Perlstein
09ce4f7aaf Add IPv6 support.
Submitted by: Jean-Luc Richier <Jean-Luc.Richier@imag.fr>
2002-07-15 19:40:23 +00:00
Matthew Dillon
3d8f797ac1 Convert old style (type foo *)0 casts to NULLs
PR:		kern/40360
Requested by:	Hiten PAndya via direct email
2002-07-11 17:54:58 +00:00
Matthew Dillon
d331c5d43f Replace the global buffer hash table with per-vnode splay trees using a
methodology similar to the vm_map_entry splay and the VM splay that Alan
Cox is working on.  Extensive testing has appeared to have shown no
increase in overhead.

Disadvantages
    Dirties more cache lines during lookups.

    Not as fast as a hash table lookup (but still N log N and optimal
    when there is locality of reference).

Advantages
    vnode->v_dirtyblkhd is now perfectly sorted, making fsync/sync/filesystem
    syncer operate more efficiently.

    I get to rip out all the old hacks (some of which were mine) that tried
    to keep the v_dirtyblkhd tailq sorted.

    The per-vnode splay tree should be easier to lock / SMPng pushdown on
    vnodes will be easier.

    This commit along with another that Alan is working on for the VM page
    global hash table will allow me to implement ranged fsync(), optimize
    server-side nfs commit rpcs, and implement partial syncs by the
    filesystem syncer (aka filesystem syncer would detect that someone is
    trying to get the vnode lock, remembers its place, and skip to the
    next vnode).

Note that the buffer cache splay is somewhat more complex then other splays
due to special handling of background bitmap writes (multiple buffers with
the same lblkno in the same vnode), and B_INVAL discontinuities between the
old hash table and the existence of the buffer on the v_cleanblkhd list.

Suggested by: alc
2002-07-10 17:02:32 +00:00
Seigo Tanimura
4cc20ab1f0 Back out my lats commit of locking down a socket, it conflicts with hsu's work.
Requested by:	hsu
2002-05-31 11:52:35 +00:00
Seigo Tanimura
243917fe3b Lock down a socket, milestone 1.
o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a
  socket buffer. The mutex in the receive buffer also protects the data
  in struct socket.

o Determine the lock strategy for each members in struct socket.

o Lock down the following members:

  - so_count
  - so_options
  - so_linger
  - so_state

o Remove *_locked() socket APIs.  Make the following socket APIs
  touching the members above now require a locked socket:

 - sodisconnect()
 - soisconnected()
 - soisconnecting()
 - soisdisconnected()
 - soisdisconnecting()
 - sofree()
 - soref()
 - sorele()
 - sorwakeup()
 - sotryfree()
 - sowakeup()
 - sowwakeup()

Reviewed by:	alfred
2002-05-20 05:41:09 +00:00
Tom Rhodes
d394511de3 More s/file system/filesystem/g 2002-05-16 21:28:32 +00:00
Ian Dowse
3aed248695 Limit to the maximum allowed reply size the amount of data that
nfsrv_readdir and nfsrv_readdirplus can return. A client request
containing an over-large `count' field could trigger the "Bad nfs
svc reply" panic in nfs_syscalls.c.

Spotted while trying to reproduce kern/37304, which turned out to
be fixed in FreeBSD a long time ago.

MFC after:	1 week
2002-04-21 16:14:54 +00:00
John Baldwin
44731cab3b Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API.  The entire API now consists of two functions
similar to the pre-KSE API.  The suser() function takes a thread pointer
as its only argument.  The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0.  The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on:	smp@
2002-04-01 21:31:13 +00:00
Jeff Roberson
ab426dc822 Remove references to vm_zone.h and switch over to the new uma API. 2002-03-20 10:07:52 +00:00
Kirk McKusick
a0595d0249 Add a flags parameter to VFS_VGET to pass through the desired
locking flags when acquiring a vnode. The immediate purpose is
to allow polling lock requests (LK_NOWAIT) needed by soft updates
to avoid deadlock when enlisting other processes to help with
the background cleanup. For the future it will allow the use of
shared locks for read access to vnodes. This change touches a
lot of files as it affects most filesystems within the system.
It has been well tested on FFS, loopback, and CD-ROM filesystems.
only lightly on the others, so if you find a problem there, please
let me (mckusick@mckusick.com) know.
2002-03-17 01:25:47 +00:00
John Baldwin
a854ed9893 Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.
2002-02-27 18:32:23 +00:00
Matthew Dillon
9348f5e7a6 The vnode was not being vput()'d in the EEXIST mknod case on the nfs
server side.  This can lead to a system deadlock.

Reviewed by:    iedowse
Tested by:      Alexey G Misurenko <mag@caravan.ru>, iedowse
Bug found with help by: Alexey G Misurenko <mag@caravan.ru>
MFC at:         earliest convenience
2002-01-14 19:14:08 +00:00
Ian Dowse
8a919282a5 It is required by VOP_CREATE, VOP_MKNOD, VOP_SYMLINK and VOP_MKDIR
that va_mode of the supplied attributes is filled in with a valid
file mode (i.e not VNOVAL, and only ALLPERM bits set). However,
some NFS server op functions didn't guarantee this for all possible
request messages:

If a V3 client chose not include to a mode specification, we could
end up creating an ffs inode with mode 0177777, requiring a manual
fsck on the next reboot. Fix this by setting va_mode to 0 before
calling the VOP if a mode hasn't been supplied by the client.

In nfsrv_symlink(), S_IFMT bits supplied by a V2 client could end
up in the va_mode passed to VOP_SYMLINK with similar effects. We
now use the macro nfstov_mode() to correctly mask the bits.
2002-01-13 05:36:05 +00:00
Ian Dowse
5df3797ebf Fix a few NFSv2 issues that slipped in during the big cleanup. The
semantics of the nfsm_reply() macro were changed so that the caller
has to explicitly handle the V2 error case, whereas before,
nfsm_reply() did a `goto nfsmout' then. A few server ops (setattr,
readlink, create, mkdir) weren't updated to match, so errors in the
V2 case could cause protocol hangs and leaked mbufs.

Correct some comments that describe the old nfsm_reply behaviour.

[older, harmless nit] Remove the unnecessary `nfsmreply0' label in
nfsrv_create(), since for its users, the main `ereply' label does
the same thing.
2002-01-12 03:57:25 +00:00
Ian Dowse
66b462a989 The macro nfsm_reply() is supposed to allocate a reply in all cases,
but since the nfs cleanup, it hasn't done so in the case where
`error' is EBADRPC. Callers of this macro expect it to initialise
*mrq, and the `nfsmout' exit point expects a reply to be allocated
if error == 0. When nfsm_reply() was called with error = EBADRPC,
whatever junk was in *mrq (often a stale pointer to an old reply
mbuf) would be assumed to be a valid reply and passed to pru_sosend(),
causing a crash sooner or later.

Fix this by allocating a reply even in the EBADRPC case like we
used to do. This bug was specific to -current.
2002-01-11 22:22:39 +00:00
Mike Smith
b3a39c8ae2 Rename some variables that end up shadowing their namesakes in the NFS client
code.

Reviewed by:	peter
2002-01-08 19:41:06 +00:00
Ian Dowse
9669bb479a Avoid passing the variable `tl' to functions that just use it for
temporary storage. In the old NFS code it wasn't at all clear if
the value of `tl' was used across or after macro calls, but I'm
fairly confident that the convention was to keep its use local.
Each ex-macro function now uses a local version of this variable,
so all of the double-indirection goes away.

The only exception to the `local use' rule for `tl' is nfsm_clget(),
which is left unchanged by this commit.

Reviewed by:	peter
2001-12-18 01:22:09 +00:00
Ian Dowse
eec7ff8aa6 When VOP_SYMLINK fails, the value of *vpp is junk, so we must NULL
out nd.ni_vp to prevent the resource cleanup code at the end of
nfsrv_symlink from trying to vrele it. This fixes a "vrele: negative
ref cnt" panic that can occur when a symlink is attempted on an NFS
filesystem with no free space. Found locally, but the symptoms
correspond to those in the PR referenced below.

PR:		kern/26878
MFC after:	3 days
2001-12-04 16:53:42 +00:00
Matthew Dillon
b1e4abd246 Give struct socket structures a ref counting interface similar to
vnodes.  This will hopefully serve as a base from which we can
expand the MP code.  We currently do not attempt to obtain any
mutex or SX locks, but the door is open to add them when we nail
down exactly how that part of it is going to work.
2001-11-17 03:07:11 +00:00