792 Commits

Author SHA1 Message Date
dfr
0536363c85 MFC: kernel-mode NFS lock manager. 2008-04-24 10:46:25 +00:00
jhb
dc860f5a64 MFC: Consolidate the code to generate a new XID for a NFS request. 2008-03-05 20:04:16 +00:00
rwatson
bb416af772 Merge nfs_vnops.c:1.277 from HEAD to RELENG_6:
Remove hacks from the NFSv2/3 client intended to handle a lack of a
  server-side RPC retranmission cache for non-idempotent operations: these
  hacks substituted 0 (success) for the expected EEXIST in the event that
  a target name already existed for LINK, SYMLINK, and MKDIR operations,
  under the assumption that EEXIST represented a second application of the
  original RPC rather than a true failure.

  Background: certain NFS operations (in this case, LINK, SYMLINK, and
  MKDIR) are not idempotent, as they leave behind persisting state on the
  server that prevents them from being replayed without an error;if an UDP
  RPC reply is lost leading to a retransmission by theclient, the second
  reply will return EEXIST rather than success, asthe new object has
  already been created.  The NFS client previouslysilently mapped the
  EEXIST return into success to paper over thisproblem.

  However, in all modern NFS server implementations, a reply cache is kept
  in order to retransmit the original reply to a retransmitted request,
  rather than performing the operation a second time, allowing this hack
  to be avoided.  This allows link()-based filelocking over NFS to operate
  correctly, as an application requestingthe creation of a new link for a
  file to tell if it succeededatomically or not.

  Other NFS clients, including Solaris and Linux, generally follow this
  behavior for the same reasons.  Most clients also now default to TCP,
  which also helps avoid the issue of retransmitted but non-idempotent
  requests in most cases.

  Reported by:    Adam McDougall <mcdouga9 at egr dot msu dot edu>,
                  Timo Sirainen <tss at iki dot fi>
  Reviewed by:    mohans
2008-03-01 11:33:22 +00:00
jhb
c3228a22c4 MFC: Force consistent use of the mountpoint's credentials when connecting
to the NFS server by temporarily changing the current thread's credentials
to that of the mountpoint while establishing the connection.
2008-01-17 21:04:51 +00:00
ups
160df1e536 MFC: nfs_socket.c 1.148
Log: 	NetApp filers return corrupt post op attrs in the wcc on NFS error responses.
	This is easy to reproduce for EROFS. I am not sure if the attrs can be corrupt
	for other NFS error responses. For now, disabling wcc pre-op attr checks and
	post-op attr loads on NFS errors (sysctl'ed).
2007-11-16 21:24:54 +00:00
sam
33d1852e9a MFC: consolidate parsing of nfs root mount options in one place
and handle all options; this enables things like TCP mounts
2007-10-30 00:49:41 +00:00
jhb
b3027a4df2 MFC: Add a -z flag to nfsstat which zeros the NFS statistics after
displaying them.
2007-10-26 22:06:55 +00:00
jhb
15a7b85980 MFC: Fix for a race where out of order loading of NFS attrs into the
nfsnode could lead to attrs being stale.
2007-10-17 16:07:10 +00:00
jhb
5207be8308 MFC 1.164: Fix up NFS client write error handling.
Submitted by:	mohans
2007-07-17 21:02:08 +00:00
jhb
1f96e040a0 MFC 1.131: Fix for a race between the thread transmitting the request and
the thread processing the reply.
2007-06-28 03:28:28 +00:00
kib
b75617bf6d MFC:
rev. 1.11 of src/sys/geom/geom_vfs.c
rev. 1.516 of src/sys/kern/vfs_bio.c
rev. 1.35 of src/sys/nfs4client/nfs4_vnops.c
rev. 1.272 of src/sys/nfsclient/nfs_vnops.c
rev. 1.195 of src/sys/sys/buf.h
rev. 1.18 of src/sys/sys/bufobj.h
rev. 1.73 of src/sys/ufs/ffs/ffs_extern.h
rev. 1.133 of src/sys/ufs/ffs/ffs_snapshot.c
rev. 1.324 of src/sys/ufs/ffs/ffs_vfsops.c

Avoid dealing with buffers in bdwrite() that are from other side of
snaplock divisor in the lock order then the buffer being written. Add
new BOP, bop_bdwrite(), to do dirty buffer flushing for same vnode in
the bdwrite(). Default implementation, bufbdflush(), refactors the code
from bdwrite(). For ffs device buffers, specialized implementation is
used.

This commit changes KPI/KBI, thus recompilation of out of tree kernel
modules is required.

Approved by:	re (kensmith)
2007-06-11 10:53:48 +00:00
jhb
9d24797190 MFC 1.139: Fix a snafu in the changes in 1.138.
PR:		kern/113387
Submitted by:	Andre Albsmeier
2007-06-08 16:51:20 +00:00
jhb
bb88c2f7b4 MFC: Various fixes to NFS DirectIO support. 2007-05-02 15:15:51 +00:00
jhb
ce0887f2c5 MFC: Don't hold the vnode interlock across a tsleep() in nfs_flush().
This was actually contained in the MPSAFE NFS client changes which is not
all being MFC'd, however, this fixes a bug in the previous fix to
nfs_flush().

Guidance from:	mohans
2007-04-25 20:53:50 +00:00
jhb
1e4c7522ee MFC:
- In nfs_flush(), clear the NMODIFIED bit only if there are no dirty
  buffers *and* there are no buffers queued up for writing.
- Keep track of the number of in-progress async direct IO writes in the
  nfsnode.  Make fsync/close wait until all of these drain.  Add a check to
  nfs_getpage() and nfs_putpage().
2007-04-25 20:47:01 +00:00
kris
857d13a365 MFC: Don't hard-code the nfs root socket as SOCK_DGRAM. This is currently
a NOP in 6.x but this may change if further code is merged from 7.0.
2007-03-11 19:44:52 +00:00
mohans
872eb2c14b MFC:
Backing out an earlier change. It seems harmless for NFS to miss the "force
unmount" flag, making the acquisition of the MNT_ILOCK in nfs_request() and
nfs_sigintr() unnecessary. Pointed out by tegge@.
2007-02-16 03:53:33 +00:00
jhb
cb722c46dc MFC: Do not set B_NOCACHE on buffers when releasing them in flushbuflist().
If B_NOCACHE is set the pages of vm backed buffers will be invalidated.
However clean buffers can be backed by dirty VM pages so invalidating them
can lead to data loss.
Add support for flush dirty page in the data invalidation function
of some network file systems.

This fixes data losses during vnode recycling (and other code paths
using invalbuf(*,V_SAVE,*,*)) for data written using an mmaped file.
2007-02-12 19:08:29 +00:00
mohans
efe37ef4d8 Add missing MNT_ILOCK around some mnt_kern_flag accesses. 2007-02-11 03:43:34 +00:00
mohans
f035099ad1 MFC:
Fixes up the handling of shared vnode lock lookups in the NFS client,
adds a FS type specific flag indicating that the FS supports shared
vnode lock lookups, adds some logic in vfs_lookup.c to test this flag
and set lock flags appropriately.

This change fixes the general problem of cascading vnode locks when an
NFS server goes down.

Ideally, we wouldn't need these changes, as enabling shared vnode lock
lookups globally would work. Unfortunately, UFS, for example isn't
ready for shared vnode lock lookups, crashing pretty quickly.

This change is the result of discussions with Stephan Uphoff (ups@).
Thanks to Kris for shaking out several bugs in NFS with shared vnode
lock lookups in current. MFC'ed per Kris' request.

Reviewed by:	ups@
2007-02-11 03:07:46 +00:00
mohans
77c4a0dce1 MFC: Fix for a vnode lock leak in nfs_create() in the event of an error.
Spotted by ups@.
2007-01-31 23:11:15 +00:00
mohans
7f2590b5d0 MFC 3 fixes from -current. All having to do with the case where the same
filehandle is looked up by 2 or more processes.
- Don't vrele() the losing vnode, as vfs_hash_insert() vput()'s it.
- Initialize mutexes on the losing nfsnode (as these get destroyed in the
  nfsnode reclaim path).
- Move the initialization of the filehandle to before the vfs_insert, to
  close some races which could result in multiple vnodes for the same
  filehandle being inserted into the hash.
2007-01-03 20:19:02 +00:00
sam
3a586a3803 MFC 1.67: honor nolockd flag in root mount options 2006-12-23 22:40:56 +00:00
mohans
7aaff56599 MFC :
Fix to readdir+ reply handling. When inserting an entry into the namecache,
initialize the nfsnode's ctime. Otherwise a subsequent lookup purges the
just entered namecache entry.
Approved by: re
2006-12-05 18:41:35 +00:00
bde
628d29a473 MFC (1.270: don't do null Setattr RPCs for VA_MARK_ATIME). 2006-11-23 09:50:18 +00:00
mohans
b4043fdc19 MFC: Make EWOULDBLOCK a recoverable error so that the request is
retransmitted. This bug results in data corruption. Writes are
silently dropped on EWOULDBLOCK (caused because socket send buffer is
full and sockbuf timer fires - with NFS/TCP).
Reviewed by: ups@
Approved by: re
2006-11-02 19:48:17 +00:00
tegge
690853d66d MFC: Use mount interlock to protect all changes to mnt_flag and
mnt_kern_flag. This eliminates a race where MNT_UPDATE flag could be
     lost when nmount() raced against sync(), sync_fsync() or quotactl().

Approved by:	re (kensmith)
2006-10-09 19:47:17 +00:00
mohans
64c133521a MFC change 1.138.
Fix for a NFS/TCP client bug which would cause the NFS/TCP stream to get
out of sync under heavy loads, forcing frequent reconnets, causing EBADRPC
errors etc.

Approved by: re
2006-10-01 05:03:18 +00:00
mohans
2f6828bb9f MFC:
Vnode locks are recursive and the NFS client support shared vnode locks.

Approved by: re
2006-09-13 19:25:44 +00:00
brooks
039ba93037 MFC: rev 1.185
Add a new kernel environment variable "boot.netif.mtu" which is used to
set the MTU prior to mounting root via NFS.  This is required if the
server supports a higher than default MTU because the client will not
see the responses otherwise.
2006-09-07 17:38:47 +00:00
kib
2f7d13c770 MFC rev. 1.267:
Always supply curthread as argument to nfs_asyncio and nfs_doio
in nfs_strategy. Otherwise, for some buffers, signals would be ignored
at the intr mounts.

Reviewed by:	mohan
Approved by:	pjd (mentor)
2006-08-07 12:33:25 +00:00
kib
883b286035 MFC rev. 1.142:
Signals may be delivered to process as well as to the thread. Check the
thread-delivered signals in addition to the process one.

Reviewed by:	mohan
Approved by:	pjd (mentor)
2006-08-07 12:32:10 +00:00
rwatson
9a0e4c7010 Merge nfs_nfsiod.c:1.89 from HEAD to RELENG_6:
Adjust minimum iod threads from 4 to 0 -- since we compile the NFS
  client into the kernel by default, and many users won't use NFS,
  don't start an extra 4 kernel threads that are unused.  Once NFS
  becomes active, it will start nfsiod's as it needs them.

  We might consider mandating a minimum iod's equal to the number of
  active NFS mounts (truncated to some value), which would force some
  to remain available without having to create a new one if the file
  system is mostly inactive.

  PR:             70880
  Prodded by:     cel
  Head nod:       peter
  Pointed out by: Joe <fbsd_user at a1poweruser dot com>
2006-06-08 22:57:07 +00:00
cel
f84994a70d NFS over TCP retransmit behavior should default to a 60 second time out,
mimicing the NFS reference implementation.

NFS over TCP does not need fast retransmit timeouts, since network loss
and congestion are managed by the transport (TCP), unlike with NFS over
UDP.  A long timeout prevents the unnecessary retransmission of non-
idempotent NFS requests.

Reviewed by:	mohans, silby, rees?
Sponsored by:	Network Appliance, Incorporated
2006-05-30 01:52:59 +00:00
cel
4ec879514b Refactor the NFS over UDP retransmit timeout estimation logic to allow
the estimator to be more easily tuned and maintained.

There should be no functional change except there is now a lower limit
on the retransmit timeout to prevent the client from retransmitting
faster than the server's disks can fill requests, and an upper limit
to prevent the estimator from taking too long to retransmit during a
server outage.

Reviewed by:	mohan, kris, silby
Sponsored by:	Network Appliance, Incorporated
2006-05-30 00:43:07 +00:00
delphij
dfb738e5a6 MFC src/sys/nfsclient/nfs_bio.c,v 1.154
and src/sys/nfsclient/nfs_vnops.c,v 1.262 (by ps@):

 - Always return success from NFS strategy. nfs_doio(), in the
   event of an error, does the right thing, in terms of setting
   the error flags in the buf header. That fixes a crash from
   bstrategy().
 - Treat ETIMEDOUT as a "recoverable" error, causing the buffer
   to be re-dirtied. ETIMEDOUT can occur on soft mounts, when
   the number of retries are exceeded, and we don't want data loss
   in that case.

Submitted by:   Mohan Srinivasan
Approved by:	re (scottl)
2006-04-18 05:31:58 +00:00
jon
73f7f6d707 MFC 1.261 - fix a crash when an nfsv2 mount fails
Approved by:	re
2006-04-18 05:18:47 +00:00
cel
8c36fd3864 If an NFS server returns more than a few EJUKEBOX errors for a given RPC
request, the FreeBSD NFS client will quickly back off to a excessively
long wait (days, then weeks) before retrying the request.

Change the behavior of the FreeBSD NFS client to match the behavior of
the reference NFS client implementation (Solaris).  This provides a fixed
delay of 10 seconds between each retry by default.  A sysctl, called
nfs3_jukebox_delay, is now available to tune the delay.  Unlike Solaris,
the sysctl value on FreeBSD is in seconds, rather than in HZ.

MFC revision 1.136 to RELENG_6

Sponsored by:   Network Appliance, Incorporated
Reviewed by:    rick
Approved by:    re (kensmith), silby
2006-04-02 04:11:23 +00:00
kris
c506d5366a MFC r1.137:
Fix a bug in the NFS/TCP retransmission path.

The bug was that earlier, if a request was retransmitted,
we would do subsequent retransmits every 10 msecs.

This can cause data corruption under moderate loads by reordering
operations as seen by the client NFS attribute cache, and on the
server side when the retransmission occurs after the original request
has left the duplicate cache, since the operation will be committed
for a second time.

Further work on retransmission handling is needed (e.g. they are still
being done sent too often since they are scaled by HZ, and the size of
the dup cache is too small and easily overwhelmed on busy servers).

Submitted by:   mohans
Approved by:	re (mux)
2006-03-31 07:13:09 +00:00
cel
cfec640a25 Fix a bug in NFSv3 READDIRPLUS reply processing
The client's READDIRPLUS logic skips the attributes and
filehandle of the ".." entry.  If the server doesn't send
attributes but does send a filehandle for "..", the
client's logic doesn't account for the extra "value
Fix a bug in NFSv3 READDIRPLUS reply processing

The client's READDIRPLUS logic skips the attributes and
filehandle of the ".." entry.  If the server doesn't send
attributes but does send a filehandle for "..", the
client's logic doesn't account for the extra "value
follows" field that indicates whether the filehandle is
present, causing the remaining entries in the reply
to be ignored.

This is an MFC of 1.264 in the CURRENT branch.

Sponsored by:   Network Appliance, Inc.
Reviewed by:    rick, mohans
Approved by:    re, silby
2006-03-29 18:11:32 +00:00
delphij
6b5f6d40b5 MFC 1.263: a typo fix (diff reduction against -HEAD)
Approved by:	re (hrs)
2006-03-24 04:48:42 +00:00
pjd
84853dde8e MFC: sys/nfsclient/nfs_diskless.c 1.15
I wanted 'nolockd' here instead of 'lockd'.

Approved by:	re (mux)
2006-03-20 15:45:14 +00:00
scottl
c5719df4a5 MFC: Call vfs_destroy_object() before v_data gets set to NULL.
Approved by: re
2006-03-12 21:50:02 +00:00
pjd
8d7bed0cec MFC: sys/nfsclient/nfs_diskless.c 1.12,1.13
Add boot.nfsroot.options loader tunable.
It allows to specify options for NFS root file system.
Currently supported options are: soft, intr, conn, lockd.

I'm adding this functionality mostly for 'lockd' option, which is only
honored when performing the initial mount and will be silently ignored
if used while updating the mount options.

This will allow to use flock(2) without the need of using varmfs or
rpc.lockd and friends.

Example of use:
boot.nfsroot.options="intr,lockd"

Approved by:	re (scottl)
2006-03-01 18:01:28 +00:00
yar
dbcb706f58 Work around the shortness of the size argument to
vnode_create_vobject() while preserving the binary ABI
to filesystem modules in RELENG_6: introduce a new function
vnode_create_vobject_off() that takes the size argument
as off_t; move all stock file systems to it; re-implement
the old vnode_create_vobject() using vnode_create_vobject_off()
so that old or binary-only FS modules can work w/o hitting the
bug.  The trick is to pass a size of 0 to vnode_create_vobject_off()
so that it will call VOP_GETATTR() and thus get the actual,
untruncated file size even if the calling module still uses
the old vnode_create_vobject().

PR:		kern/92243
Approved by:	re (scottl)
2006-02-20 00:53:15 +00:00
rees
24c9cc5118 MFC rev 1.135:
Don't log an error on tcp connection reset, even if we don't get ECONNRESET.

Submitted by:	cel@citi.umich.edu
Approved by:	re (scottl)
2006-02-16 02:39:52 +00:00
rwatson
f017c618f0 Merge nfs_lock.c:1.43 from HEAD to RELENG_6:
In nfs_dolock(), GC now under-used ioflg, rendered obsolete when we moved
  from using a fifo to talk to rpc.lockd to using a special device node.

Approved by:	re (scottl)
2006-02-14 00:06:32 +00:00
tegge
81ceadf72a MFC: Add marker vnodes to ensure that all vnodes associated with the mount
point are iterated over when using MNT_VNODE_FOREACH.
2006-01-14 01:18:03 +00:00
maxim
b08fccbd06 MFC rev. 1.134: fix for a bug where NFS/TCP would
not reconnect (in the case where the server FIN'ed).

PR:		kern/88833
Requested by:	Roman V. Palagin
Approved by:	Mohan Strinivasan
2005-12-15 18:10:37 +00:00
rees
94b8aef59a MFC: nfs_socket.c 1.132, nfs_subs.c 1.142, nfsm_subs.h 1.37
fix a problem with XID re-use when a server returns NFSERR_JUKEBOX.
2005-12-13 21:29:26 +00:00