Adjust minimum iod threads from 4 to 0 -- since we compile the NFS
client into the kernel by default, and many users won't use NFS,
don't start an extra 4 kernel threads that are unused. Once NFS
becomes active, it will start nfsiod's as it needs them.
We might consider mandating a minimum iod's equal to the number of
active NFS mounts (truncated to some value), which would force some
to remain available without having to create a new one if the file
system is mostly inactive.
PR: 70880
Prodded by: cel
Head nod: peter
Pointed out by: Joe <fbsd_user at a1poweruser dot com>
mimicing the NFS reference implementation.
NFS over TCP does not need fast retransmit timeouts, since network loss
and congestion are managed by the transport (TCP), unlike with NFS over
UDP. A long timeout prevents the unnecessary retransmission of non-
idempotent NFS requests.
Reviewed by: mohans, silby, rees?
Sponsored by: Network Appliance, Incorporated
the estimator to be more easily tuned and maintained.
There should be no functional change except there is now a lower limit
on the retransmit timeout to prevent the client from retransmitting
faster than the server's disks can fill requests, and an upper limit
to prevent the estimator from taking too long to retransmit during a
server outage.
Reviewed by: mohan, kris, silby
Sponsored by: Network Appliance, Incorporated
and src/sys/nfsclient/nfs_vnops.c,v 1.262 (by ps@):
- Always return success from NFS strategy. nfs_doio(), in the
event of an error, does the right thing, in terms of setting
the error flags in the buf header. That fixes a crash from
bstrategy().
- Treat ETIMEDOUT as a "recoverable" error, causing the buffer
to be re-dirtied. ETIMEDOUT can occur on soft mounts, when
the number of retries are exceeded, and we don't want data loss
in that case.
Submitted by: Mohan Srinivasan
Approved by: re (scottl)
request, the FreeBSD NFS client will quickly back off to a excessively
long wait (days, then weeks) before retrying the request.
Change the behavior of the FreeBSD NFS client to match the behavior of
the reference NFS client implementation (Solaris). This provides a fixed
delay of 10 seconds between each retry by default. A sysctl, called
nfs3_jukebox_delay, is now available to tune the delay. Unlike Solaris,
the sysctl value on FreeBSD is in seconds, rather than in HZ.
MFC revision 1.136 to RELENG_6
Sponsored by: Network Appliance, Incorporated
Reviewed by: rick
Approved by: re (kensmith), silby
Fix a bug in the NFS/TCP retransmission path.
The bug was that earlier, if a request was retransmitted,
we would do subsequent retransmits every 10 msecs.
This can cause data corruption under moderate loads by reordering
operations as seen by the client NFS attribute cache, and on the
server side when the retransmission occurs after the original request
has left the duplicate cache, since the operation will be committed
for a second time.
Further work on retransmission handling is needed (e.g. they are still
being done sent too often since they are scaled by HZ, and the size of
the dup cache is too small and easily overwhelmed on busy servers).
Submitted by: mohans
Approved by: re (mux)
The client's READDIRPLUS logic skips the attributes and
filehandle of the ".." entry. If the server doesn't send
attributes but does send a filehandle for "..", the
client's logic doesn't account for the extra "value
Fix a bug in NFSv3 READDIRPLUS reply processing
The client's READDIRPLUS logic skips the attributes and
filehandle of the ".." entry. If the server doesn't send
attributes but does send a filehandle for "..", the
client's logic doesn't account for the extra "value
follows" field that indicates whether the filehandle is
present, causing the remaining entries in the reply
to be ignored.
This is an MFC of 1.264 in the CURRENT branch.
Sponsored by: Network Appliance, Inc.
Reviewed by: rick, mohans
Approved by: re, silby
Add boot.nfsroot.options loader tunable.
It allows to specify options for NFS root file system.
Currently supported options are: soft, intr, conn, lockd.
I'm adding this functionality mostly for 'lockd' option, which is only
honored when performing the initial mount and will be silently ignored
if used while updating the mount options.
This will allow to use flock(2) without the need of using varmfs or
rpc.lockd and friends.
Example of use:
boot.nfsroot.options="intr,lockd"
Approved by: re (scottl)
vnode_create_vobject() while preserving the binary ABI
to filesystem modules in RELENG_6: introduce a new function
vnode_create_vobject_off() that takes the size argument
as off_t; move all stock file systems to it; re-implement
the old vnode_create_vobject() using vnode_create_vobject_off()
so that old or binary-only FS modules can work w/o hitting the
bug. The trick is to pass a size of 0 to vnode_create_vobject_off()
so that it will call VOP_GETATTR() and thus get the actual,
untruncated file size even if the calling module still uses
the old vnode_create_vobject().
PR: kern/92243
Approved by: re (scottl)
In nfs_dolock(), GC now under-used ioflg, rendered obsolete when we moved
from using a fifo to talk to rpc.lockd to using a special device node.
Approved by: re (scottl)
- Fix leak of struct nlminfo on process exit.
- Fix malloc type collision, that made the above problem
difficult to understand.
Reported by: Vladimir Sharun <sharun ukr.net>
Approved by: re (scottl)
| Fixes for NFS crashes on architectures that require strict alignment.
| - Fix nfsm_disct() so that after pulling up data, the remaining data
| is aligned if necessary.
| - Fix nfs_clnt_tcp_soupcall() to bcopy() the rpc length out of the
| mbuf (instead of casting m_data to a uint32).
|
| Submitted by: Pyun YongHyeon
| Reviewed by: Mohan Srinivasan
|
| Revision Changes Path
| 1.118 +12 -3 src/sys/nfs/nfs_common.c
| 1.38 +6 -0 src/sys/nfs/nfs_common.h
| 1.126 +2 -1 src/sys/nfsclient/nfs_socket.c
Approved by: re (scottl)
| In nfs_nget() if two threads race on the same filehandle, the loser
| should cause the nfsnode to get freed. This fixes a potential vnode
| (and nfsnode) leak in that path.
|
| Submitted by: Mohan Srinivasan
| Reviewed by: phk
|
| Revision Changes Path
| 1.78 +2 -1 src/sys/nfsclient/nfs_node.c
Approved by: re (scottl)
rpcclnt.c:1.14 from HEAD to RELENG_6:
Acquire Giant in uprintf() and tprintf() due to the non-MPSAFEty of
the tty code invoked from these functions. In two cases, during
timeout handling in NFS-related RPC client code, acquire Giant in
the caller before other mutexes the caller might hold, in order to
avoid lock order reversals with Giant (a recursive acquire is not
a reversal as it won't ever wait).
Correct age-old comments about uprintf()/tprintf() sleeping: they
will never sleep.
Much useful feedback from: bde
Approved by: re (scottl)
Fix for a NFS soft mounts bug where if the number of retries exceeds
the max rexmits, the request was not being bounced back with a
ETIMEDOUT error.
Approved by: re
pending discussion of how implementation would proceed. Applications
like -lc_r expect select(3) to match the EAGAIN-status of IO
functions.
Approved by: re
atomic write request, it can fill the buffer cache with the entirety
of that write in order to handle retries. However, it never drops
the vnode lock, or else it wouldn't be atomic, so it ends up waiting
indefinitely for more buf memory that cannot be gotten as it has it
all, and it waits in an uncancellable state.
To fix this, hibufspace is exported and scaled to a reasonable
fraction. This is used as the limit of how much of an atomic write
request by the NFS client will be handled asynchronously. If the
request is larger than this, it will be turned into a synchronous
request which won't deadlock the system. It's possible this value is
far off from what is required by some, so it shall be tunable as soon
as mount_nfs(8) learns of the new field.
The slowdown between an asynchronous and a synchronous write on NFS
appears to be on the order of 2x-4x.
General nod by: gad
MFC after: 2 weeks
More testing: wes
PR: kern/79208
re-sent instead of timing out.
don't log an error message on reconnection, which is not an error.
remove unused nfs_mrep_before_tsleep.
Reviewed by: Mohan Srinivasan
Approved by: alfred
as they have no connection with the expected MNT_* flags. This bug
was exposed 18 months ago when the assignments to f_flags in
vfs_syscalls.c were moved to before the VFS_STATFS() call. It was
fixed in the CSRG source 10 years ago, but we never picked up that
change.
PR: kern/80390
MFC after: 1 week
the MNT_RDONLY flag if the "ro" option was passed in from userland, and
clears it otherwise. In the diskless case, the MNT_RDONLY flag is already
set when this code is reached, but there are no mount options, so it was
incorrectly cleared. Change the logic so the MNT_RDONLY flag is set if the
"ro" option was specified, and left alone otherwise.
Note that the NFS code will still happily let you mount a filesystem RW
even if the server exports it RO. I'm not sure how to fix that.
- Network filesystems are written with a special idiom that checks the
cache first, and may even unlock dvp before discovering that a network
round-trip is required to resolve the name. I believe dvp is prevented
from being recycled even in the forced unmount case by the shared lock
on the mount point. If not, this code should grow checks for VI_DOOMED
after it relocks dvp or it will access NULL v_data fields.
Sponsored by: Isilon Systems, Inc.
these filesystems will support shared locks until they are explicitly
modified to do so. Careful review must be done to ensure that this
is safe for each individual filesystem.
Sponsored by: Isilon Systems, Inc.
non-maskable).
- The NFS client needs to guard against spurious wakeups
while waiting for the response. ltrace causes the process
under question to wakeup (possibly from ptrace()), which
causes NFS to wakeup from tsleep without the response being
delivered.
Submitted by: Mohan Srinivasan