previously conditionally acquired Giant based on debug.mpsafenet. As that
has now been removed, they are no longer required. Removing them
significantly simplifies error-handling in the socket layer, eliminated
quite a bit of unwinding of locking in error cases.
While here clean up the now unneeded opt_net.h, which previously was used
for the NET_WITH_GIANT kernel option. Clean up some related gotos for
consistency.
Reviewed by: bz, csjp
Tested by: kris
Approved by: re (kensmith)
lock and unlock conditionally, not just set the flag on it conditionally.
In practice, this bug couldn't manifest, as in the current revision of
the code, no callers pass a NULL rep.
CID: 1416
Found with: Coverity Prevent(tm)
GETATTRs being generated - one from lookup()/namei() and the other
from nfs_open() (for cto consistency). This change eliminates the
GETATTR in nfs_open() if an otw GETATTR was done from the namei()
path. Instead of extending the vop interface, we timestamp each attr
load, and use this to detect whether a GETATTR was done from namei()
for this syscall. Introduces a thread-local variable that counts the
syscalls made by the thread and uses <pid, tid, thread syscalls> as
the attrload timestamp. Thanks to jhb@ and peter@ for a discussion on
thread state that could be used as the timestamp with minimal overhead.
This is easy to reproduce for EROFS. I am not sure if the attrs can be corrupt
for other NFS error responses. For now, disabling wcc pre-op attr checks and
post-op attr loads on NFS errors (sysctl'ed).
Reported by: Kris Kennaway
have to explicitly acquire Giant (although they need to be aware of this and
not hold any locks at that point). Remove the acquisitions of Giant in the
NFS client wrapping tprintf().
2) Reduce the acquisitions of the Giant lock in the nfs_socket.c paths significantly.
- We don't need to acquire Giant before tsleeping on lbolt anymore,
since jhb specialcased lbolt handling in msleep.
- nfs_up() needs to acquire Giant only if printing the "server up"
message.
- nfs_timer() held Giant for the duration of the NFS timer processing,
just because the printing of the message in nfs_down() needed it
(and we acquire other locks in nfs_timer()). The acquisition of
Giant is moved down into nfs_down() now, reducing the time Giant is
held in that path.
Reported by: Kris Kennaway
This bug results in data corruption with NFS/TCP. Writes are silently dropped
on EWOULDBLOCK (because socket send buffer is full and sockbuf timer fires).
Reviewed by: ups@
soreceive(), and sopoll(), which are wrappers for pru_sosend,
pru_soreceive, and pru_sopoll, and are now used univerally by socket
consumers rather than either directly invoking the old so*() functions
or directly invoking the protocol switch method (about an even split
prior to this commit).
This completes an architectural change that was begun in 1996 to permit
protocols to provide substitute implementations, as now used by UDP.
Consumers now uniformly invoke sosend(), soreceive(), and sopoll() to
perform these operations on sockets -- in particular, distributed file
systems and socket system calls.
Architectural head nod: sam, gnn, wollman
the estimator to be more easily tuned and maintained.
There should be no functional change except there is now a lower limit
on the retransmit timeout to prevent the client from retransmitting
faster than the server's disks can fill requests, and an upper limit
to prevent the estimator from taking to long to retransmit during a
server outage.
Reviewed by: mohan, kris, silby
Sponsored by: Network Appliance, Incorporated
The bug was that earlier, if a request was retransmitted,
we would do subsequent retransmits every 10 msecs.
This can cause data corruption under moderate loads by reordering
operations as seen by the client NFS attribute cache, and on the
server side when the retransmission occurs after the original request
has left the duplicate cache, since the operation will be committed
for a second time.
Further work on retransmission handling is needed (e.g. they are still
being done sent too often since they are scaled by HZ, and the size of
the dup cache is too small and easily overwhelmed on busy servers).
Submitted by: mohans
request, the FreeBSD NFS client will quickly back off to a excessively
long wait (days, then weeks) before retrying the request.
Change the behavior of the FreeBSD NFS client to match the behavior of
the reference NFS client implementation (Solaris). This provides a fixed
delay of 10 seconds between each retry by default. A sysctl, called
nfs3_jukebox_delay, is now available to tune the delay. Unlike Solaris,
the sysctl value on FreeBSD is in seconds, rather than in HZ.
Sponsored by: Network Appliance, Incorporated
Reviewed by: rick
Approved by: silby
MFC after: 3 days
as they both interact with the tty code (!MPSAFE) and may sleep if the
tty buffer is full (per comment).
Modify all consumers of uprintf() and tprintf() to hold Giant around
calls into these functions. In most cases, this means adding an
acquisition of Giant immediately around the function. In some cases
(nfs_timer()), it means acquiring Giant higher up in the callout.
With these changes, UFS no longer panics on SMP when either blocks are
exhausted or inodes are exhausted under load due to races in the tty
code when running without Giant.
NB: Some reduction in calls to uprintf() in the svr4 code is probably
desirable.
NB: In the case of nfs_timer(), calling uprintf() while holding a mutex,
or even in a callout at all, is a bad idea, and will generate warnings
and potential upset. This needs to be fixed, but was a problem before
this change.
NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having
non-MPSAFE tty code.
MFC after: 1 week
- Fix nfsm_disct() so that after pulling up data, the remaining data
is aligned if necessary.
- Fix nfs_clnt_tcp_soupcall() to bcopy() the rpc length out of the
mbuf (instead of casting m_data to a uint32).
Submitted by: Pyun YongHyeon
Reviewed by: Mohan Srinivasan
re-sent instead of timing out.
don't log an error message on reconnection, which is not an error.
remove unused nfs_mrep_before_tsleep.
Reviewed by: Mohan Srinivasan
Approved by: alfred
non-maskable).
- The NFS client needs to guard against spurious wakeups
while waiting for the response. ltrace causes the process
under question to wakeup (possibly from ptrace()), which
causes NFS to wakeup from tsleep without the response being
delivered.
Submitted by: Mohan Srinivasan
and if the client (erroneously) reads the RPC length as 0 bytes, the
client can loop around in the socket callback. Explicitly check for
the length being 0 case and teardown/re-connect.
Submitted by: Mohan Srinivasan
upcalls which do RPC header parsing and match up the reply with the
request. NFS calls now sleep on the nfsreq structure. This enables
us to eliminate the NFS recvlock.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
send routine. In IPv6 UDP, the thread will be passed to suser(), which
asserts that if a thread is used for a super user check, it be
curthread. Many of these protocol entry points probably need to
accept credentials instead of threads.
MT5 candidate.
Noticed/tested by: kuriyama
a better name. I have a kern_[sg]etsockopt which I plan to commit
shortly, but the arguments to these function will be quite different
from so_setsockopt.
Approved by: alfred
Rebind the client socket when we experience a timeout. This fixes
the case where our IP changes for some reason.
Signal a VFS event when NFS transitions from up to down and vice
versa.
Add a placeholder vfs_sysctl where we will put status reporting
shortly.
Also:
Make down NFS mounts return EIO instead of EINTR when there is a
soft timeout or force unmount in progress.
are supposed to continue firing as long as there is work to do, not
stop after the first invocation.
This is damage control after a patch that has been committed prematurely.
Tested by: kris