Commit Graph

795 Commits

Author SHA1 Message Date
Mohan Srinivasan
88b94fba38 Make EWOULDBLOCK a recoverable error so that the request is retransmitted.
This bug results in data corruption with NFS/TCP. Writes are silently dropped
on EWOULDBLOCK (because socket send buffer is full and sockbuf timer fires).

Reviewed by: ups@
2006-10-31 20:25:37 +00:00
Bruce Evans
35259c2c89 Fixed some style bugs (especially ones involving long lines and use
of __P(())).  There are many more.
2006-10-17 22:07:07 +00:00
Bruce Evans
6a72ff6b09 Don't do null Setattr RPCs for VA_MARK_ATIME. When we added the
VA_MARK_ATIME feature to fix POSIX conformance fore execve() and mmap(),
we thought that it was optimized well enough for the one file system
that supports it (ffs) and harmless for other file systems (except
layered ones which already get the layering for VOP_SETATTR() wrong).
However, nfs_setattr() doesn't do much parameter checking, so when
it gets a combination of parameters that it doesn't understand, it
always does a Setattr RPC.  This RPC can't do anything good, and for
VA_MARK_ATIME it is null except for wasting a lot of time.

This is the smallest and easiest to fix of several bugs that have
increased the number of RPCs for kernel builds on nfs by more than
100% since 2004-11-05.  The real-time increase depends on network
latency and parallelization and can also be very large (approaching
the same percentage for unparallelized operations like "make depend"
on systems with fast CPUs and high-latency networks).
2006-10-14 07:25:11 +00:00
Poul-Henning Kamp
f645b0b51c First part of a little cleanup in the calendar/timezone/RTC handling.
Move relevant variables to <sys/clock.h> and fix #includes as necessary.

Use libkern's much more time- & spamce-efficient BCD routines.
2006-10-02 12:59:59 +00:00
Tor Egge
a1e363f256 Add mnt_noasync counter to better handle interleaved calls to nmount(),
sync() and sync_fsync() without losing MNT_ASYNC.  Add MNTK_ASYNC flag
which is set only when MNT_ASYNC is set and mnt_noasync is zero, and
check that flag instead of MNT_ASYNC before initiating async io.
2006-09-26 04:15:59 +00:00
Tor Egge
5da56ddb21 Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag.
This eliminates a race where MNT_UPDATE flag could be lost when nmount()
raced against sync(), sync_fsync() or quotactl().
2006-09-26 04:12:49 +00:00
Mohan Srinivasan
7d7d9e2242 Fixes up the handling of shared vnode lock lookups in the NFS client,
adds a FS type specific flag indicating that the FS supports shared
vnode lock lookups, adds some logic in vfs_lookup.c to test this flag
and set lock flags appropriately.

- amd on 6.x is a non-starter (without this change). Using amd under
  heavy load results in a deadlock (with cascading vnode locks all the
  way to the root) very quickly.
- This change should also fix the more general problem of cascading
  vnode deadlocks when an NFS server goes down.

Ideally, we wouldn't need these changes, as enabling shared vnode lock
lookups globally would work. Unfortunately, UFS, for example isn't
ready for shared vnode lock lookups, crashing pretty quickly.

This change is the result of discussions with Stephan Uphoff (ups@).

Reviewed by:	ups@
2006-09-13 18:39:09 +00:00
Mohan Srinivasan
6cd7078919 Fix for a deadlock triggered by a 'umount -f' causing a NFS request to never
retransmit (or return). Thanks to John Baldwin for helping nail this one.

Found by : Kris Kennaway
2006-08-29 22:00:12 +00:00
Thomas Quinot
3401780fa0 Fix typos in comment. 2006-08-16 23:53:05 +00:00
Alan Cox
5786be7cc7 Introduce a field to struct vm_page for storing flags that are
synchronized by the lock on the object containing the page.

Transition PG_WANTED and PG_SWAPINPROG to use the new field,
eliminating the need for holding the page queues lock when setting
or clearing these flags.  Rename PG_WANTED and PG_SWAPINPROG to
VPO_WANTED and VPO_SWAPINPROG, respectively.

Eliminate the assertion that the page queues lock is held in
vm_page_io_finish().

Eliminate the acquisition and release of the page queues lock
around calls to vm_page_io_finish() in kern_sendfile() and
vfs_unbusy_pages().
2006-08-09 17:43:27 +00:00
Brooks Davis
a36aa44a85 Add a new kernel environment variable "boot.netif.mtu" which is used to
set the MTU prior to mounting root via NFS.  This is required if the
server supports a higher than default MTU because the client will not
see the responses otherwise.

MFC after:	3 weeks
2006-08-09 01:56:17 +00:00
Robert Watson
b0668f7151 soreceive_generic(), and sopoll_generic(). Add new functions sosend(),
soreceive(), and sopoll(), which are wrappers for pru_sosend,
pru_soreceive, and pru_sopoll, and are now used univerally by socket
consumers rather than either directly invoking the old so*() functions
or directly invoking the protocol switch method (about an even split
prior to this commit).

This completes an architectural change that was begun in 1996 to permit
protocols to provide substitute implementations, as now used by UDP.
Consumers now uniformly invoke sosend(), soreceive(), and sopoll() to
perform these operations on sockets -- in particular, distributed file
systems and socket system calls.

Architectural head nod:	sam, gnn, wollman
2006-07-24 15:20:08 +00:00
Konstantin Belousov
c915bcbad2 Signals may be delivered to process as well as to the thread. Check the
thread-delivered signals in addition to the process one.

Reviewed by:	mohan
MFC after:	1 month
Approved by:	kan (mentor)
2006-07-08 15:39:11 +00:00
Konstantin Belousov
201599c3af Always supply curthread as argument to nfs_asyncio and nfs_doio
in nfs_strategy. Otherwise, for some buffers, signals would be ignored
at the intr mounts.

Reviewed by:	mohan
MFC after:	1 month
Approved by:	kan (mentor)
2006-07-08 15:36:51 +00:00
Yaroslav Tykhiy
4b97d7affd There is a consensus that ifaddr.ifa_addr should never be NULL,
except in places dealing with ifaddr creation or destruction; and
in such special places incomplete ifaddrs should never be linked
to system-wide data structures.  Therefore we can eliminate all the
superfluous checks for "ifa->ifa_addr != NULL" and get ready
to the system crashing honestly instead of masking possible bugs.

Suggested by:	glebius, jhb, ru
2006-06-29 19:22:05 +00:00
Yaroslav Tykhiy
576cdf4352 Use the elegant TAILQ_FOREACH() in place of a hand-rolled for() loop. 2006-06-29 15:37:39 +00:00
Mohan Srinivasan
64c3892747 Kris Kennaway found that for '/' NFS mounts, the MPSAFE mount flag was
not being set, which means Giant would be acquired for these mounts.
2006-05-30 20:32:44 +00:00
Mohan Srinivasan
1af6f471ca Fix for a potential attempt to sleep while holding nm_mtx. Caught and reported
by Witness (which forces the mbuf allocation flag to M_NOWAIT).

Reported by: "sekes".
2006-05-26 18:45:55 +00:00
Stephan Uphoff
6c1b7d16c2 Call vm_object_page_clean() with the object lock held.
Submitted by:	kensmith@
Reviewed by:	mohans@
MFC after:	6 days
2006-05-25 17:16:11 +00:00
Stephan Uphoff
dcf67e65d2 Do not set B_NOCACHE on buffers when releasing them in flushbuflist().
If B_NOCACHE is set the pages of vm backed buffers will be invalidated.
However clean buffers can be backed by dirty VM pages so invalidating them
can lead to data loss.
Add support for flush dirty page in the data invalidation function
of some network file systems.

This fixes data losses during vnode recycling (and other code paths
using invalbuf(*,V_SAVE,*,*)) for data written using an mmaped file.

Collaborative effort by: jhb@,mohans@,peter@,ps@,ups@
Reviewed by:	tegge@
MFC after:	7 days
2006-05-25 01:00:35 +00:00
Mohan Srinivasan
5bbfbd1422 Since NFSv4 is not SMP safe, nfsiod needs to acquire Giant for NFSv4 mounts
before doing the read/write.

Reported by:	Chuck Lever.
2006-05-24 23:06:50 +00:00
Robert Watson
33c6a485bd Adjust minimum iod threads from 4 to 0 -- since we compile the NFS
client into the kernel by default, and many users won't use NFS,
don't start an extra 4 kernel threads that are unused.  Once NFS
becomes active, it will start nfsiod's as it needs them.

We might consider mandating a minimum iod's equal to the number of
active NFS mounts (truncated to some value), which would force some
to remain available without having to create a new one if the file
system is mostly inactive.

PR:		70880
MFC after:	2 weeks
Prodded by:	cel
Head nod:	peter
Pointed out by:	Joe <fbsd_user at a1poweruser dot com>
2006-05-24 21:04:46 +00:00
Chuck Lever
6d0699a5ba NFS over TCP retransmit behavior should default to a 60 second time out,
mimicing the NFS reference implementation.

NFS over TCP does not need fast retransmit timeouts, since network loss
and congestion are managed by the transport (TCP), unlike with NFS over
UDP.  A long timeout prevents the unnecessary retransmission of non-
idempotent NFS requests.

Reviewed by:	mohans, silby, rees?
Sponsored by:	Network Appliance, Incorporated
2006-05-23 18:48:07 +00:00
Chuck Lever
94163ea283 Refactor the NFS over UDP retransmit timeout estimation logic to allow
the estimator to be more easily tuned and maintained.

There should be no functional change except there is now a lower limit
on the retransmit timeout to prevent the client from retransmitting
faster than the server's disks can fill requests, and an upper limit
to prevent the estimator from taking to long to retransmit during a
server outage.

Reviewed by:	mohan, kris, silby
Sponsored by:	Network Appliance, Incorporated
2006-05-23 18:33:58 +00:00
Mohan Srinivasan
f2c48228fe Vnode locks are recursive and the NFS client support shared vnode locks.
Found by: Kris Kennaway.
2006-05-23 16:07:23 +00:00
Mohan Srinivasan
f1cdf89911 Changes to make the NFS client MP safe.
Thanks to Kris Kennaway for testing and sending lots of bugs my way.
2006-05-19 00:04:24 +00:00
Mohan Srinivasan
671d06fb2e Fix a snafu caused while patching the previous fix from another branch. 2006-05-05 18:12:13 +00:00
Mohan Srinivasan
9f5b7dea42 Fix for a NFS/TCP client bug which would cause the NFS/TCP stream to get
out of sync under heavy loads, forcing frequent reconnets, causing EBADRPC
errors etc.
2006-05-05 18:04:53 +00:00
Mohan Srinivasan
5ef7d50da5 Keep track of the number of in-progress async direct IO writes in the nfsnode.
Make fsync/close wait until all of these drain. Add a check to nfs_getpage() and
nfs_putpage().
2006-04-06 01:20:30 +00:00
Jeff Roberson
b2282f9a3f - Busy the filesystem in nfs_statfs to prevent us from creating a new
vnode after vflush() has succeeded.  This would cause a dangling vnode
   panic at unmount time otherwise.  Other filesystems may have this problem
   via their VFS_VGET() routines.

Found by:	kris
Sponsored by:	Isilon Systems, Inc.
2006-04-01 01:15:23 +00:00
Kris Kennaway
78e31796c9 Fix a bug in the NFS/TCP retransmission path.
The bug was that earlier, if a request was retransmitted,
we would do subsequent retransmits every 10 msecs.

This can cause data corruption under moderate loads by reordering
operations as seen by the client NFS attribute cache, and on the
server side when the retransmission occurs after the original request
has left the duplicate cache, since the operation will be committed
for a second time.

Further work on retransmission handling is needed (e.g. they are still
being done sent too often since they are scaled by HZ, and the size of
the dup cache is too small and easily overwhelmed on busy servers).

Submitted by:	mohans
2006-03-23 22:58:42 +00:00
Pawel Jakub Dawidek
9972deb772 Actually I wanted 'nolockd' here instead of 'lockd'.
MFC after:	2 days
2006-03-19 13:27:37 +00:00
Chuck Lever
a59b03bf0e If an NFS server returns more than a few EJUKEBOX errors for a given RPC
request, the FreeBSD NFS client will quickly back off to a excessively
long wait (days, then weeks) before retrying the request.

Change the behavior of the FreeBSD NFS client to match the behavior of
the reference NFS client implementation (Solaris).  This provides a fixed
delay of 10 seconds between each retry by default.  A sysctl, called
nfs3_jukebox_delay, is now available to tune the delay.  Unlike Solaris,
the sysctl value on FreeBSD is in seconds, rather than in HZ.

Sponsored by:	Network Appliance, Incorporated
Reviewed by:	rick
Approved by:	silby
MFC after:	3 days
2006-03-17 22:14:23 +00:00
Chuck Lever
9f5349f23d Fix a bug in NFSv3 READDIRPLUS reply processing
The client's READDIRPLUS logic skips the attributes and
filehandle of the ".." entry.  If the server doesn't send
attributes but does send a filehandle for "..", the
client's logic doesn't account for the extra "value
follows" field that indicates whether the filehandle is
present, causing the remaining entries in the reply
to be ignored.

Sponsored by:	Network Appliance, Inc.
Reviewed by:	rick, mohans
Approved by:	silby
MFC after:	2 weeks
2006-03-08 01:43:01 +00:00
Jim Rees
4b81d0eb0f Don't log an error on tcp connection reset, even if we don't get ECONNRESET.
Submitted by:	cel@citi.umich.edu
2006-01-20 15:07:18 +00:00
Alfred Perlstein
92e73f5711 I ran into an nfs client panic a couple of times in a row over the
last few days.  I tracked it down to the fact that nfs_reclaim()
is setting vp->v_data to NULL _before_ calling vnode_destroy_object().
After silence from the mailing list I checked further and discovered
that ufs_reclaim() is unique among FreeBSD filesystems for calling
vnode_destroy_object() early, long before tossing v_data or much
of anything else, for that matter.  The rest, including NFS, appear
to be identical, as if they were just clones of one original routine.

The enclosed patch fixes all file systems in essentially the same
way, by moving the call to vnode_destroy_object() to early in the
routine (before the call to vfs_hash_remove(), if any).  I have
only tested NFS, but I've now run for over eighteen hours with the
patch where I wouldn't get past four or five without it.

Submitted by: Frank Mayhar
Requested by: Mohan Srinivasan
MFC After: 1 week
2006-01-17 17:29:03 +00:00
Robert Watson
63074a901a In nfs_dolock(), GC now under-used ioflg, rendered obsolete when we moved
from using a fifo to talk to rpc.lockd to using a special device node.

Noticed by:	Coverity Prevent analysis tool
MFC after:	3 days
2006-01-13 23:16:29 +00:00
Tor Egge
82be0a5a24 Add marker vnodes to ensure that all vnodes associated with the mount point are
iterated over when using MNT_VNODE_FOREACH.

Reviewed by:	truckman
2006-01-09 20:42:19 +00:00
Xin LI
fc9fac4c78 Correct a typo 2005-12-28 10:03:48 +00:00
Paul Saab
fc6ff223c4 Improve upon rev 1.133 where NFS/TCP would not reconnect.
Submitted by:	Mohan Srinivasan
2005-12-12 23:18:05 +00:00
Ruslan Ermilov
2f1b461447 Unexpand LLADDR(). 2005-11-29 09:51:47 +00:00
Paul Saab
38b29f71ef Fix for a bug where NFS/TCP would not reconnect (in the case where
the server FIN'ed). Seen with Solaris NFS servers.

Reported by:	TOMITA Yoshinori <yoshint@flab.fujitsu.co.jp>
Submitted by:	Mohan Strinivasan
2005-11-21 19:25:24 +00:00
Paul Saab
3834aac17e - Always return success from NFS strategy. nfs_doio(), in the
event of an error, does the right thing, in terms of setting
  the error flags in the buf header. That fixes a crash from
  bstrategy().
- Treat ETIMEDOUT as a "recoverable" error, causing the buffer
  to be re-dirtied. ETIMEDOUT can occur on soft mounts, when
  the number of retries are exceeded, and we don't want data loss
  in that case.

Submitted by:	Mohan Srinivasan
2005-11-21 19:23:46 +00:00
Jim Rees
cb156cc603 fix a problem with XID re-use when a server returns NFSERR_JUKEBOX.
Submitted by:	cel@citi.umich.edu
Fixed by:	rick@snowhite.cis.uoguelph.ca
Approved by:	alfred
MFC after:	3 weeks
2005-11-21 18:39:18 +00:00
Jonathan Chen
0b3e7451da fix a crash when an nfsv2 mount fails
MFC after:	1 week
2005-11-10 23:25:16 +00:00
Paul Saab
9c31df40bb Fix for a crash (from nfs_lookup() in an error case).
Submitted by:	Mohan Srinivasan
2005-11-03 19:24:54 +00:00
Paul Saab
41ce2892bb In nfs_flush(), clear the NMODIFIED bit only if there are no dirty
buffers *and* there are no buffers queued up for writing.  The bug
was that NMODIFIED was being cleared even while there were buffers
scheduled to be written out, which leads to all sorts of interesting
bugs - one where the file could shrink (because of a post-op getattr
load, say) causing data in buffer(s) queued for write to be tossed,
resulting in data corruption.

Submitted by:	Mohan Srinivasan
2005-11-03 07:42:15 +00:00
Paul Saab
120c58288c Fix for a race between the thread transmitting the request and the
thread processing the reply.

Submitted by:	Mohan Srinivasan
2005-11-03 07:31:06 +00:00
Robert Watson
5bb84bc84b Normalize a significant number of kernel malloc type names:
- Prefer '_' to ' ', as it results in more easily parsed results in
  memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
  as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
  memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
  attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion.  Similar changes are required for UMA zone names.
2005-10-31 15:41:29 +00:00
Gleb Smirnoff
c0bc2867c1 - Fix leak of struct nlminfo on process exit.
- Fix malloc type collision, that made the above problem
  difficult to understand.

Reported by:	Vladimir Sharun <sharun ukr.net>
2005-10-26 07:18:37 +00:00