freebsd-dev

Author	SHA1	Message	Date
Alfred Perlstein	77465d9390	Get rid of qaddr_t. Requested by: bde	2007-10-16 10:54:55 +00:00
Mohan Srinivasan	faf529dce5	NFS MP scaling changes. - Eliminate the hideous nfs_sndlock that serialized NFS/TCP request senders thru the sndlock. - Institute a new nfs_connectlock that serializes NFS/TCP reconnects. Add logic to wait for pending request senders to finish sending before reconnecting. Dial down the sb_timeo for NFS/TCP sockets to 1 sec. - Break out the nfs xid manipulation under a new nfs xid lock, rather than over loading the nfs request lock for this purpose. - Fix some of the locking in nfs_request. Many thanks to Kris Kennaway for his help with this and for initiating the MP scaling analysis and work. Kris also tested this patch thorougly. Approved by: re@ (Ken Smith)	2007-10-12 19:12:21 +00:00
Mohan Srinivasan	17c53e4a28	Fix for a very rare race, caused by the nfsiod wakeup and nfsiod idle timeout occurring at exactly the same time. If this happens, the nfsiod exits although there may be a queued async IO request for it. Found by : Kris Kennaway Approved by: re	2007-09-25 21:08:49 +00:00
Robert Watson	0bf686c125	Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which previously conditionally acquired Giant based on debug.mpsafenet. As that has now been removed, they are no longer required. Removing them significantly simplifies error-handling in the socket layer, eliminated quite a bit of unwinding of locking in error cases. While here clean up the now unneeded opt_net.h, which previously was used for the NET_WITH_GIANT kernel option. Clean up some related gotos for consistency. Reviewed by: bz, csjp Tested by: kris Approved by: re (kensmith)	2007-08-06 14:26:03 +00:00
John Baldwin	f4b65ca5d0	Fix for a race where out of order loading of NFS attrs into the nfsnode could lead to attrs being stale. One example (that we ran into) was a READDIR+, WRITE. The responses came back in order, but the attrs from the WRITE were loaded before the attrs from the READDIR+, leading to the wrong size from being read on the next stat() call. MFC after: 1 week Submitted by: mohans Approved by: re (kensmith)	2007-07-03 18:31:47 +00:00
John Baldwin	03e557fd5a	Fix up NFS client write error handling. Errors are split into recoverable and unrecoverable. For the former, we redirty the buffer and hang onto it for future retries. For the latter (eg. ESTALE), we discard the buffer and return the error back to the user on the next syscall. This fixes a number of vfs panics and fixes having a large number of dirty buffers (that cannot be written out and reclaimed) from hanging around. Thanks to ups@ for discussions on this issue. Reported by: kris, Kai, others Approved by: re (kensmith)	2007-07-03 18:30:55 +00:00
Attilio Rao	b4b7081961	Do proper "locking" for missing vmmeters part. Now, we assume no more sched_lock protection for some of them and use the distribuited loads method for vmmeter (distribuited through CPUs). Reviewed by: alc, bde Approved by: jeff (mentor)	2007-06-04 21:45:18 +00:00
Jeff Roberson	1c4bcd050a	- Move rusage from being per-process in struct pstats to per-thread in td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)	2007-06-01 01:12:45 +00:00
Attilio Rao	2feb50bf7d	Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)	2007-05-31 22:52:15 +00:00
Robert Watson	5d0dd109f4	In nfs_down(), if rep can be NULL, which we test for, then we should lock and unlock conditionally, not just set the flag on it conditionally. In practice, this bug couldn't manifest, as in the current revision of the code, no callers pass a NULL rep. CID: 1416 Found with: Coverity Prevent(tm)	2007-05-18 19:34:54 +00:00
Jeff Roberson	222d01951f	- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>	2007-05-18 07:10:50 +00:00
John Baldwin	a1054d5776	Various fixes to the NFS Directio support. - Fix for a bug where a close would not wait for all (directio) dirty buffers to drain. The nfsnode was not marked NMODIFIED when there were directio dirtied buffers pending, causing this. - No reason to vhold/vrele the vp when enqueueing DirectIO requests for the nfsiods. The vnode can't really go way since the close has to wait for these requests to drain. MFC after: 1 week Submitted by: mohans	2007-04-25 20:34:55 +00:00
Robert Watson	dc4725135d	Attempt to rationalize NFS privileges: - Replace PRIV_NFSD with PRIV_NFS_DAEMON, add PRIV_NFS_LOCKD. - Use PRIV_NFS_DAEMON in the NFS server. - In the NFS client, move the privilege check from nfslockdans(), which occurs every time a write is performed on /dev/nfslock, and instead do it in nfslock_open() just once. This allows us to avoid checking the saved uid for root, and just use the effective on open. Use PRIV_NFS_LOCKD.	2007-04-21 18:11:19 +00:00
Xin LI	1247688a3e	Don't destroy a mutex just before we use it, instead, destroy it after we have used it.	2007-03-23 08:52:36 +00:00
Tor Egge	61b9d89ff0	Make insmntque() externally visibile and allow it to fail (e.g. during late stages of unmount). On failure, the vnode is recycled. Add insmntque1(), to allow for file system specific cleanup when recycling vnode on failure. Change getnewvnode() to no longer call insmntque(). Previously, embryonic vnodes were put onto the list of vnode belonging to a file system, which is unsafe for a file system marked MPSAFE. Change vfs_hash_insert() to no longer lock the vnode. The caller now has that responsibility. Change most file systems to lock the vnode and call insmntque() or insmntque1() after a new vnode has been sufficiently setup. Handle failed insmntque*() calls by propagating errors to callers, possibly after some file system specific cleanup. Approved by: re (kensmith) Reviewed by: kib In collaboration with: kib	2007-03-13 01:50:27 +00:00
Mohan Srinivasan	d9915117c9	Back out a chance to nfs_timer() that inadvertantly crept in the last checkin :(	2007-03-09 04:07:54 +00:00
Mohan Srinivasan	f9bb753844	Over NFS, an open() call could result in multiple over-the-wire GETATTRs being generated - one from lookup()/namei() and the other from nfs_open() (for cto consistency). This change eliminates the GETATTR in nfs_open() if an otw GETATTR was done from the namei() path. Instead of extending the vop interface, we timestamp each attr load, and use this to detect whether a GETATTR was done from namei() for this syscall. Introduces a thread-local variable that counts the syscalls made by the thread and uses <pid, tid, thread syscalls> as the attrload timestamp. Thanks to jhb@ and peter@ for a discussion on thread state that could be used as the timestamp with minimal overhead.	2007-03-09 04:02:38 +00:00
John Baldwin	4d70511ac3	Use pause() rather than tsleep() on stack variables and function pointers.	2007-02-27 17:23:29 +00:00
Mohan Srinivasan	0973754e14	Backing out an earlier change. It seems harmless for NFS to miss the "force unmount" flag, making the acquisition of the MNT_ILOCK in nfs_request() and nfs_sigintr() unnecessary. Pointed out by tegge@.	2007-02-16 03:46:55 +00:00
Mohan Srinivasan	024465d002	Add missing MNT_ILOCK around some mnt_kern_flag accesses.	2007-02-11 04:01:10 +00:00
Mohan Srinivasan	4e99994cc9	Fix for a vnode lock leak in nfs_create() in the event of an error. Spotted by ups@.	2007-01-31 23:10:27 +00:00
Kris Kennaway	410355bf69	Instead of always hard-coding the socket type for the nfs root mount as SOCK_DGRAM (i.e. UDP), respect the value configured earlier. This allows TCP NFS root mounts using e.g. the boot.nfsroot.options="tcp" tunable. In this case some of the connection parameters like the retry timer were previously set appropriately for TCP but inappropriately for the UDP socket that was actually used, leading to e.g. extremely long recovery times (O(hours)) after a nfs server reboot. Reviewed by: mohans MFC After: 2 weeks	2007-01-30 00:26:04 +00:00
Bruce Evans	e43982a801	Unstaticize nfs_iosize() in nfsclient and use it in nfs4client instead of duplicating it except for larger style bugs in the copy. Fix some nearby style bugs (including a harmless type mismatch) in and near the remaining copy. This is part of fixing collisions of the 2 nfs*client's names. Even static names should have a unique prefixes so that they can be debugged easily.	2007-01-25 13:07:25 +00:00
Konstantin Belousov	2cc7d26f7f	Cylinder group bitmaps and blocks containing inode for a snapshot file are after snaplock, while other ffs device buffers are before snaplock in global lock order. By itself, this could cause deadlock when bdwrite() tries to flush dirty buffers on snapshotted ffs. If, during the flush, COW activity for snapshot needs to allocate block and ffs_alloccg() selects the cylinder group that is being written by bdwrite(), then kernel would panic due to recursive buffer lock acquision. Avoid dealing with buffers in bdwrite() that are from other side of snaplock divisor in the lock order then the buffer being written. Add new BOP, bop_bdwrite(), to do dirty buffer flushing for same vnode in the bdwrite(). Default implementation, bufbdflush(), refactors the code from bdwrite(). For ffs device buffers, specialized implementation is used. Reviewed by: tegge, jeff, Russell Cattelan (cattelan xfs org, xfs changes) Tested by: Peter Holm X-MFC after: 3 weeks (if ever: it changes ABI)	2007-01-23 10:01:19 +00:00
Mohan Srinivasan	7f3a6e42c9	NetApp filers return corrupt post op attrs in the wcc on NFS error responses. This is easy to reproduce for EROFS. I am not sure if the attrs can be corrupt for other NFS error responses. For now, disabling wcc pre-op attr checks and post-op attr loads on NFS errors (sysctl'ed). Reported by: Kris Kennaway	2006-12-11 19:54:25 +00:00
Sam Leffler	49d5157434	consolidate parsing of nfs root mount options in one place and handle all options (some may require fixes elsewhere) Reviewed by: jhb, mohans MFC after: 1 month	2006-12-06 02:15:25 +00:00
Mohan Srinivasan	594ece53bc	In nfs_nget(), we must initialize the fh in the nfsnode before inserting the vnode into the vfs hash. Otherwise, another thread walking the hash can trip on an nfsnode with an uninitialized or partially initialized fh. Thanks to ups@ for spotting this race.	2006-11-29 02:21:40 +00:00
Mohan Srinivasan	d4875805d7	bde@ pointed out that tprintf() acquires Giant so callers of tprintf() don't have to explicitly acquire Giant (although they need to be aware of this and not hold any locks at that point). Remove the acquisitions of Giant in the NFS client wrapping tprintf().	2006-11-27 23:26:06 +00:00
Mohan Srinivasan	88d5725c38	Fix for a bug caused by a race when 2 threads lookup the same file. Leave the loser's lock(s) initialized, so the reclaim logic can unconditionally destroy them when that race occurs (or if the vfs hash insert happened to fail for some other reason). Thanks to ups@ for a careful review of the code. Reported by : Kris Kennaway	2006-11-27 19:06:43 +00:00
Mohan Srinivasan	a18c4dc336	1) Fix up locking in nfs_up() and nfs_down. 2) Reduce the acquisitions of the Giant lock in the nfs_socket.c paths significantly. - We don't need to acquire Giant before tsleeping on lbolt anymore, since jhb specialcased lbolt handling in msleep. - nfs_up() needs to acquire Giant only if printing the "server up" message. - nfs_timer() held Giant for the duration of the NFS timer processing, just because the printing of the message in nfs_down() needed it (and we acquire other locks in nfs_timer()). The acquisition of Giant is moved down into nfs_down() now, reducing the time Giant is held in that path. Reported by: Kris Kennaway	2006-11-20 04:14:23 +00:00
Mohan Srinivasan	3c2fcc3c92	vfs_hash_insert() vputs() the losing vnode before returning, in the event of a race where a duplicate vnode is entered into the vfs hash. nfs_nget() shouldn't be releasing the vnode in that case.	2006-11-16 23:03:46 +00:00
Mohan Srinivasan	87c125cecc	Fix to readdir+ reply handling. When inserting an entry into the namecache, initialize the nfsnode's ctime. Otherwise a subsequent lookup purges the just entered namecache entry.	2006-11-16 23:02:37 +00:00
Sam Leffler	83cc6b9ad2	honor nolockd flag in root mount options MFC after: 2 weeks	2006-11-07 18:02:45 +00:00
Mohan Srinivasan	88b94fba38	Make EWOULDBLOCK a recoverable error so that the request is retransmitted. This bug results in data corruption with NFS/TCP. Writes are silently dropped on EWOULDBLOCK (because socket send buffer is full and sockbuf timer fires). Reviewed by: ups@	2006-10-31 20:25:37 +00:00
Bruce Evans	35259c2c89	Fixed some style bugs (especially ones involving long lines and use of __P(())). There are many more.	2006-10-17 22:07:07 +00:00
Bruce Evans	6a72ff6b09	Don't do null Setattr RPCs for VA_MARK_ATIME. When we added the VA_MARK_ATIME feature to fix POSIX conformance fore execve() and mmap(), we thought that it was optimized well enough for the one file system that supports it (ffs) and harmless for other file systems (except layered ones which already get the layering for VOP_SETATTR() wrong). However, nfs_setattr() doesn't do much parameter checking, so when it gets a combination of parameters that it doesn't understand, it always does a Setattr RPC. This RPC can't do anything good, and for VA_MARK_ATIME it is null except for wasting a lot of time. This is the smallest and easiest to fix of several bugs that have increased the number of RPCs for kernel builds on nfs by more than 100% since 2004-11-05. The real-time increase depends on network latency and parallelization and can also be very large (approaching the same percentage for unparallelized operations like "make depend" on systems with fast CPUs and high-latency networks).	2006-10-14 07:25:11 +00:00
Poul-Henning Kamp	f645b0b51c	First part of a little cleanup in the calendar/timezone/RTC handling. Move relevant variables to <sys/clock.h> and fix #includes as necessary. Use libkern's much more time- & spamce-efficient BCD routines.	2006-10-02 12:59:59 +00:00
Tor Egge	a1e363f256	Add mnt_noasync counter to better handle interleaved calls to nmount(), sync() and sync_fsync() without losing MNT_ASYNC. Add MNTK_ASYNC flag which is set only when MNT_ASYNC is set and mnt_noasync is zero, and check that flag instead of MNT_ASYNC before initiating async io.	2006-09-26 04:15:59 +00:00
Tor Egge	5da56ddb21	Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag. This eliminates a race where MNT_UPDATE flag could be lost when nmount() raced against sync(), sync_fsync() or quotactl().	2006-09-26 04:12:49 +00:00
Mohan Srinivasan	7d7d9e2242	Fixes up the handling of shared vnode lock lookups in the NFS client, adds a FS type specific flag indicating that the FS supports shared vnode lock lookups, adds some logic in vfs_lookup.c to test this flag and set lock flags appropriately. - amd on 6.x is a non-starter (without this change). Using amd under heavy load results in a deadlock (with cascading vnode locks all the way to the root) very quickly. - This change should also fix the more general problem of cascading vnode deadlocks when an NFS server goes down. Ideally, we wouldn't need these changes, as enabling shared vnode lock lookups globally would work. Unfortunately, UFS, for example isn't ready for shared vnode lock lookups, crashing pretty quickly. This change is the result of discussions with Stephan Uphoff (ups@). Reviewed by: ups@	2006-09-13 18:39:09 +00:00
Mohan Srinivasan	6cd7078919	Fix for a deadlock triggered by a 'umount -f' causing a NFS request to never retransmit (or return). Thanks to John Baldwin for helping nail this one. Found by : Kris Kennaway	2006-08-29 22:00:12 +00:00
Thomas Quinot	3401780fa0	Fix typos in comment.	2006-08-16 23:53:05 +00:00
Alan Cox	5786be7cc7	Introduce a field to struct vm_page for storing flags that are synchronized by the lock on the object containing the page. Transition PG_WANTED and PG_SWAPINPROG to use the new field, eliminating the need for holding the page queues lock when setting or clearing these flags. Rename PG_WANTED and PG_SWAPINPROG to VPO_WANTED and VPO_SWAPINPROG, respectively. Eliminate the assertion that the page queues lock is held in vm_page_io_finish(). Eliminate the acquisition and release of the page queues lock around calls to vm_page_io_finish() in kern_sendfile() and vfs_unbusy_pages().	2006-08-09 17:43:27 +00:00
Brooks Davis	a36aa44a85	Add a new kernel environment variable "boot.netif.mtu" which is used to set the MTU prior to mounting root via NFS. This is required if the server supports a higher than default MTU because the client will not see the responses otherwise. MFC after: 3 weeks	2006-08-09 01:56:17 +00:00
Robert Watson	b0668f7151	soreceive_generic(), and sopoll_generic(). Add new functions sosend(), soreceive(), and sopoll(), which are wrappers for pru_sosend, pru_soreceive, and pru_sopoll, and are now used univerally by socket consumers rather than either directly invoking the old so*() functions or directly invoking the protocol switch method (about an even split prior to this commit). This completes an architectural change that was begun in 1996 to permit protocols to provide substitute implementations, as now used by UDP. Consumers now uniformly invoke sosend(), soreceive(), and sopoll() to perform these operations on sockets -- in particular, distributed file systems and socket system calls. Architectural head nod: sam, gnn, wollman	2006-07-24 15:20:08 +00:00
Konstantin Belousov	c915bcbad2	Signals may be delivered to process as well as to the thread. Check the thread-delivered signals in addition to the process one. Reviewed by: mohan MFC after: 1 month Approved by: kan (mentor)	2006-07-08 15:39:11 +00:00
Konstantin Belousov	201599c3af	Always supply curthread as argument to nfs_asyncio and nfs_doio in nfs_strategy. Otherwise, for some buffers, signals would be ignored at the intr mounts. Reviewed by: mohan MFC after: 1 month Approved by: kan (mentor)	2006-07-08 15:36:51 +00:00
Yaroslav Tykhiy	4b97d7affd	There is a consensus that ifaddr.ifa_addr should never be NULL, except in places dealing with ifaddr creation or destruction; and in such special places incomplete ifaddrs should never be linked to system-wide data structures. Therefore we can eliminate all the superfluous checks for "ifa->ifa_addr != NULL" and get ready to the system crashing honestly instead of masking possible bugs. Suggested by: glebius, jhb, ru	2006-06-29 19:22:05 +00:00
Yaroslav Tykhiy	576cdf4352	Use the elegant TAILQ_FOREACH() in place of a hand-rolled for() loop.	2006-06-29 15:37:39 +00:00
Mohan Srinivasan	64c3892747	Kris Kennaway found that for '/' NFS mounts, the MPSAFE mount flag was not being set, which means Giant would be acquired for these mounts.	2006-05-30 20:32:44 +00:00
Mohan Srinivasan	1af6f471ca	Fix for a potential attempt to sleep while holding nm_mtx. Caught and reported by Witness (which forces the mbuf allocation flag to M_NOWAIT). Reported by: "sekes".	2006-05-26 18:45:55 +00:00
Stephan Uphoff	6c1b7d16c2	Call vm_object_page_clean() with the object lock held. Submitted by: kensmith@ Reviewed by: mohans@ MFC after: 6 days	2006-05-25 17:16:11 +00:00
Stephan Uphoff	dcf67e65d2	Do not set B_NOCACHE on buffers when releasing them in flushbuflist(). If B_NOCACHE is set the pages of vm backed buffers will be invalidated. However clean buffers can be backed by dirty VM pages so invalidating them can lead to data loss. Add support for flush dirty page in the data invalidation function of some network file systems. This fixes data losses during vnode recycling (and other code paths using invalbuf(,V_SAVE,,*)) for data written using an mmaped file. Collaborative effort by: jhb@,mohans@,peter@,ps@,ups@ Reviewed by: tegge@ MFC after: 7 days	2006-05-25 01:00:35 +00:00
Mohan Srinivasan	5bbfbd1422	Since NFSv4 is not SMP safe, nfsiod needs to acquire Giant for NFSv4 mounts before doing the read/write. Reported by: Chuck Lever.	2006-05-24 23:06:50 +00:00
Robert Watson	33c6a485bd	Adjust minimum iod threads from 4 to 0 -- since we compile the NFS client into the kernel by default, and many users won't use NFS, don't start an extra 4 kernel threads that are unused. Once NFS becomes active, it will start nfsiod's as it needs them. We might consider mandating a minimum iod's equal to the number of active NFS mounts (truncated to some value), which would force some to remain available without having to create a new one if the file system is mostly inactive. PR: 70880 MFC after: 2 weeks Prodded by: cel Head nod: peter Pointed out by: Joe <fbsd_user at a1poweruser dot com>	2006-05-24 21:04:46 +00:00
Chuck Lever	6d0699a5ba	NFS over TCP retransmit behavior should default to a 60 second time out, mimicing the NFS reference implementation. NFS over TCP does not need fast retransmit timeouts, since network loss and congestion are managed by the transport (TCP), unlike with NFS over UDP. A long timeout prevents the unnecessary retransmission of non- idempotent NFS requests. Reviewed by: mohans, silby, rees? Sponsored by: Network Appliance, Incorporated	2006-05-23 18:48:07 +00:00
Chuck Lever	94163ea283	Refactor the NFS over UDP retransmit timeout estimation logic to allow the estimator to be more easily tuned and maintained. There should be no functional change except there is now a lower limit on the retransmit timeout to prevent the client from retransmitting faster than the server's disks can fill requests, and an upper limit to prevent the estimator from taking to long to retransmit during a server outage. Reviewed by: mohan, kris, silby Sponsored by: Network Appliance, Incorporated	2006-05-23 18:33:58 +00:00
Mohan Srinivasan	f2c48228fe	Vnode locks are recursive and the NFS client support shared vnode locks. Found by: Kris Kennaway.	2006-05-23 16:07:23 +00:00
Mohan Srinivasan	f1cdf89911	Changes to make the NFS client MP safe. Thanks to Kris Kennaway for testing and sending lots of bugs my way.	2006-05-19 00:04:24 +00:00
Mohan Srinivasan	671d06fb2e	Fix a snafu caused while patching the previous fix from another branch.	2006-05-05 18:12:13 +00:00
Mohan Srinivasan	9f5b7dea42	Fix for a NFS/TCP client bug which would cause the NFS/TCP stream to get out of sync under heavy loads, forcing frequent reconnets, causing EBADRPC errors etc.	2006-05-05 18:04:53 +00:00
Mohan Srinivasan	5ef7d50da5	Keep track of the number of in-progress async direct IO writes in the nfsnode. Make fsync/close wait until all of these drain. Add a check to nfs_getpage() and nfs_putpage().	2006-04-06 01:20:30 +00:00
Jeff Roberson	b2282f9a3f	- Busy the filesystem in nfs_statfs to prevent us from creating a new vnode after vflush() has succeeded. This would cause a dangling vnode panic at unmount time otherwise. Other filesystems may have this problem via their VFS_VGET() routines. Found by: kris Sponsored by: Isilon Systems, Inc.	2006-04-01 01:15:23 +00:00
Kris Kennaway	78e31796c9	Fix a bug in the NFS/TCP retransmission path. The bug was that earlier, if a request was retransmitted, we would do subsequent retransmits every 10 msecs. This can cause data corruption under moderate loads by reordering operations as seen by the client NFS attribute cache, and on the server side when the retransmission occurs after the original request has left the duplicate cache, since the operation will be committed for a second time. Further work on retransmission handling is needed (e.g. they are still being done sent too often since they are scaled by HZ, and the size of the dup cache is too small and easily overwhelmed on busy servers). Submitted by: mohans	2006-03-23 22:58:42 +00:00
Pawel Jakub Dawidek	9972deb772	Actually I wanted 'nolockd' here instead of 'lockd'. MFC after: 2 days	2006-03-19 13:27:37 +00:00
Chuck Lever	a59b03bf0e	If an NFS server returns more than a few EJUKEBOX errors for a given RPC request, the FreeBSD NFS client will quickly back off to a excessively long wait (days, then weeks) before retrying the request. Change the behavior of the FreeBSD NFS client to match the behavior of the reference NFS client implementation (Solaris). This provides a fixed delay of 10 seconds between each retry by default. A sysctl, called nfs3_jukebox_delay, is now available to tune the delay. Unlike Solaris, the sysctl value on FreeBSD is in seconds, rather than in HZ. Sponsored by: Network Appliance, Incorporated Reviewed by: rick Approved by: silby MFC after: 3 days	2006-03-17 22:14:23 +00:00
Chuck Lever	9f5349f23d	Fix a bug in NFSv3 READDIRPLUS reply processing The client's READDIRPLUS logic skips the attributes and filehandle of the ".." entry. If the server doesn't send attributes but does send a filehandle for "..", the client's logic doesn't account for the extra "value follows" field that indicates whether the filehandle is present, causing the remaining entries in the reply to be ignored. Sponsored by: Network Appliance, Inc. Reviewed by: rick, mohans Approved by: silby MFC after: 2 weeks	2006-03-08 01:43:01 +00:00
Jim Rees	4b81d0eb0f	Don't log an error on tcp connection reset, even if we don't get ECONNRESET. Submitted by: cel@citi.umich.edu	2006-01-20 15:07:18 +00:00
Alfred Perlstein	92e73f5711	I ran into an nfs client panic a couple of times in a row over the last few days. I tracked it down to the fact that nfs_reclaim() is setting vp->v_data to NULL _before_ calling vnode_destroy_object(). After silence from the mailing list I checked further and discovered that ufs_reclaim() is unique among FreeBSD filesystems for calling vnode_destroy_object() early, long before tossing v_data or much of anything else, for that matter. The rest, including NFS, appear to be identical, as if they were just clones of one original routine. The enclosed patch fixes all file systems in essentially the same way, by moving the call to vnode_destroy_object() to early in the routine (before the call to vfs_hash_remove(), if any). I have only tested NFS, but I've now run for over eighteen hours with the patch where I wouldn't get past four or five without it. Submitted by: Frank Mayhar Requested by: Mohan Srinivasan MFC After: 1 week	2006-01-17 17:29:03 +00:00
Robert Watson	63074a901a	In nfs_dolock(), GC now under-used ioflg, rendered obsolete when we moved from using a fifo to talk to rpc.lockd to using a special device node. Noticed by: Coverity Prevent analysis tool MFC after: 3 days	2006-01-13 23:16:29 +00:00
Tor Egge	82be0a5a24	Add marker vnodes to ensure that all vnodes associated with the mount point are iterated over when using MNT_VNODE_FOREACH. Reviewed by: truckman	2006-01-09 20:42:19 +00:00
Xin LI	fc9fac4c78	Correct a typo	2005-12-28 10:03:48 +00:00
Paul Saab	fc6ff223c4	Improve upon rev 1.133 where NFS/TCP would not reconnect. Submitted by: Mohan Srinivasan	2005-12-12 23:18:05 +00:00
Ruslan Ermilov	2f1b461447	Unexpand LLADDR().	2005-11-29 09:51:47 +00:00
Paul Saab	38b29f71ef	Fix for a bug where NFS/TCP would not reconnect (in the case where the server FIN'ed). Seen with Solaris NFS servers. Reported by: TOMITA Yoshinori <yoshint@flab.fujitsu.co.jp> Submitted by: Mohan Strinivasan	2005-11-21 19:25:24 +00:00
Paul Saab	3834aac17e	- Always return success from NFS strategy. nfs_doio(), in the event of an error, does the right thing, in terms of setting the error flags in the buf header. That fixes a crash from bstrategy(). - Treat ETIMEDOUT as a "recoverable" error, causing the buffer to be re-dirtied. ETIMEDOUT can occur on soft mounts, when the number of retries are exceeded, and we don't want data loss in that case. Submitted by: Mohan Srinivasan	2005-11-21 19:23:46 +00:00
Jim Rees	cb156cc603	fix a problem with XID re-use when a server returns NFSERR_JUKEBOX. Submitted by: cel@citi.umich.edu Fixed by: rick@snowhite.cis.uoguelph.ca Approved by: alfred MFC after: 3 weeks	2005-11-21 18:39:18 +00:00
Jonathan Chen	0b3e7451da	fix a crash when an nfsv2 mount fails MFC after: 1 week	2005-11-10 23:25:16 +00:00
Paul Saab	9c31df40bb	Fix for a crash (from nfs_lookup() in an error case). Submitted by: Mohan Srinivasan	2005-11-03 19:24:54 +00:00
Paul Saab	41ce2892bb	In nfs_flush(), clear the NMODIFIED bit only if there are no dirty buffers and there are no buffers queued up for writing. The bug was that NMODIFIED was being cleared even while there were buffers scheduled to be written out, which leads to all sorts of interesting bugs - one where the file could shrink (because of a post-op getattr load, say) causing data in buffer(s) queued for write to be tossed, resulting in data corruption. Submitted by: Mohan Srinivasan	2005-11-03 07:42:15 +00:00
Paul Saab	120c58288c	Fix for a race between the thread transmitting the request and the thread processing the reply. Submitted by: Mohan Srinivasan	2005-11-03 07:31:06 +00:00
Robert Watson	5bb84bc84b	Normalize a significant number of kernel malloc type names: - Prefer '_' to ' ', as it results in more easily parsed results in memory monitoring tools such as vmstat. - Remove punctuation that is incompatible with using memory type names as file names, such as '/' characters. - Disambiguate some collisions by adding subsystem prefixes to some memory types. - Generally prefer lower case to upper case. - If the same type is defined in multiple architecture directories, attempt to use the same name in additional cases. Not all instances were caught in this change, so more work is required to finish this conversion. Similar changes are required for UMA zone names.	2005-10-31 15:41:29 +00:00
Gleb Smirnoff	c0bc2867c1	- Fix leak of struct nlminfo on process exit. - Fix malloc type collision, that made the above problem difficult to understand. Reported by: Vladimir Sharun <sharun ukr.net>	2005-10-26 07:18:37 +00:00
Pawel Jakub Dawidek	df71afde00	- Use strsep() instead of strtok(). - strdup() uses M_WAITOK, so we don't need to check it's return value against NULL. MFC after: 2 weeks	2005-10-06 19:04:08 +00:00
Pawel Jakub Dawidek	720f3948c0	Add boot.nfsroot.options loader tunable. It allows to specify options for NFS root file system. Currently supported options are: soft, intr, conn, lockd. I'm adding this functionality mostly for 'lockd' option, which is only honored when performing the initial mount and will be silently ignored if used while updating the mount options. This will allow to use flock(2) without the need of using varmfs or rpc.lockd and friends. Example of use: boot.nfsroot.options="intr,lockd" MFC after: 2 weeks	2005-10-06 11:18:34 +00:00
Robert Watson	84d2b7df26	Add GIANT_REQUIRED and WITNESS sleep warnings to uprintf() and tprintf(), as they both interact with the tty code (!MPSAFE) and may sleep if the tty buffer is full (per comment). Modify all consumers of uprintf() and tprintf() to hold Giant around calls into these functions. In most cases, this means adding an acquisition of Giant immediately around the function. In some cases (nfs_timer()), it means acquiring Giant higher up in the callout. With these changes, UFS no longer panics on SMP when either blocks are exhausted or inodes are exhausted under load due to races in the tty code when running without Giant. NB: Some reduction in calls to uprintf() in the svr4 code is probably desirable. NB: In the case of nfs_timer(), calling uprintf() while holding a mutex, or even in a callout at all, is a bad idea, and will generate warnings and potential upset. This needs to be fixed, but was a problem before this change. NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having non-MPSAFE tty code. MFC after: 1 week	2005-09-19 16:51:43 +00:00
Paul Saab	250614c5ab	FIx for a bug in the change that made nfs_timer() MPSAFE. We need to grab Giant before calling pru_send() (if running with mpsafenet = 0). Found by: Jeremie Le Hen. Fixed by: Maxime Henrion	2005-07-27 15:06:26 +00:00
Paul Saab	4fb48d10b0	In nfs_nget() if two threads race on the same filehandle, the loser should cause the nfsnode to get freed. This fixes a potential vnode (and nfsnode) leak in that path. Submitted by: Mohan Srinivasan Reviewed by: phk	2005-07-27 15:05:31 +00:00
Paul Saab	865b5cc7fd	Remove the NFS client rslock. The rslock was used to serialize writers that want to extend the file. It was also used to serialize readers that might want to read the last block of the file (with a writer extending the file). Now that we support vnode locking for NFS, the rslock is unnecessary. Writers grab the exclusive vnode lock before writing and readers grab the shared (or in some cases the exclusive) lock. Submitted by: Mohan Srinivasan	2005-07-21 22:46:56 +00:00
Paul Saab	4321eae6b7	Make nfs_timer() MPSAFE. With this change, the bottom half of the NFS client (the interface with the protocol stack and callouts) is Giant-free. Submitted by: Mohan Srinivasan.	2005-07-19 21:27:25 +00:00
Paul Saab	38b8570c55	Fix for a NFS soft mounts bug where if the number of retries exceeds the max rexmits, the request was not being bounced back with a ETIMEDOUT error. Reported by: Oliver Lehmann Submitted by: Mohan Srinivasan	2005-07-18 02:12:17 +00:00
Paul Saab	0e38f5365b	Fixes for NFS crashes on architectures that require strict alignment. - Fix nfsm_disct() so that after pulling up data, the remaining data is aligned if necessary. - Fix nfs_clnt_tcp_soupcall() to bcopy() the rpc length out of the mbuf (instead of casting m_data to a uint32). Submitted by: Pyun YongHyeon Reviewed by: Mohan Srinivasan	2005-07-14 20:08:27 +00:00
Brian Feldman	6979a7592a	Ifdef out the incomplete non-blocking IO implementation for NFS pending discussion of how implementation would proceed. Applications like -lc_r expect select(3) to match the EAGAIN-status of IO functions. Approved by: re	2005-06-16 15:43:17 +00:00
Brian Feldman	cc3149b1ea	Fix a serious deadlock with the NFS client. Given a large enough atomic write request, it can fill the buffer cache with the entirety of that write in order to handle retries. However, it never drops the vnode lock, or else it wouldn't be atomic, so it ends up waiting indefinitely for more buf memory that cannot be gotten as it has it all, and it waits in an uncancellable state. To fix this, hibufspace is exported and scaled to a reasonable fraction. This is used as the limit of how much of an atomic write request by the NFS client will be handled asynchronously. If the request is larger than this, it will be turned into a synchronous request which won't deadlock the system. It's possible this value is far off from what is required by some, so it shall be tunable as soon as mount_nfs(8) learns of the new field. The slowdown between an asynchronous and a synchronous write on NFS appears to be on the order of 2x-4x. General nod by: gad MFC after: 2 weeks More testing: wes PR: kern/79208	2005-06-10 23:50:41 +00:00
Dag-Erling Smørgrav	3f54cc0505	Ugh. Previous commit got the logic exactly backward. Submitted by: bland Pointy hat to: des	2005-05-17 18:23:03 +00:00
Dag-Erling Smørgrav	ff17c7a727	Revision 1.173 broke updating a mount from ro to rw. Fix that by clearing the MNT_RDONLY flag if MNT_UPDATE is set and "ro" was not specified. Suggested by: cognet	2005-05-17 12:00:43 +00:00
Jim Rees	3785bdbe7f	set R_MUSTRESEND flag in mark_for_reconnect so re-connected requests get re-sent instead of timing out. don't log an error message on reconnection, which is not an error. remove unused nfs_mrep_before_tsleep. Reviewed by: Mohan Srinivasan Approved by: alfred	2005-05-10 14:25:14 +00:00
Paul Saab	15ec3fe2f0	Fix a bug in NFS/TCP where retransmissions would not reliably happen if the server rebooted or tore down the connection for any reason. Found by: Jonathan Noack. Submitted by: Mohan Srinivasan.	2005-05-04 16:37:31 +00:00
Ian Dowse	2c443c417c	Don't copy the NFSMNT_* flags into struct statfs's f_flags field, as they have no connection with the expected MNT_* flags. This bug was exposed 18 months ago when the assignments to f_flags in vfs_syscalls.c were moved to before the VFS_STATFS() call. It was fixed in the CSRG source 10 years ago, but we never picked up that change. PR: kern/80390 MFC after: 1 week	2005-05-02 15:57:10 +00:00
Dag-Erling Smørgrav	4104e6bc1d	When NFS was converted to the new mount syscall, code was written that sets the MNT_RDONLY flag if the "ro" option was passed in from userland, and clears it otherwise. In the diskless case, the MNT_RDONLY flag is already set when this code is reached, but there are no mount options, so it was incorrectly cleared. Change the logic so the MNT_RDONLY flag is set if the "ro" option was specified, and left alone otherwise. Note that the NFS code will still happily let you mount a filesystem RW even if the server exports it RO. I'm not sure how to fix that.	2005-04-27 14:46:02 +00:00

1 2 3 4 5 ...

878 Commits