freebsd-nq

Author	SHA1	Message	Date
Attilio Rao	a3c14ce5d9	namei() can call underlying nfs_readlink() passing a struct uio pointer owned by a NULL owner. This will lead consequent VOP_ISLOCKED() present into nfs_upgrade_vnlock() to panic as it only acquire curthread now. Fix nfs_upgrade_vnlock() and nfs_downgrade_vnlock() in order to not use more the struct thread pointer passed as argument (as it is really nomore required there as vn_lock() and VOP_UNLOCK doesn't get the lock more). Using curthread, in place, doesn't get ambiguity as LK_EXCLOTHER should be handled as a "not locked" request by both functions. Reported by: kris Tested by: kris Reviewed by: ups	2008-02-09 20:13:19 +00:00
Mohan Srinivasan	17c53e4a28	Fix for a very rare race, caused by the nfsiod wakeup and nfsiod idle timeout occurring at exactly the same time. If this happens, the nfsiod exits although there may be a queued async IO request for it. Found by : Kris Kennaway Approved by: re	2007-09-25 21:08:49 +00:00
John Baldwin	03e557fd5a	Fix up NFS client write error handling. Errors are split into recoverable and unrecoverable. For the former, we redirty the buffer and hang onto it for future retries. For the latter (eg. ESTALE), we discard the buffer and return the error back to the user on the next syscall. This fixes a number of vfs panics and fixes having a large number of dirty buffers (that cannot be written out and reclaimed) from hanging around. Thanks to ups@ for discussions on this issue. Reported by: kris, Kai, others Approved by: re (kensmith)	2007-07-03 18:30:55 +00:00
Attilio Rao	b4b7081961	Do proper "locking" for missing vmmeters part. Now, we assume no more sched_lock protection for some of them and use the distribuited loads method for vmmeter (distribuited through CPUs). Reviewed by: alc, bde Approved by: jeff (mentor)	2007-06-04 21:45:18 +00:00
Attilio Rao	2feb50bf7d	Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)	2007-05-31 22:52:15 +00:00
Jeff Roberson	222d01951f	- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>	2007-05-18 07:10:50 +00:00
John Baldwin	a1054d5776	Various fixes to the NFS Directio support. - Fix for a bug where a close would not wait for all (directio) dirty buffers to drain. The nfsnode was not marked NMODIFIED when there were directio dirtied buffers pending, causing this. - No reason to vhold/vrele the vp when enqueueing DirectIO requests for the nfsiods. The vnode can't really go way since the close has to wait for these requests to drain. MFC after: 1 week Submitted by: mohans	2007-04-25 20:34:55 +00:00
Alan Cox	5786be7cc7	Introduce a field to struct vm_page for storing flags that are synchronized by the lock on the object containing the page. Transition PG_WANTED and PG_SWAPINPROG to use the new field, eliminating the need for holding the page queues lock when setting or clearing these flags. Rename PG_WANTED and PG_SWAPINPROG to VPO_WANTED and VPO_SWAPINPROG, respectively. Eliminate the assertion that the page queues lock is held in vm_page_io_finish(). Eliminate the acquisition and release of the page queues lock around calls to vm_page_io_finish() in kern_sendfile() and vfs_unbusy_pages().	2006-08-09 17:43:27 +00:00
Stephan Uphoff	6c1b7d16c2	Call vm_object_page_clean() with the object lock held. Submitted by: kensmith@ Reviewed by: mohans@ MFC after: 6 days	2006-05-25 17:16:11 +00:00
Stephan Uphoff	dcf67e65d2	Do not set B_NOCACHE on buffers when releasing them in flushbuflist(). If B_NOCACHE is set the pages of vm backed buffers will be invalidated. However clean buffers can be backed by dirty VM pages so invalidating them can lead to data loss. Add support for flush dirty page in the data invalidation function of some network file systems. This fixes data losses during vnode recycling (and other code paths using invalbuf(,V_SAVE,,*)) for data written using an mmaped file. Collaborative effort by: jhb@,mohans@,peter@,ps@,ups@ Reviewed by: tegge@ MFC after: 7 days	2006-05-25 01:00:35 +00:00
Mohan Srinivasan	f1cdf89911	Changes to make the NFS client MP safe. Thanks to Kris Kennaway for testing and sending lots of bugs my way.	2006-05-19 00:04:24 +00:00
Mohan Srinivasan	5ef7d50da5	Keep track of the number of in-progress async direct IO writes in the nfsnode. Make fsync/close wait until all of these drain. Add a check to nfs_getpage() and nfs_putpage().	2006-04-06 01:20:30 +00:00
Paul Saab	3834aac17e	- Always return success from NFS strategy. nfs_doio(), in the event of an error, does the right thing, in terms of setting the error flags in the buf header. That fixes a crash from bstrategy(). - Treat ETIMEDOUT as a "recoverable" error, causing the buffer to be re-dirtied. ETIMEDOUT can occur on soft mounts, when the number of retries are exceeded, and we don't want data loss in that case. Submitted by: Mohan Srinivasan	2005-11-21 19:23:46 +00:00
Paul Saab	865b5cc7fd	Remove the NFS client rslock. The rslock was used to serialize writers that want to extend the file. It was also used to serialize readers that might want to read the last block of the file (with a writer extending the file). Now that we support vnode locking for NFS, the rslock is unnecessary. Writers grab the exclusive vnode lock before writing and readers grab the shared (or in some cases the exclusive) lock. Submitted by: Mohan Srinivasan	2005-07-21 22:46:56 +00:00
Brian Feldman	6979a7592a	Ifdef out the incomplete non-blocking IO implementation for NFS pending discussion of how implementation would proceed. Applications like -lc_r expect select(3) to match the EAGAIN-status of IO functions. Approved by: re	2005-06-16 15:43:17 +00:00
Brian Feldman	cc3149b1ea	Fix a serious deadlock with the NFS client. Given a large enough atomic write request, it can fill the buffer cache with the entirety of that write in order to handle retries. However, it never drops the vnode lock, or else it wouldn't be atomic, so it ends up waiting indefinitely for more buf memory that cannot be gotten as it has it all, and it waits in an uncancellable state. To fix this, hibufspace is exported and scaled to a reasonable fraction. This is used as the limit of how much of an atomic write request by the NFS client will be handled asynchronously. If the request is larger than this, it will be turned into a synchronous request which won't deadlock the system. It's possible this value is far off from what is required by some, so it shall be tunable as soon as mount_nfs(8) learns of the new field. The slowdown between an asynchronous and a synchronous write on NFS appears to be on the order of 2x-4x. General nod by: gad MFC after: 2 weeks More testing: wes PR: kern/79208	2005-06-10 23:50:41 +00:00
David Schultz	938838feb1	Don't brelse(bp) if bp is null. Also, eliminate some redundancy and dead code. Found by: Coverity Prevent analysis tool	2005-03-18 21:23:32 +00:00
Jeff Roberson	c0f681c21d	- The VI_DOOMED flag now signals the end of a vnode's relationship with the filesystem. Check that rather than VI_XLOCK. Sponsored by: Isilon Systems, Inc.	2005-03-13 12:14:56 +00:00
Poul-Henning Kamp	56dd36b1a6	Remove unused cred arg from nfs_vinvalbuf() and many bogus arguments passed for it.	2005-01-24 12:31:06 +00:00
Poul-Henning Kamp	7c0745eeae	Eliminate unused and unnecessary "cred" argument from vinvalbuf()	2005-01-14 07:33:51 +00:00
Warner Losh	c398230b64	/* -> /*- for license, minor formatting changes	2005-01-07 01:45:51 +00:00
Paul Saab	a7500bceb0	First cut of NFS direct IO support. - NFS direct IO completely bypasses the buffer and page caches. If a file is open for direct IO all caching is disabled. - Direct IO for Directories will be addressed later. - 2 new NFS directio related sysctls are added. One is a knob to disable NFS direct IO completely (direct IO is enabled by default). The other is to disallow mmaped IO on a file that has at least one O_DIRECT open (see the comment in nfs_vnops.c for more details). The default is to allow mmaps on a file that has O_DIRECT opens. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Obtained from: Yahoo!	2004-12-15 22:20:22 +00:00
Paul Saab	4342aac774	Store a hint in the nfsnode to detect sequential access of the file. Kick off a readahead only when sequential access is detected. This eliminates wasteful readaheads in random file access. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Obtained from: Yahoo!	2004-12-10 03:27:12 +00:00
Paul Saab	35ec46b7f2	Rewrite of the NFS client's reply handling. We now have NFS socket upcalls which do RPC header parsing and match up the reply with the request. NFS calls now sleep on the nfsreq structure. This enables us to eliminate the NFS recvlock. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com	2004-12-06 21:11:15 +00:00
Paul Saab	ddc6c40075	2 fixes that improve on the consistency of the NFS client cache. - Change the cached mtime to a 'struct timespec' from a time_t. Improving the precision of the cached mtime tightens up NFS' "close-to-open" consistency considerably. - Always force an over-the-wire consistency check from nfs_open() (unless the file is marked modified). This further improves NFS' "close-to-open" consistency. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com	2004-12-06 19:18:00 +00:00
Paul Saab	d54d263a79	Serialize NFS vinvalbuf operations by acquiring/upgrading to the vnode EXCLUSIVE lock. This prevents threads from adding pages to the vnode while an invalidation is in progress, closing potential races. In the bioread() path, callers acquire the SHARED vnode lock - so while an invalidate was in progress, it was possible to fault in new pages onto the vnode causing the invalidation to take a while or fail. We saw these races at Yahoo! with very large files+heavy concurrent access. Forcing an upgrade to EXCLUSIVE lock before doing the invalidation closes all these races. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com	2004-12-06 18:52:28 +00:00
Jeff Roberson	b646893f0f	- Eliminate the acquisition and release of the bqlock in bremfree() by setting the B_REMFREE flag in the buf. This is done to prevent lock order reversals with code that must call bremfree() with a local lock held. This also reduces overhead by removing two lock operations per buf for fsync() and similar. - Check for the B_REMFREE flag in brelse() and bqrelse() after the bqlock has been acquired so that we may remove ourself from the free-list. - Provide a bremfreef() function to immediately remove a buf from a free-list for use only by NFS. This is done because the nfsclient code overloads the b_freelist queue for its own async. io queue. - Simplify the numfreebuffers accounting by removing a switch statement that executed the same code in every possible case. - getnewbuf() can encounter locked bufs on free-lists once Giant is removed. Remove a panic associated with this condition and delay asserts that inspect the buf until after it is locked. Reviewed by: phk Sponsored by: Isilon Systems, Inc.	2004-11-18 08:44:09 +00:00
Poul-Henning Kamp	6e67e2a710	Retire b_magic now, we have the bufobj containing the same hint.	2004-11-04 09:48:18 +00:00
Poul-Henning Kamp	b792bebeea	Move the buffer method vector (buf->b_op) to the bufobj. Extend it with a strategy method. Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY song and dance. Rename ibwrite to bufwrite(). Move the two NFS buf_ops to more sensible places, add bufstrategy to them. Add inlines for bwrite() and bstrategy() which calls through buf->b_bufobj->b_ops->b_{write,strategy}(). Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().	2004-10-24 20:03:41 +00:00
Poul-Henning Kamp	494eb176e7	Add b_bufobj to struct buf which eventually will eliminate the need for b_vp. Initialize b_bufobj for all buffers. Make incore() and gbincore() take a bufobj instead of a vnode. Make inmem() local to vfs_bio.c Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj) also VI_MTX() to BO_MTX(), Make buf_vlist_add() take a bufobj instead of a vnode. Eliminate other uses of bp->b_vp where bp->b_bufobj will do. Various minor polishing: remove "register", turn panic into KASSERT, use new function declarations, TAILQ_FOREACH_SAFE() etc.	2004-10-22 08:47:20 +00:00
David Schultz	506d3e1bcc	nfsclient/nfs_bio.c has a PHOLD() without a PRELE(). Neither should be necessary here. Also, use killproc() instead of psignal().	2004-10-01 05:01:41 +00:00
Poul-Henning Kamp	08dbd671ff	Remove unused B_WRITEINPROG flag	2004-09-15 21:49:22 +00:00
Poul-Henning Kamp	35f134080f	Explicitly pass vnode to nfs_doio() and mountpoint to nfs_asyncio().	2004-09-07 08:56:43 +00:00
Alfred Perlstein	c713aaaeca	NFS mobility PHASE I, II & III (phase VI, and V pending): Rebind the client socket when we experience a timeout. This fixes the case where our IP changes for some reason. Signal a VFS event when NFS transitions from up to down and vice versa. Add a placeholder vfs_sysctl where we will put status reporting shortly. Also: Make down NFS mounts return EIO instead of EINTR when there is a soft timeout or force unmount in progress.	2004-07-06 09:12:03 +00:00
Robert Watson	8950572050	Remove bad cookie vp kernel printf; while it does notify about an interesting event, there's little or nothing the user can do about it.	2004-06-17 00:15:37 +00:00
Alan Cox	5a32489377	Make vm_page's PG_ZERO flag immutable between the time of the page's allocation and deallocation. This flag's principal use is shortly after allocation. For such cases, clearing the flag is pointless. The only unusual use of PG_ZERO is in vfs_bio_clrbuf(). However, allocbuf() never requests a prezeroed page. So, vfs_bio_clrbuf() never sees a prezeroed page. Reviewed by: tegge@	2004-05-06 05:03:23 +00:00
Peter Edwards	7c8ca9400e	Let the NFS client notice a file's size changing as a modification. This avoids presenting invalid data to the client's applications when the file is modified, and then extended within the window of the resolution of the modifcation timestamp. Reviewed By: iedowse PR: kern/64091	2004-04-14 23:23:55 +00:00
Warner Losh	2fcbca0d85	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson. Approved by: core, peter, alc, rwatson	2004-04-07 05:00:01 +00:00
Poul-Henning Kamp	4d453ef101	Properly vector all bwrite() and BUF_WRITE() calls through the same path and s/BUF_WRITE()/bwrite()/ since it now does the same as bwrite().	2004-03-11 18:02:36 +00:00
John Baldwin	91d5354a2c	Locking for the per-process resource limits structure. - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists. Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64	2004-02-04 21:52:57 +00:00
Alfred Perlstein	90abe7f294	Use function pointers to remove the depenancy cross dependancy on nfs4 and the nfs3 client. Also fix some bugs that happen to be causing crashes in both v3 and v4 introduced by the v4 import. Submitted by: Jim Rees <rees@umich.edu> Approved by: re	2003-11-22 02:21:49 +00:00
Alfred Perlstein	1bf8720450	University of Michigan's Citi NFSv4 kernel client code. Submitted by: Jim Rees <rees@umich.edu>	2003-11-14 20:54:10 +00:00
Poul-Henning Kamp	901b6327b1	Initialize bp->b_offset before calling VOP_STRATEGY(). Remove KASSERTS and panics with B_PHYS checks which no longer apply.	2003-10-18 11:14:29 +00:00
Poul-Henning Kamp	e6ad40bb11	We do not get B_PHYS buffers here anymore. /dev/drum is long gone.	2003-10-18 09:33:13 +00:00
Jeff Roberson	8b5905a47d	- Remove the backtrace() call from the *_vinvalbuf() functions. Thanks to a stack trace supplied by phk, I now understand what's going on here. The check for VI_XLOCK stops us from calling vinvalbuf once the vnode has been partially torn down in vclean(). It is not clear that this would cause a problem. Document this in nfs_bio.c, which is where the other two filesystems copied this code from.	2003-10-04 08:51:50 +00:00
Jeff Roberson	ce1fb23146	- Remove interlock protection around VI_XLOCK. The interlock is not sufficient to guarantee that this race is not hit. The XLOCK will likely have to be redesigned due to the way reference counting and mutexes work in FreeBSD. We currently can not be guaranteed that xlock was not set and cleared while we were blocked on the interlock while waiting to check for XLOCK. This would lead us to reference a vnode which was not the vnode we requested. - Add a backtrace() call inside of INVARIANTS in the hopes of finding out if this condition is ever hit. It should not, since we should be retaining a reference to the vnode in these cases. The reference would be sufficient to block recycling.	2003-09-19 23:37:49 +00:00
Alan Cox	aec774abec	Lock the vm object when freeing a page.	2003-06-17 05:17:00 +00:00
Poul-Henning Kamp	17a1391990	The IO_NOWDRAIN and B_NOWDRAIN hacks are no longer needed to prevent deadlocks with vnode backed md(4) devices because md now uses a kthread to run the bio requests instead of doing it directly from the bio down path.	2003-05-31 16:42:45 +00:00
Robert Watson	7042ac8cd7	This change grabs the vnode lock for NFS client vnodes when calling VOP_SETATTR() or VOP_GETATTR(); without these locks (a) VFS_DEBUG_LOCKS will panic, and (b) it may be possible to corrupt entries in the cached vnode attributes in the nfsnode, since nfsnode attribute cache data is also protected by the vnode lock. Approved by: re (jhb) Pointed out by: VFS_DEBUG_LOCKS	2003-05-15 21:12:08 +00:00
Jeff Roberson	7261f5f68e	- Add a new 'flags' parameter to getblk(). - Define one flag GB_LOCK_NOWAIT that tells getblk() to pass the LK_NOWAIT flag to the initial BUF_LOCK(). This will eventually be used in cases were we want to use a buffer only if it is not currently in use. - Convert all consumers of the getblk() api to use this extra parameter. Reviwed by: arch Not objected to by: mckusick	2003-03-04 00:04:44 +00:00

1 2 3 4

166 Commits