freebsd-dev

Author	SHA1	Message	Date
David Schultz	938838feb1	Don't brelse(bp) if bp is null. Also, eliminate some redundancy and dead code. Found by: Coverity Prevent analysis tool	2005-03-18 21:23:32 +00:00
Jeff Roberson	c0f681c21d	- The VI_DOOMED flag now signals the end of a vnode's relationship with the filesystem. Check that rather than VI_XLOCK. Sponsored by: Isilon Systems, Inc.	2005-03-13 12:14:56 +00:00
Poul-Henning Kamp	56dd36b1a6	Remove unused cred arg from nfs_vinvalbuf() and many bogus arguments passed for it.	2005-01-24 12:31:06 +00:00
Poul-Henning Kamp	7c0745eeae	Eliminate unused and unnecessary "cred" argument from vinvalbuf()	2005-01-14 07:33:51 +00:00
Warner Losh	c398230b64	/* -> /*- for license, minor formatting changes	2005-01-07 01:45:51 +00:00
Paul Saab	a7500bceb0	First cut of NFS direct IO support. - NFS direct IO completely bypasses the buffer and page caches. If a file is open for direct IO all caching is disabled. - Direct IO for Directories will be addressed later. - 2 new NFS directio related sysctls are added. One is a knob to disable NFS direct IO completely (direct IO is enabled by default). The other is to disallow mmaped IO on a file that has at least one O_DIRECT open (see the comment in nfs_vnops.c for more details). The default is to allow mmaps on a file that has O_DIRECT opens. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Obtained from: Yahoo!	2004-12-15 22:20:22 +00:00
Paul Saab	4342aac774	Store a hint in the nfsnode to detect sequential access of the file. Kick off a readahead only when sequential access is detected. This eliminates wasteful readaheads in random file access. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Obtained from: Yahoo!	2004-12-10 03:27:12 +00:00
Paul Saab	35ec46b7f2	Rewrite of the NFS client's reply handling. We now have NFS socket upcalls which do RPC header parsing and match up the reply with the request. NFS calls now sleep on the nfsreq structure. This enables us to eliminate the NFS recvlock. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com	2004-12-06 21:11:15 +00:00
Paul Saab	ddc6c40075	2 fixes that improve on the consistency of the NFS client cache. - Change the cached mtime to a 'struct timespec' from a time_t. Improving the precision of the cached mtime tightens up NFS' "close-to-open" consistency considerably. - Always force an over-the-wire consistency check from nfs_open() (unless the file is marked modified). This further improves NFS' "close-to-open" consistency. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com	2004-12-06 19:18:00 +00:00
Paul Saab	d54d263a79	Serialize NFS vinvalbuf operations by acquiring/upgrading to the vnode EXCLUSIVE lock. This prevents threads from adding pages to the vnode while an invalidation is in progress, closing potential races. In the bioread() path, callers acquire the SHARED vnode lock - so while an invalidate was in progress, it was possible to fault in new pages onto the vnode causing the invalidation to take a while or fail. We saw these races at Yahoo! with very large files+heavy concurrent access. Forcing an upgrade to EXCLUSIVE lock before doing the invalidation closes all these races. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com	2004-12-06 18:52:28 +00:00
Jeff Roberson	b646893f0f	- Eliminate the acquisition and release of the bqlock in bremfree() by setting the B_REMFREE flag in the buf. This is done to prevent lock order reversals with code that must call bremfree() with a local lock held. This also reduces overhead by removing two lock operations per buf for fsync() and similar. - Check for the B_REMFREE flag in brelse() and bqrelse() after the bqlock has been acquired so that we may remove ourself from the free-list. - Provide a bremfreef() function to immediately remove a buf from a free-list for use only by NFS. This is done because the nfsclient code overloads the b_freelist queue for its own async. io queue. - Simplify the numfreebuffers accounting by removing a switch statement that executed the same code in every possible case. - getnewbuf() can encounter locked bufs on free-lists once Giant is removed. Remove a panic associated with this condition and delay asserts that inspect the buf until after it is locked. Reviewed by: phk Sponsored by: Isilon Systems, Inc.	2004-11-18 08:44:09 +00:00
Poul-Henning Kamp	6e67e2a710	Retire b_magic now, we have the bufobj containing the same hint.	2004-11-04 09:48:18 +00:00
Poul-Henning Kamp	b792bebeea	Move the buffer method vector (buf->b_op) to the bufobj. Extend it with a strategy method. Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY song and dance. Rename ibwrite to bufwrite(). Move the two NFS buf_ops to more sensible places, add bufstrategy to them. Add inlines for bwrite() and bstrategy() which calls through buf->b_bufobj->b_ops->b_{write,strategy}(). Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().	2004-10-24 20:03:41 +00:00
Poul-Henning Kamp	494eb176e7	Add b_bufobj to struct buf which eventually will eliminate the need for b_vp. Initialize b_bufobj for all buffers. Make incore() and gbincore() take a bufobj instead of a vnode. Make inmem() local to vfs_bio.c Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj) also VI_MTX() to BO_MTX(), Make buf_vlist_add() take a bufobj instead of a vnode. Eliminate other uses of bp->b_vp where bp->b_bufobj will do. Various minor polishing: remove "register", turn panic into KASSERT, use new function declarations, TAILQ_FOREACH_SAFE() etc.	2004-10-22 08:47:20 +00:00
David Schultz	506d3e1bcc	nfsclient/nfs_bio.c has a PHOLD() without a PRELE(). Neither should be necessary here. Also, use killproc() instead of psignal().	2004-10-01 05:01:41 +00:00
Poul-Henning Kamp	08dbd671ff	Remove unused B_WRITEINPROG flag	2004-09-15 21:49:22 +00:00
Poul-Henning Kamp	35f134080f	Explicitly pass vnode to nfs_doio() and mountpoint to nfs_asyncio().	2004-09-07 08:56:43 +00:00
Alfred Perlstein	c713aaaeca	NFS mobility PHASE I, II & III (phase VI, and V pending): Rebind the client socket when we experience a timeout. This fixes the case where our IP changes for some reason. Signal a VFS event when NFS transitions from up to down and vice versa. Add a placeholder vfs_sysctl where we will put status reporting shortly. Also: Make down NFS mounts return EIO instead of EINTR when there is a soft timeout or force unmount in progress.	2004-07-06 09:12:03 +00:00
Robert Watson	8950572050	Remove bad cookie vp kernel printf; while it does notify about an interesting event, there's little or nothing the user can do about it.	2004-06-17 00:15:37 +00:00
Alan Cox	5a32489377	Make vm_page's PG_ZERO flag immutable between the time of the page's allocation and deallocation. This flag's principal use is shortly after allocation. For such cases, clearing the flag is pointless. The only unusual use of PG_ZERO is in vfs_bio_clrbuf(). However, allocbuf() never requests a prezeroed page. So, vfs_bio_clrbuf() never sees a prezeroed page. Reviewed by: tegge@	2004-05-06 05:03:23 +00:00
Peter Edwards	7c8ca9400e	Let the NFS client notice a file's size changing as a modification. This avoids presenting invalid data to the client's applications when the file is modified, and then extended within the window of the resolution of the modifcation timestamp. Reviewed By: iedowse PR: kern/64091	2004-04-14 23:23:55 +00:00
Warner Losh	2fcbca0d85	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson. Approved by: core, peter, alc, rwatson	2004-04-07 05:00:01 +00:00
Poul-Henning Kamp	4d453ef101	Properly vector all bwrite() and BUF_WRITE() calls through the same path and s/BUF_WRITE()/bwrite()/ since it now does the same as bwrite().	2004-03-11 18:02:36 +00:00
John Baldwin	91d5354a2c	Locking for the per-process resource limits structure. - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists. Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64	2004-02-04 21:52:57 +00:00
Alfred Perlstein	90abe7f294	Use function pointers to remove the depenancy cross dependancy on nfs4 and the nfs3 client. Also fix some bugs that happen to be causing crashes in both v3 and v4 introduced by the v4 import. Submitted by: Jim Rees <rees@umich.edu> Approved by: re	2003-11-22 02:21:49 +00:00
Alfred Perlstein	1bf8720450	University of Michigan's Citi NFSv4 kernel client code. Submitted by: Jim Rees <rees@umich.edu>	2003-11-14 20:54:10 +00:00
Poul-Henning Kamp	901b6327b1	Initialize bp->b_offset before calling VOP_STRATEGY(). Remove KASSERTS and panics with B_PHYS checks which no longer apply.	2003-10-18 11:14:29 +00:00
Poul-Henning Kamp	e6ad40bb11	We do not get B_PHYS buffers here anymore. /dev/drum is long gone.	2003-10-18 09:33:13 +00:00
Jeff Roberson	8b5905a47d	- Remove the backtrace() call from the *_vinvalbuf() functions. Thanks to a stack trace supplied by phk, I now understand what's going on here. The check for VI_XLOCK stops us from calling vinvalbuf once the vnode has been partially torn down in vclean(). It is not clear that this would cause a problem. Document this in nfs_bio.c, which is where the other two filesystems copied this code from.	2003-10-04 08:51:50 +00:00
Jeff Roberson	ce1fb23146	- Remove interlock protection around VI_XLOCK. The interlock is not sufficient to guarantee that this race is not hit. The XLOCK will likely have to be redesigned due to the way reference counting and mutexes work in FreeBSD. We currently can not be guaranteed that xlock was not set and cleared while we were blocked on the interlock while waiting to check for XLOCK. This would lead us to reference a vnode which was not the vnode we requested. - Add a backtrace() call inside of INVARIANTS in the hopes of finding out if this condition is ever hit. It should not, since we should be retaining a reference to the vnode in these cases. The reference would be sufficient to block recycling.	2003-09-19 23:37:49 +00:00
Alan Cox	aec774abec	Lock the vm object when freeing a page.	2003-06-17 05:17:00 +00:00
Poul-Henning Kamp	17a1391990	The IO_NOWDRAIN and B_NOWDRAIN hacks are no longer needed to prevent deadlocks with vnode backed md(4) devices because md now uses a kthread to run the bio requests instead of doing it directly from the bio down path.	2003-05-31 16:42:45 +00:00
Robert Watson	7042ac8cd7	This change grabs the vnode lock for NFS client vnodes when calling VOP_SETATTR() or VOP_GETATTR(); without these locks (a) VFS_DEBUG_LOCKS will panic, and (b) it may be possible to corrupt entries in the cached vnode attributes in the nfsnode, since nfsnode attribute cache data is also protected by the vnode lock. Approved by: re (jhb) Pointed out by: VFS_DEBUG_LOCKS	2003-05-15 21:12:08 +00:00
Jeff Roberson	7261f5f68e	- Add a new 'flags' parameter to getblk(). - Define one flag GB_LOCK_NOWAIT that tells getblk() to pass the LK_NOWAIT flag to the initial BUF_LOCK(). This will eventually be used in cases were we want to use a buffer only if it is not currently in use. - Convert all consumers of the getblk() api to use this extra parameter. Reviwed by: arch Not objected to by: mckusick	2003-03-04 00:04:44 +00:00
Dag-Erling Smørgrav	521f364b80	More low-hanging fruit: kill caddr_t in calls to wakeup(9) / [mt]sleep(9).	2003-03-02 16:54:40 +00:00
Matthew Dillon	3a3d82ec0a	Abstract-out the constants for the sequential heuristic. No operational changes. MFC after: 1 day	2002-12-28 20:37:50 +00:00
Jeff Roberson	8926aed697	- Lock access to the buf lists. - Use vrefcnt() where appropriate. - Add some locking asserts.	2002-09-25 02:38:43 +00:00
Jeff Roberson	e6e370a7fe	- Replace v_flag with v_iflag and v_vflag - v_vflag is protected by the vnode lock and is used when synchronization with VOP calls is needed. - v_iflag is protected by interlock and is used for dealing with vnode management issues. These flags include X/O LOCK, FREE, DOOMED, etc. - All accesses to v_iflag and v_vflag have either been locked or marked with mp_fixme's. - Many ASSERT_VOP_LOCKED calls have been added where the locking was not clear. - Many functions in vfs_subr.c were restructured to provide for stronger locking. Idea stolen from: BSD/OS	2002-08-04 10:29:36 +00:00
Alan Cox	e0c9fdb50e	o Lock page queue accesses in nfs_getpages().	2002-07-21 20:01:32 +00:00
Matthew Dillon	8e0619c6b0	Fix a bug nfs_write() related to ^C'ing during a file write on an interruptable mount. We were returning from inside the loop without releasing the rslock. Submitted by: Mike Junk <junk@isilon.com> MFC after: 3 days	2002-07-16 19:43:59 +00:00
Matthew Dillon	3d8f797ac1	Convert old style (type foo *)0 casts to NULLs PR: kern/40360 Requested by: Hiten PAndya via direct email	2002-07-11 17:54:58 +00:00
Matthew Dillon	d331c5d43f	Replace the global buffer hash table with per-vnode splay trees using a methodology similar to the vm_map_entry splay and the VM splay that Alan Cox is working on. Extensive testing has appeared to have shown no increase in overhead. Disadvantages Dirties more cache lines during lookups. Not as fast as a hash table lookup (but still N log N and optimal when there is locality of reference). Advantages vnode->v_dirtyblkhd is now perfectly sorted, making fsync/sync/filesystem syncer operate more efficiently. I get to rip out all the old hacks (some of which were mine) that tried to keep the v_dirtyblkhd tailq sorted. The per-vnode splay tree should be easier to lock / SMPng pushdown on vnodes will be easier. This commit along with another that Alan is working on for the VM page global hash table will allow me to implement ranged fsync(), optimize server-side nfs commit rpcs, and implement partial syncs by the filesystem syncer (aka filesystem syncer would detect that someone is trying to get the vnode lock, remembers its place, and skip to the next vnode). Note that the buffer cache splay is somewhat more complex then other splays due to special handling of background bitmap writes (multiple buffers with the same lblkno in the same vnode), and B_INVAL discontinuities between the old hash table and the existence of the buffer on the v_cleanblkhd list. Suggested by: alc	2002-07-10 17:02:32 +00:00
John Baldwin	56e9ce41a5	In namei(), we use a NULL thread for uio_td when doing a VOP_READLINK(). nfs_readlink() calls nfs_bioread() which passes in uio_td as the thread argument to nfs_getcacheblk(). In nfs_getcacheblk() we dereference the thread pointer to get a process pointer to pass to nfs_sigintr(). This obviously results in a panic. :) Rather than change nfs_getcacheblk() to check if the thread pointer is NULL when calling nfs_sigintr() like other callers do, change nfs_sigintr() to take a thread as the last argument instead of a process so none of the callers have to care if the thread is NULL or not.	2002-06-28 21:53:08 +00:00
John Baldwin	a854ed9893	Simple p_ucred -> td_ucred changes to start using the per-thread ucred reference.	2002-02-27 18:32:23 +00:00
Peter Wemm	1bde568682	Revise the nfsiod auto tuning code. Now both the upper and lower limits are specifyable by sysctl and are respected. Submitted by: Maxime Henrion <mux@sneakerz.org>	2002-01-15 20:57:21 +00:00
Peter Wemm	117f61374c	Implement vfs.nfs.iodmin (minimum number of nfsiod's) and vfs.nfs.iodmaxidle (idle time before nfsiod's exit). Make it adaptive so that we create nfsiod's on demand and they go away after not being used for a while. The upper limit is NFS_MAXASYNCDAEMON (currently 20). More will be done here, but this is a useful checkpoint. Submitted by: Maxime Henrion <mux@qualys.com>	2002-01-14 02:13:46 +00:00
Matthew Dillon	3ebeaf5984	This fixes a large number of bugs in our NFS client side code. A recent commit by Kirk also fixed a softupdates bug that could easily be triggered by server side NFS. * An edge case with shared R+W mmap()'s and truncate whereby the system would inappropriately clear the dirty bits on still-dirty data. (applicable to all filesystems) THIS FIX TEMPORARILY DISABLED PENDING FURTHER TESTING. see vm/vm_page.c line 1641 * The straddle case for VM pages and buffer cache buffers when truncating. (applicable to NFS client side) * Possible SMP database corruption due to vm_pager_unmap_page() not clearing the TLB for the other cpu's. (applicable to NFS client side but could effect all filesystems). Note: not considered serious since the corruption occurs beyond the file EOF. * When flusing a dirty buffer due to B_CACHE getting cleared, we were accidently setting B_CACHE again (that is, bwrite() sets B_CACHE), when we really want it to stay clear after the write is complete. This resulted in a corrupt buffer. (applicable to all filesystems but probably only triggered by NFS) * We have to call vtruncbuf() when ftruncate()ing to remove any buffer cache buffers. This is still tentitive, I may be able to remove it due to the second bug fix. (applicable to NFS client side) * vnode_pager_setsize() race against nfs_vinvalbuf()... we have to set n_size before calling nfs_vinvalbuf or the NFS code may recursively vnode_pager_setsize() to the original value before the truncate. This is what was causing the user mmap bus faults in the nfs tester program. (applicable to NFS client side) * Fix to softupdates (see ufs/ffs/ffs_inode.c 1.73, commit made by Kirk). Testing program written by: Avadis Tevanian, Jr. Testing program supplied by: jkh / Apple (see Dec2001 posting to freebsd-hackers with Subject 'NFS: How to make FreeBS fall on its face in one easy step') MFC after: 1 week	2001-12-14 01:16:57 +00:00
Matthew Dillon	7e76bb562e	Implement IO_NOWDRAIN and B_NOWDRAIN - prevents the buffer cache from blocking in wdrain during a write. This flag needs to be used in devices whos strategy routines turn-around and issue another high level I/O, such as when MD turns around and issues a VOP_WRITE to vnode backing store, in order to avoid deadlocking the dirty buffer draining code. Remove a vprintf() warning from MD when the backing vnode is found to be in-use. The syncer of buf_daemon could be flushing the backing vnode at the time of an MD operation so the warning is not correct. MFC after: 1 week	2001-11-05 18:48:54 +00:00
John Baldwin	bd78cece5d	Change the kernel's ucred API as follows: - crhold() returns a reference to the ucred whose refcount it bumps. - crcopy() now simply copies the credentials from one credential to another and has no return value. - a new crshared() primitive is added which returns true if a ucred's refcount is > 1 and false (0) otherwise.	2001-10-11 23:38:17 +00:00
Peter Wemm	891a092764	Sigh, Last minute pre-merge typo. (missing quotes)	2001-09-18 23:49:33 +00:00
Peter Wemm	eb25edbda3	Cleanup and split of nfs client and server code. This builds on the top of several repo-copies.	2001-09-18 23:32:09 +00:00
Warner Losh	976a26437e	nfs_strategy calls nfs_asyncio with td as NULL. So add a bandaid that will pass NULL as the struct proc when td is NULL. This has stopped crashing on my machine. Note: The passing of NULL may be bogus, but I'll let others fix that problem. Reviewed by: jhb	2001-09-18 18:37:52 +00:00
Julian Elischer	b40ce4165d	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha	2001-09-12 08:38:13 +00:00
John Baldwin	617e358cdf	- Sort includes. - Update vmmeter statistics for vnode pagein/pageouts in getpages/putpages.	2001-07-04 20:14:59 +00:00
Matthew Dillon	0cddd8f023	With Alfred's permission, remove vm_mtx in favor of a fine-grained approach (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.	2001-07-04 16:20:28 +00:00
John Baldwin	ce70e0a964	Assert Giant is held by the caller rather than getting it and releasing it in getpages/putpages.	2001-05-23 22:26:05 +00:00
Alfred Perlstein	2395531439	Introduce a global lock for the vm subsystem (vm_mtx). vm_mtx does not recurse and is required for most low level vm operations. faults can not be taken without holding Giant. Memory subsystems can now call the base page allocators safely. Almost all atomic ops were removed as they are covered under the vm mutex. Alpha and ia64 now need to catch up to i386's trap handlers. FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties). Reviewed (partially) by: jake, jhb	2001-05-19 01:28:09 +00:00
Greg Lehey	60fb0ce365	Revert consequences of changes to mount.h, part 2. Requested by: bde	2001-04-29 02:45:39 +00:00
Greg Lehey	d98dc34f52	Correct #includes to work with fixed sys/mount.h.	2001-04-23 09:05:15 +00:00
Alfred Perlstein	d8d5fa8805	vnode_pager_freepage() is really vm_page_free() in disguise, nuke vnode_pager_freepage() and replace all calls to it with vm_page_free()	2001-04-19 06:18:23 +00:00
Poul-Henning Kamp	f84e29a06c	This patch removes the VOP_BWRITE() vector. VOP_BWRITE() was a hack which made it possible for NFS client side to use struct buf with non-bio backing. This patch takes a more general approach and adds a bp->b_op vector where more methods can be added. The success of this patch depends on bp->b_op being initialized all relevant places for some value of "relevant" which is not easy to determine. For now the buffers have grown a b_magic element which will make such issues a tiny bit easier to debug.	2001-04-17 08:56:39 +00:00
John Baldwin	19eb87d22a	Grab the process lock while calling psignal and before calling psignal.	2001-03-07 03:37:06 +00:00
Poul-Henning Kamp	9626b608de	Separate the struct bio related stuff out of <sys/buf.h> into <sys/bio.h>. <sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall not be made a nested include according to bdes teachings on the subject of nested includes. Diskdrivers and similar stuff below specfs::strategy() should no longer need to include <sys/buf.> unless they need caching of data. Still a few bogus uses of struct buf to track down. Repocopy by: peter	2000-05-05 09:59:14 +00:00
Poul-Henning Kamp	8177437d85	Complete the bio/buf divorce for all code below devfs::strategy Exceptions: Vinum untouched. This means that it cannot be compiled. Greg Lehey is on the case. CCD not converted yet, casts to struct buf (still safe) atapi-cd casts to struct buf to examine B_PHYS	2000-04-15 05:54:02 +00:00
Poul-Henning Kamp	c244d2de43	Move B_ERROR flag to b_ioflags and call it BIO_ERROR. (Much of this done by script) Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED. Move b_pblkno and b_iodone_chain to struct bio while we transition, they will be obsoleted once bio structs chain/stack. Add bio_queue field for struct bio aware disksort. Address a lot of stylistic issues brought up by bde.	2000-04-02 15:24:56 +00:00
Poul-Henning Kamp	b99c307a21	Rename the existing BUF_STRATEGY() to DEV_STRATEGY() substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo) substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo) This patch is machine generated except for the ccd.c and buf.h parts.	2000-03-20 11:29:10 +00:00
Poul-Henning Kamp	21144e3bf1	Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new field in struct buf: b_iocmd. The b_iocmd is enforced to have exactly one bit set. B_WRITE was bogusly defined as zero giving rise to obvious coding mistakes. Also eliminate the redundant struct buf flag B_CALL, it can just as efficiently be done by comparing b_iodone to NULL. Should you get a panic or drop into the debugger, complaining about "b_iocmd", don't continue. It is likely to write on your disk where it should have been reading. This change is a step in the direction towards a stackable BIO capability. A lot of this patch were machine generated (Thanks to style(9) compliance!) Vinum users: Greg has not had time to test this yet, be careful.	2000-03-20 10:44:49 +00:00
Matthew Dillon	c37c9620cd	Enhance reassignbuf(). When a buffer cannot be time-optimally inserted into vnode dirtyblkhd we append it to the list instead of prepend it to the list in order to maintain a 'forward' locality of reference, which is arguably better then 'reverse'. The original algorithm did things this way to but at a huge time cost. Enhance the append interlock for NFS writes to handle intr/soft mounts better. Fix the hysteresis for NFS async daemon I/O requests to reduce the number of unnecessary context switches. Modify handling of NFS mount options. Any given user option that is too high now defaults to the kernel maximum for that option rather then the kernel default for that option. Reviewed by: Alfred Perlstein <bright@wintelcom.net>	2000-01-05 05:11:37 +00:00
Matthew Dillon	b7303db36e	Fix two problems: First, fix the append seek position race that can occur due to np->n_size potentially changing if nfs_getcacheblk() blocks in nfs_write(). Second, under -current we must supply the proper bufsize when obtaining buffers that straddle the EOF, but due to the fact that np->n_size can change out from under us it is possible that we may specify the wrong buffer size and wind up truncating dirty data written by another process. Both problems are solved by implementing nfs_rslock(), which allows us to lock around sensitive buffer cache operations such as those that occur when appending to a file. It is believed that this race is responsible for causing dirtyoff/dirtyend and (in stable) validoff/validend to exceed the buffer size. Therefore we have now added a warning printf for the dirtyoff/end case in current. However, we have introduced a new problem which we need to fix at some point, and that is that soft or intr NFS mounts may become uninterruptable from the point of view of process A which is stuck waiting on rslock while process B is stuck doing the rpc. To unstick process A, process B would have to be interrupted first. Reviewed by: Alfred Perlstein <bright@wintelcom.net>	1999-12-14 19:07:54 +00:00
Matthew Dillon	ea94c7b968	Synopsis of problem being fixed: Dan Nelson originally reported that blocks of zeros could wind up in a file written to over NFS by a client. The problem only occurs a few times per several gigabytes of data. This problem turned out to be bug #3 below. bug #1: B_CLUSTEROK must be cleared when an NFS buffer is reverted from stage 2 (ready for commit rpc) to stage 1 (ready for write). Reversions can occur when a dirty NFS buffer is redirtied with new data. Otherwise the VFS/BIO system may end up thinking that a stage 1 NFS buffer is clusterable. Stage 1 NFS buffers are not clusterable. bug #2: B_CLUSTEROK was inappropriately set for a 'short' NFS buffer (short buffers only occur near the EOF of the file). Change to only set when the buffer is a full biosize (usually 8K). This bug has no effect but should be fixed in -current anyway. It need not be backported. bug #3: B_NEEDCOMMIT was inappropriately set in nfs_flush() (which is typically only called by the update daemon). nfs_flush() does a multi-pass loop but due to the lack of vnode locking it is possible for new buffers to be added to the dirtyblkhd list while a flush operation is going on. This may result in nfs_flush() setting B_NEEDCOMMIT on a buffer which has NOT yet gone through its stage 1 write, causing only the commit rpc to be made and thus causing the contents of the buffer to be thrown away (never sent to the server). The patch also contains some cleanup, which only applies to the commit into -current. Reviewed by: dg, julian Originally Reported by: Dan Nelson <dnelson@emsphone.com>	1999-12-12 06:09:57 +00:00
Poul-Henning Kamp	923502ff91	useracc() the prequel: Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs. This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ\|WRITE} rather than B_{READ\|WRITE} as argument.	1999-10-29 18:09:36 +00:00
Matthew Dillon	8fdd2461b3	Add comment to clarify a commit rpc optimization already being performed.	1999-09-20 19:10:28 +00:00
Matthew Dillon	b5acbc8b9c	Asynchronized client-side nfs_commit. NFS commit operations were previously issued synchronously even if async daemons (nfsiod's) were available. The commit has been moved from the strategy code to the doio code in order to asynchronize it. Removed use of lastr in preparation for removal of vnode->v_lastr. It has been replaced with seqcount, which is already supported by the system and, in fact, gives us a better heuristic for sequential detection then lastr ever did. Made major performance improvements to the server side commit. The server previously fsync'd the entire file for each commit rpc. The server now bawrite()s only those buffers related to the offset/size specified in the commit rpc. Note that we do not commit the meta-data yet. This works still needs to be done. Note that a further optimization can be done (and has not yet been done) on the client: we can merge multiple potential commit rpc's into a single rpc with a greater file offset/size range and greatly reduce rpc traffic. Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>	1999-09-17 05:57:57 +00:00
Peter Wemm	c3aac50f28	$Id$ -> $FreeBSD$	1999-08-28 01:08:13 +00:00
Alan Cox	2c28a10540	Add the (inline) function vm_page_undirty for clearing the dirty bitmask of a vm_page. Use it. Submitted by: dillon	1999-08-17 04:02:34 +00:00
Dmitrij Tejblum	e868365294	nfs_getcacheblk() can return 0 if the mount is interruptible. It need to be checked by the caller. Broken in: rev. 1.70 (1999/05/02)	1999-08-12 18:04:39 +00:00
Kirk McKusick	67812eacd7	Convert buffer locking from using the B_BUSY and B_WANTED flags to using lockmgr locks. This commit should be functionally equivalent to the old semantics. That is, all buffer locking is done with LK_EXCLUSIVE requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will be done in future commits.	1999-06-26 02:47:16 +00:00
Kirk McKusick	f9c8cab591	Add a vnode argument to VOP_BWRITE to get rid of the last vnode operator special case. Delete special case code from vnode_if.sh, vnode_if.src, umap_vnops.c, and null_vnops.c.	1999-06-16 23:27:55 +00:00
Peter Wemm	adbde675ee	Don't mistake a non-async block that needs to be committed for an interrupted write. Obtained from: fvdl@NetBSD.org via OpenBSD.	1999-06-05 05:25:37 +00:00
Poul-Henning Kamp	b0eeea2042	remove b_proc from struct buf, it's (now) unused. Reviewed by: dillon, bde	1999-05-06 20:00:34 +00:00
Alan Cox	4221e284a3	The VFS/BIO subsystem contained a number of hacks in order to optimize piecemeal, middle-of-file writes for NFS. These hacks have caused no end of trouble, especially when combined with mmap(). I've removed them. Instead, NFS will issue a read-before-write to fully instantiate the struct buf containing the write. NFS does, however, optimize piecemeal appends to files. For most common file operations, you will not notice the difference. The sole remaining fragment in the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache coherency issues with read-merge-write style operations. NFS also optimizes the write-covers-entire-buffer case by avoiding the read-before-write. There is quite a bit of room for further optimization in these areas. The VM system marks pages fully-valid (AKA vm_page_t->valid = VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault. This is not correct operation. The vm_pager_get_pages() code is now responsible for marking VM pages all-valid. A number of VM helper routines have been added to aid in zeroing-out the invalid portions of a VM page prior to the page being marked all-valid. This operation is necessary to properly support mmap(). The zeroing occurs most often when dealing with file-EOF situations. Several bugs have been fixed in the NFS subsystem, including bits handling file and directory EOF situations and buf->b_flags consistancy issues relating to clearing B_ERROR & B_INVAL, and handling B_DONE. getblk() and allocbuf() have been rewritten. B_CACHE operation is now formally defined in comments and more straightforward in implementation. B_CACHE for VMIO buffers is based on the validity of the backing store. B_CACHE for non-VMIO buffers is based simply on whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear, and vise-versa). biodone() is now responsible for setting B_CACHE when a successful read completes. B_CACHE is also set when a bdwrite() is initiated and when a bwrite() is initiated. VFS VOP_BWRITE routines (there are only two - nfs_bwrite() and bwrite()) are now expected to set B_CACHE. This means that bowrite() and bawrite() also set B_CACHE indirectly. There are a number of places in the code which were previously using buf->b_bufsize (which is DEV_BSIZE aligned) when they should have been using buf->b_bcount. These have been fixed. getblk() now clears B_DONE on return because the rest of the system is so bad about dealing with B_DONE. Major fixes to NFS/TCP have been made. A server-side bug could cause requests to be lost by the server due to nfs_realign() overwriting other rpc's in the same TCP mbuf chain. The server's kernel must be recompiled to get the benefit of the fixes. Submitted by: Matthew Dillon <dillon@apollo.backplane.com>	1999-05-02 23:57:16 +00:00
Peter Wemm	8a0d8193f2	Hold nfsd's upages in-core with PHOLD rather than P_NOSWAP.	1999-04-06 03:07:54 +00:00
Julian Elischer	8d17e69460	Catch a case spotted by Tor where files mmapped could leave garbage in the unallocated parts of the last page when the file ended on a frag but not a page boundary. Delimitted by tags PRE_MATT_MMAP_EOF and POST_MATT_MMAP_EOF, in files alpha/alpha/pmap.c i386/i386/pmap.c nfs/nfs_bio.c vm/pmap.h vm/vm_page.c vm/vm_page.h vm/vnode_pager.c miscfs/specfs/spec_vnops.c ufs/ufs/ufs_readwrite.c kern/vfs_bio.c Submitted by: Matt Dillon <dillon@freebsd.org> Reviewed by: Alan Cox <alc@freebsd.org>	1999-04-05 19:38:30 +00:00
Julian Elischer	4ef2094e45	Reviewed by: Many at differnt times in differnt parts, including alan, john, me, luoqi, and kirk Submitted by: Matt Dillon <dillon@frebsd.org> This change implements a relatively sophisticated fix to getnewbuf(). There were two problems with getnewbuf(). First, the writerecursion can lead to a system stack overflow when you have NFS and/or VN devices in the system. Second, the free/dirty buffer accounting was completely broken. Not only did the nfs routines blow it trying to manually account for the buffer state, but the accounting that was done did not work well with the purpose of their existance: figuring out when getnewbuf() needs to sleep. The meat of the change is to kern/vfs_bio.c. The remaining diffs are all minor except for NFS, which includes both the fixes for bp interaction AND fixes for a 'biodone(): buffer already done' lockup. Sys/buf.h also contains a chaining structure which is not used by this patchset but is used by other patches that are coming soon. This patch deliniated by tags PRE_MAT_GETBUF and POST_MAT_GETBUF. (sorry for the missing T matt)	1999-03-12 02:24:58 +00:00
Matthew Dillon	1c7c3c6a86	This is a rather large commit that encompasses the new swapper, changes to the VM system to support the new swapper, VM bug fixes, several VM optimizations, and some additional revamping of the VM code. The specific bug fixes will be documented with additional forced commits. This commit is somewhat rough in regards to code cleanup issues. Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>	1999-01-21 08:29:12 +00:00
Dmitrij Tejblum	10db74e96d	(Hopefully) fix support for "large" files. Mostly cast block numbers to off_t before they multiplied to block sizes.	1998-12-14 17:51:30 +00:00
Archie Cobbs	f1d19042b0	The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static and local variables, goto labels, and functions declared but not defined.	1998-12-07 21:58:50 +00:00
Peter Wemm	dad00f4e9c	Remove [apparently] bogus casts to u_long for the vnode_pager_setsize() second argument. np_size is a 64 bit int, so is the second arg. This might have caused needless 2G/4G file size problems. I believe it was Bruce who queried this.	1998-11-09 07:00:14 +00:00
Kirk McKusick	1cda241131	Mark directory buffers that have no valid data with B_INVAL so that they are not put in the cache.	1998-09-29 22:01:10 +00:00
Kirk McKusick	113b88d241	When adding data to a buffer, we need to clear the B_NEEDCOMMIT flag which says that the data is on server but not committed.	1998-09-29 21:46:54 +00:00
Doug Rabson	e69763a315	Cosmetic changes to the PAGE_XXX macros to make them consistent with the other objects in vm.	1998-09-04 08:06:57 +00:00
Bruce Evans	4c4918c9e4	Avoid an egcs pessimization for 64-bit signed division on i386's. Pre-2.8 versions of gcc generate a call to __divdi3() for all 64-bit signed divisions, but egcs optimizes them to a shift and fixup when the divisor is a constant power of 2. Unfortunately, it generates a call to __cmpdi2() for the fixup, although all except possibly ancient versions of gcc and egcs do ordinary 64-bit comparisons inline.	1998-06-14 15:52:00 +00:00
Peter Wemm	55b41976c1	Make sure we go a nfs_fsinfo() in get/putpages before calling readrpc/writerpc, since they assume it's already been done. This could break if the first read/write access to a nfs filesystem was an exec() or mmap() instead of a read(), write() syscall. (or statfs()). nfs_getpages() could return an errno (EOPNOTSUPP) instead of a VM_PAGER_* return code. Some layout tweaks for the get/putpages code.	1998-06-01 11:32:53 +00:00
Peter Wemm	7c1c33a7dd	When using NFSv3, use the remote server's idea of the maximum file size rather than assuming 2^64. It may not like files that big. :-) On the nfs server, calculate and report the max file size as the point that the block numbers in the cache would turn negative. (ie: 1099511627775 bytes (1TB)). One of the things I'm worried about however, is that directory offsets are really cookies on a NFSv3 server and can be rather large, especially when/if the server generates the opaque directory cookies by using a local filesystem offset in what comes out as the upper 32 bits of the 64 bit cookie. (a server is free to do this, it could save byte swapping depending on the native 64 bit byte order) Obtained from: NetBSD	1998-05-30 16:33:58 +00:00
Peter Wemm	dfae73fd2e	A cleaner fix for PR#5102, clear nonsense flags at mount time rather than in the core of nfs_bio.c at the 11th hour. PR: 5102	1998-05-20 08:02:24 +00:00
Peter Wemm	fe6c0d4599	Allow control of the attribute cache timeouts at mount time. We had run out of bits in the nfs mount flags, I have moved the internal state flags into a seperate variable. These are no longer visible via statfs(), but I don't know of anything that looks at them.	1998-05-19 07:11:27 +00:00
Steve Price	b9921401f1	Don't allow the readdirplus routine to be used in NFS V2. PR: 5102 Reviewed by: msmith Submitted by: Dmitry Kohmanyuk <dk@farm.org>	1998-03-28 16:05:05 +00:00
Julian Elischer	b1897c197c	Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman) Submitted by: Kirk McKusick (mcKusick@mckusick.com) Obtained from: WHistle development tree	1998-03-08 09:59:44 +00:00
John Dyson	8f9110f6a1	This mega-commit is meant to fix numerous interrelated problems. There has been some bitrot and incorrect assumptions in the vfs_bio code. These problems have manifest themselves worse on NFS type filesystems, but can still affect local filesystems under certain circumstances. Most of the problems have involved mmap consistancy, and as a side-effect broke the vfs.ioopt code. This code might have been committed seperately, but almost everything is interrelated. 1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that are fully valid. 2) Rather than deactivating erroneously read initial (header) pages in kern_exec, we now free them. 3) Fix the rundown of non-VMIO buffers that are in an inconsistent (missing vp) state. 4) Fix the disassociation of pages from buffers in brelse. The previous code had rotted and was faulty in a couple of important circumstances. 5) Remove a gratuitious buffer wakeup in vfs_vmio_release. 6) Remove a crufty and currently unused cluster mechanism for VBLK files in vfs_bio_awrite. When the code is functional, I'll add back a cleaner version. 7) The page busy count wakeups assocated with the buffer cache usage were incorrectly cleaned up in a previous commit by me. Revert to the original, correct version, but with a cleaner implementation. 8) The cluster read code now tries to keep data associated with buffers more aggressively (without breaking the heuristics) when it is presumed that the read data (buffers) will be soon needed. 9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The delay loop waiting is not useful for filesystem locks, due to the length of the time intervals. 10) Correct and clean-up spec_getpages. 11) Implement a fully functional nfs_getpages, nfs_putpages. 12) Fix nfs_write so that modifications are coherent with the NFS data on the server disk (at least as well as NFS seems to allow.) 13) Properly support MS_INVALIDATE on NFS. 14) Properly pass down MS_INVALIDATE to lower levels of the VM code from vm_map_clean. 15) Better support the notion of pages being busy but valid, so that fewer in-transit waits occur. (use p->busy more for pageouts instead of PG_BUSY.) Since the page is fully valid, it is still usable for reads. 16) It is possible (in error) for cached pages to be busy. Make the page allocation code handle that case correctly. (It should probably be a printf or panic, but I want the system to handle coding errors robustly. I'll probably add a printf.) 17) Correct the design and usage of vm_page_sleep. It didn't handle consistancy problems very well, so make the design a little less lofty. After vm_page_sleep, if it ever blocked, it is still important to relookup the page (if the object generation count changed), and verify it's status (always.) 18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up. 19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush. 20) Fix vm_pager_put_pages and it's descendents to support an int flag instead of a boolean, so that we can pass down the invalidate bit.	1998-03-07 21:37:31 +00:00
Mike Smith	651ae11e2f	Trivial filesystem getpages/putpages implementations, set the second. These should be considered the first steps in a work-in-progress. Submitted by: Terry Lambert <terry@freebsd.org>	1998-03-06 09:46:52 +00:00

1 2 3 4

200 Commits