freebsd-nq

Author	SHA1	Message	Date
Kirk McKusick	e929c00d23	The buffer queue mechanism has been reformulated. Instead of having QUEUE_AGE, QUEUE_LRU, and QUEUE_EMPTY we instead have QUEUE_CLEAN, QUEUE_DIRTY, QUEUE_EMPTY, and QUEUE_EMPTYKVA. With this patch clean and dirty buffers have been separated. Empty buffers with KVM assignments have been separated from truely empty buffers. getnewbuf() has been rewritten and now operates in a 100% optimal fashion. That is, it is able to find precisely the right kind of buffer it needs to allocate a new buffer, defragment KVM, or to free-up an existing buffer when the buffer cache is full (which is a steady-state situation for the buffer cache). Buffer flushing has been reorganized. Previously buffers were flushed in the context of whatever process hit the conditions forcing buffer flushing to occur. This resulted in processes blocking on conditions unrelated to what they were doing. This also resulted in inappropriate VFS stacking chains due to multiple processes getting stuck trying to flush dirty buffers or due to a single process getting into a situation where it might attempt to flush buffers recursively - a situation that was only partially fixed in prior commits. We have added a new daemon called the buf_daemon which is responsible for flushing dirty buffers when the number of dirty buffers exceeds the vfs.hidirtybuffers limit. This daemon attempts to dynamically adjust the rate at which dirty buffers are flushed such that getnewbuf() calls (almost) never block. The number of nbufs and amount of buffer space is now scaled past the 8MB limit that was previously imposed for systems with over 64MB of memory, and the vfs.{lo,hi}dirtybuffers limits have been relaxed somewhat. The number of physical buffers has been increased with the intention that we will manage physical I/O differently in the future. reassignbuf previously attempted to keep the dirtyblkhd list sorted which could result in non-deterministic operation under certain conditions, such as when a large number of dirty buffers are being managed. This algorithm has been changed. reassignbuf now keeps buffers locally sorted if it can do so cheaply, and otherwise gives up and adds buffers to the head of the dirtyblkhd list. The new algorithm is deterministic but not perfect. The new algorithm greatly reduces problems that previously occured when write_behind was turned off in the system. The P_FLSINPROG proc->p_flag bit has been replaced by the more descriptive P_BUFEXHAUST bit. This bit allows processes working with filesystem buffers to use available emergency reserves. Normal processes do not set this bit and are not allowed to dig into emergency reserves. The purpose of this bit is to avoid low-memory deadlocks. A small race condition was fixed in getpbuf() in vm/vm_pager.c. Submitted by: Matthew Dillon <dillon@apollo.backplane.com> Reviewed by: Kirk McKusick <mckusick@mckusick.com>	1999-07-04 00:25:38 +00:00
Peter Wemm	ddebd8794d	Hopefully fix the remaining glitches with the BUF_*() changes. This should (really this time) fix pageout to swap and a couple of clustering cases. This simplifies BUF_KERNPROC() so that it unconditionally reassigns the lock owner rather than testing B_ASYNC and having the caller decide when to do the reassign. At present this is required because some places use B_CALL/b_iodone to free the buffers without B_ASYNC being set. Also, vfs_cluster.c explicitly calls BUF_KERNPROC() when attaching the buffers rather than the parent walking the cluster_head tailq. Reviewed by: Kirk McKusick <mckusick@mckusick.com>	1999-06-29 05:59:47 +00:00
Peter Wemm	72283ee95d	Fix a bug that was almost certainly making breadn() fail. BUF_KERNPROC() was being called on the wrong bp - it should be called on the one that's just about to be fed to VOP_STRATEGY().	1999-06-28 15:32:10 +00:00
Peter Wemm	e6257a9a09	GC the remnants of the old pre-softupdates update daemon. It's been #if 0'd for a fair while now.	1999-06-26 14:46:35 +00:00
Kirk McKusick	67812eacd7	Convert buffer locking from using the B_BUSY and B_WANTED flags to using lockmgr locks. This commit should be functionally equivalent to the old semantics. That is, all buffer locking is done with LK_EXCLUSIVE requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will be done in future commits.	1999-06-26 02:47:16 +00:00
Kirk McKusick	45623f31bc	When allocating new buffers in getnewbuf, there are several points at which we may sleep. So, after completing our buffer allocation we must ensure that another process has not come along and allocated a different buffer with the same identity. We do this by keeping a global counter of the number of buffers that getnewbuf has allocated. We save this count when we enter getnewbuf and check it when we are about to return. If it has changed, then other buffers were allocated while we were in getnewbuf, so we must return NULL to let our parent know that it must recheck to see if it still needs the new buffer. Hopefully this fix will eliminate the creation of duplicate buffers with the same identity and the obscure corruptions that they cause.	1999-06-22 01:39:53 +00:00
Kirk McKusick	f9c8cab591	Add a vnode argument to VOP_BWRITE to get rid of the last vnode operator special case. Delete special case code from vnode_if.sh, vnode_if.src, umap_vnops.c, and null_vnops.c.	1999-06-16 23:27:55 +00:00
Tor Egge	01cf8ad024	If we still haven't got a sufficient number of free buffers after the call to flushdirtybuffers() then sleep in waitfreebuffers(). PR: 11697 Reviewed by: David Greenman, Matt Dillon	1999-06-16 03:19:04 +00:00
Kirk McKusick	e4ab40bcb6	Get rid of the global variable rushjob and replace it with a function in kern/vfs_subr.c named speedup_syncer() which handles the speedup request. Change the various clients of rushjob to use the new function.	1999-06-15 23:37:29 +00:00
Peter Wemm	ccb84588dd	Try an fix a couple of dev_t/major/minor etc nits.	1999-05-12 22:30:50 +00:00
Poul-Henning Kamp	b0eeea2042	remove b_proc from struct buf, it's (now) unused. Reviewed by: dillon, bde	1999-05-06 20:00:34 +00:00
Poul-Henning Kamp	84c55b38e4	Remove unused fields from struct buf: b_savekva b_validoff b_validend Reviewed by: dillon, bde	1999-05-06 17:06:41 +00:00
Alan Cox	4221e284a3	The VFS/BIO subsystem contained a number of hacks in order to optimize piecemeal, middle-of-file writes for NFS. These hacks have caused no end of trouble, especially when combined with mmap(). I've removed them. Instead, NFS will issue a read-before-write to fully instantiate the struct buf containing the write. NFS does, however, optimize piecemeal appends to files. For most common file operations, you will not notice the difference. The sole remaining fragment in the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache coherency issues with read-merge-write style operations. NFS also optimizes the write-covers-entire-buffer case by avoiding the read-before-write. There is quite a bit of room for further optimization in these areas. The VM system marks pages fully-valid (AKA vm_page_t->valid = VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault. This is not correct operation. The vm_pager_get_pages() code is now responsible for marking VM pages all-valid. A number of VM helper routines have been added to aid in zeroing-out the invalid portions of a VM page prior to the page being marked all-valid. This operation is necessary to properly support mmap(). The zeroing occurs most often when dealing with file-EOF situations. Several bugs have been fixed in the NFS subsystem, including bits handling file and directory EOF situations and buf->b_flags consistancy issues relating to clearing B_ERROR & B_INVAL, and handling B_DONE. getblk() and allocbuf() have been rewritten. B_CACHE operation is now formally defined in comments and more straightforward in implementation. B_CACHE for VMIO buffers is based on the validity of the backing store. B_CACHE for non-VMIO buffers is based simply on whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear, and vise-versa). biodone() is now responsible for setting B_CACHE when a successful read completes. B_CACHE is also set when a bdwrite() is initiated and when a bwrite() is initiated. VFS VOP_BWRITE routines (there are only two - nfs_bwrite() and bwrite()) are now expected to set B_CACHE. This means that bowrite() and bawrite() also set B_CACHE indirectly. There are a number of places in the code which were previously using buf->b_bufsize (which is DEV_BSIZE aligned) when they should have been using buf->b_bcount. These have been fixed. getblk() now clears B_DONE on return because the rest of the system is so bad about dealing with B_DONE. Major fixes to NFS/TCP have been made. A server-side bug could cause requests to be lost by the server due to nfs_realign() overwriting other rpc's in the same TCP mbuf chain. The server's kernel must be recompiled to get the benefit of the fixes. Submitted by: Matthew Dillon <dillon@apollo.backplane.com>	1999-05-02 23:57:16 +00:00
Alan Cox	0043b4376a	Address a performance problem in getnewbuf: In heavy-writing situations, QUEUE_LRU can contain a large number of DELWRI buffers at its head. These buffers must be moved to the tail if they cannot be written async in order to reduce the scanning time required to skip past these buffers in later getnewbuf() calls. Submitted by: Matthew Dillon <dillon@apollo.backplane.com>	1999-04-29 18:15:25 +00:00
Dmitrij Tejblum	35871a15c5	getnewbuf(): check return value from tsleep(). Interruptible NFS may pass PCATCH to slpflag.	1999-04-14 18:51:52 +00:00
Alan Cox	b2e2337ba1	Fix a performance problem with the new getnewbuf() code: in an outofspace condition ( bufspace > hibufspace ), an inappropriate scan of the empty queue was performed looking for buffer space to free up. Submitted by: Matthew Dillon <dillon@apollo.backplane.com>	1999-04-07 02:41:54 +00:00
Julian Elischer	8d17e69460	Catch a case spotted by Tor where files mmapped could leave garbage in the unallocated parts of the last page when the file ended on a frag but not a page boundary. Delimitted by tags PRE_MATT_MMAP_EOF and POST_MATT_MMAP_EOF, in files alpha/alpha/pmap.c i386/i386/pmap.c nfs/nfs_bio.c vm/pmap.h vm/vm_page.c vm/vm_page.h vm/vnode_pager.c miscfs/specfs/spec_vnops.c ufs/ufs/ufs_readwrite.c kern/vfs_bio.c Submitted by: Matt Dillon <dillon@freebsd.org> Reviewed by: Alan Cox <alc@freebsd.org>	1999-04-05 19:38:30 +00:00
Bruce Evans	96ebc5810b	Fixed a serious bug in rev.1.202. getnewbuf() sometimes didn't initialise bp->b_data. This tended to cause panics for file systems whose block size is smaller than one page.	1999-03-19 10:17:44 +00:00
Julian Elischer	4ef2094e45	Reviewed by: Many at differnt times in differnt parts, including alan, john, me, luoqi, and kirk Submitted by: Matt Dillon <dillon@frebsd.org> This change implements a relatively sophisticated fix to getnewbuf(). There were two problems with getnewbuf(). First, the writerecursion can lead to a system stack overflow when you have NFS and/or VN devices in the system. Second, the free/dirty buffer accounting was completely broken. Not only did the nfs routines blow it trying to manually account for the buffer state, but the accounting that was done did not work well with the purpose of their existance: figuring out when getnewbuf() needs to sleep. The meat of the change is to kern/vfs_bio.c. The remaining diffs are all minor except for NFS, which includes both the fixes for bp interaction AND fixes for a 'biodone(): buffer already done' lockup. Sys/buf.h also contains a chaining structure which is not used by this patchset but is used by other patches that are coming soon. This patch deliniated by tags PRE_MAT_GETBUF and POST_MAT_GETBUF. (sorry for the missing T matt)	1999-03-12 02:24:58 +00:00
Julian Elischer	850c9afd03	Make comment match code.	1999-03-02 21:23:38 +00:00
Julian Elischer	8b3bd42341	Remove inapropriate use of VOP_ISLOCKED() This produced races resulting in panics and filesystem corruptions under some circumstances. Reviewed by: luoqi chen <luoqi@freebsd.org> Reviewed by: Kirk McKusick <mckusick@mckusick.com> Submitted by: Matt Dillon <dillon@freebsd.org>	1999-03-02 20:26:39 +00:00
Matthew Dillon	d254af07a1	Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile	1999-01-27 21:50:00 +00:00
Matthew Dillon	377f9b28a6	Don't try to calculate B_CACHE for an NFS related bp that has a > 0 b_validend. This will screw up small-writes, causing lots of little writes out the network. We will assume that NFS handles B_CACHE properly.	1999-01-24 00:51:11 +00:00
Matthew Dillon	fae1f2e045	Fix an expression parenthesization typo in a conditional. It should not have any operational effects other then to make the code in question a little faster. Also added a more involved comment.	1999-01-23 06:36:15 +00:00
David Greenman	33ce4218c6	Don't throw away the buffer contents on a fatal write error; just mark the buffer as still being dirty. This isn't a perfect solution, but throwing away the buffer contents will often result in filesystem corruption and this solution will at least correctly deal with transient errors. Submitted by: Kirk McKusick <mckusick@mckusick.com>	1999-01-22 08:59:05 +00:00
Matthew Dillon	8618c644e4	The main operational changes are in getblk()'s handling of the B_DELWRI and B_CACHE flags, fixing a bug that showed up with NFS. Also, a number of cases where manually inserted code has been removed and replaced with an inline function call giving us better functional isolation in the source.	1999-01-21 09:19:33 +00:00
Matthew Dillon	1c7c3c6a86	This is a rather large commit that encompasses the new swapper, changes to the VM system to support the new swapper, VM bug fixes, several VM optimizations, and some additional revamping of the VM code. The specific bug fixes will be documented with additional forced commits. This commit is somewhat rough in regards to code cleanup issues. Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>	1999-01-21 08:29:12 +00:00
Matthew Dillon	ba2871b74c	Obtained from: Luoqi Fix NFS file corruption problem introduced in 1.188. The valid range was not being set properly, causing a later reference to the buffer to clear the B_CACHE bit.	1999-01-19 08:00:51 +00:00
Eivind Eklund	a32c99f35e	Silence warnings.	1999-01-12 11:59:34 +00:00
Eivind Eklund	219cbf59f2	KNFize, by bde.	1999-01-10 01:58:29 +00:00
Eivind Eklund	5526d2d920	Split DIAGNOSTIC -> DIAGNOSTIC, INVARIANTS, and INVARIANT_SUPPORT as discussed on -hackers. Introduce 'KASSERT(assertion, ("panic message", args))' for simple check + panic. Reviewed by: msmith	1999-01-08 17:31:30 +00:00
Matthew Dillon	e6ee8e16e0	Adjust some comments to prevent future confusion on the implementation. Also add a reference to the buf(9) manual page.	1998-12-22 18:57:30 +00:00
Luoqi Chen	1a551e9808	Correctly handle misaligned VMIO buffer (whose start or end offset in the VM object are not page aligned). This should fix the mount_msdos panic after a failed attemp to mount as ffs. Reviewed By: Matthew Dillon <dillon@apollo.backplane.com> Archie Cobbs <archie@whistle.com> Dmitrij Tejblum <dima@tejblum.dnttm.rssi.ru>	1998-12-22 14:43:58 +00:00
Matthew Dillon	fe523aa107	fix intermediate overflow in 'quad = int * int' situation by casting the arguments to the multiply to a quad equivalent. In this case, vm_ooffset_t. Reviewed by: Archie Cobbs <archie@whistle.com>	1998-12-14 21:17:37 +00:00
Eivind Eklund	4979978b8d	Fix grouping of statements. This remove a potential panic in the soft updates code. While I'm here, remove an unintended trigraph. Reviewed by: Kirk McKusick <kirk@freebsd.org>	1998-12-07 17:23:45 +00:00
David Greenman	4f699173cb	Closed a very narrow and rare race condition that involved net interrupts, bio interrupts, and a truncated file that along with the precise alignment of the planets could result in a page being freed multiple times or a just-freed page being put onto the inactive queue.	1998-11-18 09:00:47 +00:00
Peter Wemm	40c8cfe552	Use TAILQ macros for clean/dirty block list processing. Set b_xflags rather than abusing the list next pointer with a magic number.	1998-10-31 15:31:29 +00:00
David Greenman	2a78b8d1f8	Unwire everything to the inactive queue in order to preserve LRU ordering.	1998-10-30 14:53:54 +00:00
David Greenman	0d5a725446	Fixed editing error. Pointed out by bde.	1998-10-29 11:04:22 +00:00
David Greenman	730075613a	Added a second argument, "activate" to the vm_page_unwire() call so that the caller can select either inactive or active queue to put the page on.	1998-10-28 13:37:02 +00:00
Poul-Henning Kamp	f5ef029e92	Nitpicking and dusting performed on a train. Removes trivial warnings about unused variables, labels and other lint.	1998-10-25 17:44:59 +00:00
David Greenman	6cde7a165f	Fixed two potentially serious classes of bugs: 1) The vnode pager wasn't properly tracking the file size due to "size" being page rounded in some cases and not in others. This sometimes resulted in corrupted files. First noticed by Terry Lambert. Fixed by changing the "size" pager_alloc parameter to be a 64bit byte value (as opposed to a 32bit page index) and changing the pagers and their callers to deal with this properly. 2) Fixed a bogus type cast in round_page() and trunc_page() that caused some 64bit offsets and sizes to be scrambled. Removing the cast required adding casts at a few dozen callers. There may be problems with other bogus casts in close-by macros. A quick check seemed to indicate that those were okay, however.	1998-10-13 08:24:45 +00:00
Matthew Dillon	ff8fae607b	PR: kern/7418 Reviewed by: Luoqi Chen <luoqi@watermarkgroup.com> Fixed problem where write()s can get lost due to buffers flagged B_DELWRI being improperly released in brelse().	1998-09-26 00:12:35 +00:00
Peter Wemm	10baba4b95	Goodbye BOUNCE_BUFFERS, for a hack it has served us well. The last consumer of this code (the old SCSI system) has left us and the CAM code does it's own bouncing. The isa dma system has been doing it's own bouncing for a while too. Reviewed by: core	1998-09-25 17:34:49 +00:00
Justin T. Gibbs	7ea97031d1	kern_clock.c: Remove old disk statistics variables. vfs_bio.c: Enable bowrite now that B_ORDERED works for all buffer devices.	1998-09-15 10:05:18 +00:00
Poul-Henning Kamp	0375c9f2b8	Add a new vnode op, VOP_FREEBLKS(), which filesystems can use to inform device drivers about sectors no longer in use. Device-drivers receive the call through d_strategy, if they have D_CANFREE in d_flags. This allows flash based devices to erase the sectors and avoid pointlessly carrying them around in compactions. Reviewed by: Kirk Mckusick, bde Sponsored by: M-Systems (www.m-sys.com)	1998-09-05 14:13:12 +00:00
Doug Rabson	e69763a315	Cosmetic changes to the PAGE_XXX macros to make them consistent with the other objects in vm.	1998-09-04 08:06:57 +00:00
Luoqi Chen	ddae3cb9a0	Close a race window for getnewbuf() between shared lock holders of the vnode. Reviewed by: Mike Smith	1998-08-28 20:07:13 +00:00
Poul-Henning Kamp	12e14047a4	Fix DDBs printing of buf-flags after I changed them yesterday.	1998-08-25 14:41:42 +00:00
Poul-Henning Kamp	1d9b3ba13d	Remove the last remaining evidence of B_TAPE. Reclaim 3 unused bits in b_flags	1998-08-24 17:47:25 +00:00
Doug Rabson	069e9bc1b4	Change various syscalls to use size_t arguments instead of u_int. Add some overflow checks to read/write (from bde). Change all modifications to vm_page::flags, vm_page::busy, vm_object::flags and vm_object::paging_in_progress to use operations which are not interruptable. Reviewed by: Bruce Evans <bde@zeta.org.au>	1998-08-24 08:39:39 +00:00
Doug Rabson	7032ad107e	Protect all modifications to v_numoutput with splbio().	1998-08-13 08:09:08 +00:00
Doug Rabson	d474eaaa5f	Protect all modifications to paging_in_progress with splvm(). The i386 managed to avoid corruption of this variable by luck (the compiler used a memory read-modify-write instruction which wasn't interruptable) but other architectures cannot. With this change, I am now able to 'make buildworld' on the alpha (sfx: the crowd goes wild...)	1998-08-06 08:33:19 +00:00
Bruce Evans	9f14a215f4	Fixed printf format errors.	1998-07-13 07:05:55 +00:00
Julian Elischer	6deaf84b1f	Catch a few corner cases where FreeBSD differs enough from BSD 4.4 to confuse Soft updates.. Should solve several "dangling deps" panics.	1998-07-08 01:04:33 +00:00
Julian Elischer	fd5d1124e2	VOP_STRATEGY grows an (struct vnode *) argument as the value in b_vp is often not really what you want. (and needs to be frobbed). more cleanups will follow this. Reviewed by: Bruce Evans <bde@freebsd.org>	1998-07-04 20:45:42 +00:00
Peter Wemm	b1951f4028	vm_page_is_valid() wasn't expecting a large offset argument, it's expecting a sub-page offset. We were passing the file position, and vm_page_bits() could do some interesting things when base was larger PAGE_SIZE. if (size > PAGE_SIZE - base) size = PAGE_SIZE - base; is interesting when (PAGE_SIZE - base) is negative. I could imagine that this could have interesting consequences for memory page -> device block bit validation.	1998-05-01 15:10:59 +00:00
Peter Wemm	f806d5a257	Fix one problem with NFSv3 > 2GB file support. Submitted by: bde	1998-05-01 15:04:35 +00:00
Dag-Erling Smørgrav	dc73342347	Seventy-odd "its" / "it's" typos in comments fixed as per kern/6108.	1998-04-17 22:37:19 +00:00
Bruce Evans	c1087c1324	Support compiling with `gcc -ansi'.	1998-04-15 17:47:40 +00:00
John Dyson	f9be84912c	Correct a problem where buffers might not be zeroed when needed. The B_MALLOC buffers might not have been properly zeroed.	1998-03-27 06:48:24 +00:00
John Dyson	52c64c95c5	In kern_physio.c fix tsleep priority messup. In vfs_bio.c, remove b_generation count usage, remove redundant reassignbuf, remove redundant spl(s), manage page PG_ZERO flags more correctly, utilize in invalid value for b_offset until it is properly initialized. Add asserts for #ifdef DIAGNOSTIC, when b_offset is improperly used. when a process is not performing I/O, and just waiting on a buffer generally, make the sleep priority low. only check page validity in getblk for B_VMIO buffers. In vfs_cluster, add b_offset asserts, correct pointer calculation for clustered reads. Improve readability of certain parts of the code. Remove redundant spl(s). In vfs_subr, correct usage of vfs_bio_awrite (From Andrew Gallatin <gallatin@cs.duke.edu>). More vtruncbuf problems fixed.	1998-03-19 22:48:16 +00:00
John Dyson	4641c8ac1d	Correct a problem where data OR metadata could be thrown away if a buffer is grown.	1998-03-17 17:36:05 +00:00
KATO Takenori	f1aca9c33f	Deleted PC-98 code because (1) machine dependent code should not be in here, and (2) the flag used in PC-98 code has been assigned to another purpose.	1998-03-17 08:41:28 +00:00
John Dyson	bef608bd7e	Some VM improvements, including elimination of alot of Sig-11 problems. Tor Egge and others have helped with various VM bugs lately, but don't blame him -- blame me!!! pmap.c: 1) Create an object for kernel page table allocations. This fixes a bogus allocation method previously used for such, by grabbing pages from the kernel object, using bogus pindexes. (This was a code cleanup, and perhaps a minor system stability issue.) pmap.c: 2) Pre-set the modify and accessed bits when prudent. This will decrease bus traffic under certain circumstances. vfs_bio.c, vfs_cluster.c: 3) Rather than calculating the beginning virtual byte offset multiple times, stick the offset into the buffer header, so that the calculated offset can be reused. (Long long multiplies are often expensive, and this is a probably unmeasurable performance improvement, and code cleanup.) vfs_bio.c: 4) Handle write recursion more intelligently (but not perfectly) so that it is less likely to cause a system panic, and is also much more robust. vfs_bio.c: 5) getblk incorrectly wrote out blocks that are incorrectly sized. The problem is fixed, and writes blocks out ONLY when B_DELWRI is true. vfs_bio.c: 6) Check that already constituted buffers have fully valid pages. If not, then make sure that the B_CACHE bit is not set. (This was a major source of Sig-11 type problems.) vfs_bio.c: 7) Fix a potential system deadlock due to an incorrectly specified sleep priority while waiting for a buffer write operation. The change that I made opens the system up to serious problems, and we need to examine the issue of process sleep priorities. vfs_cluster.c, vfs_bio.c: 8) Make clustered reads work more correctly (and more completely) when buffers are already constituted, but not fully valid. (This was another system reliability issue.) vfs_subr.c, ffs_inode.c: 9) Create a vtruncbuf function, which is used by filesystems that can truncate files. The vinvalbuf forced a file sync type operation, while vtruncbuf only invalidates the buffers past the new end of file, and also invalidates the appropriate pages. (This was a system reliabiliy and performance issue.) 10) Modify FFS to use vtruncbuf. vm_object.c: 11) Make the object rundown mechanism for OBJT_VNODE type objects work more correctly. Included in that fix, create pager entries for the OBJT_DEAD pager type, so that paging requests that might slip in during race conditions are properly handled. (This was a system reliability issue.) vm_page.c: 12) Make some of the page validation routines be a little less picky about arguments passed to them. Also, support page invalidation change the object generation count so that we handle generation counts a little more robustly. vm_pageout.c: 13) Further reduce pageout daemon activity when the system doesn't need help from it. There should be no additional performance decrease even when the pageout daemon is running. (This was a significant performance issue.) vnode_pager.c: 14) Teach the vnode pager to handle race conditions during vnode deallocations.	1998-03-16 01:56:03 +00:00
Julian Elischer	b1897c197c	Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman) Submitted by: Kirk McKusick (mcKusick@mckusick.com) Obtained from: WHistle development tree	1998-03-08 09:59:44 +00:00
John Dyson	8f9110f6a1	This mega-commit is meant to fix numerous interrelated problems. There has been some bitrot and incorrect assumptions in the vfs_bio code. These problems have manifest themselves worse on NFS type filesystems, but can still affect local filesystems under certain circumstances. Most of the problems have involved mmap consistancy, and as a side-effect broke the vfs.ioopt code. This code might have been committed seperately, but almost everything is interrelated. 1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that are fully valid. 2) Rather than deactivating erroneously read initial (header) pages in kern_exec, we now free them. 3) Fix the rundown of non-VMIO buffers that are in an inconsistent (missing vp) state. 4) Fix the disassociation of pages from buffers in brelse. The previous code had rotted and was faulty in a couple of important circumstances. 5) Remove a gratuitious buffer wakeup in vfs_vmio_release. 6) Remove a crufty and currently unused cluster mechanism for VBLK files in vfs_bio_awrite. When the code is functional, I'll add back a cleaner version. 7) The page busy count wakeups assocated with the buffer cache usage were incorrectly cleaned up in a previous commit by me. Revert to the original, correct version, but with a cleaner implementation. 8) The cluster read code now tries to keep data associated with buffers more aggressively (without breaking the heuristics) when it is presumed that the read data (buffers) will be soon needed. 9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The delay loop waiting is not useful for filesystem locks, due to the length of the time intervals. 10) Correct and clean-up spec_getpages. 11) Implement a fully functional nfs_getpages, nfs_putpages. 12) Fix nfs_write so that modifications are coherent with the NFS data on the server disk (at least as well as NFS seems to allow.) 13) Properly support MS_INVALIDATE on NFS. 14) Properly pass down MS_INVALIDATE to lower levels of the VM code from vm_map_clean. 15) Better support the notion of pages being busy but valid, so that fewer in-transit waits occur. (use p->busy more for pageouts instead of PG_BUSY.) Since the page is fully valid, it is still usable for reads. 16) It is possible (in error) for cached pages to be busy. Make the page allocation code handle that case correctly. (It should probably be a printf or panic, but I want the system to handle coding errors robustly. I'll probably add a printf.) 17) Correct the design and usage of vm_page_sleep. It didn't handle consistancy problems very well, so make the design a little less lofty. After vm_page_sleep, if it ever blocked, it is still important to relookup the page (if the object generation count changed), and verify it's status (always.) 18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up. 19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush. 20) Fix vm_pager_put_pages and it's descendents to support an int flag instead of a boolean, so that we can pass down the invalidate bit.	1998-03-07 21:37:31 +00:00
John Dyson	a638dbdbf4	Fix a rounding error for the NFS buffer validend. Submitted by: John W. De Boskey <jwd@unx.sas.com>	1998-03-04 03:17:30 +00:00
John Dyson	ffc82b0a70	1) Use a more consistent page wait methodology. 2) Do not unnecessarily force page blocking when paging pages out. 3) Further improve swap pager performance and correctness, including fixing the paging in progress deadlock (except in severe I/O error conditions.) 4) Enable vfs_ioopt=1 as a default. 5) Fix and enable the page prezeroing in SMP mode. All in all, SMP systems especially should show a significant improvement in "snappyness."	1998-03-01 04:18:54 +00:00
David Greenman	c78ab18a81	Fix a && that should be an &. Reviewed by: "John S. Dyson" <dyson@FreeBSD.ORG> Submitted by: jwd@unx.sas.com (John W. DeBoskey)	1998-02-11 20:06:48 +00:00
Eivind Eklund	303b270b0a	Staticize.	1998-02-09 06:11:36 +00:00
Eivind Eklund	0b08f5f737	Back out DIAGNOSTIC changes.	1998-02-06 12:14:30 +00:00
Eivind Eklund	47cfdb166d	Turn DIAGNOSTIC into a new-style option.	1998-02-04 22:34:03 +00:00
John Dyson	eaf13dd73a	Change the busy page mgmt, so that when pages are freed, they MUST be PG_BUSY. It is bogus to free a page that isn't busy, because it is in a state of being "unavailable" when being freed. The additional advantage is that the page_remove code has a better cross-check that the page should be busy and unavailable for other use. There were some minor problems with the collapse code, and this plugs those subtile "holes." Also, the vfs_bio code wasn't checking correctly for PG_BUSY pages. I am going to develop a more consistant scheme for grabbing pages, busy or otherwise. For now, we are stuck with the current morass.	1998-01-31 11:56:53 +00:00
John Dyson	33b90a70cd	Various NFS fixes: Make vfs_bio buffer mgmt work better. Buffers were being used after brelse. Make nfs_getpages work independently of other NFS interfaces. This eliminates some difficult recursion problems and decreases pagefault overhead. Remove an erroneous vfs_unbusy_pages. Fix a reentrancy problem, with nfs_vinvalbuf when vnode is already being rundown. Reassignbuf wasn't being called when needed under certain circumstances. (Thanks to Bill Paul for help.)	1998-01-25 06:24:09 +00:00
John Dyson	50ce7ff499	Add better support for larger I/O clusters, including larger physical I/O. The support is not mature yet, and some of the underlying implementation needs help. However, support does exist for IDE devices now.	1998-01-24 02:01:46 +00:00
John Dyson	2d8acc0f4a	VM level code cleanups. 1) Start using TSM. Struct procs continue to point to upages structure, after being freed. Struct vmspace continues to point to pte object and kva space for kstack. u_map is now superfluous. 2) vm_map's don't need to be reference counted. They always exist either in the kernel or in a vmspace. The vmspaces are managed by reference counts. 3) Remove the "wired" vm_map nonsense. 4) No need to keep a cache of kernel stack kva's. 5) Get rid of strange looking ++var, and change to var++. 6) Change more data structures to use our "zone" allocator. Added struct proc, struct vmspace and struct vnode. This saves a significant amount of kva space and physical memory. Additionally, this enables TSM for the zone managed memory. 7) Keep ioopt disabled for now. 8) Remove the now bogus "single use" map concept. 9) Use generation counts or id's for data structures residing in TSM, where it allows us to avoid unneeded restart overhead during traversals, where blocking might occur. 10) Account better for memory deficits, so the pageout daemon will be able to make enough memory available (experimental.) 11) Fix some vnode locking problems. (From Tor, I think.) 12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp. (experimental.) 13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c code. Use generation counts, get rid of unneded collpase operations, and clean up the cluster code. 14) Make vm_zone more suitable for TSM. This commit is partially as a result of discussions and contributions from other people, including DG, Tor Egge, PHK, and probably others that I have forgotten to attribute (so let me know, if I forgot.) This is not the infamous, final cleanup of the vnode stuff, but a necessary step. Vnode mgmt should be correct, but things might still change, and there is still some missing stuff (like ioopt, and physical backing of non-merged cache files, debugging of layering concepts.)	1998-01-22 17:30:44 +00:00
John Dyson	4722175765	Tie up some loose ends in vnode/object management. Remove an unneeded config option in pmap. Fix a problem with faulting in pages. Clean-up some loose ends in swap pager memory management. The system should be much more stable, but all subtile bugs aren't fixed yet.	1998-01-17 09:17:02 +00:00
John Dyson	925a3a419a	Fix some vnode management problems, and better mgmt of vnode free list. Fix the UIO optimization code. Fix an assumption in vm_map_insert regarding allocation of swap pagers. Fix an spl problem in the collapse handling in vm_object_deallocate. When pages are freed from vnode objects, and the criteria for putting the associated vnode onto the free list is reached, either put the vnode onto the list, or put it onto an interrupt safe version of the list, for further transfer onto the actual free list. Some minor syntax changes changing pre-decs, pre-incs to post versions. Remove a bogus timeout (that I added for debugging) from vn_lock. PHK will likely still have problems with the vnode list management, and so do I, but it is better than it was.	1998-01-12 01:46:33 +00:00
John Dyson	95e5e988e0	Make our v_usecount vnode reference count work identically to the original BSD code. The association between the vnode and the vm_object no longer includes reference counts. The major difference is that vm_object's are no longer freed gratuitiously from the vnode, and so once an object is created for the vnode, it will last as long as the vnode does. When a vnode object reference count is incremented, then the underlying vnode reference count is incremented also. The two "objects" are now more intimately related, and so the interactions are now much less complex. When vnodes are now normally placed onto the free queue with an object still attached. The rundown of the object happens at vnode rundown time, and happens with exactly the same filesystem semantics of the original VFS code. There is absolutely no need for vnode_pager_uncache and other travesties like that anymore. A side-effect of these changes is that SMP locking should be much simpler, the I/O copyin/copyout optimizations work, NFS should be more ponderable, and further work on layered filesystems should be less frustrating, because of the totally coherent management of the vnode objects and vnodes. Please be careful with your system while running this code, but I would greatly appreciate feedback as soon a reasonably possible.	1998-01-06 05:26:17 +00:00
John Dyson	6d94bea461	Improve my copyright.	1997-12-22 11:54:00 +00:00
John Dyson	f2e6e69d92	Slight performance improvement, removal of unneeded SPLs.	1997-12-07 04:06:41 +00:00
Bruce Evans	1cd52ec333	Don't include <sys/lock.h> in headers when only `struct simplelock' is required. Fixed everything that depended on the pollution.	1997-12-05 19:55:52 +00:00
Poul-Henning Kamp	ab3f746966	In all such uses of struct buf: 's/b_un.b_addr/b_data/g'	1997-12-02 21:07:20 +00:00
John Dyson	b4b3edc1f4	Fix a serious problem during resizing buffers where old buffers address space wasn't being properly reclaimed. Submitted by: Bruce Evans <bde@freebsd.org>	1997-12-01 19:04:00 +00:00
John Dyson	4ced7dd5bf	Avoid manipulating the buffer map at interrupt time by deferring bfreekva to getnewbuf, and remove from brelse. Reviewed by: dg@root.com	1997-11-24 06:18:27 +00:00
Poul-Henning Kamp	4a11ca4e29	Remove a bunch of variables which were unused both in GENERIC and LINT. Found by: -Wunused	1997-11-07 08:53:44 +00:00
Poul-Henning Kamp	cb226aaa62	Move the "retval" (3rd) parameter from all syscall functions and put it in struct proc instead. This fixes a boatload of compiler warning, and removes a lot of cruft from the sources. I have not removed the /ARGSUSED/, they will require some looking at. libkvm, ps and other userland struct proc frobbing programs will need recompiled.	1997-11-06 19:29:57 +00:00
Bruce Evans	55b211e3af	Removed unused #includes.	1997-10-28 15:59:26 +00:00
Poul-Henning Kamp	dba3870c10	VFS interior redecoration. Rename vn_default_error to vop_defaultop all over the place. Move vn_bwrite from vfs_bio.c to vfs_default.c and call it vop_stdbwrite. Use vop_null instead of nullop. Move vop_nopoll from vfs_subr.c to vfs_default.c Move vop_sharedlock from vfs_subr.c to vfs_default.c Move vop_nolock from vfs_subr.c to vfs_default.c Move vop_nounlock from vfs_subr.c to vfs_default.c Move vop_noislocked from vfs_subr.c to vfs_default.c Use vop_ebadf instead of *_ebadf. Add vop_defaultop for getpages on master vnode in MFS.	1997-10-26 20:55:39 +00:00
Poul-Henning Kamp	a1c995b626	Last major round (Unless Bruce thinks of somthing :-) of malloc changes. Distribute all but the most fundamental malloc types. This time I also remembered the trick to making things static: Put "static" in front of them. A couple of finer points by: bde	1997-10-12 20:26:33 +00:00
Poul-Henning Kamp	55166637cd	Distribute and statizice a lot of the malloc M_* types. Substantial input from: bde	1997-10-11 18:31:40 +00:00
Justin T. Gibbs	ab36c06737	init_main.c subr_autoconf.c: Add support for "interrupt driven configuration hooks". A component of the kernel can register a hook, most likely during auto-configuration, and receive a callback once interrupt services are available. This callback will occur before the root and dump devices are configured, so the configuration task can affect the selection of those two devices or complete any tasks that need to be performed prior to launching init. System boot is posponed so long as a hook is registered. The hook owner is responsible for removing the hook once their task is complete or the system boot can continue. kern_acct.c kern_clock.c kern_exit.c kern_synch.c kern_time.c: Change the interface and implementation for the kernel callout service. The new implemntaion is based on the work of Adam M. Costello and George Varghese, published in a technical report entitled "Redesigning the BSD Callout and Timer Facilities". The interface used in FreeBSD is a little different than the one outlined in the paper. The new function prototypes are: struct callout_handle timeout(void (func)(void ), void arg, int ticks); void untimeout(void (func)(void ), void arg, struct callout_handle handle); If a client wishes to remove a timeout, it must store the callout_handle returned by timeout and pass it to untimeout. The new implementation gives 0(1) insert and removal of callouts making this interface scale well even for applications that keep 100s of callouts outstanding. See the updated timeout.9 man page for more details.	1997-09-21 22:00:25 +00:00
John Dyson	804cd17e21	Re-institute a bugfix in allocation of anonymous buffer memory.	1997-09-21 04:49:30 +00:00
Poul-Henning Kamp	c1f95f1378	The patch is needed in order to not throw away unmodified local filesystem metadata at the first brelse call when the block device vnode has v_tag set to VT_NFS. Reviewed by: phk Submitted by: Tor Egge <tegge@idi.ntnu.no>	1997-09-10 20:09:22 +00:00
Bruce Evans	2d85d0df17	Some staticized variables were still declared to be extern.	1997-09-07 16:56:34 +00:00
John Dyson	a5db4bf475	Back out some incorrect changes that was worse than the original bug.	1997-08-26 04:36:27 +00:00
John Dyson	745b842305	Some corrections to the anonymous page managment. Submitted by: Peter Chen <pmchen@eecs.umich.edu>	1997-08-21 01:35:37 +00:00
John Dyson	c0ecffb96b	Modify the scheduling policy to take into account disk I/O waits as chargeable CPU usage. This should mitigate the problem of processes doing disk I/O hogging the CPU. Various users have reported the problem, and test code shows that the problem should now be gone.	1997-08-09 10:13:32 +00:00
John Dyson	6b195d32a1	Fix a problem with the VN device. Specifically, the VN device can cause a problem of spiraling death due to buffer resource limitations. The vfs_bio code in general had little ability to handle buffer resource management, and now it does. Also, there are a lot more knobs for tuning the vfs_bio code now. The knobs came free because of the need that there always be some immediately available buffers (non-delayed or locked) for use. Note that the buffer cache code is much less likely to get bogged down with lots of delayed writes, even more so than before.	1997-06-15 17:56:53 +00:00

1 2 3 4 5 ...

270 Commits