freebsd-nq

Author	SHA1	Message	Date
Alan Cox	2be21c5e68	o Create vm_pageq_enqueue() to encapsulate code that is duplicated time and again in vm_page.c and vm_pageq.c. o Delete unusused prototypes. (Mainly a result of the earlier renaming of various functions from vm_page_() to vm_pageq_().)	2002-03-04 18:55:26 +00:00
Tor Egge	d2760948fe	Add a page queue, PQ_HOLD, that temporarily owns pages with nonzero hold count that would otherwise be on one of the free queues. This eliminates a panic when broken programs unmap memory that still has pending IO from raw devices. Reviewed by: dillon, alc	2002-02-19 23:19:30 +00:00
Mike Silbersack	0c9e47230a	Add one more comment to the OOM changes so that future readers of the code may better understand the code. Suggested by: dillon MFC after: 1 week	2002-02-19 18:50:49 +00:00
Mike Silbersack	ef6020d187	Changes to make the OOM killer much more effective: - Allow the OOM killer to target processes currently locked in memory. These very often are the ones doing the memory hogging. - Drop the wakeup priority of processes currently sleeping while waiting for their page fault to complete. In order for the OOM killer to work well, the killed process and other system processes waiting on memory must be allowed to wakeup first. Reviewed by: dillon MFC after: 1 week	2002-02-19 18:34:02 +00:00
Matthew Dillon	3ebeaf5984	This fixes a large number of bugs in our NFS client side code. A recent commit by Kirk also fixed a softupdates bug that could easily be triggered by server side NFS. * An edge case with shared R+W mmap()'s and truncate whereby the system would inappropriately clear the dirty bits on still-dirty data. (applicable to all filesystems) THIS FIX TEMPORARILY DISABLED PENDING FURTHER TESTING. see vm/vm_page.c line 1641 * The straddle case for VM pages and buffer cache buffers when truncating. (applicable to NFS client side) * Possible SMP database corruption due to vm_pager_unmap_page() not clearing the TLB for the other cpu's. (applicable to NFS client side but could effect all filesystems). Note: not considered serious since the corruption occurs beyond the file EOF. * When flusing a dirty buffer due to B_CACHE getting cleared, we were accidently setting B_CACHE again (that is, bwrite() sets B_CACHE), when we really want it to stay clear after the write is complete. This resulted in a corrupt buffer. (applicable to all filesystems but probably only triggered by NFS) * We have to call vtruncbuf() when ftruncate()ing to remove any buffer cache buffers. This is still tentitive, I may be able to remove it due to the second bug fix. (applicable to NFS client side) * vnode_pager_setsize() race against nfs_vinvalbuf()... we have to set n_size before calling nfs_vinvalbuf or the NFS code may recursively vnode_pager_setsize() to the original value before the truncate. This is what was causing the user mmap bus faults in the nfs tester program. (applicable to NFS client side) * Fix to softupdates (see ufs/ffs/ffs_inode.c 1.73, commit made by Kirk). Testing program written by: Avadis Tevanian, Jr. Testing program supplied by: jkh / Apple (see Dec2001 posting to freebsd-hackers with Subject 'NFS: How to make FreeBS fall on its face in one easy step') MFC after: 1 week	2001-12-14 01:16:57 +00:00
Matthew Dillon	245df27cee	Implement kern.maxvnodes. adjusting kern.maxvnodes now actually has a real effect. Optimize vfs_msync(). Avoid having to continually drop and re-obtain mutexes when scanning the vnode list. Improves looping case by 500%. Optimize ffs_sync(). Avoid having to continually drop and re-obtain mutexes when scanning the vnode list. This makes a couple of assumptions, which I believe are ok, in regards to vnode stability when the mount list mutex is held. Improves looping case by 500%. (more optimization work is needed on top of these fixes) MFC after: 1 week	2001-10-26 00:08:05 +00:00
Peter Wemm	3516c025ff	Implement idle zeroing of pages. I've been tinkering with this on and off since John Dyson left his work-in-progress. It is off by default for now. sysctl vm.zeroidle_enable=1 to turn it on. There are some hacks here to deal with the present lack of preemption - we yield after doing a small number of pages since we wont preempt otherwise. This is basically Matt's algorithm [with hysteresis] with an idle process to call it in a similar way it used to be called from the idle loop. I cleaned up the includes a fair bit here too.	2001-08-25 05:00:44 +00:00
Matthew Dillon	0b76df7146	KASSERT if vm_page_t->wire_count overflows.	2001-08-22 04:01:56 +00:00
John Baldwin	8ec48c6dbf	- Remove asleep(), await(), and M_ASLEEP. - Callers of asleep() and await() have been converted to calling tsleep(). The only caller outside of M_ASLEEP was the ata driver, which called both asleep() and await() with spl-raised, so there was no need for the asleep() and await() pair. M_ASLEEP was unused. Reviewed by: jasone, peter	2001-08-10 06:37:05 +00:00
Jake Burkholder	3a9b5daf48	Oops. Last commit to vm_object.c should have got these files too. Remove the use of atomic ops to manipulate vm_object and vm_page flags. Giant is required here, so they are superfluous. Discussed with: dillon	2001-07-31 04:09:52 +00:00
Assar Westerlund	d3e5863fa9	make vm_page_select_cache static Requested by: bde	2001-07-23 12:34:31 +00:00
Matthew Dillon	6d03d577a5	Reorg vm_page.c into vm_page.c, vm_pageq.c, and vm_contig.c (for contigmalloc). Also removed some spl's and added some VM mutexes, but they are not actually used yet, so this commit does not really make any operational changes to the system. vm_page.c relates to vm_page_t manipulation, including high level deactivation, activation, etc... vm_pageq.c relates to finding free pages and aquiring exclusive access to a page queue (exclusivity part not yet implemented). And the world still builds... :-)	2001-07-04 23:27:09 +00:00
Matthew Dillon	1b40f8c036	Change inlines back into mainline code in preparation for mutexing. Also, most of these inlines had been bloated in -current far beyond their original intent. Normalize prototypes and function declarations to be ANSI only (half already were). And do some general cleanup. (kernel size also reduced by 50-100K, but that isn't the prime intent)	2001-07-04 20:15:18 +00:00
Matthew Dillon	54d9214595	whitespace / register cleanup	2001-07-04 19:00:13 +00:00
Matthew Dillon	0cddd8f023	With Alfred's permission, remove vm_mtx in favor of a fine-grained approach (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.	2001-07-04 16:20:28 +00:00
Matthew Dillon	ac8f990bde	This patch implements O_DIRECT about 80% of the way. It takes a patchset Tor created a while ago, removes the raw I/O piece (that has cache coherency problems), and adds a buffer cache / VM freeing piece. Essentially this patch causes O_DIRECT I/O to not be left in the cache, but does not prevent it from going through the cache, hence the 80%. For the last 20% we need a method by which the I/O can be issued directly to buffer supplied by the user process and bypass the buffer cache entirely, but still maintain cache coherency. I also have the code working under -stable but the changes made to sys/file.h may not be MFCable, so an MFC is not on the table yet. Submitted by: tegge, dillon	2001-05-24 07:22:27 +00:00
John Baldwin	7d4ad42de5	Sort includes.	2001-05-22 07:01:11 +00:00
Alfred Perlstein	2395531439	Introduce a global lock for the vm subsystem (vm_mtx). vm_mtx does not recurse and is required for most low level vm operations. faults can not be taken without holding Giant. Memory subsystems can now call the base page allocators safely. Almost all atomic ops were removed as they are covered under the vm mutex. Alpha and ia64 now need to catch up to i386's trap handlers. FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties). Reviewed (partially) by: jake, jhb	2001-05-19 01:28:09 +00:00
Mark Murray	fb919e4d5a	Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)	2001-05-01 08:13:21 +00:00
John Baldwin	136d8f42b9	Unrevert the pmap_map() changes. They weren't broken on x86. Sense beaten into me by: peter	2001-03-07 05:29:21 +00:00
John Baldwin	4a01ebd482	Back out the pmap_map() change for now, it isn't completely stable on the i386.	2001-03-07 01:04:17 +00:00
John Baldwin	968950e5d1	- Rework pmap_map() to take advantage of direct-mapped segments on supported architectures such as the alpha. This allows us to save on kernel virtual address space, TLB entries, and (on the ia64) VHPT entries. pmap_map() now modifies the passed in virtual address on architectures that do not support direct-mapped segments to point to the next available virtual address. It also returns the actual address that the request was mapped to. - On the IA64 don't use a special zone of PV entries needed for early calls to pmap_kenter() during pmap_init(). This gets us in trouble because we end up trying to use the zone allocator before it is initialized. Instead, with the pmap_map() change, the number of needed PV entries is small enough that we can get by with a static pool that is used until pmap_init() is complete. Submitted by: dfr Debugging help: peter Tested by: me	2001-03-06 06:06:42 +00:00
Andrew Gallatin	c909b97167	Allocate vm_page_array and vm_page_buckets from the end of the biggest chunk of memory, rather than from the start. This fixes problems allocating bouncebuffers on alphas where there is only 1 chunk of memory (unlike PCs where there is generally at least one small chunk and a large chunk). Having 1 chunk had been fatal, because these structures take over 13MB on a machine with 1GB of ram. This doesn't leave much room for other structures and bounce buffers if they're at the front. Reviewed by: dfr, anderson@cs.duke.edu, silence on -arch Tested by: Yoriaki FUJIMORI <fujimori@grafin.fujimori.cache.waseda.ac.jp>	2001-03-01 19:21:24 +00:00
Matthew Dillon	2b6b0df712	This implements a better launder limiting solution. There was a solution in 4.2-REL which I ripped out in -stable and -current when implementing the low-memory handling solution. However, maxlaunder turns out to be the saving grace in certain very heavily loaded systems (e.g. newsreader box). The new algorithm limits the number of pages laundered in the first pageout daemon pass. If that is not sufficient then suceessive will be run without any limit. Write I/O is now pipelined using two sysctls, vfs.lorunningspace and vfs.hirunningspace. This prevents excessive buffered writes in the disk queues which cause long (multi-second) delays for reads. It leads to more stable (less jerky) and generally faster I/O streaming to disk by allowing required read ops (e.g. for indirect blocks and such) to occur without interrupting the write stream, amoung other things. NOTE: eventually, filesystem write I/O pipelining needs to be done on a per-device basis. At the moment it is globalized.	2000-12-26 19:41:38 +00:00
Poul-Henning Kamp	065b25803d	Fix floppy drives on machines with lots of RAM. The fix works by reverting the ordering of free memory so that the chances of contig_malloc() succeeding increases. PR: 23291 Submitted by: Andrew Atrens <atrens@nortel.ca>	2000-12-18 20:12:13 +00:00
Matthew Dillon	936524aa02	Implement a low-memory deadlock solution. Removed most of the hacks that were trying to deal with low-memory situations prior to now. The new code is based on the concept that I/O must be able to function in a low memory situation. All major modules related to I/O (except networking) have been adjusted to allow allocation out of the system reserve memory pool. These modules now detect a low memory situation but rather then block they instead continue to operate, then return resources to the memory pool instead of cache them or leave them wired. Code has been added to stall in a low-memory situation prior to a vnode being locked. Thus situations where a process blocks in a low-memory condition while holding a locked vnode have been reduced to near nothing. Not only will I/O continue to operate, but many prior deadlock conditions simply no longer exist. Implement a number of VFS/BIO fixes (found by Ian): in biodone(), bogus-page replacement code, the loop was not properly incrementing loop variables prior to a continue statement. We do not believe this code can be hit anyway but we aren't taking any chances. We'll turn the whole section into a panic (as it already is in brelse()) after the release is rolled. In biodone(), the foff calculation was incorrectly clamped to the iosize, causing the wrong foff to be calculated for pages in the case of an I/O error or biodone() called without initiating I/O. The problem always caused a panic before. Now it doesn't. The problem is mainly an issue with NFS. Fixed casts for ~PAGE_MASK. This code worked properly before only because the calculations use signed arithmatic. Better to properly extend PAGE_MASK first before inverting it for the 64 bit masking op. In brelse(), the bogus_page fixup code was improperly throwing away the original contents of 'm' when it did the j-loop to fix the bogus pages. The result was that it would potentially invalidate parts of the WRONG page(!), leading to corruption. There may still be cases where a background bitmap write is being duplicated, causing potential corruption. We have identified a potentially serious bug related to this but the fix is still TBD. So instead this patch contains a KASSERT to detect the problem and panic the machine rather then continue to corrupt the filesystem. The problem does not occur very often.. it is very hard to reproduce, and it may or may not be the cause of the corruption people have reported. Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>) Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>	2000-11-18 23:06:26 +00:00
Kirk McKusick	c904bbbdd8	Simplify and rationalise the management of the vnode free list (preparing the code to add snapshots).	2000-07-04 04:32:40 +00:00
Matthew Dillon	8b03c8ed5e	This is a cleanup patch to Peter's new OBJT_PHYS VM object type and sysv shared memory support for it. It implements a new PG_UNMANAGED flag that has slightly different characteristics from PG_FICTICIOUS. A new sysctl, kern.ipc.shm_use_phys has been added to enable the use of physically-backed sysv shared memory rather then swap-backed. Physically backed shm segments are not tracked with PV entries, allowing programs which use a large shm segment as a rendezvous point to operate without eating an insane amount of KVM in the PV entry management. Read: Oracle. Peter's OBJT_PHYS object will also allow us to eventually implement page-table sharing and/or 4MB physical page support for such segments. We're half way there.	2000-05-29 22:40:54 +00:00
Peter Wemm	0385347c1a	Implement an optimization of the VM<->pmap API. Pass vm_page_t's directly to various pmap_*() functions instead of looking up the physical address and passing that. In many cases, the first thing the pmap code was doing was going to a lot of trouble to get back the original vm_page_t, or it's shadow pv_table entry. Inspired by: John Dyson's 1998 patches. Also: Eliminate pv_table as a seperate thing and build it into a machine dependent part of vm_page_t. This eliminates having a seperate set of structions that shadow each other in a 1:1 fashion that we often went to a lot of trouble to translate from one to the other. (see above) This happens to save 4 bytes of physical memory for each page in the system. (8 bytes on the Alpha). Eliminate the use of the phys_avail[] array to determine if a page is managed (ie: it has pv_entries etc). Store this information in a flag. Things like device_pager set it because they create vm_page_t's on the fly that do not have pv_entries. This makes it easier to "unmanage" a page of physical memory (this will be taken advantage of in subsequent commits). Add a function to add a new page to the freelist. This could be used for reclaiming the previously wasted pages left over from preloaded loader(8) files. Reviewed by: dillon	2000-05-21 12:50:18 +00:00
Philippe Charnier	5929bcfaba	Revert spelling mistake I made in the previous commit Requested by: Alan and Bruce	2000-03-27 20:41:17 +00:00
Philippe Charnier	956f31353c	Spelling	2000-03-26 15:20:23 +00:00
Poul-Henning Kamp	db5f635acc	Eliminate the undocumented, experimental, non-delivering and highly dangerous MAX_PERF option.	2000-03-16 08:51:55 +00:00
Matthew Dillon	4f79d873c1	Add MAP_NOSYNC feature to mmap(), and MADV_NOSYNC and MADV_AUTOSYNC to madvise(). This feature prevents the update daemon from gratuitously flushing dirty pages associated with a mapped file-backed region of memory. The system pager will still page the memory as necessary and the VM system will still be fully coherent with the filesystem. Modifications made by other means to the same area of memory, for example by write(), are unaffected. The feature works on a page-granularity basis. MAP_NOSYNC allows one to use mmap() to share memory between processes without incuring any significant filesystem overhead, putting it in the same performance category as SysV Shared memory and anonymous memory. Reviewed by: julian, alc, dg	1999-12-12 03:19:33 +00:00
Alan Cox	e6ce529511	Two changes: (1) Use vm_page_unqueue_nowakeup in vm_page_alloc instead of duplicating the code. (2) If a wired page is passed to vm_page_free_toq, panic instead of printing a friendly warning. (If we don't panic here, we'll just panic later in vm_page_unwire obscuring the problem.)	1999-11-10 05:23:19 +00:00
Alan Cox	be72f78813	The core of this patch is to vm/vm_page.h. The effects are two-fold: (1) to eliminate an extra (useless) level of indirection in half of the page queue accesses and (2) to use a single name for each queue throughout, instead of, e.g., "vm_page_queue_active" in some places and "vm_page_queues[PQ_ACTIVE]" in others. Reviewed by: dillon	1999-10-30 07:37:14 +00:00
Poul-Henning Kamp	923502ff91	useracc() the prequel: Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs. This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ\|WRITE} rather than B_{READ\|WRITE} as argument.	1999-10-29 18:09:36 +00:00
Matthew Dillon	90ecac61c0	Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com> Replace various VM related page count calculations strewn over the VM code with inlines to aid in readability and to reduce fragility in the code where modules depend on the same test being performed to properly sleep and wakeup. Split out a portion of the page deactivation code into an inline in vm_page.c to support vm_page_dontneed(). add vm_page_dontneed(), which handles the madvise MADV_DONTNEED feature in a related commit coming up for vm_map.c/vm_object.c. This code prevents degenerate cases where an essentially active page may be rotated through a subset of the paging lists, resulting in premature disposal.	1999-09-17 04:56:40 +00:00
Peter Wemm	c3aac50f28	$Id$ -> $FreeBSD$	1999-08-28 01:08:13 +00:00
Alan Cox	14068cfed2	vm_page_alloc and contigmalloc1: Verify that free pages are not dirty. Submitted by: dillon	1999-08-20 06:32:00 +00:00
Alan Cox	0e70993526	vm_page_free_toq: Update the comment to reflect the demise of PQ_ZERO and remove a (now) useless test.	1999-08-17 18:09:01 +00:00
Alan Cox	3fc3fec6d3	vm_page_free_toq: Clear the dirty bit mask (vm_page_undirty) before adding the page to the free page queue. Submitted by: dillon	1999-08-17 05:08:39 +00:00
Alan Cox	6c91c1dc3f	contigmalloc1: If a page is found in the wrong queue, panic instead of silently ignoring the problem.	1999-08-11 05:12:00 +00:00
Peter Wemm	ed6d0b65f0	Add a contigfree() as a corollary to contigmalloc() as it's not clear which free routine to use and people are tempted to use free() (which doesn't work)	1999-08-10 22:21:13 +00:00
Alan Cox	5d2aec8927	Change the type of vpgqueues::lcnt from "int *" to "int". The indirection served no purpose.	1999-07-31 18:31:00 +00:00
Alan Cox	755292ace1	vm_page_queue_init: Remove the initialization of PQ_NONE's cnt and lcnt. They aren't used. vm_page_insert: Remove an unnecessary dereference. vm_page_wire: Remove the one and only (and thus pointless) reference to PQ_NONE's lcnt.	1999-07-31 04:19:49 +00:00
Peter Wemm	3efc015bae	Fix some int/long printf problems for the Alpha	1999-07-01 19:53:43 +00:00
Alan Cox	9c89c228fe	Remove (1) "extern" declarations for variables that were previously made "static" and (2) initialized but unused variables.	1999-06-22 07:18:20 +00:00
Alan Cox	60ff97b002	Remove vm_object::cache_count and vm_object::wired_count. They are not used. (Nor is there any planned use by John who introduced them.) Reviewed by: "John S. Dyson" <toor@dyson.iquest.net>	1999-06-20 21:47:02 +00:00
Alan Cox	c207703465	Set cnt.v_page_size to PAGE_SIZE rather than DEFAULT_PAGE_SIZE so that "vmstat -s" reports the correct value on the Alpha. Submitted by: Hidetoshi Shimokawa <simokawa@sat.t.u-tokyo.ac.jp>	1999-06-20 04:55:29 +00:00
Alan Cox	4221e284a3	The VFS/BIO subsystem contained a number of hacks in order to optimize piecemeal, middle-of-file writes for NFS. These hacks have caused no end of trouble, especially when combined with mmap(). I've removed them. Instead, NFS will issue a read-before-write to fully instantiate the struct buf containing the write. NFS does, however, optimize piecemeal appends to files. For most common file operations, you will not notice the difference. The sole remaining fragment in the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache coherency issues with read-merge-write style operations. NFS also optimizes the write-covers-entire-buffer case by avoiding the read-before-write. There is quite a bit of room for further optimization in these areas. The VM system marks pages fully-valid (AKA vm_page_t->valid = VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault. This is not correct operation. The vm_pager_get_pages() code is now responsible for marking VM pages all-valid. A number of VM helper routines have been added to aid in zeroing-out the invalid portions of a VM page prior to the page being marked all-valid. This operation is necessary to properly support mmap(). The zeroing occurs most often when dealing with file-EOF situations. Several bugs have been fixed in the NFS subsystem, including bits handling file and directory EOF situations and buf->b_flags consistancy issues relating to clearing B_ERROR & B_INVAL, and handling B_DONE. getblk() and allocbuf() have been rewritten. B_CACHE operation is now formally defined in comments and more straightforward in implementation. B_CACHE for VMIO buffers is based on the validity of the backing store. B_CACHE for non-VMIO buffers is based simply on whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear, and vise-versa). biodone() is now responsible for setting B_CACHE when a successful read completes. B_CACHE is also set when a bdwrite() is initiated and when a bwrite() is initiated. VFS VOP_BWRITE routines (there are only two - nfs_bwrite() and bwrite()) are now expected to set B_CACHE. This means that bowrite() and bawrite() also set B_CACHE indirectly. There are a number of places in the code which were previously using buf->b_bufsize (which is DEV_BSIZE aligned) when they should have been using buf->b_bcount. These have been fixed. getblk() now clears B_DONE on return because the rest of the system is so bad about dealing with B_DONE. Major fixes to NFS/TCP have been made. A server-side bug could cause requests to be lost by the server due to nfs_realign() overwriting other rpc's in the same TCP mbuf chain. The server's kernel must be recompiled to get the benefit of the fixes. Submitted by: Matthew Dillon <dillon@apollo.backplane.com>	1999-05-02 23:57:16 +00:00
Julian Elischer	8d17e69460	Catch a case spotted by Tor where files mmapped could leave garbage in the unallocated parts of the last page when the file ended on a frag but not a page boundary. Delimitted by tags PRE_MATT_MMAP_EOF and POST_MATT_MMAP_EOF, in files alpha/alpha/pmap.c i386/i386/pmap.c nfs/nfs_bio.c vm/pmap.h vm/vm_page.c vm/vm_page.h vm/vnode_pager.c miscfs/specfs/spec_vnops.c ufs/ufs/ufs_readwrite.c kern/vfs_bio.c Submitted by: Matt Dillon <dillon@freebsd.org> Reviewed by: Alan Cox <alc@freebsd.org>	1999-04-05 19:38:30 +00:00
Alan Cox	61fc5ee627	Construct the free queue(s) in descending order (by physical address) so that the first 16MB of physical memory is allocated last rather than first. On large-memory machines, this avoids the exhaustion of low physical memory before isa_dmainit has run.	1999-03-19 05:21:03 +00:00
Matthew Dillon	d1bf5d56b6	Remove unnecessary page protects on map_split and collapse operations. Fix bug where an object's OBJ_WRITEABLE/OBJ_MIGHTBEDIRTY flags do not get set under certain circumstances ( page rename case ). Reviewed by: Alan Cox <alc@cs.rice.edu>, John Dyson	1999-02-24 21:26:26 +00:00
Matthew Dillon	efcae3d355	Minor reorganization of vm_page_alloc(). No functional changes have been made but the code has been reorganized and documented to make it more readable, reduce the size of the code, and optimize the branch path caching capabilities that most modern processors have.	1999-02-15 06:52:14 +00:00
Matthew Dillon	faa273d5c2	Rip out PQ_ZERO queue. PQ_ZERO functionality is now combined in with PQ_FREE. There is little operational difference other then the kernel being a few kilobytes smaller and the code being more readable. * vm_page_select_free() has been greatly simplified. * The PQ_ZERO page queue and supporting structures have been removed * vm_page_zero_idle() revamped (see below) PG_ZERO setting and clearing has been migrated from vm_page_alloc() to vm_page_free[_zero]() and will eventually be guarenteed to remain tracked throughout a page's life ( if it isn't already ). When a page is freed, PG_ZERO pages are appended to the appropriate tailq in the PQ_FREE queue while non-PG_ZERO pages are prepended. When locating a new free page, PG_ZERO selection operates from within vm_page_list_find() ( get page from end of queue instead of beginning of queue ) and then only occurs in the nominal critical path case. If the nominal case misses, both normal and zero-page allocation devolves into the same _vm_page_list_find() select code without any specific zero-page optimizations. Additionally, vm_page_zero_idle() has been revamped. Hysteresis has been added and zero-page tracking adjusted to conform with the other changes. Currently hysteresis is set at 1/3 (lo) and 1/2 (hi) the number of free pages. We may wish to increase both parameters as time permits. The hysteresis is designed to avoid silly zeroing in borderline allocation/free situations.	1999-02-08 00:37:36 +00:00
Matthew Dillon	a0e7b3e5ce	Remove L1 cache coloring optimization ( leave L2 cache coloring opt ). Rewrite vm_page_list_find() and vm_page_select_free() - make inline out of nominal case.	1999-02-07 20:45:15 +00:00
Matthew Dillon	8aef171243	Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile	1999-01-28 00:57:57 +00:00
Matthew Dillon	2f586e1b2c	Undo last commit - not a bug, just duplicate code. PG_MAPPED and PG_WRITEABLE are already cleared by vm_page_protect().	1999-01-24 07:06:52 +00:00
Matthew Dillon	68af6d169b	vm_map_split() used to dirty the page manually after calling vm_page_rename(), but never pulled the page off PQ_CACHE if it was on PQ_CACHE. Dirty pages in PQ_CACHE are not allowed and a KASSERT was added in -4.x to test for this... and got hit. In -4.x, vm_page_rename() automatically dirties the page. This commit also has it deal with the PQ_CACHE case, deactivating the page in that case.	1999-01-24 06:00:31 +00:00
Matthew Dillon	e1a4feafd0	Clear PG_MAPPED as well as PG_WRITEABLE when a page is moved to the cache.	1999-01-24 02:29:26 +00:00
Matthew Dillon	c9fa34cf07	Clear PG_WRITEABLE in vm_page_cache(). This may or may not be a bug, but the bit should definitely be cleared.	1999-01-24 01:04:04 +00:00
Matthew Dillon	060282de8a	The hash table used to be a table of doubly-link list headers ( two pointers per entry ). The table has been changed to a singly linked list of vm_page_t pointers. The table has been doubled in size, but the entries only take half the space so a net-zero change in memory use. The hash function has been changed, hopefully for the better. The combination of the larger hash table size of changed function should keep the chain length down to a reasonable number (0-3, average 1). vm_object->page_hint has been removed. This 'optimization' was not only never needed, but costs as much as a hash chain link to implement. While having page_hint in vm_object might result in better locality of reference, the cost is not worth the space in vm_object or the extra instructions in my view. vm_page_alloc*() functions have been inlined and call a generalized non-inlined vm_page_alloc_toq() which combines the standard alloc and zero-page alloc functions together, reducing code size and the L1 cache footprint. Some reordering has been done... not much. The delinking code should be faster ( because unlinking a doubly-linked list requires four memory ops and unlinking a singly linked list only requires two ), and we get a hash consistancy check for free. vm_page_rename() now automatically sets the page's dirty bits. vm_page_alloc() does not try to manually inline freeing a cache page. Instead, it now properly calls vm_page_free(m) ... vm_page_free() is really too complex to manually inline. vm_await(), supporting asleep(), has been added.	1999-01-21 10:01:49 +00:00
Matthew Dillon	1c7c3c6a86	This is a rather large commit that encompasses the new swapper, changes to the VM system to support the new swapper, VM bug fixes, several VM optimizations, and some additional revamping of the VM code. The specific bug fixes will be documented with additional forced commits. This commit is somewhat rough in regards to code cleanup issues. Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>	1999-01-21 08:29:12 +00:00
Eivind Eklund	219cbf59f2	KNFize, by bde.	1999-01-10 01:58:29 +00:00
Eivind Eklund	5526d2d920	Split DIAGNOSTIC -> DIAGNOSTIC, INVARIANTS, and INVARIANT_SUPPORT as discussed on -hackers. Introduce 'KASSERT(assertion, ("panic message", args))' for simple check + panic. Reviewed by: msmith	1999-01-08 17:31:30 +00:00
Matthew Dillon	9858fcda2e	Update comments to routines in vm_page.c, most especially whether a routine can block or not as part of a general effort to carefully document blocking/non-blocking calls in the kernel.	1998-12-23 01:52:47 +00:00
David Greenman	4f6e1f8bfc	Closed a small race condition between wiring/unwiring pages that involved the page's wire_count.	1998-11-11 15:07:57 +00:00
David Greenman	c8d14c765f	Fixed wrong comments in and about vm_page_deactivate().	1998-10-28 13:41:43 +00:00
David Greenman	730075613a	Added a second argument, "activate" to the vm_page_unwire() call so that the caller can select either inactive or active queue to put the page on.	1998-10-28 13:37:02 +00:00
Poul-Henning Kamp	f5ef029e92	Nitpicking and dusting performed on a train. Removes trivial warnings about unused variables, labels and other lint.	1998-10-25 17:44:59 +00:00
David Greenman	300ee8246e	Nuked PG_TABLED flag. Replaced with m->object != NULL.	1998-10-21 14:46:42 +00:00
David Greenman	12d534d2f9	Add a diagnostic printf for freeing a wired page. This will eventually be turned into a panic, but I want to make sure that all cases of freeing pages with wire_count==1 (which is/was allowed) have first been fixed.	1998-10-21 11:43:04 +00:00
Doug Rabson	e69763a315	Cosmetic changes to the PAGE_XXX macros to make them consistent with the other objects in vm.	1998-09-04 08:06:57 +00:00
Doug Rabson	069e9bc1b4	Change various syscalls to use size_t arguments instead of u_int. Add some overflow checks to read/write (from bde). Change all modifications to vm_page::flags, vm_page::busy, vm_object::flags and vm_object::paging_in_progress to use operations which are not interruptable. Reviewed by: Bruce Evans <bde@zeta.org.au>	1998-08-24 08:39:39 +00:00
Doug Rabson	56e7ede1c4	Notify pmap when a page is freed on the alpha to allow it to clean up its emulated modified/referenced bits.	1998-07-26 18:15:20 +00:00
Bruce Evans	15c7382561	Cast pointers to [u]intptr_t instead of to [unsigned] long.	1998-07-15 04:17:55 +00:00
Bruce Evans	ac1e407b32	Fixed printf format errors.	1998-07-11 07:46:16 +00:00
Bruce Evans	be160d60ab	Removed unused includes.	1998-06-21 18:02:50 +00:00
Doug Rabson	ecbb00a262	This commit fixes various 64bit portability problems required for FreeBSD/alpha. The most significant item is to change the command argument to ioctl functions from int to u_long. This change brings us inline with various other BSD versions. Driver writers may like to use (__FreeBSD_version == 300003) to detect this change. The prototype FreeBSD/alpha machdep will follow in a couple of days time.	1998-06-07 17:13:14 +00:00
John Dyson	976f208be3	Cleanup and remove some dead code from the initialization.	1998-06-02 05:50:08 +00:00
Peter Wemm	3c33646725	Seatbelts for vm_page_bits() in case a file offset is passed in rather than the page offset. If a large file offset was passed in, a large negative array index could be generated which could cause page faults etc at worst and file corruption at the least. (Pages are allocated within file space on page alignment boundaries, so a file offset being passed in here is harmless to DTRT. The case where this was happening has already been fixed though, this is in case it happens again). Reviewed by: dyson	1998-05-02 03:02:13 +00:00
Bruce Evans	c1087c1324	Support compiling with `gcc -ansi'.	1998-04-15 17:47:40 +00:00
John Dyson	bef608bd7e	Some VM improvements, including elimination of alot of Sig-11 problems. Tor Egge and others have helped with various VM bugs lately, but don't blame him -- blame me!!! pmap.c: 1) Create an object for kernel page table allocations. This fixes a bogus allocation method previously used for such, by grabbing pages from the kernel object, using bogus pindexes. (This was a code cleanup, and perhaps a minor system stability issue.) pmap.c: 2) Pre-set the modify and accessed bits when prudent. This will decrease bus traffic under certain circumstances. vfs_bio.c, vfs_cluster.c: 3) Rather than calculating the beginning virtual byte offset multiple times, stick the offset into the buffer header, so that the calculated offset can be reused. (Long long multiplies are often expensive, and this is a probably unmeasurable performance improvement, and code cleanup.) vfs_bio.c: 4) Handle write recursion more intelligently (but not perfectly) so that it is less likely to cause a system panic, and is also much more robust. vfs_bio.c: 5) getblk incorrectly wrote out blocks that are incorrectly sized. The problem is fixed, and writes blocks out ONLY when B_DELWRI is true. vfs_bio.c: 6) Check that already constituted buffers have fully valid pages. If not, then make sure that the B_CACHE bit is not set. (This was a major source of Sig-11 type problems.) vfs_bio.c: 7) Fix a potential system deadlock due to an incorrectly specified sleep priority while waiting for a buffer write operation. The change that I made opens the system up to serious problems, and we need to examine the issue of process sleep priorities. vfs_cluster.c, vfs_bio.c: 8) Make clustered reads work more correctly (and more completely) when buffers are already constituted, but not fully valid. (This was another system reliability issue.) vfs_subr.c, ffs_inode.c: 9) Create a vtruncbuf function, which is used by filesystems that can truncate files. The vinvalbuf forced a file sync type operation, while vtruncbuf only invalidates the buffers past the new end of file, and also invalidates the appropriate pages. (This was a system reliabiliy and performance issue.) 10) Modify FFS to use vtruncbuf. vm_object.c: 11) Make the object rundown mechanism for OBJT_VNODE type objects work more correctly. Included in that fix, create pager entries for the OBJT_DEAD pager type, so that paging requests that might slip in during race conditions are properly handled. (This was a system reliability issue.) vm_page.c: 12) Make some of the page validation routines be a little less picky about arguments passed to them. Also, support page invalidation change the object generation count so that we handle generation counts a little more robustly. vm_pageout.c: 13) Further reduce pageout daemon activity when the system doesn't need help from it. There should be no additional performance decrease even when the pageout daemon is running. (This was a significant performance issue.) vnode_pager.c: 14) Teach the vnode pager to handle race conditions during vnode deallocations.	1998-03-16 01:56:03 +00:00
John Dyson	e163e201ef	Some cruft left over from my megacommit. A page rotation optimization was a good idea, but can cause instability. That optimization is now removed.	1998-03-08 06:27:30 +00:00
John Dyson	8f9110f6a1	This mega-commit is meant to fix numerous interrelated problems. There has been some bitrot and incorrect assumptions in the vfs_bio code. These problems have manifest themselves worse on NFS type filesystems, but can still affect local filesystems under certain circumstances. Most of the problems have involved mmap consistancy, and as a side-effect broke the vfs.ioopt code. This code might have been committed seperately, but almost everything is interrelated. 1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that are fully valid. 2) Rather than deactivating erroneously read initial (header) pages in kern_exec, we now free them. 3) Fix the rundown of non-VMIO buffers that are in an inconsistent (missing vp) state. 4) Fix the disassociation of pages from buffers in brelse. The previous code had rotted and was faulty in a couple of important circumstances. 5) Remove a gratuitious buffer wakeup in vfs_vmio_release. 6) Remove a crufty and currently unused cluster mechanism for VBLK files in vfs_bio_awrite. When the code is functional, I'll add back a cleaner version. 7) The page busy count wakeups assocated with the buffer cache usage were incorrectly cleaned up in a previous commit by me. Revert to the original, correct version, but with a cleaner implementation. 8) The cluster read code now tries to keep data associated with buffers more aggressively (without breaking the heuristics) when it is presumed that the read data (buffers) will be soon needed. 9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The delay loop waiting is not useful for filesystem locks, due to the length of the time intervals. 10) Correct and clean-up spec_getpages. 11) Implement a fully functional nfs_getpages, nfs_putpages. 12) Fix nfs_write so that modifications are coherent with the NFS data on the server disk (at least as well as NFS seems to allow.) 13) Properly support MS_INVALIDATE on NFS. 14) Properly pass down MS_INVALIDATE to lower levels of the VM code from vm_map_clean. 15) Better support the notion of pages being busy but valid, so that fewer in-transit waits occur. (use p->busy more for pageouts instead of PG_BUSY.) Since the page is fully valid, it is still usable for reads. 16) It is possible (in error) for cached pages to be busy. Make the page allocation code handle that case correctly. (It should probably be a printf or panic, but I want the system to handle coding errors robustly. I'll probably add a printf.) 17) Correct the design and usage of vm_page_sleep. It didn't handle consistancy problems very well, so make the design a little less lofty. After vm_page_sleep, if it ever blocked, it is still important to relookup the page (if the object generation count changed), and verify it's status (always.) 18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up. 19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush. 20) Fix vm_pager_put_pages and it's descendents to support an int flag instead of a boolean, so that we can pass down the invalidate bit.	1998-03-07 21:37:31 +00:00
John Dyson	ffc82b0a70	1) Use a more consistent page wait methodology. 2) Do not unnecessarily force page blocking when paging pages out. 3) Further improve swap pager performance and correctness, including fixing the paging in progress deadlock (except in severe I/O error conditions.) 4) Enable vfs_ioopt=1 as a default. 5) Fix and enable the page prezeroing in SMP mode. All in all, SMP systems especially should show a significant improvement in "snappyness."	1998-03-01 04:18:54 +00:00
Eivind Eklund	303b270b0a	Staticize.	1998-02-09 06:11:36 +00:00
Eivind Eklund	0b08f5f737	Back out DIAGNOSTIC changes.	1998-02-06 12:14:30 +00:00
John Dyson	95461b450d	1) Start using a cleaner and more consistant page allocator instead of the various ad-hoc schemes. 2) When bringing in UPAGES, the pmap code needs to do another vm_page_lookup. 3) When appropriate, set the PG_A or PG_M bits a-priori to both avoid some processor errata, and to minimize redundant processor updating of page tables. 4) Modify pmap_protect so that it can only remove permissions (as it originally supported.) The additional capability is not needed. 5) Streamline read-only to read-write page mappings. 6) For pmap_copy_page, don't enable write mapping for source page. 7) Correct and clean-up pmap_incore. 8) Cluster initial kern_exec pagin. 9) Removal of some minor lint from kern_malloc. 10) Correct some ioopt code. 11) Remove some dead code from the MI swapout routine. 12) Correct vm_object_deallocate (to remove backing_object ref.) 13) Fix dead object handling, that had problems under heavy memory load. 14) Add minor vm_page_lookup improvements. 15) Some pages are not in objects, and make sure that the vm_page.c can properly support such pages. 16) Add some more page deficit handling. 17) Some minor code readability improvements.	1998-02-05 03:32:49 +00:00
Eivind Eklund	47cfdb166d	Turn DIAGNOSTIC into a new-style option.	1998-02-04 22:34:03 +00:00
John Dyson	c15541e7a7	contigalloc doesn't place the allocated page(s) into an object, and now this breaks vm_page_wire (due to wired page accounting per object.) This should fix a problem as described by Donald Maddox.	1998-01-31 20:30:18 +00:00
John Dyson	eaf13dd73a	Change the busy page mgmt, so that when pages are freed, they MUST be PG_BUSY. It is bogus to free a page that isn't busy, because it is in a state of being "unavailable" when being freed. The additional advantage is that the page_remove code has a better cross-check that the page should be busy and unavailable for other use. There were some minor problems with the collapse code, and this plugs those subtile "holes." Also, the vfs_bio code wasn't checking correctly for PG_BUSY pages. I am going to develop a more consistant scheme for grabbing pages, busy or otherwise. For now, we are stuck with the current morass.	1998-01-31 11:56:53 +00:00
John Dyson	2d8acc0f4a	VM level code cleanups. 1) Start using TSM. Struct procs continue to point to upages structure, after being freed. Struct vmspace continues to point to pte object and kva space for kstack. u_map is now superfluous. 2) vm_map's don't need to be reference counted. They always exist either in the kernel or in a vmspace. The vmspaces are managed by reference counts. 3) Remove the "wired" vm_map nonsense. 4) No need to keep a cache of kernel stack kva's. 5) Get rid of strange looking ++var, and change to var++. 6) Change more data structures to use our "zone" allocator. Added struct proc, struct vmspace and struct vnode. This saves a significant amount of kva space and physical memory. Additionally, this enables TSM for the zone managed memory. 7) Keep ioopt disabled for now. 8) Remove the now bogus "single use" map concept. 9) Use generation counts or id's for data structures residing in TSM, where it allows us to avoid unneeded restart overhead during traversals, where blocking might occur. 10) Account better for memory deficits, so the pageout daemon will be able to make enough memory available (experimental.) 11) Fix some vnode locking problems. (From Tor, I think.) 12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp. (experimental.) 13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c code. Use generation counts, get rid of unneded collpase operations, and clean up the cluster code. 14) Make vm_zone more suitable for TSM. This commit is partially as a result of discussions and contributions from other people, including DG, Tor Egge, PHK, and probably others that I have forgotten to attribute (so let me know, if I forgot.) This is not the infamous, final cleanup of the vnode stuff, but a necessary step. Vnode mgmt should be correct, but things might still change, and there is still some missing stuff (like ioopt, and physical backing of non-merged cache files, debugging of layering concepts.)	1998-01-22 17:30:44 +00:00
John Dyson	4722175765	Tie up some loose ends in vnode/object management. Remove an unneeded config option in pmap. Fix a problem with faulting in pages. Clean-up some loose ends in swap pager memory management. The system should be much more stable, but all subtile bugs aren't fixed yet.	1998-01-17 09:17:02 +00:00
John Dyson	925a3a419a	Fix some vnode management problems, and better mgmt of vnode free list. Fix the UIO optimization code. Fix an assumption in vm_map_insert regarding allocation of swap pagers. Fix an spl problem in the collapse handling in vm_object_deallocate. When pages are freed from vnode objects, and the criteria for putting the associated vnode onto the free list is reached, either put the vnode onto the list, or put it onto an interrupt safe version of the list, for further transfer onto the actual free list. Some minor syntax changes changing pre-decs, pre-incs to post versions. Remove a bogus timeout (that I added for debugging) from vn_lock. PHK will likely still have problems with the vnode list management, and so do I, but it is better than it was.	1998-01-12 01:46:33 +00:00
John Dyson	2be70f79f6	Lots of improvements, including restructring the caching and management of vnodes and objects. There are some metadata performance improvements that come along with this. There are also a few prototypes added when the need is noticed. Changes include: 1) Cleaning up vref, vget. 2) Removal of the object cache. 3) Nuke vnode_pager_uncache and friends, because they aren't needed anymore. 4) Correct some missing LK_RETRY's in vn_lock. 5) Correct the page range in the code for msync. Be gentle, and please give me feedback asap.	1997-12-29 00:25:11 +00:00
John Dyson	0aa8918597	Fix the "missing page" problem. Also, improve the performance of page allocation in common cases.	1997-11-06 08:35:50 +00:00
Poul-Henning Kamp	f0d45e6aae	Fix contigmalloc() and contigmalloc1() arguments.	1997-10-10 18:18:47 +00:00
Peter Wemm	f8ddc1e209	Print correct function name in panics	1997-09-13 15:04:52 +00:00
Bruce Evans	79624e2147	Removed unused #includes.	1997-09-01 03:17:34 +00:00

1 2 3 4 5

229 Commits