freebsd-dev

Author	SHA1	Message	Date
Alan Cox	7024db1d40	Push down the page queues lock inside of vm_page_free_toq() and pmap_page_is_mapped() in preparation for removing page queues locking around calls to vm_page_free(). Setting aside the assertion that calls pmap_page_is_mapped(), vm_page_free_toq() now acquires and holds the page queues lock just long enough to actually add or remove the page from the paging queues. Update vm_page_unhold() to reflect the above change.	2010-05-06 16:39:43 +00:00
Alan Cox	5ac59343be	Acquire the page lock around all remaining calls to vm_page_free() on managed pages that didn't already have that lock held. (Freeing an unmanaged page, such as the various pmaps use, doesn't require the page lock.) This allows a change in vm_page_remove()'s locking requirements. It now expects the page lock to be held instead of the page queues lock. Consequently, the page queues lock is no longer required at all by callers to vm_page_rename(). Discussed with: kib	2010-05-05 18:16:06 +00:00
Kip Macy	0ce3ba8cd5	Update locking comment above vm_page: - re-assign page queue lock "Q" - assign page lock "P" - update several uncommented fields - observe that hold_count is now protected by the page lock "P"	2010-05-01 03:41:21 +00:00
Kip Macy	2965a45315	On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib	2010-04-30 00:46:43 +00:00
Alan Cox	e67e0775e6	Align and pad the page queue and free page queue locks so that the linker can't possibly place them together within the same cache line. MFC after: 3 weeks	2009-10-04 18:53:10 +00:00
Alan Cox	461c78604e	Eliminate a stale comment and the two remaining uses of the "register" keyword in this file.	2009-05-30 22:15:55 +00:00
Alan Cox	1c1b26f276	Eliminate page queues locking from bufdone_finish() through the following changes: Rename vfs_page_set_valid() to vfs_page_set_validclean() to reflect what this function actually does. Suggested by: tegge Introduce a new version of vfs_page_set_valid() that does no more than what the function's name implies. Specifically, it does not update the page's dirty mask, and thus it does not require the page queues lock to be held. Update two of the three callers to the old vfs_page_set_valid() to call vfs_page_set_validclean() instead because they actually require the page's dirty mask to be cleared. Introduce vm_page_set_valid(). Reviewed by: tegge	2009-05-13 05:39:39 +00:00
Konstantin Belousov	641e2829b6	Extend the struct vm_page wire_count to u_int to avoid the overflow of the counter, that may happen when too many sendfile(2) calls are being executed with this vnode [1]. To keep the size of the struct vm_page and offsets of the fields accessed by out-of-tree modules, swap the types and locations of the wire_count and cow fields. Add safety checks to detect cow overflow and force fallback to the normal copy code for zero-copy sockets. [2] Reported by: Anton Yuzhaninov <citrin citrin ru> [1] Suggested by: alc [2] Reviewed by: alc MFC after: 2 weeks	2009-01-03 13:24:08 +00:00
Ed Maste	a8a478fce6	Move CTASSERT from header file to source file, per implementation note now in the CTASSERT man page.	2008-09-26 18:44:40 +00:00
Alan Cox	e5b006ffca	Rename vm_pageq_requeue() to vm_page_requeue() on account of its recent migration to vm/vm_page.c.	2008-03-19 20:24:35 +00:00
Alan Cox	1fa94a36b1	Almost seven years ago, vm/vm_page.c was split into three parts: vm/vm_contig.c, vm/vm_page.c, and vm/vm_pageq.c. Today, vm/vm_pageq.c has withered to the point that it contains only four short functions, two of which are only used by vm/vm_page.c. Since I can't foresee any reason for vm/vm_pageq.c to grow, it is time to fold the remaining contents of vm/vm_pageq.c back into vm/vm_page.c. Add some comments. Rename one of the functions, vm_pageq_enqueue(), that is now static within vm/vm_page.c to vm_page_enqueue(). Eliminate PQ_MAXCOUNT as it no longer serves any purpose.	2008-03-18 06:52:15 +00:00
Alan Cox	c944491426	Correct an error of omission in the reimplementation of the page cache: vm_object_page_remove() should convert any cached pages that fall with the specified range to free pages. Otherwise, there could be a problem if a file is first truncated and then regrown. Specifically, some old data from prior to the truncation might reappear. Generalize vm_page_cache_free() to support the conversion of either a subset or the entirety of an object's cached pages. Reported by: tegge Reviewed by: tegge Approved by: re (kensmith)	2007-09-27 04:21:59 +00:00
Alan Cox	7bfda801a8	Change the management of cached pages (PQ_CACHE) in two fundamental ways: (1) Cached pages are no longer kept in the object's resident page splay tree and memq. Instead, they are kept in a separate per-object splay tree of cached pages. However, access to this new per-object splay tree is synchronized by the _free_ page queues lock, not to be confused with the heavily contended page queues lock. Consequently, a cached page can be reclaimed by vm_page_alloc(9) without acquiring the object's lock or the page queues lock. This solves a problem independently reported by tegge@ and Isilon. Specifically, they observed the page daemon consuming a great deal of CPU time because of pages bouncing back and forth between the cache queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of this problem turned out to be a deadlock avoidance strategy employed when selecting a cached page to reclaim in vm_page_select_cache(). However, the root cause was really that reclaiming a cached page required the acquisition of an object lock while the page queues lock was already held. Thus, this change addresses the problem at its root, by eliminating the need to acquire the object's lock. Moreover, keeping cached pages in the object's primary splay tree and memq was, in effect, optimizing for the uncommon case. Cached pages are reclaimed far, far more often than they are reactivated. Instead, this change makes reclamation cheaper, especially in terms of synchronization overhead, and reactivation more expensive, because reactivated pages will have to be reentered into the object's primary splay tree and memq. (2) Cached pages are now stored alongside free pages in the physical memory allocator's buddy queues, increasing the likelihood that large allocations of contiguous physical memory (i.e., superpages) will succeed. Finally, as a result of this change long-standing restrictions on when and where a cached page can be reclaimed and returned by vm_page_alloc(9) are eliminated. Specifically, calls to vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and return a formerly cached page. Consequently, a call to malloc(9) specifying M_NOWAIT is less likely to fail. Discussed with: many over the course of the summer, including jeff@, Justin Husted @ Isilon, peter@, tegge@ Tested by: an earlier version by kris@ Approved by: re (kensmith)	2007-09-25 06:25:06 +00:00
Alan Cox	0f752392c6	Update a comment describing the page queues. Approved by: re (hrs)	2007-07-13 04:42:20 +00:00
Alan Cox	2446e4f02c	Enable the new physical memory allocator. This allocator uses a binary buddy system with a twist. First and foremost, this allocator is required to support the implementation of superpages. As a side effect, it enables a more robust implementation of contigmalloc(9). Moreover, this reimplementation of contigmalloc(9) eliminates the acquisition of Giant by contigmalloc(..., M_NOWAIT, ...). The twist is that this allocator tries to reduce the number of TLB misses incurred by accesses through a direct map to small, UMA-managed objects and page table pages. Roughly speaking, the physical pages that are allocated for such purposes are clustered together in the physical address space. The performance benefits vary. In the most extreme case, a uniprocessor kernel running on an Opteron, I measured an 18% reduction in system time during a buildworld. This allocator does not implement page coloring. The reason is that superpages have much the same effect. The contiguous physical memory allocation necessary for a superpage is inherently colored. Finally, the one caveat is that this allocator does not effectively support prezeroed pages. I hope this is temporary. On i386, this is a slight pessimization. However, on amd64, the beneficial effects of the direct-map optimization outweigh the ill effects. I speculate that this is true in general of machines with a direct map. Approved by: re	2007-06-16 04:57:06 +00:00
Alan Cox	04a18977c8	Define every architecture as either VM_PHYSSEG_DENSE or VM_PHYSSEG_SPARSE depending on whether the physical address space is densely or sparsely populated with memory. The effect of this definition is to determine which of two implementations of vm_page_array and PHYS_TO_VM_PAGE() is used. The legacy implementation is obtained by defining VM_PHYSSEG_DENSE, and a new implementation that trades off time for space is obtained by defining VM_PHYSSEG_SPARSE. For now, all architectures except for ia64 and sparc64 define VM_PHYSSEG_DENSE. Defining VM_PHYSSEG_SPARSE on ia64 allows the entirety of my Itanium 2's memory to be used. Previously, only the first 1 GB could be used. Defining VM_PHYSSEG_SPARSE on sparc64 allows USIIIi-based systems to boot without crashing. This change is a combination of Nathan Whitehorn's patch and my own work in perforce. Discussed with: kmacy, marius, Nathan Whitehorn PR: 112194	2007-05-05 19:50:28 +00:00
Alan Cox	9f5c801b94	Change the way that unmanaged pages are created. Specifically, immediately flag any page that is allocated to a OBJT_PHYS object as unmanaged in vm_page_alloc() rather than waiting for a later call to vm_page_unmanage(). This allows for the elimination of some uses of the page queues lock. Change the type of the kernel and kmem objects from OBJT_DEFAULT to OBJT_PHYS. This allows us to take advantage of the above change to simplify the allocation of unmanaged pages in kmem_alloc() and kmem_malloc(). Remove vm_page_unmanage(). It is no longer used.	2007-02-25 06:14:58 +00:00
Alan Cox	0cd31a0d75	Change the page's CLEANCHK flag from being a page queue mutex synchronized flag to a vm object mutex synchronized flag.	2007-02-22 06:15:52 +00:00
Alan Cox	9af80719db	Replace PG_BUSY with VPO_BUSY. In other words, changes to the page's busy flag, i.e., VPO_BUSY, are now synchronized by the per-vm object lock instead of the global page queues lock.	2006-10-22 04:28:14 +00:00
Alan Cox	e1cb7bc081	Make vm_page_release_contig() static.	2006-09-03 22:24:08 +00:00
Alan Cox	eb4bbba83a	Refactor vm_page_sleep_if_busy() so that the test for a busy page is inlined and a procedure call is made in the rare case, i.e., when it is necessary to sleep. In this case, inlining the test actually makes the kernel smaller.	2006-08-27 19:50:13 +00:00
Alan Cox	09ef0d6e0c	The return value from vm_pageq_add_new_page() is not used. Eliminate it.	2006-08-25 04:36:19 +00:00
Alan Cox	b146f9e5d2	Reimplement the page's NOSYNC flag as an object-synchronized instead of a page queues-synchronized flag. Reduce the scope of the page queues lock in vm_fault() accordingly. Move vm_fault()'s call to vm_object_set_writeable_dirty() outside of the scope of the page queues lock. Reviewed by: tegge Additionally, eliminate an unnecessary dereference in computing the argument that is passed to vm_object_set_writeable_dirty().	2006-08-13 00:11:09 +00:00
Alan Cox	5786be7cc7	Introduce a field to struct vm_page for storing flags that are synchronized by the lock on the object containing the page. Transition PG_WANTED and PG_SWAPINPROG to use the new field, eliminating the need for holding the page queues lock when setting or clearing these flags. Rename PG_WANTED and PG_SWAPINPROG to VPO_WANTED and VPO_SWAPINPROG, respectively. Eliminate the assertion that the page queues lock is held in vm_page_io_finish(). Eliminate the acquisition and release of the page queues lock around calls to vm_page_io_finish() in kern_sendfile() and vfs_unbusy_pages().	2006-08-09 17:43:27 +00:00
Alan Cox	39fd9b639f	With the recent changes to the implementation of page coloring, the the option PQ_NOOPT is used exclusively by vm_pageq.c. Thus, the include of opt_vmpage.h can be removed from vm_page.h.	2006-01-24 19:24:54 +00:00
Alexander Leidinger	ef39c05baa	MI changes: - provide an interface (macros) to the page coloring part of the VM system, this allows to try different coloring algorithms without the need to touch every file [1] - make the page queue tuning values readable: sysctl vm.stats.pagequeue - autotuning of the page coloring values based upon the cache size instead of options in the kernel config (disabling of the page coloring as a kernel option is still possible) MD changes: - detection of the cache size: only IA32 and AMD64 (untested) contains cache size detection code, every other arch just comes with a dummy function (this results in the use of default values like it was the case without the autotuning of the page coloring) - print some more info on Intel CPU's (like we do on AMD and Transmeta CPU's) Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue" and report if the cache* values are zero (= bug in the cache detection code) or not. Based upon work by: Chad David <davidc@acns.ab.ca> [1] Reviewed by: alc, arch (in 2004) Discussed with: alc, Chad David, arch (in 2004)	2005-12-31 14:39:20 +00:00
Robert Watson	8be9063ea1	Don't perform a nested include of opt_vmpage.h if LIBMEMSTAT is defined, as opt_vmpage.h will not be available to user space library builds. A similar existing check is present for KLD_MODULE for similar reasons. MFC after: 3 days	2005-08-04 10:05:11 +00:00
Warner Losh	60727d8b86	/* -> /*- for license, minor formatting changes	2005-01-07 02:29:27 +00:00
Alan Cox	91f7a86064	Note that access to the page's busy count is synchronized by the containing object's lock.	2004-12-27 05:27:59 +00:00
Alan Cox	0f9f9bcb53	Introduce VM_ALLOC_NOBUSY, an option to vm_page_alloc() and vm_page_grab() that indicates that the caller does not want a page with its busy flag set. In many places, the global page queues lock is acquired and released just to clear the busy flag on a just allocated page. Both the allocation of the page and the clearing of the busy flag occur while the containing vm object is locked. So, the busy flag might as well never be set.	2004-10-24 06:15:36 +00:00
Marcel Moolenaar	2bbb534b56	Move the cow field between wire_count and hold_count. This is the position that is 64-bit aligned and makes sure that the valid and dirty fields are also 64-bit aligned. This means that if PAGE_SIZE is 32K, the size of the vm_page structure is only increased by 8 bytes instead of 16 bytes. More importantly, the vm_page structure is either 120 or 128 bytes on ia64. These are "interesting" sizes.	2004-08-22 20:52:23 +00:00
Brian Feldman	4362fada8f	Reimplement contigmalloc(9) with an algorithm which stands a greatly- improved chance of working despite pressure from running programs. Instead of trying to throw a bunch of pages out to swap and hope for the best, only a range that can potentially fulfill contigmalloc(9)'s request will have its contents paged out (potentially, not forcibly) at a time. The new contigmalloc operation still operates in three passes, but it could potentially be tuned to more or less. The first pass only looks at pages in the cache and free pages, so they would be thrown out without having to block. If this is not enough, the subsequent passes page out any unwired memory. To combat memory pressure refragmenting the section of memory being laundered, each page is removed from the systems' free memory queue once it has been freed so that blocking later doesn't cause the memory laundered so far to get reallocated. The page-out operations are now blocking, as it would make little sense to try to push out a page, then get its status immediately afterward to remove it from the available free pages queue, if it's unlikely to have been freed. Another change is that if KVA allocation fails, the allocated memory segment will be freed and not leaked. There is a sysctl/tunable, defaulting to on, which causes the old contigmalloc() algorithm to be used. Nonetheless, I have been using vm.old_contigmalloc=0 for over a month. It is safe to switch at run-time to see the difference it makes. A new interface has been used which does not require mapping the allocated pages into KVA: vm_page.h functions vm_page_alloc_contig() and vm_page_release_contig(). These are what vm.old_contigmalloc=0 uses internally, so the sysctl/tunable does not affect their operation. When using the contigmalloc(9) and contigfree(9) interfaces, memory is now tracked with malloc(9) stats. Several functions have been exported from kern_malloc.c to allow other subsystems to use these statistics, as well. This invalidates the BUGS section of the contigmalloc(9) manpage.	2004-07-19 06:21:27 +00:00
Alan Cox	69c1a910e5	Update stale comments regarding page coloring.	2004-06-05 21:06:42 +00:00
Alan Cox	62326de742	Move the definitions of SWAPBLK_NONE and SWAPBLK_MASK from vm_page.h to blist.h, enabling the removal of numerous #includes from subr_blist.c. (subr_blist.c and swap_pager.c are the only users of these definitions.)	2004-06-04 04:03:26 +00:00
Alan Cox	e363785643	Remove a stale comment: PG_DIRTY and PG_FILLED were removed in revisions 1.17 and 1.12 respectively.	2004-05-30 20:48:15 +00:00
Warner Losh	05eb3785e7	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999. Approved by: core	2004-04-06 20:15:37 +00:00
Alan Cox	889eb0fc62	Eliminate unused arguments from vm_page_startup().	2004-04-04 23:33:36 +00:00
Alan Cox	45ad3d59ed	Remove some long unused definitions.	2004-03-04 04:26:14 +00:00
Alan Cox	93dbd07122	- Align a comment within struct vm_page. - Annotate the vm_page's valid field as synchronized by the containing vm object's lock.	2003-10-25 18:33:04 +00:00
Alan Cox	ab42316c2f	- Retire vm_pageout_page_free(). Instead, use vm_page_select_cache() from vm_pageout_scan(). Rationale: I don't like leaving a busy page in the cache queue with neither the vm object nor the vm page queues lock held. - Assert that the page is active in vm_pageout_page_stats().	2003-10-22 18:41:32 +00:00
Alan Cox	fee181a696	- Remove some long unused code.	2003-10-20 18:57:01 +00:00
Alan Cox	669890eaeb	Retire vm_page_copy(). Its reason for being ended when peter@ modified pmap_copy_page() et al. to accept a vm_page_t rather than a physical address. Also, this change will facilitate locking access to the vm page's valid field.	2003-10-08 05:35:12 +00:00
Marcel Moolenaar	16bc6ff39e	Assert that u_long is at least 64 bits if PAGE_SIZE is 32K. Suggested by: phk	2003-08-25 19:58:01 +00:00
Marcel Moolenaar	21a708cfde	Also define VM_PAGE_BITS_ALL for 16K and 32K pages. Make the constant unsigned for all page sizes and unsigned long for 32K pages.	2003-08-23 06:30:47 +00:00
Marcel Moolenaar	1fa057c6f1	Add support for 16K and 32K page sizes. The valid and dirty maps in struct vm_page are defined as u_int for 16K pages and u_long for 32K pages, with the implied assumption that long will at least be 64 bits wide on platforms where we support 32K pages.	2003-08-23 06:24:00 +00:00
Jake Burkholder	227f9a1c58	- Add vm_paddr_t, a physical address type. This is required for systems where physical addresses larger than virtual addresses, such as i386s with PAE. - Use this to represent physical addresses in the MI vm system and in the i386 pmap code. This also changes the paddr parameter to d_mmap_t. - Fix printf formats to handle physical addresses >4G in the i386 memory detection code, and due to kvtop returning vm_paddr_t instead of u_long. Note that this is a name change only; vm_paddr_t is still the same as vm_offset_t on all currently supported platforms. Sponsored by: DARPA, Network Associates Laboratories Discussed with: re, phk (cdevsw change)	2003-03-25 00:07:06 +00:00
Alan Cox	24c9ad6bed	- Remove vm_page_sleep_busy(). The transition to vm_page_sleep_if_busy(), which incorporates page queue and field locking, is complete. - Assert that the page queue lock rather than Giant is held in vm_page_flag_set().	2002-12-19 07:23:46 +00:00
Alan Cox	a12cc0e489	Remove vm_page_protect(). Instead, use pmap_page_protect() directly.	2002-11-18 04:05:22 +00:00
Alan Cox	ada2a050be	Export the function vm_page_splay().	2002-11-04 19:21:39 +00:00
Jeff Roberson	026aa839a4	- Add a new flag to vm_page_alloc, VM_ALLOC_NOOBJ. This tells vm_page_alloc not to insert this page into an object. The pindex is still used for colorization. - Rework vm_page_select_* to accept a color instead of an object and pindex to work with VM_PAGE_NOOBJ. - Document other VM_ALLOC_ flags. Reviewed by: peter, jake	2002-11-01 00:59:03 +00:00
Alan Cox	f3b676f0ad	o Reinline vm_page_undirty(), reducing the kernel size. (This reverts a part of vm_page.h revision 1.87 and vm_page.c revision 1.167.)	2002-10-20 19:57:55 +00:00
Matthew Dillon	b86ec922be	Replace the vm_page hash table with a per-vmobject splay tree. There should be no major change in performance from this change at this time but this will allow other work to progress: Giant lock removal around VM system in favor of per-object mutexes, ranged fsyncs, more optimal COMMIT rpc's for NFS, partial filesystem syncs by the syncer, more optimal object flushing, etc. Note that the buffer cache is already using a similar splay tree mechanism. Note that a good chunk of the old hash table code is still in the tree. Alan or I will remove it prior to the release if the new code does not introduce unsolvable bugs, else we can revert more easily. Submitted by: alc (this is Alan's code) Approved by: re	2002-10-18 17:24:30 +00:00
Jeff Roberson	99571dc345	- Split UMA_ZFLAG_OFFPAGE into UMA_ZFLAG_OFFPAGE and UMA_ZFLAG_HASH. - Remove all instances of the mallochash. - Stash the slab pointer in the vm page's object pointer when allocating from the kmem_obj. - Use the overloaded object pointer to find slabs for malloced memory.	2002-09-18 08:26:30 +00:00
Alan Cox	fff6062ab6	o Retire vm_page_zero_fill() and vm_page_zero_fill_area(). Ever since pmap_zero_page() and pmap_zero_page_area() were modified to accept a struct vm_page * instead of a physical address, vm_page_zero_fill() and vm_page_zero_fill_area() have served no purpose.	2002-08-25 00:22:31 +00:00
Alan Cox	38f612e053	o Remove the setting and clearing of the PG_MAPPED flag from the alpha and ia64 pmap. o Remove the PG_MAPPED flag's declaration.	2002-08-10 18:01:39 +00:00
Alan Cox	e5f8bd9418	o Introduce vm_page_sleep_if_busy() as an eventual replacement for vm_page_sleep_busy(). vm_page_sleep_if_busy() uses the page queues lock.	2002-07-29 19:41:22 +00:00
Alan Cox	2c071f61f0	o Modify vm_page_grab() to accept VM_ALLOC_WIRED.	2002-07-28 23:46:19 +00:00
Alan Cox	6fd77192b2	o Remove dead and/or unused code.	2002-07-20 05:06:20 +00:00
Alan Cox	827b2fa091	o Introduce an argument, VM_ALLOC_WIRED, that requests vm_page_alloc() to return a wired page. o Use VM_ALLOC_WIRED within Alpha's pmap_growkernel(). Also, because Alpha's pmap_growkernel() calls vm_page_alloc() from within a critical section, specify VM_ALLOC_INTERRUPT instead of VM_ALLOC_SYSTEM. (Only VM_ALLOC_INTERRUPT is implemented entirely with a spin mutex.) o Assert that the page queues mutex is held in vm_page_wire() on Alpha, just like the other platforms.	2002-07-18 04:08:10 +00:00
Alan Cox	1f54526952	o Complete the locking of page queue accesses by vm_page_unwire(). o Assert that the page queues lock is held in vm_page_unwire(). o Make vm_page_lock_queues() and vm_page_unlock_queues() visible to kernel loadable modules.	2002-07-13 20:55:21 +00:00
Alan Cox	70c1763634	o Resurrect vm_page_lock_queues(), vm_page_unlock_queues(), and the free queue lock (revision 1.33 of vm/vm_page.c removed them). o Make the free queue lock a spin lock because it's sometimes acquired inside of a critical section.	2002-07-04 22:07:37 +00:00
Kenneth D. Merry	98cb733c67	At long last, commit the zero copy sockets code. MAKEDEV: Add MAKEDEV glue for the ti(4) device nodes. ti.4: Update the ti(4) man page to include information on the TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options, and also include information about the new character device interface and the associated ioctls. man9/Makefile: Add jumbo.9 and zero_copy.9 man pages and associated links. jumbo.9: New man page describing the jumbo buffer allocator interface and operation. zero_copy.9: New man page describing the general characteristics of the zero copy send and receive code, and what an application author should do to take advantage of the zero copy functionality. NOTES: Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS, TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT. conf/files: Add uipc_jumbo.c and uipc_cow.c. conf/options: Add the 5 options mentioned above. kern_subr.c: Receive side zero copy implementation. This takes "disposable" pages attached to an mbuf, gives them to a user process, and then recycles the user's page. This is only active when ZERO_COPY_SOCKETS is turned on and the kern.ipc.zero_copy.receive sysctl variable is set to 1. uipc_cow.c: Send side zero copy functions. Takes a page written by the user and maps it copy on write and assigns it kernel virtual address space. Removes copy on write mapping once the buffer has been freed by the network stack. uipc_jumbo.c: Jumbo disposable page allocator code. This allocates (optionally) disposable pages for network drivers that want to give the user the option of doing zero copy receive. uipc_socket.c: Add kern.ipc.zero_copy.{send,receive} sysctls that are enabled if ZERO_COPY_SOCKETS is turned on. Add zero copy send support to sosend() -- pages get mapped into the kernel instead of getting copied if they meet size and alignment restrictions. uipc_syscalls.c:Un-staticize some of the sf* functions so that they can be used elsewhere. (uipc_cow.c) if_media.c: In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid calling malloc() with M_WAITOK. Return an error if the M_NOWAIT malloc fails. The ti(4) driver and the wi(4) driver, at least, call this with a mutex held. This causes witness warnings for 'ifconfig -a' with a wi(4) or ti(4) board in the system. (I've only verified for ti(4)). ip_output.c: Fragment large datagrams so that each segment contains a multiple of PAGE_SIZE amount of data plus headers. This allows the receiver to potentially do page flipping on receives. if_ti.c: Add zero copy receive support to the ti(4) driver. If TI_PRIVATE_JUMBOS is not defined, it now uses the jumbo(9) buffer allocator for jumbo receive buffers. Add a new character device interface for the ti(4) driver for the new debugging interface. This allows (a patched version of) gdb to talk to the Tigon board and debug the firmware. There are also a few additional debugging ioctls available through this interface. Add header splitting support to the ti(4) driver. Tweak some of the default interrupt coalescing parameters to more useful defaults. Add hooks for supporting transmit flow control, but leave it turned off with a comment describing why it is turned off. if_tireg.h: Change the firmware rev to 12.4.11, since we're really at 12.4.11 plus fixes from 12.4.13. Add defines needed for debugging. Remove the ti_stats structure, it is now defined in sys/tiio.h. ti_fw.h: 12.4.11 firmware. ti_fw2.h: 12.4.11 firmware, plus selected fixes from 12.4.13, and my header splitting patches. Revision 12.4.13 doesn't handle 10/100 negotiation properly. (This firmware is the same as what was in the tree previously, with the addition of header splitting support.) sys/jumbo.h: Jumbo buffer allocator interface. sys/mbuf.h: Add a new external mbuf type, EXT_DISPOSABLE, to indicate that the payload buffer can be thrown away / flipped to a userland process. socketvar.h: Add prototype for socow_setup. tiio.h: ioctl interface to the character portion of the ti(4) driver, plus associated structure/type definitions. uio.h: Change prototype for uiomoveco() so that we'll know whether the source page is disposable. ufs_readwrite.c:Update for new prototype of uiomoveco(). vm_fault.c: In vm_fault(), check to see whether we need to do a page based copy on write fault. vm_object.c: Add a new function, vm_object_allocate_wait(). This does the same thing that vm_object allocate does, except that it gives the caller the opportunity to specify whether it should wait on the uma_zalloc() of the object structre. This allows vm objects to be allocated while holding a mutex. (Without generating WITNESS warnings.) vm_object_allocate() is implemented as a call to vm_object_allocate_wait() with the malloc flag set to M_WAITOK. vm_object.h: Add prototype for vm_object_allocate_wait(). vm_page.c: Add page-based copy on write setup, clear and fault routines. vm_page.h: Add page based COW function prototypes and variable in the vm_page structure. Many thanks to Drew Gallatin, who wrote the zero copy send and receive code, and to all the other folks who have tested and reviewed this code over the years.	2002-06-26 03:37:47 +00:00
Jeff Roberson	e78f35b33f	Turn VM_ALLOC_ZERO into a flag. Submitted by: tegge Reviewed by: dillon	2002-06-25 22:01:12 +00:00
Alan Cox	8f2ba19c90	o Remove unused #defines.	2002-05-27 22:10:28 +00:00
Peter Wemm	44e74ba6c3	We do not necessarily need to map/unmap pages to zero parts of them. On systems where physical memory is also direct mapped (alpha, sparc, ia64 etc) this is slightly harmful.	2002-04-28 00:15:48 +00:00
Eivind Eklund	a128794977	- Remove a number of extra newlines that do not belong here according to style(9) - Minor space adjustment in cases where we have "( ", " )", if(), return(), while(), for(), etc. - Add /* SYMBOL */ after a few #endifs. Reviewed by: alc	2002-03-10 21:52:48 +00:00
Alan Cox	2be21c5e68	o Create vm_pageq_enqueue() to encapsulate code that is duplicated time and again in vm_page.c and vm_pageq.c. o Delete unusused prototypes. (Mainly a result of the earlier renaming of various functions from vm_page_() to vm_pageq_().)	2002-03-04 18:55:26 +00:00
Alan Cox	5714577006	Remove some long dead code.	2002-03-02 22:21:42 +00:00
Tor Egge	d2760948fe	Add a page queue, PQ_HOLD, that temporarily owns pages with nonzero hold count that would otherwise be on one of the free queues. This eliminates a panic when broken programs unmap memory that still has pending IO from raw devices. Reviewed by: dillon, alc	2002-02-19 23:19:30 +00:00
Peter Wemm	3516c025ff	Implement idle zeroing of pages. I've been tinkering with this on and off since John Dyson left his work-in-progress. It is off by default for now. sysctl vm.zeroidle_enable=1 to turn it on. There are some hacks here to deal with the present lack of preemption - we yield after doing a small number of pages since we wont preempt otherwise. This is basically Matt's algorithm [with hysteresis] with an idle process to call it in a similar way it used to be called from the idle loop. I cleaned up the includes a fair bit here too.	2001-08-25 05:00:44 +00:00
Jake Burkholder	3a9b5daf48	Oops. Last commit to vm_object.c should have got these files too. Remove the use of atomic ops to manipulate vm_object and vm_page flags. Giant is required here, so they are superfluous. Discussed with: dillon	2001-07-31 04:09:52 +00:00
Assar Westerlund	d3e5863fa9	make vm_page_select_cache static Requested by: bde	2001-07-23 12:34:31 +00:00
Assar Westerlund	0379d76358	(vm_page_select_cache): add prototype	2001-07-21 17:08:15 +00:00
Matthew Dillon	6d03d577a5	Reorg vm_page.c into vm_page.c, vm_pageq.c, and vm_contig.c (for contigmalloc). Also removed some spl's and added some VM mutexes, but they are not actually used yet, so this commit does not really make any operational changes to the system. vm_page.c relates to vm_page_t manipulation, including high level deactivation, activation, etc... vm_pageq.c relates to finding free pages and aquiring exclusive access to a page queue (exclusivity part not yet implemented). And the world still builds... :-)	2001-07-04 23:27:09 +00:00
Matthew Dillon	1b40f8c036	Change inlines back into mainline code in preparation for mutexing. Also, most of these inlines had been bloated in -current far beyond their original intent. Normalize prototypes and function declarations to be ANSI only (half already were). And do some general cleanup. (kernel size also reduced by 50-100K, but that isn't the prime intent)	2001-07-04 20:15:18 +00:00
Matthew Dillon	0cddd8f023	With Alfred's permission, remove vm_mtx in favor of a fine-grained approach (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.	2001-07-04 16:20:28 +00:00
Matthew Dillon	ac8f990bde	This patch implements O_DIRECT about 80% of the way. It takes a patchset Tor created a while ago, removes the raw I/O piece (that has cache coherency problems), and adds a buffer cache / VM freeing piece. Essentially this patch causes O_DIRECT I/O to not be left in the cache, but does not prevent it from going through the cache, hence the 80%. For the last 20% we need a method by which the I/O can be issued directly to buffer supplied by the user process and bypass the buffer cache entirely, but still maintain cache coherency. I also have the code working under -stable but the changes made to sys/file.h may not be MFCable, so an MFC is not on the table yet. Submitted by: tegge, dillon	2001-05-24 07:22:27 +00:00
Alfred Perlstein	2395531439	Introduce a global lock for the vm subsystem (vm_mtx). vm_mtx does not recurse and is required for most low level vm operations. faults can not be taken without holding Giant. Memory subsystems can now call the base page allocators safely. Almost all atomic ops were removed as they are covered under the vm mutex. Alpha and ia64 now need to catch up to i386's trap handlers. FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties). Reviewed (partially) by: jake, jhb	2001-05-19 01:28:09 +00:00
Matthew Dillon	2b6b0df712	This implements a better launder limiting solution. There was a solution in 4.2-REL which I ripped out in -stable and -current when implementing the low-memory handling solution. However, maxlaunder turns out to be the saving grace in certain very heavily loaded systems (e.g. newsreader box). The new algorithm limits the number of pages laundered in the first pageout daemon pass. If that is not sufficient then suceessive will be run without any limit. Write I/O is now pipelined using two sysctls, vfs.lorunningspace and vfs.hirunningspace. This prevents excessive buffered writes in the disk queues which cause long (multi-second) delays for reads. It leads to more stable (less jerky) and generally faster I/O streaming to disk by allowing required read ops (e.g. for indirect blocks and such) to occur without interrupting the write stream, amoung other things. NOTE: eventually, filesystem write I/O pipelining needs to be done on a per-device basis. At the moment it is globalized.	2000-12-26 19:41:38 +00:00
Matthew Dillon	936524aa02	Implement a low-memory deadlock solution. Removed most of the hacks that were trying to deal with low-memory situations prior to now. The new code is based on the concept that I/O must be able to function in a low memory situation. All major modules related to I/O (except networking) have been adjusted to allow allocation out of the system reserve memory pool. These modules now detect a low memory situation but rather then block they instead continue to operate, then return resources to the memory pool instead of cache them or leave them wired. Code has been added to stall in a low-memory situation prior to a vnode being locked. Thus situations where a process blocks in a low-memory condition while holding a locked vnode have been reduced to near nothing. Not only will I/O continue to operate, but many prior deadlock conditions simply no longer exist. Implement a number of VFS/BIO fixes (found by Ian): in biodone(), bogus-page replacement code, the loop was not properly incrementing loop variables prior to a continue statement. We do not believe this code can be hit anyway but we aren't taking any chances. We'll turn the whole section into a panic (as it already is in brelse()) after the release is rolled. In biodone(), the foff calculation was incorrectly clamped to the iosize, causing the wrong foff to be calculated for pages in the case of an I/O error or biodone() called without initiating I/O. The problem always caused a panic before. Now it doesn't. The problem is mainly an issue with NFS. Fixed casts for ~PAGE_MASK. This code worked properly before only because the calculations use signed arithmatic. Better to properly extend PAGE_MASK first before inverting it for the 64 bit masking op. In brelse(), the bogus_page fixup code was improperly throwing away the original contents of 'm' when it did the j-loop to fix the bogus pages. The result was that it would potentially invalidate parts of the WRONG page(!), leading to corruption. There may still be cases where a background bitmap write is being duplicated, causing potential corruption. We have identified a potentially serious bug related to this but the fix is still TBD. So instead this patch contains a KASSERT to detect the problem and panic the machine rather then continue to corrupt the filesystem. The problem does not occur very often.. it is very hard to reproduce, and it may or may not be the cause of the corruption people have reported. Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>) Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>	2000-11-18 23:06:26 +00:00
David E. O'Brien	ee2fd03885	Make the arguments match the functionality of the functions.	2000-08-26 04:51:39 +00:00
Alfred Perlstein	4c8545c13d	#elsif -> #elif Noticed by: green	2000-07-11 09:41:29 +00:00
John Baldwin	9a20f99adf	Replace the PQ_*CACHE options with a single PQ_CACHESIZE option that you set equal to the number of kilobytes in your cache. The old options are still supported for backwards compatibility. Submitted by: Kelly Yancey <kbyanc@posi.net>	2000-07-04 08:55:18 +00:00
Matthew Dillon	8b03c8ed5e	This is a cleanup patch to Peter's new OBJT_PHYS VM object type and sysv shared memory support for it. It implements a new PG_UNMANAGED flag that has slightly different characteristics from PG_FICTICIOUS. A new sysctl, kern.ipc.shm_use_phys has been added to enable the use of physically-backed sysv shared memory rather then swap-backed. Physically backed shm segments are not tracked with PV entries, allowing programs which use a large shm segment as a rendezvous point to operate without eating an insane amount of KVM in the PV entry management. Read: Oracle. Peter's OBJT_PHYS object will also allow us to eventually implement page-table sharing and/or 4MB physical page support for such segments. We're half way there.	2000-05-29 22:40:54 +00:00
Jake Burkholder	e39756439c	Back out the previous change to the queue(3) interface. It was not discussed and should probably not happen. Requested by: msmith and others	2000-05-26 02:09:24 +00:00
Jake Burkholder	740a1973a6	Change the way that the queue(3) structures are declared; don't assume that the type argument to _HEAD and _ENTRY is a struct. Suggested by: phk Reviewed by: phk Approved by: mdodd	2000-05-23 20:41:01 +00:00
Peter Wemm	0385347c1a	Implement an optimization of the VM<->pmap API. Pass vm_page_t's directly to various pmap_*() functions instead of looking up the physical address and passing that. In many cases, the first thing the pmap code was doing was going to a lot of trouble to get back the original vm_page_t, or it's shadow pv_table entry. Inspired by: John Dyson's 1998 patches. Also: Eliminate pv_table as a seperate thing and build it into a machine dependent part of vm_page_t. This eliminates having a seperate set of structions that shadow each other in a 1:1 fashion that we often went to a lot of trouble to translate from one to the other. (see above) This happens to save 4 bytes of physical memory for each page in the system. (8 bytes on the Alpha). Eliminate the use of the phys_avail[] array to determine if a page is managed (ie: it has pv_entries etc). Store this information in a flag. Things like device_pager set it because they create vm_page_t's on the fly that do not have pv_entries. This makes it easier to "unmanage" a page of physical memory (this will be taken advantage of in subsequent commits). Add a function to add a new page to the freelist. This could be used for reclaiming the previously wasted pages left over from preloaded loader(8) files. Reviewed by: dillon	2000-05-21 12:50:18 +00:00
Peter Wemm	c447342094	Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL" is an application space macro and the applications are supposed to be free to use it as they please (but cannot). This is consistant with the other BSD's who made this change quite some time ago. More commits to come.	1999-12-29 05:07:58 +00:00
Matthew Dillon	4f79d873c1	Add MAP_NOSYNC feature to mmap(), and MADV_NOSYNC and MADV_AUTOSYNC to madvise(). This feature prevents the update daemon from gratuitously flushing dirty pages associated with a mapped file-backed region of memory. The system pager will still page the memory as necessary and the VM system will still be fully coherent with the filesystem. Modifications made by other means to the same area of memory, for example by write(), are unaffected. The feature works on a page-granularity basis. MAP_NOSYNC allows one to use mmap() to share memory between processes without incuring any significant filesystem overhead, putting it in the same performance category as SysV Shared memory and anonymous memory. Reviewed by: julian, alc, dg	1999-12-12 03:19:33 +00:00
Alan Cox	be72f78813	The core of this patch is to vm/vm_page.h. The effects are two-fold: (1) to eliminate an extra (useless) level of indirection in half of the page queue accesses and (2) to use a single name for each queue throughout, instead of, e.g., "vm_page_queue_active" in some places and "vm_page_queues[PQ_ACTIVE]" in others. Reviewed by: dillon	1999-10-30 07:37:14 +00:00
Matthew Dillon	90ecac61c0	Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com> Replace various VM related page count calculations strewn over the VM code with inlines to aid in readability and to reduce fragility in the code where modules depend on the same test being performed to properly sleep and wakeup. Split out a portion of the page deactivation code into an inline in vm_page.c to support vm_page_dontneed(). add vm_page_dontneed(), which handles the madvise MADV_DONTNEED feature in a related commit coming up for vm_map.c/vm_object.c. This code prevents degenerate cases where an essentially active page may be rotated through a subset of the paging lists, resulting in premature disposal.	1999-09-17 04:56:40 +00:00
Peter Wemm	c3aac50f28	$Id$ -> $FreeBSD$	1999-08-28 01:08:13 +00:00
Brian Feldman	38c808edb7	Unbreak the nfs KLD_MODULE. It needs a bit more of vm_page.h than was exported (notably vm_page_undirty()). Also, let vm_page_dirty() work in a KLD.	1999-08-17 22:48:10 +00:00
Alan Cox	2c28a10540	Add the (inline) function vm_page_undirty for clearing the dirty bitmask of a vm_page. Use it. Submitted by: dillon	1999-08-17 04:02:34 +00:00
Alan Cox	175d1f69bf	contigmalloc1 (currently) depends on PQ_FREE and PQ_CACHE not being 0 to tell a valid "struct vm_page" from an invalid one in the vm_page_array. This isn't a very robust method.	1999-08-15 05:36:43 +00:00
Matt Jacob	3739d7da89	Add back in old definitions if we're compiling for alpha.	1999-08-15 01:16:53 +00:00
Alan Cox	514bfcc440	Don't create a "struct vpgqueues" for PQ_NONE.	1999-08-14 06:25:54 +00:00
Alan Cox	1aefb1d957	Make the default page coloring parameters match a (non-Xeon) Pentium II/III. This setting is also acceptable for Celerons and Pentium Pros with less than 1MB L2 caches. Note: PQ_L2_SIZE is a misnomer. The correct number of colors is a function of the cache's degree of associativity as well as its size. Submitted by: bde and alc	1999-08-12 21:16:53 +00:00
Alan Cox	5d2aec8927	Change the type of vpgqueues::lcnt from "int *" to "int". The indirection served no purpose.	1999-07-31 18:31:00 +00:00
Alan Cox	3b21348301	Reduce the number of "magic constants" used for page coloring by one: PQ_PRIME2 and PQ_PRIME3 are used to accomplish the same thing at different places in the kernel. Drop PQ_PRIME3.	1999-07-22 06:04:17 +00:00
Alan Cox	9c89c228fe	Remove (1) "extern" declarations for variables that were previously made "static" and (2) initialized but unused variables.	1999-06-22 07:18:20 +00:00
Alan Cox	6ea5bd80fe	Remove some unused function and variable declarations.	1999-06-19 18:42:53 +00:00
Alan Cox	4221e284a3	The VFS/BIO subsystem contained a number of hacks in order to optimize piecemeal, middle-of-file writes for NFS. These hacks have caused no end of trouble, especially when combined with mmap(). I've removed them. Instead, NFS will issue a read-before-write to fully instantiate the struct buf containing the write. NFS does, however, optimize piecemeal appends to files. For most common file operations, you will not notice the difference. The sole remaining fragment in the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache coherency issues with read-merge-write style operations. NFS also optimizes the write-covers-entire-buffer case by avoiding the read-before-write. There is quite a bit of room for further optimization in these areas. The VM system marks pages fully-valid (AKA vm_page_t->valid = VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault. This is not correct operation. The vm_pager_get_pages() code is now responsible for marking VM pages all-valid. A number of VM helper routines have been added to aid in zeroing-out the invalid portions of a VM page prior to the page being marked all-valid. This operation is necessary to properly support mmap(). The zeroing occurs most often when dealing with file-EOF situations. Several bugs have been fixed in the NFS subsystem, including bits handling file and directory EOF situations and buf->b_flags consistancy issues relating to clearing B_ERROR & B_INVAL, and handling B_DONE. getblk() and allocbuf() have been rewritten. B_CACHE operation is now formally defined in comments and more straightforward in implementation. B_CACHE for VMIO buffers is based on the validity of the backing store. B_CACHE for non-VMIO buffers is based simply on whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear, and vise-versa). biodone() is now responsible for setting B_CACHE when a successful read completes. B_CACHE is also set when a bdwrite() is initiated and when a bwrite() is initiated. VFS VOP_BWRITE routines (there are only two - nfs_bwrite() and bwrite()) are now expected to set B_CACHE. This means that bowrite() and bawrite() also set B_CACHE indirectly. There are a number of places in the code which were previously using buf->b_bufsize (which is DEV_BSIZE aligned) when they should have been using buf->b_bcount. These have been fixed. getblk() now clears B_DONE on return because the rest of the system is so bad about dealing with B_DONE. Major fixes to NFS/TCP have been made. A server-side bug could cause requests to be lost by the server due to nfs_realign() overwriting other rpc's in the same TCP mbuf chain. The server's kernel must be recompiled to get the benefit of the fixes. Submitted by: Matthew Dillon <dillon@apollo.backplane.com>	1999-05-02 23:57:16 +00:00
Julian Elischer	8d17e69460	Catch a case spotted by Tor where files mmapped could leave garbage in the unallocated parts of the last page when the file ended on a frag but not a page boundary. Delimitted by tags PRE_MATT_MMAP_EOF and POST_MATT_MMAP_EOF, in files alpha/alpha/pmap.c i386/i386/pmap.c nfs/nfs_bio.c vm/pmap.h vm/vm_page.c vm/vm_page.h vm/vnode_pager.c miscfs/specfs/spec_vnops.c ufs/ufs/ufs_readwrite.c kern/vfs_bio.c Submitted by: Matt Dillon <dillon@freebsd.org> Reviewed by: Alan Cox <alc@freebsd.org>	1999-04-05 19:38:30 +00:00
Julian Elischer	811c2e1a76	Fix breakage in last commit Submitted by: Brian Feldman <green@unixhelp.org>	1999-03-15 05:09:48 +00:00
Julian Elischer	0237469f43	A bit of a hack, but allows the vn device to be a module again. Submitted by: Matt Dillon <dillon@freebsd.org>	1999-03-14 20:40:15 +00:00
Matthew Dillon	efcae3d355	Minor reorganization of vm_page_alloc(). No functional changes have been made but the code has been reorganized and documented to make it more readable, reduce the size of the code, and optimize the branch path caching capabilities that most modern processors have.	1999-02-15 06:52:14 +00:00
Matthew Dillon	faa273d5c2	Rip out PQ_ZERO queue. PQ_ZERO functionality is now combined in with PQ_FREE. There is little operational difference other then the kernel being a few kilobytes smaller and the code being more readable. * vm_page_select_free() has been greatly simplified. * The PQ_ZERO page queue and supporting structures have been removed * vm_page_zero_idle() revamped (see below) PG_ZERO setting and clearing has been migrated from vm_page_alloc() to vm_page_free[_zero]() and will eventually be guarenteed to remain tracked throughout a page's life ( if it isn't already ). When a page is freed, PG_ZERO pages are appended to the appropriate tailq in the PQ_FREE queue while non-PG_ZERO pages are prepended. When locating a new free page, PG_ZERO selection operates from within vm_page_list_find() ( get page from end of queue instead of beginning of queue ) and then only occurs in the nominal critical path case. If the nominal case misses, both normal and zero-page allocation devolves into the same _vm_page_list_find() select code without any specific zero-page optimizations. Additionally, vm_page_zero_idle() has been revamped. Hysteresis has been added and zero-page tracking adjusted to conform with the other changes. Currently hysteresis is set at 1/3 (lo) and 1/2 (hi) the number of free pages. We may wish to increase both parameters as time permits. The hysteresis is designed to avoid silly zeroing in borderline allocation/free situations.	1999-02-08 00:37:36 +00:00
Matthew Dillon	a0e7b3e5ce	Remove L1 cache coloring optimization ( leave L2 cache coloring opt ). Rewrite vm_page_list_find() and vm_page_select_free() - make inline out of nominal case.	1999-02-07 20:45:15 +00:00
Matthew Dillon	161a8f6519	Add vm_page_dirty() inline with PQ_CACHE sanity check	1999-01-24 05:57:50 +00:00
Matthew Dillon	a7039a1d42	Add invariants to vm_page_busy() and vm_page_wakeup() to check for PG_BUSY stupidity.	1999-01-24 01:05:15 +00:00
Matthew Dillon	41dbefba28	The TAILQ hashq has been turned into a singly-linked=list link, reducing the size of vm_page_t. SWAPBLK_NONE and SWAPBLK_MASK are defined here. These actually are more generalized then their names imply, but their placement is somewhat of a legacy issue from a prior test version of this code that put the swapblk in the vm_page_t structure. That test code was eventually thrown away. The legacy remains. Added vm_page_flash() inline. Similar to vm_page_wakeup() except that it does not clear PG_BUSY ( one assumes that PG_BUSY is already clear ). Used by a number of routines to wakeup waiters. Collapsed some of the code in inline calls to make other inline calls. GCC will optimize this well and it reduces duplication. vm_page_free() and vm_page_free_zero() inlines added to convert to the proper vm_page_free_toq() call. vm_page_sleep_busy() inline added, replacing vm_page_sleep() ( which has been removed ). This implements a much more optimizable page-waiting function.	1999-01-21 10:06:24 +00:00
Matthew Dillon	1c7c3c6a86	This is a rather large commit that encompasses the new swapper, changes to the VM system to support the new swapper, VM bug fixes, several VM optimizations, and some additional revamping of the VM code. The specific bug fixes will be documented with additional forced commits. This commit is somewhat rough in regards to code cleanup issues. Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>	1999-01-21 08:29:12 +00:00
Eivind Eklund	5526d2d920	Split DIAGNOSTIC -> DIAGNOSTIC, INVARIANTS, and INVARIANT_SUPPORT as discussed on -hackers. Introduce 'KASSERT(assertion, ("panic message", args))' for simple check + panic. Reviewed by: msmith	1999-01-08 17:31:30 +00:00
David Greenman	730075613a	Added a second argument, "activate" to the vm_page_unwire() call so that the caller can select either inactive or active queue to put the page on.	1998-10-28 13:37:02 +00:00
David Greenman	300ee8246e	Nuked PG_TABLED flag. Replaced with m->object != NULL.	1998-10-21 14:46:42 +00:00
Doug Rabson	e69763a315	Cosmetic changes to the PAGE_XXX macros to make them consistent with the other objects in vm.	1998-09-04 08:06:57 +00:00
Garrett Wollman	85e7f5492b	Separate wakeup conditions for page I/O count (pg_busy) and lock (PG_BUSY). This is not sa completely solution to the deadlock, but the additional wakeups have helped in my observation. Suggested by: John Dyson	1998-09-01 17:12:19 +00:00
Doug Rabson	069e9bc1b4	Change various syscalls to use size_t arguments instead of u_int. Add some overflow checks to read/write (from bde). Change all modifications to vm_page::flags, vm_page::busy, vm_object::flags and vm_object::paging_in_progress to use operations which are not interruptable. Reviewed by: Bruce Evans <bde@zeta.org.au>	1998-08-24 08:39:39 +00:00
Stephen McKay	9e80236560	Correct/clarify some comments.	1998-08-22 15:24:09 +00:00
John-Mark Gurney	20f718132d	document some VM paging options for cache sizes: PQ_NOOPT no coloring PQ_LARGECACHE used for 512k/16k cache PQ_HUGECACHE used for 1024k/16k cache	1998-06-30 08:01:30 +00:00
Bruce Evans	be160d60ab	Removed unused includes.	1998-06-21 18:02:50 +00:00
Doug Rabson	ecbb00a262	This commit fixes various 64bit portability problems required for FreeBSD/alpha. The most significant item is to change the command argument to ioctl functions from int to u_long. This change brings us inline with various other BSD versions. Driver writers may like to use (__FreeBSD_version == 300003) to detect this change. The prototype FreeBSD/alpha machdep will follow in a couple of days time.	1998-06-07 17:13:14 +00:00
John Dyson	b9cefc08e2	Support a 16K first level cache for 512K 2nd level. Also, add support for 1MB 2nd level cache.	1998-05-24 04:25:27 +00:00
John Dyson	8f9110f6a1	This mega-commit is meant to fix numerous interrelated problems. There has been some bitrot and incorrect assumptions in the vfs_bio code. These problems have manifest themselves worse on NFS type filesystems, but can still affect local filesystems under certain circumstances. Most of the problems have involved mmap consistancy, and as a side-effect broke the vfs.ioopt code. This code might have been committed seperately, but almost everything is interrelated. 1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that are fully valid. 2) Rather than deactivating erroneously read initial (header) pages in kern_exec, we now free them. 3) Fix the rundown of non-VMIO buffers that are in an inconsistent (missing vp) state. 4) Fix the disassociation of pages from buffers in brelse. The previous code had rotted and was faulty in a couple of important circumstances. 5) Remove a gratuitious buffer wakeup in vfs_vmio_release. 6) Remove a crufty and currently unused cluster mechanism for VBLK files in vfs_bio_awrite. When the code is functional, I'll add back a cleaner version. 7) The page busy count wakeups assocated with the buffer cache usage were incorrectly cleaned up in a previous commit by me. Revert to the original, correct version, but with a cleaner implementation. 8) The cluster read code now tries to keep data associated with buffers more aggressively (without breaking the heuristics) when it is presumed that the read data (buffers) will be soon needed. 9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The delay loop waiting is not useful for filesystem locks, due to the length of the time intervals. 10) Correct and clean-up spec_getpages. 11) Implement a fully functional nfs_getpages, nfs_putpages. 12) Fix nfs_write so that modifications are coherent with the NFS data on the server disk (at least as well as NFS seems to allow.) 13) Properly support MS_INVALIDATE on NFS. 14) Properly pass down MS_INVALIDATE to lower levels of the VM code from vm_map_clean. 15) Better support the notion of pages being busy but valid, so that fewer in-transit waits occur. (use p->busy more for pageouts instead of PG_BUSY.) Since the page is fully valid, it is still usable for reads. 16) It is possible (in error) for cached pages to be busy. Make the page allocation code handle that case correctly. (It should probably be a printf or panic, but I want the system to handle coding errors robustly. I'll probably add a printf.) 17) Correct the design and usage of vm_page_sleep. It didn't handle consistancy problems very well, so make the design a little less lofty. After vm_page_sleep, if it ever blocked, it is still important to relookup the page (if the object generation count changed), and verify it's status (always.) 18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up. 19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush. 20) Fix vm_pager_put_pages and it's descendents to support an int flag instead of a boolean, so that we can pass down the invalidate bit.	1998-03-07 21:37:31 +00:00
John Dyson	ffc82b0a70	1) Use a more consistent page wait methodology. 2) Do not unnecessarily force page blocking when paging pages out. 3) Further improve swap pager performance and correctness, including fixing the paging in progress deadlock (except in severe I/O error conditions.) 4) Enable vfs_ioopt=1 as a default. 5) Fix and enable the page prezeroing in SMP mode. All in all, SMP systems especially should show a significant improvement in "snappyness."	1998-03-01 04:18:54 +00:00
John Dyson	95461b450d	1) Start using a cleaner and more consistant page allocator instead of the various ad-hoc schemes. 2) When bringing in UPAGES, the pmap code needs to do another vm_page_lookup. 3) When appropriate, set the PG_A or PG_M bits a-priori to both avoid some processor errata, and to minimize redundant processor updating of page tables. 4) Modify pmap_protect so that it can only remove permissions (as it originally supported.) The additional capability is not needed. 5) Streamline read-only to read-write page mappings. 6) For pmap_copy_page, don't enable write mapping for source page. 7) Correct and clean-up pmap_incore. 8) Cluster initial kern_exec pagin. 9) Removal of some minor lint from kern_malloc. 10) Correct some ioopt code. 11) Remove some dead code from the MI swapout routine. 12) Correct vm_object_deallocate (to remove backing_object ref.) 13) Fix dead object handling, that had problems under heavy memory load. 14) Add minor vm_page_lookup improvements. 15) Some pages are not in objects, and make sure that the vm_page.c can properly support such pages. 16) Add some more page deficit handling. 17) Some minor code readability improvements.	1998-02-05 03:32:49 +00:00
Peter Wemm	6875d25465	Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.	1997-02-22 09:48:43 +00:00
Jordan K. Hubbard	1130b656e5	Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.	1997-01-14 07:20:47 +00:00
John Dyson	853a7bc893	Make the default cache size optim to be 256K, the old default was 64K. The change has essentially neutral effect on those machines with little or no cache, and has a positive effect on "normal" machines with 256K or more cache.	1996-10-06 22:26:13 +00:00
John Dyson	5070c7f8c5	Addition of page coloring support. Various levels of coloring are afforded. The default level works with minimal overhead, but one can also enable full, efficient use of a 512K cache. (Parameters can be generated to support arbitrary cache sizes also.)	1996-09-08 20:44:49 +00:00
John Dyson	67bf686897	Backed out the recent changes/enhancements to the VM code. The problem with the 'shell scripts' was found, but there was a 'strange' problem found with a 486 laptop that we could not find. This commit backs the code back to 25-jul, and will be re-entered after the snapshot in smaller (more easily tested) chunks.	1996-07-30 03:08:57 +00:00
John Dyson	4f4d35edf0	This commit is meant to solve a couple of VM system problems or performance issues. 1) The pmap module has had too many inlines, and so the object file is simply bigger than it needs to be. Some common code is also merged into subroutines. 2) Removal of some evil PHYS_TO_VM_PAGE macro calls. Unfortunately, a few have needed to be added also. The removal caused the need for more vm_page_lookups. I added lookup hints to minimize the need for the page table lookup operations. 3) Removal of some bogus performance improvements, that mostly made the code more complex (tracking individual page table page updates unnecessarily). Those improvements actually hurt 386 processors perf (not that people who worry about perf use 386 processors anymore :-)). 4) Changed pv queue manipulations/structures to be TAILQ's. 5) The pv queue code has had some performance problems since day one. Some significant scalability issues are resolved by threading the pv entries from the pmap AND the physical address instead of just the physical address. This makes certain pmap operations run much faster. This does not affect most micro-benchmarks, but should help loaded system performance significantly. DG helped and came up with most of the solution for this one. 6) Most if not all pmap bit operations follow the pattern: pmap_test_bit(); pmap_clear_bit(); That made for twice the necessary pv list traversal. The pmap interface now supports only pmap_tc_bit type operations: pmap_[test/clear]_modified, pmap_[test/clear]_referenced. Additionally, the modified routine now takes a vm_page_t arg instead of a phys address. This eliminates a PHYS_TO_VM_PAGE operation. 7) Several rewrites of routines that contain redundant code to use common routines, so that there is a greater likelihood of keeping the cache footprint smaller.	1996-07-27 03:24:10 +00:00
John Dyson	38efa82b23	This commit does a couple of things: Re-enables the RSS limiting, and the routine is now tail-recursive, making it much more safe (eliminates the possiblity of kernel stack overflow.) Also, the RSS limiting is a little more intelligent about finding the likely objects that are pushing the process over the limit. Added some sysctls that help with VM system tuning. New sysctl features: 1) Enable/disable lru pageout algorithm. vm.pageout_algorithm = 0, default algorithm that works well, especially using X windows and heavy memory loading. Can have adverse effects, sometimes slowing down program loading. vm.pageout_algorithm = 1, close to true LRU. Works much better than clock, etc. Does not work as well as the default algorithm in general. Certain memory "malloc" type benchmarks work a little better with this setting. Please give me feedback on the performance results associated with these. 2) Enable/disable swapping. vm.swapping_enabled = 1, default. vm.swapping_enabled = 0, useful for cases where swapping degrades performance. The config option "NO_SWAPPING" is still operative, and takes precedence over the sysctl. If "NO_SWAPPING" is specified, the sysctl still exists, but "vm.swapping_enabled" is hard-wired to "0". Each of these can be changed "on the fly."	1996-06-26 05:39:27 +00:00
John Dyson	886d3e1150	Adjust the threshold for blocking on movement of pages from the cache queue in vm_fault. Move the PG_BUSY in vm_fault to the correct place. Remove redundant/unnecessary code in pmap.c. Properly block on rundown of page table pages, if they are busy. I think that the VM system is in pretty good shape now, and the following individuals (among others, in no particular order) have helped with this recent bunch of bugs, thanks! If I left anyone out, I apologize! Stephen McKay, Stephen Hocking, Eric J. Chet, Dan O'Brien, James Raynard, Marc Fournier.	1996-06-08 06:48:35 +00:00
John Dyson	6b6f000870	Keep page-table pages from ever being sensed as dirty. This should fix some problems with the page-table page management code, since it can't deal with the notion of page-table pages being paged out or in transit. Also, clean up some stylistic issues per some suggestions from Stephen McKay.	1996-06-05 03:31:49 +00:00
John Dyson	7f5fe93fc7	One more file missing from the mega-commit. This inlines some very simple routines in vm_page.c, so that an unnecessary subroutine call is removed.	1996-05-18 04:00:18 +00:00
Mike Pritchard	6c5e9bbdf5	Fix a bunch of spelling errors in the comment fields of a bunch of system include files.	1996-01-30 23:02:38 +00:00
John Dyson	bd7e5f992e	Eliminated many redundant vm_map_lookup operations for vm_mmap. Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish overhead for merged cache. Efficiency improvement for vfs_cluster. It used to do alot of redundant calls to cluster_rbuild. Correct the ordering for vrele of .text and release of credentials. Use the selective tlb update for 486/586/P6. Numerous fixes to the size of objects allocated for files. Additionally, fixes in the various pagers. Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs. Fixes in the swap pager for exhausted resources. The pageout code will not as readily thrash. Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE), thereby improving efficiency of several routines. Eliminate even more unnecessary vm_page_protect operations. Significantly speed up process forks. Make vm_object_page_clean more efficient, thereby eliminating the pause that happens every 30seconds. Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the case of filesystems mounted async. Fix a panic with busy pages when write clustering is done for non-VMIO buffers.	1996-01-19 04:00:31 +00:00
John Dyson	a316d390bd	Changes to support 1Tb filesizes. Pages are now named by an (object,index) pair instead of (object,offset) pair.	1995-12-11 04:58:34 +00:00
Poul-Henning Kamp	3af768903d	Remove unused vars & funcs, make things static, protoize a little bit.	1995-11-20 12:20:02 +00:00
John Dyson	d559b36913	Remove of now unused PG_COPYONWRITE.	1995-10-23 04:29:39 +00:00
John Dyson	10ad4d483c	Added prototype for new routine "vm_page_set_validclean" and initial declarations for the prezeroed pages mechanism.	1995-09-03 20:11:26 +00:00
David Greenman	24a1cce34f	NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct proc or any VM system structure will have to be rebuilt!!! Much needed overhaul of the VM system. Included in this first round of changes: 1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages, haspage, and sync operations are supported. The haspage interface now provides information about clusterability. All pager routines now take struct vm_object's instead of "pagers". 2) Improved data structures. In the previous paradigm, there is constant confusion caused by pagers being both a data structure ("allocate a pager") and a collection of routines. The idea of a pager structure has escentially been eliminated. Objects now have types, and this type is used to index the appropriate pager. In most cases, items in the pager structure were duplicated in the object data structure and thus were unnecessary. In the few cases that remained, a un_pager structure union was created in the object to contain these items. 3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now be removed. For instance, vm_object_enter(), vm_object_lookup(), vm_object_remove(), and the associated object hash list were some of the things that were removed. 4) simple_lock's removed. Discussion with several people reveals that the SMP locking primitives used in the VM system aren't likely the mechanism that we'll be adopting. Even if it were, the locking that was in the code was very inadequate and would have to be mostly re-done anyway. The locking in a uni-processor kernel was a no-op but went a long way toward making the code difficult to read and debug. 5) Places that attempted to kludge-up the fact that we don't have kernel thread support have been fixed to reflect the reality that we are really dealing with processes, not threads. The VM system didn't have complete thread support, so the comments and mis-named routines were just wrong. We now use tsleep and wakeup directly in the lock routines, for instance. 6) Where appropriate, the pagers have been improved, especially in the pager_alloc routines. Most of the pager_allocs have been rewritten and are now faster and easier to maintain. 7) The pagedaemon pageout clustering algorithm has been rewritten and now tries harder to output an even number of pages before and after the requested page. This is sort of the reverse of the ideal pagein algorithm and should provide better overall performance. 8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup have been removed. Some other unnecessary casts have also been removed. 9) Some almost useless debugging code removed. 10) Terminology of shadow objects vs. backing objects straightened out. The fact that the vm_object data structure escentially had this backwards really confused things. The use of "shadow" and "backing object" throughout the code is now internally consistent and correct in the Mach terminology. 11) Several minor bug fixes, including one in the vm daemon that caused 0 RSS objects to not get purged as intended. 12) A "default pager" has now been created which cleans up the transition of objects to the "swap" type. The previous checks throughout the code for swp->pg_data != NULL were really ugly. This change also provides the rudiments for future backing of "anonymous" memory by something other than the swap pager (via the vnode pager, for example), and it allows the decision about which of these pagers to use to be made dynamically (although will need some additional decision code to do this, of course). 13) (dyson) MAP_COPY has been deprecated and the corresponding "copy object" code has been removed. MAP_COPY was undocumented and non- standard. It was furthermore broken in several ways which caused its behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will continue to work correctly, but via the slightly different semantics of MAP_PRIVATE. 14) (dyson) Sharing maps have been removed. It's marginal usefulness in a threads design can be worked around in other ways. Both #12 and #13 were done to simplify the code and improve readability and maintain- ability. (As were most all of these changes) TODO: 1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing this will reduce the vnode pager to a mere fraction of its current size. 2) Rewrite vm_fault and the swap/vnode pagers to use the clustering information provided by the new haspage pager interface. This will substantially reduce the overhead by eliminating a large number of VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be improved to provide both a "behind" and "ahead" indication of contiguousness. 3) Implement the extended features of pager_haspage in swap_pager_haspage(). It currently just says 0 pages ahead/behind. 4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps via a much more general mechanism that could also be used for disk striping of regular filesystems. 5) Do something to improve the architecture of vm_object_collapse(). The fact that it makes calls into the swap pager and knows too much about how the swap pager operates really bothers me. It also doesn't allow for collapsing of non-swap pager objects ("unnamed" objects backed by other pagers).	1995-07-13 08:48:48 +00:00
Bruce Evans	7666fb4753	inline -> __inline. Headers should always use `__inline' for inline functions to avoid syntax errors when modules that don't even use the offending functions are compiled with `gcc -ansi'.	1995-04-23 08:05:49 +00:00
David Greenman	8953151fc8	Removed some obsolete flags. Submitted by: John Dyson	1995-03-26 23:33:14 +00:00
David Greenman	f919ebde54	Various changes from John and myself that do the following: New functions create - vm_object_pip_wakeup and pagedaemon_wakeup that are used to reduce the actual number of wakeups. New function vm_page_protect which is used in conjuction with some new page flags to reduce the number of calls to pmap_page_protect. Minor changes to reduce unnecessary spl nesting. Rewrote vm_page_alloc() to improve readability. Various other mostly cosmetic changes.	1995-03-01 23:30:04 +00:00
David Greenman	0c1dacbc5c	Moved ACT_MAX, ACT_ADVANCE, and ACT_DECLINE to vm_page.h.	1995-02-20 23:35:45 +00:00
Poul-Henning Kamp	d2fc53150b	YF fix.	1995-02-14 06:14:28 +00:00
David Greenman	6d40c3d394	Added ability to detect sequential faults and DTRT. (swap_pager.c) Added hook for pmap_prefault() and use symbolic constant for new third argument to vm_page_alloc() (vm_fault.c, various) Changed the way that upages and page tables are held. (vm_glue.c) Fixed architectural flaw in allocating pages at interrupt time that was introduced with the merged cache changes. (vm_page.c, various) Adjusted some algorithms to acheive better paging performance and to accomodate the fix for the architectural flaw mentioned above. (vm_pageout.c) Fixed pbuf handling problem, changed policy on handling read-behind page. (vnode_pager.c) Submitted by: John Dyson	1995-01-24 10:14:09 +00:00

1 2 3 4 5 ...

262 Commits