freebsd-dev

Author	SHA1	Message	Date
Alan Cox	edd16ab140	Add assertions in two places where a page's valid or dirty bits are changed.	2009-05-30 22:06:58 +00:00
Alan Cox	1c1b26f276	Eliminate page queues locking from bufdone_finish() through the following changes: Rename vfs_page_set_valid() to vfs_page_set_validclean() to reflect what this function actually does. Suggested by: tegge Introduce a new version of vfs_page_set_valid() that does no more than what the function's name implies. Specifically, it does not update the page's dirty mask, and thus it does not require the page queues lock to be held. Update two of the three callers to the old vfs_page_set_valid() to call vfs_page_set_validclean() instead because they actually require the page's dirty mask to be cleared. Introduce vm_page_set_valid(). Reviewed by: tegge	2009-05-13 05:39:39 +00:00
Konstantin Belousov	641e2829b6	Extend the struct vm_page wire_count to u_int to avoid the overflow of the counter, that may happen when too many sendfile(2) calls are being executed with this vnode [1]. To keep the size of the struct vm_page and offsets of the fields accessed by out-of-tree modules, swap the types and locations of the wire_count and cow fields. Add safety checks to detect cow overflow and force fallback to the normal copy code for zero-copy sockets. [2] Reported by: Anton Yuzhaninov <citrin citrin ru> [1] Suggested by: alc [2] Reviewed by: alc MFC after: 2 weeks	2009-01-03 13:24:08 +00:00
Rafal Jaworowski	8e321b7943	Support kernel crash mini dumps on ARM architecture. Obtained from: Juniper Networks, Semihalf	2008-11-06 16:20:27 +00:00
Ed Maste	a8a478fce6	Move CTASSERT from header file to source file, per implementation note now in the CTASSERT man page.	2008-09-26 18:44:40 +00:00
Kip Macy	4b34502e99	Work around differences in page allocation for initial page tables on xen MFC after: 1 month	2008-08-17 23:40:29 +00:00
Alan Cox	8bcd3b1998	Essentially, neither madvise(..., MADV_DONTNEED) nor madvise(..., MADV_FREE) work. (Moreover, I don't believe that they have ever worked as intended.) The explanation is fairly simple. Both MADV_DONTNEED and MADV_FREE perform vm_page_dontneed() on each page within the range given to madvise(). This function moves the page to the inactive queue. Specifically, if the page is clean, it is moved to the head of the inactive queue where it is first in line for processing by the page daemon. On the other hand, if it is dirty, it is placed at the tail. Let's further examine the case in which the page is clean. Recall that the page is at the head of the line for processing by the page daemon. The expectation of vm_page_dontneed()'s author was that the page would be transferred from the inactive queue to the cache queue by the page daemon. (Once the page is in the cache queue, it is, in effect, free, that is, it can be reallocated to a new vm object by vm_page_alloc() if it isn't reactivated quickly enough by a user of the old vm object.) The trouble is that nowhere in the execution of either MADV_DONTNEED or MADV_FREE is either the machine-independent reference flag (PG_REFERENCED) or the reference bit in any page table entry (PTE) mapping the page cleared. Consequently, the immediate reaction of the page daemon is to reactivate the page because it is referenced. In effect, the madvise() was for naught. The case in which the page was dirty is not too different. Instead of being laundered, the page is reactivated. Note: The essential difference between MADV_DONTNEED and MADV_FREE is that MADV_FREE clears a page's dirty field. So, MADV_FREE is always executing the clean case above. This revision changes vm_page_dontneed() to clear both the machine- independent reference flag (PG_REFERENCED) and the reference bit in all PTEs mapping the page. MFC after: 6 weeks	2008-06-06 18:38:43 +00:00
Alan Cox	f578838754	Don't call vm_reserv_alloc_page() on device-backed objects. Otherwise, the system may panic because there is no reservation structure corresponding to the physical address of the device memory. Reported by: Giorgos Keramidas	2008-05-15 18:52:31 +00:00
Alan Cox	44aab2c3de	Introduce vm_reserv_reclaim_contig(). This function is used by contigmalloc(9) as a last resort to steal pages from an inactive, partially-used superpage reservation. Rename vm_reserv_reclaim() to vm_reserv_reclaim_inactive() and refactor it so that a separate subroutine is responsible for breaking the selected reservation. This subroutine is also used by vm_reserv_reclaim_contig().	2008-04-06 18:09:28 +00:00
Alan Cox	e5b006ffca	Rename vm_pageq_requeue() to vm_page_requeue() on account of its recent migration to vm/vm_page.c.	2008-03-19 20:24:35 +00:00
Alan Cox	1fa94a36b1	Almost seven years ago, vm/vm_page.c was split into three parts: vm/vm_contig.c, vm/vm_page.c, and vm/vm_pageq.c. Today, vm/vm_pageq.c has withered to the point that it contains only four short functions, two of which are only used by vm/vm_page.c. Since I can't foresee any reason for vm/vm_pageq.c to grow, it is time to fold the remaining contents of vm/vm_pageq.c back into vm/vm_page.c. Add some comments. Rename one of the functions, vm_pageq_enqueue(), that is now static within vm/vm_page.c to vm_page_enqueue(). Eliminate PQ_MAXCOUNT as it no longer serves any purpose.	2008-03-18 06:52:15 +00:00
Alan Cox	273bf93c8d	Defer setting either PG_CACHED or PG_FREE until after the free page queues lock is acquired. Otherwise, the state of a reservation's pages' flags and its population count can be inconsistent. That could result in a page being freed twice. Reported by: kris	2008-01-02 04:43:47 +00:00
Alan Cox	f8a47341fe	Add the superpage reservation system. This is "part 2 of 2" of the machine-independent support for superpages. (The earlier part was the rewrite of the physical memory allocator.) The remainder of the code required for superpages support is machine-dependent and will be added to the various pmap implementations at a later date. Initially, I am only supporting one large page size per architecture. Moreover, I am only enabling the reservation system on amd64. (In an emergency, it can be disabled by setting VM_NRESERVLEVELS to 0 in amd64/include/vmparam.h or your kernel configuration file.)	2007-12-29 19:53:04 +00:00
Alan Cox	e35395ce21	Modify vm_phys_unfree_page() so that it no longer requires the given page to be in the free lists. Instead, it now returns TRUE if it removed the page from the free lists and FALSE if the page was not in the free lists. This change is required to support superpage reservations. Specifically, once reservations are introduced, a cached page can either be in the free lists or a reservation.	2007-12-20 22:45:54 +00:00
Alan Cox	0349775790	Eliminate redundant code from vm_page_startup().	2007-12-19 05:47:50 +00:00
Alan Cox	21e10ad46a	Simplify vm_page_free_toq().	2007-12-11 21:20:34 +00:00
Alan Cox	b640825647	Correct a comment.	2007-12-02 07:43:42 +00:00
Alan Cox	ddd6e7d2ab	When reactivating a cached page, reset the page's pool to the default pool. (Not doing this before was a performance pessimization but not a cause for panic.)	2007-11-21 23:22:10 +00:00
Konstantin Belousov	aefac17759	The intent of the freeing the (zeroed) page in vm_page_cache() for default object rather than cache it was to have vm_pager_has_page(object, pindex, ...) == FALSE to imply that there is no cached page in object at pindex. This allows to avoid explicit checks for cached pages in vm_object_backing_scan(). For now, we need the same bandaid for the swap object, otherwise both the vm_page_lookup() and the pager can report that there is no page at offset, while page is stored in the cache. Also, this fixes another instance of the KASSERT("object type is incompatible") failure in the vm_page_cache_transfer(). Reported and tested by: Peter Holm Reviewed by: alc MFC after: 3 days	2007-11-05 10:25:12 +00:00
Alan Cox	21f7958604	Change vm_page_cache_transfer() such that it does not transfer pages that would have an offset beyond the end of the target object. Such pages should remain in the source object. MFC after: 3 days Diagnosed and reviewed by: Kostik Belousov Reported and tested by: Peter Holm	2007-10-27 00:09:30 +00:00
Alan Cox	b8c5048025	In the rare case that vm_page_cache() actually frees the given page, it must first ensure that the page is no longer mapped. This is trivially accomplished by calling pmap_remove_all() a little earlier in vm_page_cache(). While I'm in the neighborbood, make a related panic message a little more useful. Approved by: re (kensmith) Reported by: Peter Holm and Konstantin Belousov Reviewed by: Konstantin Belousov	2007-10-08 18:01:38 +00:00
Alan Cox	dc9250f55c	Correct a lock assertion failure in sparc64's pmap_page_is_mapped() that is a consequence of sparc64/sparc64/vm_machdep.c revision 1.76. It occurs when uma_small_free() frees a page. The solution has two parts: (1) Mark pages allocated with VM_ALLOC_NOOBJ as PG_UNMANAGED. (2) Defer the lock assertion in pmap_page_is_mapped() until after PG_UNMANAGED is tested. This is safe because both PG_UNMANAGED and PG_FICTITIOUS are immutable flags, i.e., they do not change state between the time that a page is allocated and freed. Approved by: re (kensmith) PR: 116794	2007-10-07 18:03:03 +00:00
Alan Cox	c944491426	Correct an error of omission in the reimplementation of the page cache: vm_object_page_remove() should convert any cached pages that fall with the specified range to free pages. Otherwise, there could be a problem if a file is first truncated and then regrown. Specifically, some old data from prior to the truncation might reappear. Generalize vm_page_cache_free() to support the conversion of either a subset or the entirety of an object's cached pages. Reported by: tegge Reviewed by: tegge Approved by: re (kensmith)	2007-09-27 04:21:59 +00:00
Alan Cox	7bfda801a8	Change the management of cached pages (PQ_CACHE) in two fundamental ways: (1) Cached pages are no longer kept in the object's resident page splay tree and memq. Instead, they are kept in a separate per-object splay tree of cached pages. However, access to this new per-object splay tree is synchronized by the _free_ page queues lock, not to be confused with the heavily contended page queues lock. Consequently, a cached page can be reclaimed by vm_page_alloc(9) without acquiring the object's lock or the page queues lock. This solves a problem independently reported by tegge@ and Isilon. Specifically, they observed the page daemon consuming a great deal of CPU time because of pages bouncing back and forth between the cache queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of this problem turned out to be a deadlock avoidance strategy employed when selecting a cached page to reclaim in vm_page_select_cache(). However, the root cause was really that reclaiming a cached page required the acquisition of an object lock while the page queues lock was already held. Thus, this change addresses the problem at its root, by eliminating the need to acquire the object's lock. Moreover, keeping cached pages in the object's primary splay tree and memq was, in effect, optimizing for the uncommon case. Cached pages are reclaimed far, far more often than they are reactivated. Instead, this change makes reclamation cheaper, especially in terms of synchronization overhead, and reactivation more expensive, because reactivated pages will have to be reentered into the object's primary splay tree and memq. (2) Cached pages are now stored alongside free pages in the physical memory allocator's buddy queues, increasing the likelihood that large allocations of contiguous physical memory (i.e., superpages) will succeed. Finally, as a result of this change long-standing restrictions on when and where a cached page can be reclaimed and returned by vm_page_alloc(9) are eliminated. Specifically, calls to vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and return a formerly cached page. Consequently, a call to malloc(9) specifying M_NOWAIT is less likely to fail. Discussed with: many over the course of the summer, including jeff@, Justin Husted @ Isilon, peter@, tegge@ Tested by: an earlier version by kris@ Approved by: re (kensmith)	2007-09-25 06:25:06 +00:00
Alan Cox	eaa29f1ce4	Add a counter for the total number of pages cached and support for reporting the value of this counter in the program "vmstat". Approved by: re (rwatson)	2007-07-27 20:01:22 +00:00
Alan Cox	8941dc4471	Eliminate two unused functions: vm_phys_alloc_pages() and vm_phys_free_pages(). Rename vm_phys_alloc_pages_locked() to vm_phys_alloc_pages() and vm_phys_free_pages_locked() to vm_phys_free_pages(). Add comments regarding the need for the free page queues lock to be held by callers to these functions. No functional changes. Approved by: re (hrs)	2007-07-14 21:21:17 +00:00
Alan Cox	20dd22a24e	Correct a problem in the ZERO_COPY_SOCKETS option, specifically, in vm_page_cowfault(). Initially, if vm_page_cowfault() sleeps, the given page is wired, preventing it from being recycled. However, when transmission of the page completes, the page is unwired and returned to the page queues. At that point, the page is not in any special state that prevents it from being recycled. Consequently, vm_page_cowfault() should verify that the page is still held by the same vm object before retrying the replacement of the page. Note: The containing object is, however, safe from being recycled by virtue of having a non-zero paging-in-progress count. While I'm here, add some assertions and comments. Approved by: re (rwatson) MFC After: 3 weeks	2007-07-10 18:41:34 +00:00
Matt Jacob	0a49733cb9	Don't declare inline a function which isn't.	2007-06-17 04:19:05 +00:00
Alan Cox	bcc231ecb6	If attempting to cache a "busy", panic instead of printing a diagnostic message and returning.	2007-06-16 21:07:51 +00:00
Alan Cox	2446e4f02c	Enable the new physical memory allocator. This allocator uses a binary buddy system with a twist. First and foremost, this allocator is required to support the implementation of superpages. As a side effect, it enables a more robust implementation of contigmalloc(9). Moreover, this reimplementation of contigmalloc(9) eliminates the acquisition of Giant by contigmalloc(..., M_NOWAIT, ...). The twist is that this allocator tries to reduce the number of TLB misses incurred by accesses through a direct map to small, UMA-managed objects and page table pages. Roughly speaking, the physical pages that are allocated for such purposes are clustered together in the physical address space. The performance benefits vary. In the most extreme case, a uniprocessor kernel running on an Opteron, I measured an 18% reduction in system time during a buildworld. This allocator does not implement page coloring. The reason is that superpages have much the same effect. The contiguous physical memory allocation necessary for a superpage is inherently colored. Finally, the one caveat is that this allocator does not effectively support prezeroed pages. I hope this is temporary. On i386, this is a slight pessimization. However, on amd64, the beneficial effects of the direct-map optimization outweigh the ill effects. I speculate that this is true in general of machines with a direct map. Approved by: re	2007-06-16 04:57:06 +00:00
Attilio Rao	393a081d42	Optimize vmmeter locking. In particular: - Add an explicative table for locking of struct vmmeter members - Apply new rules for some of those members - Remove some unuseful comments Heavily reviewed by: alc, bde, jeff Approved by: jeff (mentor)	2007-06-10 21:59:14 +00:00
Attilio Rao	b4b7081961	Do proper "locking" for missing vmmeters part. Now, we assume no more sched_lock protection for some of them and use the distribuited loads method for vmmeter (distribuited through CPUs). Reviewed by: alc, bde Approved by: jeff (mentor)	2007-06-04 21:45:18 +00:00
Attilio Rao	2feb50bf7d	Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)	2007-05-31 22:52:15 +00:00
Jeff Roberson	80b200da28	- rename VMCNT_DEC to VMCNT_SUB to reflect the count argument. Suggested by: julian@ Contributed by: attilio@	2007-05-20 22:33:42 +00:00
Jeff Roberson	222d01951f	- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>	2007-05-18 07:10:50 +00:00
Alan Cox	04a18977c8	Define every architecture as either VM_PHYSSEG_DENSE or VM_PHYSSEG_SPARSE depending on whether the physical address space is densely or sparsely populated with memory. The effect of this definition is to determine which of two implementations of vm_page_array and PHYS_TO_VM_PAGE() is used. The legacy implementation is obtained by defining VM_PHYSSEG_DENSE, and a new implementation that trades off time for space is obtained by defining VM_PHYSSEG_SPARSE. For now, all architectures except for ia64 and sparc64 define VM_PHYSSEG_DENSE. Defining VM_PHYSSEG_SPARSE on ia64 allows the entirety of my Itanium 2's memory to be used. Previously, only the first 1 GB could be used. Defining VM_PHYSSEG_SPARSE on sparc64 allows USIIIi-based systems to boot without crashing. This change is a combination of Nathan Whitehorn's patch and my own work in perforce. Discussed with: kmacy, marius, Nathan Whitehorn PR: 112194	2007-05-05 19:50:28 +00:00
Alan Cox	9f5c801b94	Change the way that unmanaged pages are created. Specifically, immediately flag any page that is allocated to a OBJT_PHYS object as unmanaged in vm_page_alloc() rather than waiting for a later call to vm_page_unmanage(). This allows for the elimination of some uses of the page queues lock. Change the type of the kernel and kmem objects from OBJT_DEFAULT to OBJT_PHYS. This allows us to take advantage of the above change to simplify the allocation of unmanaged pages in kmem_alloc() and kmem_malloc(). Remove vm_page_unmanage(). It is no longer used.	2007-02-25 06:14:58 +00:00
Alan Cox	711585d087	Enable vm_page_free() and vm_page_free_zero() to be called on some pages without the page queues lock being held, specifically, pages that are not contained in a vm object and not a member of a page queue.	2007-02-18 05:54:42 +00:00
Alan Cox	ba000fb2c1	Remove a stale comment. Add punctuation to a nearby comment.	2007-02-17 19:37:00 +00:00
Alan Cox	d3d029bd62	Relax the page queue lock assertions in vm_page_remove() and vm_page_free_toq() to account for recent changes that allow vm_page_free_toq() to be called on some pages without the page queues lock being held, specifically, pages that are not contained in a vm object and not a member of a page queue. (Examples of such pages include page table pages, pv entry pages, and uma small alloc pages.)	2007-02-15 05:43:38 +00:00
Alan Cox	7d60988bad	Avoid the unnecessary acquisition of the free page queues lock when a page is actually being added to the hold queue, not the free queue. At the same time, avoid unnecessary tests to wake up threads waiting for free memory and the idle thread that zeroes free pages. (These tests will be performed later when the page finally moves from the hold queue to the free queue.)	2007-02-14 07:05:55 +00:00
Alan Cox	5351a2488a	Use the free page queue mutex instead of the page queue mutex to synchronize sleeping and waking of the zero idle thread.	2007-02-11 05:18:40 +00:00
Alan Cox	e9f995d824	Change the pagedaemon, vm_wait(), and vm_waitpfault() to sleep on the vm page queue free mutex instead of the vm page queue mutex.	2007-02-07 06:37:30 +00:00
Alan Cox	3ae3919d0b	Change the free page queue lock from a spin mutex to a default (blocking) mutex. With the demise of Alpha support, there is no longer a reason for it to be a spin mutex.	2007-02-05 06:02:55 +00:00
Kip Macy	35d10226b7	Remove the requirement that phys_avail be sorted in ascending order by explicitly finding the lowest and highest addresses when calculating the size of the vm_pages array Reviewed by :alc	2006-12-08 08:44:47 +00:00
Alan Cox	49c3b92531	I misplaced the assertion that was added to vm_page_startup() in the previous change. Correct its placement.	2006-11-08 19:11:54 +00:00
Alan Cox	9ad3296a25	Simplify the construction of the free queues in vm_page_startup(). Add an assertion to test a hypothesis concerning other redundant computation in vm_page_startup().	2006-11-08 18:43:47 +00:00
Alan Cox	2a53696fb8	The page queues lock is no longer required by vm_page_busy() or vm_page_wakeup(). Reduce or eliminate its use accordingly.	2006-10-22 21:18:48 +00:00
Alan Cox	9af80719db	Replace PG_BUSY with VPO_BUSY. In other words, changes to the page's busy flag, i.e., VPO_BUSY, are now synchronized by the per-vm object lock instead of the global page queues lock.	2006-10-22 04:28:14 +00:00
Ken Smith	a9a5d47c85	Fix two minor style(9) nits in v1.313 which were noticed during an MFC review. alc@ will be MFCing V1.313 plus style fix to RELENG_6.	2006-09-29 00:20:56 +00:00

1 2 3 4 5 ...

376 Commits