freebsd-dev

Author	SHA1	Message	Date
Konstantin Belousov	4a0dfeb299	MFC r198505: When protection of wired read-only mapping is changed to read-write, install new shadow object behind the map entry and copy the pages from the underlying objects to it. This makes the mprotect(2) call to actually perform the requested operation instead of silently do nothing and return success, that causes SIGSEGV on later write access to the mapping. Reuse vm_fault_copy_entry() to do the copying, modifying it to behave correctly when src_entry == dst_entry.	2009-11-17 18:38:00 +00:00
Konstantin Belousov	82bccc6465	MFC r198476 (by alc): Simplify the inner loop of vm_fault_copy_entry(). Approved by: alc	2009-11-17 18:31:09 +00:00
Alan Cox	38c15033c7	MFC r198472 Eliminate an unnecessary check from vm_fault_prefault().	2009-10-31 18:18:32 +00:00
John Baldwin	013818111a	Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to a device pager (OBJT_DEVICE) object in that it uses fictitious pages to provide aliases to other memory addresses. The primary difference is that it uses an sglist(9) to determine the physical addresses for a given offset into the object instead of invoking the d_mmap() method in a device driver. Reviewed by: alc Approved by: re (kensmith) MFC after: 2 weeks	2009-07-24 13:50:29 +00:00
Konstantin Belousov	121fd46175	When forking a vm space that has wired map entries, do not forget to charge the objects created by vm_fault_copy_entry. The object charge was set, but reserve not incremented. Reported by: Greg Rivers <gcr+freebsd-current tharned org> Reviewed by: alc (previous version) Approved by: re (kensmith)	2009-07-03 22:17:37 +00:00
Konstantin Belousov	3364c323e6	Implement global and per-uid accounting of the anonymous memory. Add rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved for the uid. The accounting information (charge) is associated with either map entry, or vm object backing the entry, assuming the object is the first one in the shadow chain and entry does not require COW. Charge is moved from entry to object on allocation of the object, e.g. during the mmap, assuming the object is allocated, or on the first page fault on the entry. It moves back to the entry on forks due to COW setup. The per-entry granularity of accounting makes the charge process fair for processes that change uid during lifetime, and decrements charge for proper uid when region is unmapped. The interface of vm_pager_allocate(9) is extended by adding struct ucred *, that is used to charge appropriate uid when allocation if performed by kernel, e.g. md(4). Several syscalls, among them is fork(2), may now return ENOMEM when global or per-uid limits are enforced. In collaboration with: pho Reviewed by: alc Approved by: re (kensmith)	2009-06-23 20:45:22 +00:00
Alan Cox	0a2e596a93	Eliminate unnecessary obfuscation when testing a page's valid bits.	2009-06-07 19:38:26 +00:00
Alan Cox	d7d9cfed36	Eliminate an incorrect comment.	2009-05-07 05:44:13 +00:00
Alan Cox	78cfe1f7bd	Eliminate an archaic band-aid. The immediately preceding comment already explains why the band-aid is unnecessary. Suggested by: tegge	2009-04-26 20:54:57 +00:00
Alan Cox	f9855e177d	Allow valid pages to be mapped for read access when they have a non-zero busy count. Only mappings that allow write access should be prevented by a non-zero busy count. (The prohibition on mapping pages for read access when they have a non- zero busy count originated in revision 1.202 of i386/i386/pmap.c when this code was a part of the pmap.) Reviewed by: tegge	2009-04-19 00:34:34 +00:00
Alan Cox	5758fe7185	Prior to r188331 a map entry's last read offset was only updated by a hard fault. In r188331 this update was relocated because of synchronization changes to a place where it would occur on both hard and soft faults. This change again restricts the update to hard faults.	2009-02-25 07:52:53 +00:00
Alan Cox	c722e407dc	Avoid some cases of unnecessary page queues locking by vm_fault's delete- behind heuristic.	2009-02-09 06:23:21 +00:00
Alan Cox	7b54b1a9f5	Eliminate OBJ_NEEDGIANT. After r188331, OBJ_NEEDGIANT's only use is by a redundant assertion in vm_fault(). Reviewed by: kib	2009-02-08 22:17:24 +00:00
Konstantin Belousov	2fada4c2b3	Remove no longer valid comment. Submitted by: alc	2009-02-08 21:20:13 +00:00
Konstantin Belousov	d2bf64c309	Do not sleep for vnode lock while holding map lock in vm_fault. Try to acquire vnode lock for OBJT_VNODE object after map lock is dropped. Because we have the busy page(s) in the object, sleeping there would result in deadlock with vnode resize. Try to get lock without sleeping, and, if the attempt failed, drop the state, lock the vnode, and restart the fault handler from the start with already locked vnode. Because the vnode_pager_lock() function is inlined in vm_fault(), axe it. Based on suggestion by: alc Reviewed by: tegge, alc Tested by: pho	2009-02-08 20:23:46 +00:00
Konstantin Belousov	0d0be82a5d	Style.	2009-02-08 19:37:01 +00:00
Alan Cox	ec96dca788	Simplify the inner loop of vm_fault()'s delete-behind heuristic. Instead of checking each page for PG_UNMANAGED, perform a one-time check whether the object is OBJT_PHYS. (PG_UNMANAGED pages only belong to OBJT_PHYS objects.)	2008-03-16 17:37:19 +00:00
Alan Cox	593e717ec9	Eliminate an unnecessary test from vm_fault's delete-behind heuristic. Specifically, since the delete-behind heuristic is never applied to a device-backed object, there is no point in checking whether each of the object's pages is fictitious. (Only device-backed objects have fictitious pages.)	2008-03-09 06:08:58 +00:00
Alan Cox	eb2a051720	Add an access type parameter to pmap_enter(). It will be used to implement superpage promotion. Correct a style error in kmem_malloc(): pmap_enter()'s last parameter is a Boolean.	2008-01-03 07:34:34 +00:00
Alan Cox	f8a47341fe	Add the superpage reservation system. This is "part 2 of 2" of the machine-independent support for superpages. (The earlier part was the rewrite of the physical memory allocator.) The remainder of the code required for superpages support is machine-dependent and will be added to the various pmap implementations at a later date. Initially, I am only supporting one large page size per architecture. Moreover, I am only enabling the reservation system on amd64. (In an emergency, it can be disabled by setting VM_NRESERVLEVELS to 0 in amd64/include/vmparam.h or your kernel configuration file.)	2007-12-29 19:53:04 +00:00
Konstantin Belousov	4ab8ab9285	Do not dereference NULL pointer. Reported by: Peter Holm Reviewed by: alc Approved by: re (kensmith)	2007-10-08 20:09:53 +00:00
Alan Cox	7bfda801a8	Change the management of cached pages (PQ_CACHE) in two fundamental ways: (1) Cached pages are no longer kept in the object's resident page splay tree and memq. Instead, they are kept in a separate per-object splay tree of cached pages. However, access to this new per-object splay tree is synchronized by the _free_ page queues lock, not to be confused with the heavily contended page queues lock. Consequently, a cached page can be reclaimed by vm_page_alloc(9) without acquiring the object's lock or the page queues lock. This solves a problem independently reported by tegge@ and Isilon. Specifically, they observed the page daemon consuming a great deal of CPU time because of pages bouncing back and forth between the cache queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of this problem turned out to be a deadlock avoidance strategy employed when selecting a cached page to reclaim in vm_page_select_cache(). However, the root cause was really that reclaiming a cached page required the acquisition of an object lock while the page queues lock was already held. Thus, this change addresses the problem at its root, by eliminating the need to acquire the object's lock. Moreover, keeping cached pages in the object's primary splay tree and memq was, in effect, optimizing for the uncommon case. Cached pages are reclaimed far, far more often than they are reactivated. Instead, this change makes reclamation cheaper, especially in terms of synchronization overhead, and reactivation more expensive, because reactivated pages will have to be reentered into the object's primary splay tree and memq. (2) Cached pages are now stored alongside free pages in the physical memory allocator's buddy queues, increasing the likelihood that large allocations of contiguous physical memory (i.e., superpages) will succeed. Finally, as a result of this change long-standing restrictions on when and where a cached page can be reclaimed and returned by vm_page_alloc(9) are eliminated. Specifically, calls to vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and return a formerly cached page. Consequently, a call to malloc(9) specifying M_NOWAIT is less likely to fail. Discussed with: many over the course of the summer, including jeff@, Justin Husted @ Isilon, peter@, tegge@ Tested by: an earlier version by kris@ Approved by: re (kensmith)	2007-09-25 06:25:06 +00:00
Alan Cox	806453645a	Two changes to vm_fault_additional_pages(): 1. Rewrite the backward scan. Specifically, reverse the order in which pages are allocated so that upon failure it is never necessary to free pages that were just allocated. Moreover, any allocated pages can be put to use. This makes the backward scan behave just like the forward scan. 2. Eliminate an explicit, unsynchronized check for low memory before calling vm_page_alloc(). It serves no useful purpose. It is, in effect, optimizing the uncommon case at the expense of the common case. Approved by: re (hrs) MFC after: 3 weeks	2007-07-20 06:55:11 +00:00
Alan Cox	d1974c0df1	Eliminate the special case handling of OBJT_DEVICE objects in vm_fault_additional_pages() that was introduced in revision 1.47. Then as now, it is unnecessary because dev_pager_haspage() returns zero for both the number of pages to read ahead and read behind, producing the same exact behavior by vm_fault_additional_pages() as the special case handling. Approved by: re (rwatson)	2007-07-08 19:42:52 +00:00
Alan Cox	65ea29a690	When a cached page is reactivated in vm_fault(), update the counter that tracks the total number of reactivated pages. (We have not been counting reactivations by vm_fault() since revision 1.46.) Correct a comment in vm_fault_additional_pages(). Approved by: re (kensmith) MFC after: 1 week	2007-07-06 21:25:21 +00:00
Matt Jacob	9dae729081	Initialize reqpage to zero.	2007-06-17 04:14:27 +00:00
Attilio Rao	6759608248	Rework the PCPU_* (MD) interface: - Rename PCPU_LAZY_INC into PCPU_INC - Add the PCPU_ADD interface which just does an add on the pcpu member given a specific value. Note that for most architectures PCPU_INC and PCPU_ADD are not safe. This is a point that needs some discussions/work in the next days. Reviewed by: alc, bde Approved by: jeff (mentor)	2007-06-04 21:38:48 +00:00
Jeff Roberson	1c4bcd050a	- Move rusage from being per-process in struct pstats to per-thread in td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)	2007-06-01 01:12:45 +00:00
Attilio Rao	2feb50bf7d	Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)	2007-05-31 22:52:15 +00:00
Alan Cox	cf4682ae23	Eliminate the reactivation of cached pages in vm_fault_prefault() and vm_map_pmap_enter() unless the caller is madvise(MADV_WILLNEED). With the exception of calls to vm_map_pmap_enter() from madvise(MADV_WILLNEED), vm_fault_prefault() and vm_map_pmap_enter() are both used to create speculative mappings. Thus, always reactivating cached pages is a mistake. In principle, cached pages should only be reactivated by an actual access. Otherwise, the following misbehavior can occur. On a hard fault for a text page the clustering algorithm fetches not only the required page but also several of the adjacent pages. Now, suppose that one or more of the adjacent pages are never accessed. Ultimately, these unused pages become cached pages through the efforts of the page daemon. However, the next activation of the executable reactivates and maps these unused pages. Consequently, they are never replaced. In effect, they become pinned in memory.	2007-05-22 04:45:59 +00:00
Jeff Roberson	222d01951f	- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>	2007-05-18 07:10:50 +00:00
Pawel Jakub Dawidek	fcdd9721e4	Fix a problem for file systems that don't implement VOP_BMAP() operation. The problem is this: vm_fault_additional_pages() calls vm_pager_has_page(), which calls vnode_pager_haspage(). Now when VOP_BMAP() returns an error (eg. EOPNOTSUPP), vnode_pager_haspage() returns TRUE without initializing 'before' and 'after' arguments, so we have some accidental values there. This bascially was causing this condition to be meet: if ((rahead + rbehind) > ((cnt.v_free_count + cnt.v_cache_count) - cnt.v_free_reserved)) { pagedaemon_wakeup(); [...] } (we have some random values in rahead and rbehind variables) I'm not entirely sure this is the right fix, maybe we should just return FALSE in vnode_pager_haspage() when VOP_BMAP() fails? alc@ knows about this problem, maybe he will be able to come up with a better fix if this is not the right one.	2007-04-05 20:49:46 +00:00
Alan Cox	768131d293	vm_page_busy() no longer requires the page queues lock to be held. Reduce the scope of the page queues lock in vm_fault() accordingly.	2007-03-23 06:11:25 +00:00
Alan Cox	d8810d894d	Use PCPU_LAZY_INC() to update page fault statistics.	2007-03-05 18:55:14 +00:00
Alan Cox	44b8bd66f9	Make pmap_enter() responsible for setting PG_WRITEABLE instead of its caller. (As a beneficial side-effect, a high-contention acquisition of the page queues lock in vm_fault() is eliminated.)	2006-11-12 21:48:34 +00:00
Alan Cox	66bdd5d619	The page queues lock is no longer required by vm_page_wakeup().	2006-10-23 05:27:31 +00:00
Alan Cox	9af80719db	Replace PG_BUSY with VPO_BUSY. In other words, changes to the page's busy flag, i.e., VPO_BUSY, are now synchronized by the per-vm object lock instead of the global page queues lock.	2006-10-22 04:28:14 +00:00
Alan Cox	9fea8cad08	Eliminate unnecessary PG_BUSY tests. They originally served a purpose that is now handled by vm object locking.	2006-10-21 21:02:04 +00:00
Alan Cox	b146f9e5d2	Reimplement the page's NOSYNC flag as an object-synchronized instead of a page queues-synchronized flag. Reduce the scope of the page queues lock in vm_fault() accordingly. Move vm_fault()'s call to vm_object_set_writeable_dirty() outside of the scope of the page queues lock. Reviewed by: tegge Additionally, eliminate an unnecessary dereference in computing the argument that is passed to vm_object_set_writeable_dirty().	2006-08-13 00:11:09 +00:00
Alan Cox	e7e56b2889	Eliminate the acquisition and release of the page queues lock around a call to vm_page_sleep_if_busy().	2006-08-06 00:17:17 +00:00
Alan Cox	2cf139527c	Retire debug.mpsafevm. None of the architectures supported in CVS require it any longer.	2006-07-21 23:22:49 +00:00
Stephan Uphoff	2053c12705	Remove mpte optimization from pmap_enter_quick(). There is a race with the current locking scheme and removing it should have no measurable performance impact. This fixes page faults leading to panics in pmap_enter_quick_locked() on amd64/i386. Reviewed by: alc,jhb,peter,ps	2006-06-15 01:01:06 +00:00
Alan Cox	8f8790a76d	Simplify the implementation of vm_fault_additional_pages() based upon the object's memq being ordered. Specifically, replace repeated calls to vm_page_lookup() by two simple constant-time operations. Reviewed by: tegge	2006-05-13 20:05:44 +00:00
Warner Losh	62a59e8f0d	Remove leading __ from __(inline\|const\|signed\|volatile). They are obsolete. This should reduce diffs to NetBSD as well.	2006-03-08 06:31:46 +00:00
Tor Egge	44ed341759	Adjust old comment (present in rev 1.1) to match changes in rev 1.82. PR: kern/92509 Submitted by: "Bryan Venteicher" <bryanv@daemoninthecloset.org>	2006-02-02 21:55:38 +00:00
Alan Cox	82eedee4a4	Use the new macros abstracting the page coloring/queues implementation. (There are no functional changes.)	2006-01-27 08:35:32 +00:00
Alexander Leidinger	ef39c05baa	MI changes: - provide an interface (macros) to the page coloring part of the VM system, this allows to try different coloring algorithms without the need to touch every file [1] - make the page queue tuning values readable: sysctl vm.stats.pagequeue - autotuning of the page coloring values based upon the cache size instead of options in the kernel config (disabling of the page coloring as a kernel option is still possible) MD changes: - detection of the cache size: only IA32 and AMD64 (untested) contains cache size detection code, every other arch just comes with a dummy function (this results in the use of default values like it was the case without the autotuning of the page coloring) - print some more info on Intel CPU's (like we do on AMD and Transmeta CPU's) Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue" and report if the cache* values are zero (= bug in the cache detection code) or not. Based upon work by: Chad David <davidc@acns.ab.ca> [1] Reviewed by: alc, arch (in 2004) Discussed with: alc, Chad David, arch (in 2004)	2005-12-31 14:39:20 +00:00
Tor Egge	b898bb1be3	Don't access fs->first_object after dropping reference to it. The result could be a missed or extra giant unlock. Reviewed by: alc	2005-12-20 12:27:59 +00:00
Alan Cox	05406e6f33	Remove unneeded calls to pmap_remove_all(). The given page is not mapped. Reviewed by: tegge	2005-12-11 22:06:57 +00:00
Alan Cox	57b5187b16	Eliminate an incorrect cast.	2005-09-07 01:42:30 +00:00

1 2 3 4 5 ...

257 Commits