freebsd-skq

Author	SHA1	Message	Date
attilio	71b7824213	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>	2008-01-13 14:44:15 +00:00
attilio	18d0a0dd51	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>	2008-01-10 01:10:58 +00:00
alc	625e38eddc	Make contigmalloc(9)'s page laundering more robust. Specifically, use vm_pageout_fallback_object_lock() in vm_contig_launder_page() to better handle a lock-ordering problem. Consequently, trylock's failure on the page's containing object no longer implies that the page cannot be laundered. MFC after: 6 weeks	2007-11-25 20:37:29 +00:00
alc	9ba5385124	Tidy up: Add comments. Eliminate the pointless malloc_type_allocated(..., 0) calls that occur when contigmalloc() has failed. Eliminate the acquisition and release of the page queues lock from vm_page_release_contig(). Rename contigmalloc2() to contigmapping(), reflecting what it does.	2007-11-25 07:42:34 +00:00
alc	d1bce06c64	Change the management of cached pages (PQ_CACHE) in two fundamental ways: (1) Cached pages are no longer kept in the object's resident page splay tree and memq. Instead, they are kept in a separate per-object splay tree of cached pages. However, access to this new per-object splay tree is synchronized by the _free_ page queues lock, not to be confused with the heavily contended page queues lock. Consequently, a cached page can be reclaimed by vm_page_alloc(9) without acquiring the object's lock or the page queues lock. This solves a problem independently reported by tegge@ and Isilon. Specifically, they observed the page daemon consuming a great deal of CPU time because of pages bouncing back and forth between the cache queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of this problem turned out to be a deadlock avoidance strategy employed when selecting a cached page to reclaim in vm_page_select_cache(). However, the root cause was really that reclaiming a cached page required the acquisition of an object lock while the page queues lock was already held. Thus, this change addresses the problem at its root, by eliminating the need to acquire the object's lock. Moreover, keeping cached pages in the object's primary splay tree and memq was, in effect, optimizing for the uncommon case. Cached pages are reclaimed far, far more often than they are reactivated. Instead, this change makes reclamation cheaper, especially in terms of synchronization overhead, and reactivation more expensive, because reactivated pages will have to be reentered into the object's primary splay tree and memq. (2) Cached pages are now stored alongside free pages in the physical memory allocator's buddy queues, increasing the likelihood that large allocations of contiguous physical memory (i.e., superpages) will succeed. Finally, as a result of this change long-standing restrictions on when and where a cached page can be reclaimed and returned by vm_page_alloc(9) are eliminated. Specifically, calls to vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and return a formerly cached page. Consequently, a call to malloc(9) specifying M_NOWAIT is less likely to fail. Discussed with: many over the course of the summer, including jeff@, Justin Husted @ Isilon, peter@, tegge@ Tested by: an earlier version by kris@ Approved by: re (kensmith)	2007-09-25 06:25:06 +00:00
alc	a8415c5a0d	Enable the new physical memory allocator. This allocator uses a binary buddy system with a twist. First and foremost, this allocator is required to support the implementation of superpages. As a side effect, it enables a more robust implementation of contigmalloc(9). Moreover, this reimplementation of contigmalloc(9) eliminates the acquisition of Giant by contigmalloc(..., M_NOWAIT, ...). The twist is that this allocator tries to reduce the number of TLB misses incurred by accesses through a direct map to small, UMA-managed objects and page table pages. Roughly speaking, the physical pages that are allocated for such purposes are clustered together in the physical address space. The performance benefits vary. In the most extreme case, a uniprocessor kernel running on an Opteron, I measured an 18% reduction in system time during a buildworld. This allocator does not implement page coloring. The reason is that superpages have much the same effect. The contiguous physical memory allocation necessary for a superpage is inherently colored. Finally, the one caveat is that this allocator does not effectively support prezeroed pages. I hope this is temporary. On i386, this is a slight pessimization. However, on amd64, the beneficial effects of the direct-map optimization outweigh the ill effects. I speculate that this is true in general of machines with a direct map. Approved by: re	2007-06-16 04:57:06 +00:00
alc	a9a2aaf8ad	Conditionally acquire Giant in vm_contig_launder_page().	2007-06-11 03:20:16 +00:00
attilio	7dd8ed88a9	Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)	2007-05-31 22:52:15 +00:00
jeff	e1996cb960	- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>	2007-05-18 07:10:50 +00:00
alc	a10280e050	Correct contigmalloc2()'s implementation of M_ZERO. Specifically, contigmalloc2() was always testing the first physical page for PG_ZERO, not the current page of interest. Submitted by: Michael Plass PR: 81301 MFC after: 1 week	2007-04-19 05:39:54 +00:00
alc	4881bd38e2	Change the free page queue lock from a spin mutex to a default (blocking) mutex. With the demise of Alpha support, there is no longer a reason for it to be a spin mutex.	2007-02-05 06:02:55 +00:00
alc	1f317d8152	Ensure that the page's oflags field is initialized by contigmalloc().	2006-11-08 06:23:29 +00:00
alc	cbcb760109	Replace PG_BUSY with VPO_BUSY. In other words, changes to the page's busy flag, i.e., VPO_BUSY, are now synchronized by the per-vm object lock instead of the global page queues lock.	2006-10-22 04:28:14 +00:00
kmacy	e728d44745	sun4v requires TSBs (translation storage buffers) to be contiguous and be size aligned requiring heavy usage of vm_page_alloc_contig This change makes vm_page_alloc_contig SMP safe Approved by: scottl (acting as backup for mentor rwatson)	2006-10-12 04:41:39 +00:00
alc	9fce925349	Make vm_page_release_contig() static.	2006-09-03 22:24:08 +00:00
alc	8f32cfe8b1	Prevent a call to contigmalloc() that asks for more physical memory than the machine has from causing a panic. Submitted by: Michael Plass PR: 101668 MFC after: 3 days	2006-08-26 02:43:23 +00:00
tegge	7d4fc1e0d4	Ignore dirty pages owned by "dead" objects.	2006-03-08 00:51:00 +00:00
tegge	774f51ad2c	Eliminate a deadlock when creating snapshots. Blocking vn_start_write() must be called without any vnode locks held. Remove calls to vn_start_write() and vn_finished_write() in vnode_pager_putpages() and add these calls before the vnode lock is obtained to most of the callers that don't already have them.	2006-03-02 22:13:28 +00:00
tegge	29fc266ddd	Hold extra reference to vm object while cleaning pages.	2006-03-02 21:38:38 +00:00
scottl	a3330c2d56	The change a few years ago of having contigmalloc start its scan at the top of physical RAM instead of the bottom was a sound idea, but the implementation left a lot to be desired. Scans would spend considerable time looking at pages that are above of the address range given by the caller, and multiple calls (like what happens in busdma) would spend more time on top of that rescanning the same pages over and over. Solve this, at least for now, with two simple optimizations. The first is to not bother scanning high ordered pages that are outside of the provided address range. Second is to cache the page index from the last successful operation so that subsequent scans don't have to restart from the top. This is conditional on the numpages argument being the same or greater between calls. MFC After: 2 weeks	2006-01-29 08:24:54 +00:00
alc	e32ac719d6	Plug a leak in the newer contigmalloc() implementation. Specifically, if a multipage allocation was aborted midway, the pages that were already allocated were not always returned to the free list. Submitted by: tegge	2006-01-26 05:51:26 +00:00
alc	d1c7af5900	The previous revision incorrectly changed a switch statement into an if statement. Specifically, a break statement that previously broke out of the enclosing switch was not changed. Consequently, the enclosing loop terminated prematurely. This could result in "vm_page_insert: page already inserted" panics. Submitted by: tegge	2006-01-25 06:45:57 +00:00
netchild	507a9b3e93	MI changes: - provide an interface (macros) to the page coloring part of the VM system, this allows to try different coloring algorithms without the need to touch every file [1] - make the page queue tuning values readable: sysctl vm.stats.pagequeue - autotuning of the page coloring values based upon the cache size instead of options in the kernel config (disabling of the page coloring as a kernel option is still possible) MD changes: - detection of the cache size: only IA32 and AMD64 (untested) contains cache size detection code, every other arch just comes with a dummy function (this results in the use of default values like it was the case without the autotuning of the page coloring) - print some more info on Intel CPU's (like we do on AMD and Transmeta CPU's) Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue" and report if the cache* values are zero (= bug in the cache detection code) or not. Based upon work by: Chad David <davidc@acns.ab.ca> [1] Reviewed by: alc, arch (in 2004) Discussed with: alc, Chad David, arch (in 2004)	2005-12-31 14:39:20 +00:00
tegge	f13480db93	Check for marker pages when scanning active and inactive page queues. Reviewed by: alc	2005-08-12 18:17:40 +00:00
green	3bb055500e	The new contigmalloc(9) has a bad degenerate case where there were many regions checked again and again despite knowing the pages contained were not usable and only satisfied the alignment constraints This case was compounded, especially for large allocations, by the practice of looping from the top of memory so as to keep out of the important low-memory regions. While the old contigmalloc(9) has the same problem, it is not as noticeable due to looping from the low memory to high. This degenerate case is fixed, as well as reversing the sense of the rest of the loops within it, to provide a tremendous speed increase. This makes the best case O(n * VM overhead) much more likely than the worst case O(4 * VM overhead). For comparison, the worst case for old contigmalloc would be O(5 * VM overhead) in addition to its strategy of turning used memory into free being highly pessimal. Also, fix a bug that in practice most likely couldn't have been triggered, int the new contigmalloc(9): it walked backwards from the end of memory without accounting for how many pages it needed. Potentially, nonexistant pages could have been mapped. This hasn't occurred because the kernel generally requests as its first contigmalloc(9) a single page. Reported by: Nicolas Dehaine <nicko@stbernard.com>, wes MFC After: 1 month More testing by: Nicolas Dehaine <nicko@stbernard.com>, wes	2005-06-11 00:05:16 +00:00
imp	f0bf889d0d	/* -> /*- for license, minor formatting changes	2005-01-07 02:29:27 +00:00
delphij	2841d31dff	Try to close a potential, but serious race in our VM subsystem. Historically, our contigmalloc1() and contigmalloc2() assumes that a page in PQ_CACHE can be unconditionally reused by busying and freeing it. Unfortunatelly, when object happens to be not NULL, the code will set m->object to NULL and disregard the fact that the page is actually in the VM page bucket, resulting in page bucket hash table corruption and finally, a filesystem corruption, or a 'page not in hash' panic. This commit has borrowed the idea taken from DragonFlyBSD's fix to the VM fix by Matthew Dillon[1]. This version of patch will do the following checks: - When scanning pages in PQ_CACHE, check hold_count and skip over pages that are held temporarily. - For pages in PQ_CACHE and selected as candidate of being freed, check if it is busy at that time. Note: It seems that this is might be unrelated to kern/72539. Obtained from: DragonFlyBSD, sys/vm/vm_contig.c,v 1.11 and 1.12 [1] Reminded by: Matt Dillon Reworked by: alc MFC After: 1 week	2004-11-24 18:56:13 +00:00
alc	25b80a64b9	The synchronization provided by vm object locking has eliminated the need for most calls to vm_page_busy(). Specifically, most calls to vm_page_busy() occur immediately prior to a call to vm_page_remove(). In such cases, the containing vm object is locked across both calls. Consequently, the setting of the vm page's PG_BUSY flag is not even visible to other threads that are following the synchronization protocol. This change (1) eliminates the calls to vm_page_busy() that immediately precede a call to vm_page_remove() or functions, such as vm_page_free() and vm_page_rename(), that call it and (2) relaxes the requirement in vm_page_remove() that the vm page's PG_BUSY flag is set. Now, the vm page's PG_BUSY flag is set only when the vm object lock is released while the vm page is still in transition. Typically, this is when it is undergoing I/O.	2004-11-03 20:17:31 +00:00
alc	0041f5bef4	Acquire the vm object lock before rather than after calling vm_page_sleep_if_busy(). (The motivation being to transition synchronization of the vm_page's PG_BUSY flag from the global page queues lock to the per-object lock.)	2004-10-24 19:32:19 +00:00
green	5e84871d58	Turn on the new contigmalloc(9) by default. There should not actually be a reason to use the old contigmalloc(9), but if desired, it the vm.old_contigmalloc setting can be tuned/sysctld back to 0 for now.	2004-08-05 21:54:11 +00:00
green	cf7423d4fe	Remove extraneous locks on the VM free page queue mutex; it is not meant to be recursed upon, and could cauuse a deadlock inside the new contigmalloc (vm.old_contigmalloc=0) code. Submitted by: alc	2004-07-19 23:29:36 +00:00
green	c4b4a8f048	Reimplement contigmalloc(9) with an algorithm which stands a greatly- improved chance of working despite pressure from running programs. Instead of trying to throw a bunch of pages out to swap and hope for the best, only a range that can potentially fulfill contigmalloc(9)'s request will have its contents paged out (potentially, not forcibly) at a time. The new contigmalloc operation still operates in three passes, but it could potentially be tuned to more or less. The first pass only looks at pages in the cache and free pages, so they would be thrown out without having to block. If this is not enough, the subsequent passes page out any unwired memory. To combat memory pressure refragmenting the section of memory being laundered, each page is removed from the systems' free memory queue once it has been freed so that blocking later doesn't cause the memory laundered so far to get reallocated. The page-out operations are now blocking, as it would make little sense to try to push out a page, then get its status immediately afterward to remove it from the available free pages queue, if it's unlikely to have been freed. Another change is that if KVA allocation fails, the allocated memory segment will be freed and not leaked. There is a sysctl/tunable, defaulting to on, which causes the old contigmalloc() algorithm to be used. Nonetheless, I have been using vm.old_contigmalloc=0 for over a month. It is safe to switch at run-time to see the difference it makes. A new interface has been used which does not require mapping the allocated pages into KVA: vm_page.h functions vm_page_alloc_contig() and vm_page_release_contig(). These are what vm.old_contigmalloc=0 uses internally, so the sysctl/tunable does not affect their operation. When using the contigmalloc(9) and contigfree(9) interfaces, memory is now tracked with malloc(9) stats. Several functions have been exported from kern_malloc.c to allow other subsystems to use these statistics, as well. This invalidates the BUGS section of the contigmalloc(9) manpage.	2004-07-19 06:21:27 +00:00
green	52f66d0879	Make contigmalloc() more reliable: 1. Remove a race whereby contigmalloc() would deadlock against the running processes in the system if they kept reinstantiating the memory on the active and inactive page queues that it was trying to flush out. The process doing the contigmalloc() would sit in "swwrt" forever and the swap pager would be going at full force, but never get anywhere. Instead of doing it until the queues are empty, launder for as many iterations as there are pages in the queue. 2. Do all laundering to swap synchronously; previously, the vnode laundering was synchronous and the swap laundering not. 3. Increase the number of launder-or-allocate passes to three, from two, while failing without bothering to do all the laundering on the third pass if allocation was not possible. This effectively gives exactly two chances to launder enough contiguous memory, helpful with high memory churn where a lot of memory from one pass to the next (and during a single laundering loop) becomes dirtied again. I can now reliably hot-plug hardware requiring a 256KB contigmalloc() without having the kldload/cbb ithread sit around failing to make progress, while running a busy X session. Previously, it took killing X to get contigmalloc() to get further (that is, quiescing the system), and even then contigmalloc() returned failure.	2004-06-15 01:02:00 +00:00
imp	04c9462c02	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999. Approved by: core	2004-04-06 20:15:37 +00:00
alc	4fad4b8d26	Remove GIANT_REQUIRED from contigfree().	2004-03-13 07:09:15 +00:00
alc	9687ab882b	In the last revision, I introduced a physical contiguity check that is both unnecessary and wrong. While it is necessary to verify that the page is still free after dropping and reacquiring the free page queue lock, the physical contiguity of the page can not change, making this check unnecessary. This check was wrong in that it could cause an out-of-bounds array access. Tested by: rwatson	2004-03-05 04:46:32 +00:00
alc	163a9eabd1	Modify contigmalloc1() so that the free page queues lock is not held when vm_page_free() is called. The problem with holding this lock is that it is a spin lock and vm_page_free() may attempt the acquisition of a different default-type lock.	2004-03-02 08:25:58 +00:00
alc	600caa157e	Correct a long-standing race condition in vm_contig_launder() that could result in a panic "vm_page_cache: caching a dirty page, ...": Access to the page must be restricted or removed before calling vm_page_cache(). This race condition is identical in nature to that which was addressed by vm_pageout.c's revision 1.251 and vm_page.c's revision 1.275. MFC after: 7 days	2004-02-16 03:43:57 +00:00
alc	24162d19e2	Remove vm_page_alloc_contig(). It's now unused.	2004-01-14 06:21:38 +00:00
alc	9aa6374009	- Unmanage pages allocated by contigmalloc1(). (There is no point in having PV entries for these pages.) - Remove splvm() and splx() calls.	2004-01-10 21:17:53 +00:00
alc	9f7878e05a	- Enable recursive acquisition of the mutex synchronizing access to the free pages queue. This is presently needed by contigmalloc1(). - Move a sanity check against attempted double allocation of two pages to the same vm object offset from vm_page_alloc() to vm_page_insert(). This provides better protection because double allocation could occur through a direct call to vm_page_insert(), such as that by vm_page_rename(). - Modify contigmalloc1() to hold the mutex synchronizing access to the free pages queue while it scans vm_page_array in search of free pages. - Correct a potential leak of pages by contigmalloc1() that I introduced in revision 1.20: We must convert all cache queue pages to free pages before we begin removing free pages from the free queue. Otherwise, if we have to restart the scan because we are unable to acquire the vm object lock that is necessary to convert a cache queue page to a free page, we leak those free pages already removed from the free queue.	2004-01-08 20:48:26 +00:00
alc	90dc293a41	Don't bother clearing PG_ZERO in contigmalloc1(), kmem_alloc(), or kmem_malloc(). It serves no purpose.	2004-01-06 20:52:55 +00:00
alc	bccf1d15ab	- Increase the object lock's scope in vm_contig_launder() so that access to the object's type field and the call to vm_pageout_flush() are synchronized. - The above change allows for the eliminaton of the last parameter to vm_pageout_flush(). - Synchronize access to the page's valid field in vm_pageout_flush() using the containing object's lock.	2003-10-18 21:09:21 +00:00
bms	44aa51e3ae	Add the mlockall() and munlockall() system calls. - All those diffs to syscalls.master for each architecture are necessary. This needed clarification; the stub code generation for mlockall() was disabled, which would prevent applications from linking to this API (suggested by mux) - Giant has been quoshed. It is no longer held by the code, as the required locking has been pushed down within vm_map.c. - Callers must specify VM_MAP_WIRE_HOLESOK or VM_MAP_WIRE_NOHOLES to express their intention explicitly. - Inspected at the vmstat, top and vm pager sysctl stats level. Paging-in activity is occurring correctly, using a test harness. - The RES size for a process may appear to be greater than its SIZE. This is believed to be due to mappings of the same shared library page being wired twice. Further exploration is needed. - Believed to back out of allocations and locks correctly (tested with WITNESS, MUTEX_PROFILING, INVARIANTS and DIAGNOSTIC). PR: kern/43426, standards/54223 Reviewed by: jake, alc Approved by: jake (mentor) MFC after: 2 weeks	2003-08-11 07:14:08 +00:00
mux	e7241bd66a	Use pmap_zero_page() to zero pages instead of bzero() because they haven't been vm_map_wire()'d yet.	2003-07-27 10:41:33 +00:00
alc	d63c1dd2b8	Acquire Giant rather than asserting it is held in contigmalloc(). This is a prerequisite to removing further uses of Giant from UMA.	2003-07-26 21:48:46 +00:00
mux	a3fee15cba	Add support for the M_ZERO flag to contigmalloc(). Reviewed by: jeff	2003-07-25 21:02:25 +00:00
alc	bab2fb677f	Lock a vm object when freeing a page from it.	2003-07-05 20:51:22 +00:00
mux	ed5355f79d	Fix a few style(9) nits.	2003-07-02 01:47:47 +00:00
obrien	b0678d7a44	Use __FBSDID().	2003-06-11 23:50:51 +00:00

1 2

67 Commits