freebsd-dev

Author	SHA1	Message	Date
Konstantin Belousov	126b36a21e	Control the execution permission of the readable segments for i386 binaries on the amd64 and ia64 with the sysctl, instead of unconditionally enabling it. Reviewed by: marcel	2011-10-15 12:35:18 +00:00
John Baldwin	9860134635	Fix a typo in a comment.	2011-10-14 11:48:32 +00:00
Marcel Moolenaar	5f81660285	In sys_obreak() and when compiling for amd64 or ia64, when the process is ILP32 (i.e. i386) grant execute permissions by default. The JDK 1.4.x depends on being able to execute from the heap on i386.	2011-10-13 16:20:10 +00:00
Gleb Smirnoff	8d689e042f	Make memguard(9) capable to guard uma(9) allocations.	2011-10-12 18:08:28 +00:00
Konstantin Belousov	17514c1bd9	Style nit. Submitted by: jhb MFC after: 2 weeks	2011-09-29 00:44:34 +00:00
Konstantin Belousov	2042bb377a	Fix grammar. Submitted by: bf MFC after: 2 weeks	2011-09-28 16:12:15 +00:00
Konstantin Belousov	abb9b935ca	Use the trick of performing the atomic operation on the contained aligned word to handle the dirty mask updates in vm_page_clear_dirty_mask(). Remove the vm page queue lock around vm_page_dirty() call in vm_fault_hold() the sole purpose of which was to protect dirty on architectures which does not provide short or byte-wide atomics. Reviewed by: alc, attilio Tested by: flo (sparc64) MFC after: 2 weeks	2011-09-28 14:57:50 +00:00
Konstantin Belousov	005f609130	Use the explicitly-sized types for the dirty and valid masks. Requested by: attilio Reviewed by: alc MFC after: 2 weeks	2011-09-28 14:51:28 +00:00
Kip Macy	8451d0dd78	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)	2011-09-16 13:58:51 +00:00
Konstantin Belousov	3407fefef6	Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic flags field. Updates to the atomic flags are performed using the atomic ops on the containing word, do not require any vm lock to be held, and are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9) functions are provided to modify afalgs. Document the changes to flags field to only require the page lock. Introduce vm_page_reference(9) function to provide a stable KPI and KBI for filesystems like tmpfs and zfs which need to mark a page as referenced. Reviewed by: alc, attilio Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64) Approved by: re (bz)	2011-09-06 10:30:11 +00:00
Konstantin Belousov	15523cf799	Update some comments in swap_pager.c. Reviewed and most wording by: alc MFC after: 1 week Approved by: re (bz)	2011-08-22 20:44:18 +00:00
Konstantin Belousov	6e903bd0d6	Apply the limit to avoid the overflows in the radix tree subr_blist.c after the conversion of the swap device size to the page size units, not before. That lifts the limit on the usable swap partition size from 32GB to 256GB, that is less depressing for the modern systems. Submitted by: Alexander V. Chernikov <melifaro ipfw ru> Reviewed by: alc Approved by: re (bz) MFC after: 2 weeks	2011-08-22 11:18:47 +00:00
Robert Watson	a9d2f8d84f	Second-to-last commit implementing Capsicum capabilities in the FreeBSD kernel for FreeBSD 9.0: Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op. Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions. In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit. Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent. Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc	2011-08-11 12:30:23 +00:00
Konstantin Belousov	d98d0ce27a	- Move the PG_UNMANAGED flag from m->flags to m->oflags, renaming the flag to VPO_UNMANAGED (and also making the flag protected by the vm object lock, instead of vm page queue lock). - Mark the fake pages with both PG_FICTITIOUS (as it is now) and VPO_UNMANAGED. As a consequence, pmap code now can use use just VPO_UNMANAGED to decide whether the page is unmanaged. Reviewed by: alc Tested by: pho (x86, previous version), marius (sparc64), marcel (arm, ia64, powerpc), ray (mips) Sponsored by: The FreeBSD Foundation Approved by: re (bz)	2011-08-09 21:01:36 +00:00
Alan Cox	12f4b65fa6	Fix an error in kmem_alloc_attr(). Unless "tries" is updated, kmem_alloc_attr() could get stuck in a loop. Approved by: re (kib) MFC after: 3 days	2011-08-07 00:11:39 +00:00
Konstantin Belousov	dda4f96087	Implement the linprocfs swaps file, providing information about the configured swap devices in the Linux-compatible format. Based on the submission by: Robert Millan <rmh debian org> PR: kern/159281 Reviewed by: bde Approved by: re (kensmith) MFC after: 2 weeks	2011-08-01 19:12:15 +00:00
Konstantin Belousov	339772b003	Fix a race in the device pager allocation. If another thread won and allocated the device pager for the given handle, then the object fictitious pages list and the object membership in the global object list still need to be initialized. Otherwise, dev_pager_dealloc() will traverse uninitialized pointers. Reported and tested by: pho Reviewed by: jhb Approved by: re (kensmith) MFC after: 1 week	2011-07-30 14:13:57 +00:00
Konstantin Belousov	2e32165ce0	Extract the code to translate VM error into errno, into an exported function vm_mmap_to_errno(). It is useful for the drivers that implement mmap(2)-like functionality, to be able to return error codes consistent with mmap(2). Sponsored by: The FreeBSD Foundation No objections from: alc MFC after: 1 week	2011-07-10 20:49:13 +00:00
Konstantin Belousov	3103730c82	Style. MFC after: 3 days	2011-07-10 20:45:13 +00:00
Konstantin Belousov	2801687d56	Add a facility to disable processing page faults. When activated, uiomove generates EFAULT if any accessed address is not mapped, as opposed to handling the fault. Sponsored by: The FreeBSD Foundation Reviewed by: alc (previous version)	2011-07-09 15:21:10 +00:00
Edward Tomasz Napierala	afcc55f318	All the racct_*() calls need to happen with the proc locked. Fixing this won't happen before 9.0. This commit adds "#ifdef RACCT" around all the "PROC_LOCK(p); racct_whatever(p, ...); PROC_UNLOCK(p)" instances, in order to avoid useless locking/unlocking in kernels built without "options RACCT".	2011-07-06 20:06:44 +00:00
Attilio Rao	91a1929f07	Handle a race between device_pager and devsw in a more graceful manner: return an error code rather than panic the kernel. Sponsored by: Sandvine Incorporated Reviewed by: kib Tested by: pho MFC after: 2 weeks	2011-07-06 15:09:52 +00:00
Alan Cox	a8229fa37c	Initialize marker pages as held rather than fictitious/wired. Marking the page as held is more useful as a safety precaution in case someone forgets to check for PG_MARKER. Reviewed by: kib	2011-07-02 23:34:47 +00:00
Alan Cox	6bbee8e28a	Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing this option to vm_object_page_remove() asserts that the specified range of pages is not mapped, or more precisely that none of these pages have any managed mappings. Thus, vm_object_page_remove() need not call pmap_remove_all() on the pages. This change not only saves time by eliminating pointless calls to pmap_remove_all(), but it also eliminates an inconsistency in the use of pmap_remove_all() versus related functions, like pmap_remove_write(). It eliminates harmless but pointless calls to pmap_remove_all() that were being performed on PG_UNMANAGED pages. Update all of the existing assertions on pmap_remove_all() to reflect this change. Reviewed by: kib	2011-06-29 16:40:41 +00:00
Alan Cox	1bfec3dfb6	Revert to using the page queues lock in vm_page_clear_dirty_mask() on MIPS. (At present, although atomic_clear_char() is defined by atomic.h on MIPS, it is not actually implemented by support.S.)	2011-06-23 05:23:59 +00:00
Alan Cox	3c76db4c64	Precisely document the synchronization rules for the page's dirty field. (Saying that the lock on the object that the page belongs to must be held only represents one aspect of the rules.) Eliminate the use of the page queues lock for atomically performing read- modify-write operations on the dirty field when the underlying architecture supports atomic operations on char and short types. Document the fact that 32KB pages aren't really supported. Reviewed by: attilio, kib	2011-06-19 19:13:24 +00:00
Konstantin Belousov	3b1025d200	Assert that page is VPO_BUSY or page owner object is locked in vm_page_undirty(). The assert is not precise due to VPO_BUSY owner to tracked, so assertion does not catch the case when VPO_BUSY is owned by other thread. Reviewed by: alc	2011-06-11 20:15:19 +00:00
Konstantin Belousov	9d17da3bef	Fix a bug in r222586. Lock the page owner object around the modification of the m->dirty. Reported and tested by: nwhitehorn Reviewed by: alc	2011-06-11 20:13:28 +00:00
Konstantin Belousov	031ec8c10a	In the VOP_PUTPAGES() implementations, change the default error from VM_PAGER_AGAIN to VM_PAGER_ERROR for the uwritten pages. Return VM_PAGER_AGAIN for the partially written page. Always forward at least one page in the loop of vm_object_page_clean(). VM_PAGER_ERROR causes the page reactivation and does not clear the page dirty state, so the write is not lost. The change fixes an infinite loop in vm_object_page_clean() when the filesystem returns permanent errors for some page writes. Reported and tested by: gavin Reviewed by: alc, rmacklem MFC after: 1 week	2011-06-01 21:00:28 +00:00
Alan Cox	8cd02d00be	Correct an error in r222163. Unless UMA_MD_SMALL_ALLOC is defined, startup_alloc() must be used until uma_startup2() is called. Reported by: jh	2011-05-22 17:46:16 +00:00
Alan Cox	342f1793ba	1. Prior to r214782, UMA did not support multipage allocations before uma_startup2() was called. Thus, setting the variable "booted" to true in uma_startup() was ok on machines with UMA_MD_SMALL_ALLOC defined, because any allocations made after uma_startup() but before uma_startup2() could be satisfied by uma_small_alloc(). Now, however, some multipage allocations are necessary before uma_startup2() just to allocate zone structures on machines with a large number of processors. Thus, a Boolean can no longer effectively describe the state of the UMA allocator. Instead, make "booted" have three values to describe how far initialization has progressed. This allows multipage allocations to continue using startup_alloc() until uma_startup2(), but single-page allocations may begin using uma_small_alloc() after uma_startup(). 2. With the aforementioned change, only a modest increase in boot pages is necessary to boot UMA on a large number of processors. 3. Retire UMA_MD_SMALL_ALLOC_NEEDS_VM. It has only been used between r182028 and r204128. Reviewed by: attilio [1], nwhitehorn [3] Tested by: sbruno	2011-05-21 17:43:43 +00:00
Alan Cox	59d7277f4a	Fix spelling errors.	2011-05-20 17:28:00 +00:00
Alan Cox	df1bc9de7c	Eliminate a redundant #include. ("vm/vm_param.h" already includes "machine/vmparam.h".)	2011-05-20 15:26:31 +00:00
Matthew D Fleming	cfb00e5aa7	Move the ZERO_REGION_SIZE to a machine-dependent file, as on many architectures (i386, for example) the virtual memory space may be constrained enough that 2MB is a large chunk. Use 64K for arches other than amd64 and ia64, with special handling for sparc64 due to differing hardware. Also commit the comment changes to kmem_init_zero_region() that I missed due to not saving the file. (Darn the unfamiliar development environment). Arch maintainers, please feel free to adjust ZERO_REGION_SIZE as you see fit. Requested by: alc MFC after: 1 week MFC with: r221853	2011-05-13 19:35:01 +00:00
Matthew D Fleming	89cb2a19ec	Usa a globally visible region of zeros for both /dev/zero and the md device. There are likely other kernel uses of "blob of zeros" than can be converted. Reviewed by: alc MFC after: 1 week	2011-05-13 18:48:00 +00:00
Max Laier	e18cc7bf3e	Another long standing vm bug found at Isilon: Fix a race between vm_object_collapse and vm_fault. Reviewed by: alc@ MFC after: 3 days	2011-05-09 20:27:49 +00:00
David E. O'Brien	cec9f109bb	Reap old SPL comments. Reviewed by: alc	2011-04-26 22:18:53 +00:00
Konstantin Belousov	86769ac0a4	Fix two bugs in r218670. Hold the vnode around the region where object lock is dropped, until vnode lock is acquired. Do not drop the vnode reference for a case when the object was deallocated during unlock. Note that in this case, VV_TEXT is cleared by vnode_pager_dealloc(). Reported and tested by: pho Reviewed by: alc MFC after: 3 days	2011-04-23 21:38:21 +00:00
John Baldwin	e806d352d2	Fix several places to ignore processes that are not yet fully constructed. MFC after: 1 week	2011-04-06 17:47:22 +00:00
Edward Tomasz Napierala	f497cda257	In vm_daemon(), do not skip processes stopped with SIGSTOP.	2011-04-06 16:27:04 +00:00
Edward Tomasz Napierala	099e7e950f	Add RACCT_RSS. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)	2011-04-06 16:24:24 +00:00
Edward Tomasz Napierala	1ba5ad4210	Add accounting for most of the memory-related resources. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)	2011-04-05 20:23:59 +00:00
Konstantin Belousov	af32c4196f	Handle the corner case in vm_fault_quick_hold_pages(). If supplied length is zero, and user address is invalid, function might return -1, due to the truncation and rounding of the address. The callers interpret the situation as EFAULT. Instead of handling the zero length in caller, filter it in vm_fault_quick_hold_pages(). Sponsored by: The FreeBSD Foundation Reviewed by: alc	2011-03-25 16:38:10 +00:00
John Baldwin	8e6fa660f2	Fix some locking nits with the p_state field of struct proc: - Hold the proc lock while changing the state from PRS_NEW to PRS_NORMAL in fork to honor the locking requirements. While here, expand the scope of the PROC_LOCK() on the new process (p2) to avoid some LORs. Previously the code was locking the new child process (p2) after it had locked the parent process (p1). However, when locking two processes, the safe order is to lock the child first, then the parent. - Fix various places that were checking p_state against PRS_NEW without having the process locked to use PROC_LOCK(). Every place was already locking the process, just after the PRS_NEW check. - Remove or reduce the use of PROC_SLOCK() for places that were checking p_state against PRS_NEW. The PROC_LOCK() alone is sufficient for reading the current state. - Reorder fill_kinfo_proc() slightly so it only acquires PROC_SLOCK() once. MFC after: 1 week	2011-03-24 18:40:11 +00:00
Jeff Roberson	e4cd31dd3c	- Merge changes to the base system to support OFED. These include a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND, and other miscellaneous small features.	2011-03-21 09:40:01 +00:00
Edward Tomasz Napierala	3fccbe4397	In vm_daemon(), when iterating over all processes in the system, skip those which are not yet fully initialized (i.e. ones with p_state == PRS_NEW). Without it, we could panic in _thread_lock_flags(). Note that there may be other instances of FOREACH_PROC_IN_SYSTEM() that require similar fix. Reported by: pho, keramida Discussed with: kib	2011-03-18 06:47:23 +00:00
Alan Cox	10cf256074	Eliminate duplication of the fake page code and zone by the device and sg pagers. Reviewed by: jhb	2011-03-11 07:07:48 +00:00
Rebecca Cran	2860553a86	Change the return type of vmspace_swap_count to a long to match the other vmspace_*_count functions. MFC after: 3 days	2011-03-01 11:04:30 +00:00
Sergey Kandaurov	7ec9c8d170	Remove sysctl vm.max_proc_mmap used to protect from KVA space exhaustion. As it was pointed out by Alan Cox, that no longer serves its purpose with the modern UMA allocator compared to the old one used in 4.x days. The removal of sysctl eliminates max_proc_mmap type overflow leading to the broken mmap(2) seen with large amount of physical memory on arches with factually unbound KVA space (such as amd64). It was found that slightly less than 256GB of physmem was enough to trigger the overflow. Reviewed by: alc, kib Approved by: avg (mentor) MFC after: 2 months	2011-02-24 09:22:56 +00:00
Rebecca Cran	65d8409cee	Calculate and return the count in vmspace_swap_count as a vm_offset_t instead of an int to avoid overflow. While here, clean up some style(9) issues. PR: kern/152200 Reviewed by: kib MFC after: 2 weeks	2011-02-23 10:28:37 +00:00
Alan Cox	e6ffa21488	Remove pmap fields that are either unused or not fully implemented. Discussed with: kib	2011-02-17 15:36:29 +00:00
Konstantin Belousov	56bdf2dbc2	Since r218070 reenabled the call to vm_map_simplify_entry() from vm_map_insert(), the kmem_back() assumption about newly inserted entry might be broken due to interference of two factors. In the low memory condition, when vm_page_alloc() returns NULL, supplied map is unlocked. If another thread performs kmem_malloc() meantime, and its map entry is placed right next to our thread map entry in the map, both entries wire count is still 0 and entries are coalesced due to vm_map_simplify_entry(). Mark new entry with MAP_ENTRY_IN_TRANSITION to prevent coalesce. Fix some style issues, tighten the assertions to account for MAP_ENTRY_IN_TRANSITION state. Reported and tested by: pho Reviewed by: alc	2011-02-15 09:03:58 +00:00
Konstantin Belousov	03fa5b34a0	Lock the vnode around clearing of VV_TEXT flag. Remove mp_fixme() note mentioning that vnode lock is needed. Reviewed by: alc Tested by: pho MFC after: 1 week	2011-02-13 21:52:26 +00:00
Juli Mallett	6edf6104a9	Use CPU_FOREACH rather than expecting CPUs 0 through mp_ncpus-1 to be present. Don't micro-optimize the uniprocessor case; use the same loop there. Submitted by: Bhanu Prakash Reviewed by: kib, jhb	2011-02-12 02:10:08 +00:00
Alan Cox	d7b20e4b45	Retire VFS_BIO_DEBUG. Convert those checks that were still valid into KASSERT()s and eliminate the rest. Replace excessive printf()s and a panic() in bufdone_finish() with a KASSERT() in vm_page_io_finish(). Reviewed by: kib	2011-02-12 01:00:00 +00:00
Alan Cox	17f3095d1a	Unless "cnt" exceeds MAX_COMMIT_COUNT, nfsrv_commit() and nfsvno_fsync() are incorrectly calling vm_object_page_clean(). They are passing the length of the range rather than the ending offset of the range. Perform the OFF_TO_IDX() conversion in vm_object_page_clean() rather than the callers. Reviewed by: kib MFC after: 3 weeks	2011-02-05 21:21:27 +00:00
Alan Cox	0cc74f144e	Since the last parameter to vm_object_shadow() is a vm_size_t and not a vm_pindex_t, it makes no sense for its callers to perform atop(). Let vm_object_shadow() do that instead.	2011-02-04 21:49:24 +00:00
Alan Cox	3d05198e23	Release the free page queues lock earlier in vm_page_alloc(). Discussed with: kib@	2011-01-30 23:55:48 +00:00
Alan Cox	d2a444c0da	Reenable the call to vm_map_simplify_entry() from vm_map_insert() for non- MAP_STACK_* entries. (See r71983 and r74235.) In some cases, performing this call to vm_map_simplify_entry() halves the number of vm map entries used by the Sun JDK.	2011-01-29 15:23:02 +00:00
Matthew D Fleming	00f0e671ff	Explicitly wire the user buffer rather than doing it implicitly in sbuf_new_for_sysctl(9). This allows using an sbuf with a SYSCTL_OUT drain for extremely large amounts of data where the caller knows that appropriate references are held, and sleeping is not an issue. Inspired by: rwatson	2011-01-27 00:34:12 +00:00
Sergey Kandaurov	4053b05b91	Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize. Submitted by: perryh pluto.rain.com (previous version) Reviewed by: jhb Approved by: kib (mentor) Tested by: universe	2011-01-21 10:26:26 +00:00
Alan Cox	2c4992db70	Move the definition of M_VMPGDATA to the swap pager, where the only remaining uses are.	2011-01-18 04:54:43 +00:00
Alan Cox	44e46b9e53	Explicitly initialize the page's queue field to PQ_NONE instead of relying on PQ_NONE being zero. Redefine PQ_NONE and PQ_COUNT so that a page queue isn't allocated for PQ_NONE. Reviewed by: kib@	2011-01-17 19:17:26 +00:00
Alan Cox	9454f82862	Sort function prototypes.	2011-01-16 20:40:50 +00:00
Alan Cox	43319c116a	Update a lock annotation on the page structure.	2011-01-16 18:04:01 +00:00
Alan Cox	4c6a2e7a1f	Shift responsibility for synchronizing access to the page's act_count field to the object's lock. Reviewed by: kib@	2011-01-16 18:01:39 +00:00
Alan Cox	9648f3447d	Clean up the start of vm_page_alloc(). In particular, eliminate an assertion that is no longer required. Long ago, calls to vm_page_alloc() from an interrupt handler had to specify VM_ALLOC_INTERRUPT so that vm_page_alloc() would not attempt to reclaim a PQ_CACHE page from another vm object. Today, with the synchronization on a vm object's collection of PQ_CACHE pages, this is no longer an issue. In fact, VM_ALLOC_INTERRUPT now reclaims PQ_CACHE pages just like VM_ALLOC_{NORMAL,SYSTEM}. MFC after: 3 weeks	2011-01-16 17:33:34 +00:00
Konstantin Belousov	c6c9025b65	For consistency, use kernel_object instead of &kernel_object_store when initializing the object mutex. Do the same for kmem_object. Discussed with: alc MFC after: 1 week	2011-01-15 21:56:38 +00:00
Alan Cox	ff5958e785	For some time now, the kernel and kmem objects have been ordinary OBJT_PHYS objects. Thus, there is no need for handling them specially in vm_fault(). In fact, this special case handling would have led to an assertion failure just before the call to pmap_enter(). Reviewed by: kib@ MFC after: 6 weeks	2011-01-15 19:21:28 +00:00
John Baldwin	58ccf5b41c	Remove unneeded includes of <sys/linker_set.h>. Other headers that use it internally contain nested includes. Reviewed by: bde	2011-01-11 13:59:06 +00:00
Konstantin Belousov	50a57dfbec	Move repeated MAXSLP definition from machine/vmparam.h to sys/vmmeter.h. Update the outdated comments describing MAXSLP and the process selection algorithm for swap out. Comments wording and reviewed by: alc	2011-01-09 12:50:44 +00:00
Alan Cox	27772ddf45	Eliminate a redundant alignment directive on the page locks array.	2011-01-09 04:34:02 +00:00
Alan Cox	ce8a13bdb9	Eliminate the counting of vm_page_pa_tryrelock calls. We really don't need it anymore. Moreover, its implementation had a type mismatch, a long is not necessarily an uint64_t. (This mismatch was hidden by casting.) Move the remaining two counters up a level in the sysctl hierarchy. There is no reason for them to be under the vm.pmap node. Reviewed by: kib	2011-01-08 22:45:22 +00:00
Alan Cox	17f6a17bf7	Release the page lock early in vm_pageout_clean(). There is no reason to hold this lock until the end of the function. With the aforementioned change to vm_pageout_clean(), page locks don't need to support recursive (MTX_RECURSE) or duplicate (MTX_DUPOK) acquisitions. Reviewed by: kib	2011-01-03 00:41:56 +00:00
Alan Cox	edf93b25d3	Make a couple refinements to r216799 and r216810. In particular, revise a comment and move it to its proper place. Reviewed by: kib	2011-01-01 17:39:38 +00:00
Rebecca Cran	4c18dec9a9	There can be more than 0x20000000 swap meta blocks allocated if a swap-backed md(4) device is used. Don't panic when deallocating such a device if swap has been used. PR: kern/133170 Discussed with: kib MFC after: 3 days	2011-01-01 16:59:05 +00:00
Konstantin Belousov	50cfe7fa50	Remove OBJ_CLEANING flag. The vfs_setdirty_locked_object() is the only consumer of the flag, and it used the flag because OBJ_MIGHTBEDIRTY was cleared early in vm_object_page_clean, before the cleaning pass was done. This is no longer true after r216799. Moreover, since OBJ_CLEANING is a flag, and not the counter, it could be reset too prematurely when parallel vm_object_page_clean() are performed. Reviewed by: alc (as a part of the bigger patch) MFC after: 1 month (after r216799 is merged)	2010-12-29 22:26:49 +00:00
Alan Cox	fef87167c9	There is no point in vm_contig_launder{,_page}() flushing held pages, instead skip over them. As long as a page is held, it can't be reclaimed by contigmalloc(M_WAITOK). Moreover, a held page may be undergoing modification, e.g., vmapbuf(), so even if the hold were released before the completion of contigmalloc(), the page might have to be flushed again. MFC after: 3 weeks	2010-12-29 20:35:36 +00:00
Konstantin Belousov	3280870dca	Move the increment of vm object generation count into vm_object_set_writeable_dirty(). Fix an issue where restart of the scan in vm_object_page_clean() did not removed write permissions for newly added pages or, if the mapping for some already scanned page changed to writeable due to fault. Merge the two loops in vm_object_page_clean(), doing the remove of write permission and cleaning in the same loop. The restart of the loop then correctly downgrade writeable mappings. Fix an issue where a second caller to msync() might actually return before the first caller had actually completed flushing the pages. Clear the OBJ_MIGHTBEDIRTY flag after the cleaning loop, not before. Calls to pmap_is_modified() are not needed after pmap_remove_write() there. Proposed, reviewed and tested by: alc MFC after: 1 week	2010-12-29 12:53:53 +00:00
Alan Cox	a5dbab5444	Correct a typo in vm_fault_quick_hold_pages(). Reported by: Bartosz Stec	2010-12-28 20:02:30 +00:00
Alan Cox	4de2261903	Move vm_object_print()'s prototype to the expected place.	2010-12-27 07:12:22 +00:00
Alan Cox	0b47b37621	Retire vm_fault_quick(). It's no longer used. Reviewed by: kib@	2010-12-25 23:54:50 +00:00
Alan Cox	82de724fe1	Introduce and use a new VM interface for temporarily pinning pages. This new interface replaces the combined use of vm_fault_quick() and pmap_extract_and_hold() throughout the kernel. In collaboration with: kib@	2010-12-25 21:26:56 +00:00
Alan Cox	acd11c7499	Introduce vm_fault_hold() and use it to (1) eliminate a long-standing race condition in proc_rwmem() and to (2) simplify the implementation of the cxgb driver's vm_fault_hold_user_pages(). Specifically, in proc_rwmem() the requested read or write could fail because the targeted page could be reclaimed between the calls to vm_fault() and vm_page_hold(). In collaboration with: kib@ MFC after: 6 weeks	2010-12-20 22:49:31 +00:00
Alan Cox	8c22654d7e	Implement and use a single optimized function for unholding a set of pages. Reviewed by: kib@	2010-12-17 22:41:22 +00:00
Alan Cox	7984ab250b	Change memguard_fudge() so that it can handle km_max being zero. Not every platform defines VM_KMEM_SIZE_MAX, and on those platforms km_max will be zero. Reviewed by: mdf Tested by: marius	2010-12-14 05:47:35 +00:00
Max Laier	a5db445da4	Fix a long standing (from the original 4.4BSD lite sources) race between vmspace_fork and vm_map_wire that would lead to "vm_fault_copy_wired: page missing" panics. While faulting in pages for a map entry that is being wired down, mark the containing map as busy. In vmspace_fork wait until the map is unbusy, before we try to copy the entries. Reviewed by: kib MFC after: 5 days Sponsored by: Isilon Systems, Inc.	2010-12-09 21:02:22 +00:00
Jayachandran C.	48772ca4aa	Revert the vm/vm_page.c change in r216317. This adds back changes in r216141, which was reverted by the above check in.	2010-12-09 07:39:06 +00:00
Jayachandran C.	aa93efedd8	swi_vm() for mips.	2010-12-09 06:54:06 +00:00
Edward Tomasz Napierala	a2f510e8ec	Fix comment intentation.	2010-12-04 17:41:58 +00:00
Warner Losh	6f1a8765be	To make minidumps work properly on mips for memory that's direct mapped and entered via vm_page_setup, keep track of it like we do for amd64. # A separate commit will be made to move this to a capability-based ifdef # rather than arch-based ifdef. Submitted by: alc@ MFC after: 1 week	2010-12-03 04:39:48 +00:00
Edward Tomasz Napierala	ef694c1ac4	Replace pointer to "struct uidinfo" with pointer to "struct ucred" in "struct vm_object". This is required to make it possible to account for per-jail swap usage. Reviewed by: kib@ Tested by: pho@ Sponsored by: FreeBSD Foundation	2010-12-02 17:37:16 +00:00
Alan Cox	05cb58f669	Correct an error in the allocation of the vm_page_dump array in vm_page_startup(). Specifically, the dump_avail array should be used instead of the phys_avail array to calculate the size of vm_page_dump. For example, the pages for the message buffer are allocated prior to vm_page_startup() by subtracting them from the last entry in the phys_avail array, but the first thing that vm_page_startup() does after creating the vm_page_dump array is to set the bits corresponding to the message buffer pages in that array. However, these bits might not actually exist in the array, because the size of the array is determined by the current value in the last entry of the phys_avail array. In general, the only reason why this doesn't always result in an out-of-bounds array access is that the size of the vm_page_dump array is rounded up to the next page boundary. This change eliminates that dependence on rounding (and luck). MFC after: 6 weeks	2010-12-01 03:35:19 +00:00
Jayachandran C.	aa54636620	Fix issue noted by alc while reviewing r215938: The current implementation of vm_page_alloc_freelist() does not handle order > 0 correctly. Remove order parameter to the function and use it only for order 0 pages. Submitted by: alc	2010-11-28 05:51:31 +00:00
Konstantin Belousov	780636b72a	After the sleep caused by encountering a busy page, relookup the page. Submitted and reviewed by: alc Reprted and tested by: pho MFC after: 5 days	2010-11-24 12:25:17 +00:00
Konstantin Belousov	3157c50313	Eliminate the mab, maf arrays and related variables. The change also fixes off-by-one error in the calculation of mreq. Suggested and reviewed by: alc Tested by: pho MFC after: 5 days	2010-11-21 10:18:28 +00:00
Alan Cox	17ea6f00d5	Optimize vm_object_terminate(). Reviewed by: kib MFC after: 1 week	2010-11-20 22:30:09 +00:00
Konstantin Belousov	4c7b9a2063	The runlen returned from vm_pageout_flush() might be zero legitimately, when mreq page has status VM_PAGER_AGAIN. MFC after: 5 days	2010-11-20 17:27:38 +00:00
Alan Cox	00f8bffc22	Reduce the amount of detail printed by vm_page_free_toq() when it panics. Reviewed by: kib	2010-11-19 17:49:08 +00:00
Max Laier	85f2a0c91e	Off by one page in vm_reserv_reclaim_contig(): Also reclaim reservations with only a single free page if that satisfies the requested size. MFC after: 3 days Reviewed by: alc	2010-11-19 04:30:33 +00:00
Konstantin Belousov	1e8a675c73	vm_pageout_flush() might cache the pages that finished write to the backing storage. Such pages might be then reused, racing with the assert in vm_object_page_collect_flush() that verified that dirty pages from the run (most likely, pages with VM_PAGER_AGAIN status) are write-protected still. In fact, the page indexes for the pages that were removed from the object page list should be ignored by vm_object_page_clean(). Return the length of successfully written run from vm_pageout_flush(), that is, the count of pages between requested page and first page after requested with status VM_PAGER_AGAIN. Supply the requested page index in the array to vm_pageout_flush(). Use the returned run length to forward the index of next page to clean in vm_object_page_clean(). Reported by: avg Reviewed by: alc MFC after: 1 week	2010-11-18 21:09:02 +00:00
Konstantin Belousov	4166faaee0	Only increment object generation count when inserting the page into object page list. The only use of object generation count now is a restart of the scan in vm_object_page_clean(), which makes sense to do on the page addition. Page removals do not affect the dirtiness of the object, as well as manipulations with the shadow chain. Suggested and reviewed by: alc MFC after: 1 week	2010-11-18 20:46:28 +00:00
Konstantin Belousov	7022f954c3	Do not use __FreeBSD_version prefix for the special osrel version. The ports/Mk/bsd.port.mk uses sys/param.h to fetch osrel, and cannot grok several constants with the prefix. Reported and tested by: swell.k gmail com MFC after: 1 week	2010-11-14 21:59:11 +00:00
Konstantin Belousov	94bce4535d	Use symbolic names instead of hardcoding values for magic p_osrel constants. MFC after: 1 week	2010-11-14 18:24:12 +00:00
Konstantin Belousov	9a6d144ff8	Implement a (soft) stack guard page for auto-growing stack mappings. The unmapped page separates the tip of the stack and possible adjanced segment, making some uses of stack overflow harder. The stack growing code refuses to expand the segment to the last page of the reseved region when sysctl security.bsd.stack_guard_page is set to 1. The default value for sysctl and accompanying tunable is 0. Please note that mmap(MAP_FIXED) still can place a mapping right up to the stack, making continuous region. Reviewed by: alc MFC after: 1 week	2010-11-14 17:53:52 +00:00
Alan Cox	2cf36c8f67	Enable reservation-based physical memory allocation. Even without the creation of large page mappings in the pmap, it can provide modest performance benefits. In particular, for a "buildworld" on a 2x 1GHz Ultrasparc IIIi it reduced the wall clock time by 2.2% and the system time by 12.6%. Tested by: marius@	2010-11-10 17:57:34 +00:00
Alan Cox	e48262487a	In case the stack size reaches its limit and its growth must be restricted, ensure that grow_amount is a multiple of the page size. Otherwise, the kernel may crash in swap_reserve_by_uid() on HEAD and FreeBSD 8.x, and produce a core file with a missing stack on FreeBSD 7.x. Diagnosed and reported by: jilles Reviewed by: kib MFC after: 1 week	2010-11-07 21:40:34 +00:00
Oleksandr Tymoshenko	903ba3da86	- Add minidump support for FreeBSD/mips	2010-11-07 03:09:02 +00:00
John Baldwin	e9a069d8af	Update startup_alloc() to support multi-page allocations and allow internal zones whose objects are larger than a page to use startup_alloc(). This allows allocation of zone objects during early boot on machines with a large number of CPUs since the resulting zone objects are larger than a page. Submitted by: trema Reviewed by: attilio MFC after: 1 week	2010-11-04 15:33:50 +00:00
Alan Cox	d689bc0082	Correct some format strings used by sysctls. MFC after: 1 week	2010-10-30 18:00:53 +00:00
John Baldwin	1a587ef2a5	- Make 'vm_refcnt' volatile so that compilers won't be tempted to treat its value as a loop invariant. Currently this is a no-op because 'atomic_cmpset_int()' clobbers all memory on current architectures. - Use atomic_fetchadd_int() instead of an atomic_cmpset_int() loop to drop a reference in vmspace_free(). Reviewed by: alc MFC after: 1 month	2010-10-21 17:29:32 +00:00
Andriy Gapon	55144670c2	PG_BUSY -> VPO_BUSY, PG_WANTED -> VPO_WANTED in manual pages and comments Reviewed by: alc MFC after: 4 days	2010-10-20 05:17:23 +00:00
Matthew D Fleming	20ed0cb0c6	uma_zfree(zone, NULL) should do nothing, to match free(9). Noticed by: Ron Steinke <rsteinke at isilon dot com> MFC after: 3 days	2010-10-19 16:06:00 +00:00
Lawrence Stewart	1c6cae9711	Change uma_zone_set_max to return the effective value of "nitems" after rounding. The same value can also be obtained with uma_zone_get_max, but this change avoids a caller having to make two back-to-back calls. Sponsored by: FreeBSD Foundation Reviewed by: gnn, jhb	2010-10-16 04:41:45 +00:00
Lawrence Stewart	c4ae7908a7	- Simplify implementation of uma_zone_get_max. - Add uma_zone_get_cur which returns the current approximate occupancy of a zone. This is useful for providing stats via sysctl amongst other things. Sponsored by: FreeBSD Foundation Reviewed by: gnn, jhb MFC after: 2 weeks	2010-10-16 04:14:45 +00:00
Alan Cox	f8616ebfae	If vm_map_find() is asked to allocate a superpage-aligned region of virtual addresses that is greater than a superpage in size but not a multiple of the superpage size, then vm_map_find() is not always expanding the kernel pmap to support the last few small pages being allocated. These failures are not commonplace, so this was first noticed by someone porting FreeBSD to a new architecture. Previously, we grew the kernel page table in vm_map_findspace() when we found the first available virtual address. This works most of the time because we always grow the kernel pmap or page table by an amount that is a multiple of the superpage size. Now, instead, we defer the call to pmap_growkernel() until we are committed to a range of virtual addresses in vm_map_insert(). In general, there is another reason to prefer calling pmap_growkernel() in vm_map_insert(). It makes it possible for someone to do the equivalent of an mmap(MAP_FIXED) on the kernel map. Reported by: Svatopluk Kraus Reviewed by: kib@ MFC after: 3 weeks	2010-10-04 16:49:40 +00:00
Matthew D Fleming	d69b01efc2	Replace an XXX comment with the appropriate code. Submitted by: alc	2010-09-20 20:41:59 +00:00
Alan Cox	da0483096d	Allow a POSIX shared memory object that is opened for read but not for write to nonetheless be mapped PROT_WRITE and MAP_PRIVATE, i.e., copy-on-write. (This is a regression in the new implementation of POSIX shared memory objects that is used by HEAD and RELENG_8. This bug does not exist in RELENG_7's user-level, file-based implementation.) PR: 150260 MFC after: 3 weeks	2010-09-19 19:42:04 +00:00
Alan Cox	8304adaac6	Make refinements to r212824. In particular, don't make vm_map_unlock_nodefer() part of the synchronization interface for maps. Add comments to vm_map_unlock_and_wait() and vm_map_wakeup() describing how they should be used. In particular, describe the deferred deallocations issue with vm_map_unlock_and_wait(). Redo the implementation of vm_map_unlock_and_wait() so that it passes along the caller's file and line information, just like the other map locking primitives. Reviewed by: kib X-MFC after: r212824	2010-09-19 17:43:22 +00:00
Konstantin Belousov	0b367bd8c0	Adopt the deferring of object deallocation for the deleted map entries on map unlock to the lock downgrade and later read unlock operation. System map entries cannot be backed by OBJT_VNODE objects, no need to defer deallocation for them. Map entries from user maps do not require the owner map for deallocation, and can be accumulated in the thread-local list for freeing when a user map is unlocked. Move the collection of entries for deferred reclamation into vm_map_delete(). Create helper vm_map_process_deferred(), that is called from locations where processing is feasible. Do not process deferred entries in vm_map_unlock_and_wait() since map_sleep_mtx is held. Reviewed by: alc, rstone (previous versions) Tested by: pho MFC after: 2 weeks	2010-09-18 15:03:31 +00:00
Matthew D Fleming	4e6571599b	Re-add r212370 now that the LOR in powerpc64 has been resolved: Add a drain function for struct sysctl_req, and use it for a variety of handlers, some of which had to do awkward things to get a large enough SBUF_FIXEDLEN buffer. Note that some sysctl handlers were explicitly outputting a trailing NUL byte. This behaviour was preserved, though it should not be necessary. Reviewed by: phk (original patch)	2010-09-16 16:13:12 +00:00
Matthew D Fleming	404a593e28	Revert r212370, as it causes a LOR on powerpc. powerpc does a few unexpected things in copyout(9) and so wiring the user buffer is not sufficient to perform a copyout(9) while holding a random mutex. Requested by: nwhitehorn	2010-09-13 18:48:23 +00:00
Matthew D Fleming	dd67e2103c	Add a drain function for struct sysctl_req, and use it for a variety of handlers, some of which had to do awkward things to get a large enough FIXEDLEN buffer. Note that some sysctl handlers were explicitly outputting a trailing NUL byte. This behaviour was preserved, though it should not be necessary. Reviewed by: phk	2010-09-09 18:33:46 +00:00
Nathan Whitehorn	42768fec0f	On architectures with non-tree-based page tables like PowerPC, every page in a range must be checked when calling pmap_remove(). Calling pmap_remove() from vm_pageout_map_deactivate_pages() with the entire range of the map could result in attempting to demap an extraordinary number of pages (> 10^15), so iterate through each map entry and unmap each of them individually. MFC after: 6 weeks	2010-09-09 13:32:58 +00:00
Ryan Stone	d473d3a164	Fix a typo in r212281. uintptr -> uintptr_t Pointy hat to: rstone Approved by: emaste (mentor) MFC after: 2 weeks	2010-09-07 02:51:11 +00:00
Ryan Stone	0d41964095	In munmap() downgrade the vm_map_lock to a read lock before taking a read lock on the pmc-sx lock. This prevents a deadlock with pmc_log_process_mappings, which has an exclusive lock on pmc-sx and tries to get a read lock on a vm_map. Downgrading the vm_map_lock in munmap allows pmc_log_process_mappings to continue, preventing the deadlock. Without this change I could cause a deadlock on a multicore 8.1-RELEASE system by having one thread constantly mmap'ing and then munmap'ing a PROT_EXEC mapping in a loop while I repeatedly invoked and stopped pmcstat in system-wide sampling mode. Reviewed by: fabient Approved by: emaste (mentor) MFC after: 2 weeks	2010-09-07 00:23:45 +00:00
Andriy Gapon	a9b89cf1c1	vm_page.c: include opt_msgbuf.h for MSGBUF_SIZE use in vm_page_startup vm_page_startup uses MSGBUF_SIZE value for adding msgbuf pages to minidump. If opt_msgbuf.h is not included and MSGBUF_SIZE is overriden in kernel config, then not all msgbuf pages will be dumped. And most importantly, struct msgbuf itself will not be included. Thus the dump would look corrupted/incomplete to tools like kgdb, dmesg, etc that try to access struct msgbuf as one of the first things they do when working on a crash dump. MFC after: 5 days	2010-09-03 10:40:53 +00:00
Matthew D Fleming	a2a200a24d	Have memguard(9) crash with an easier-to-debug message on double-free. Reviewed by: zml MFC after: 3 weeks	2010-08-31 17:43:47 +00:00
Matthew D Fleming	6d3ed393d6	The realloc case for memguard(9) will copy too many bytes when reallocating to a smaller-sized allocation. Fix this issue. Noticed by: alc Reviewed by: alc Approved by: zml (mentor) MFC after: 3 weeks	2010-08-31 16:57:58 +00:00
Alan Cox	74ffb9af15	Add the MAP_PREFAULT_READ option to mmap(2). Reviewed by: jhb, kib	2010-08-28 16:57:07 +00:00
Andre Oppermann	e49471b04b	Add uma_zone_get_max() to obtain the effective limit after a call to uma_zone_set_max(). The UMA zone limit is not exactly set to the value supplied but rounded up to completely fill the backing store increment (a page normally). This can lead to surprising situations where the number of elements allocated from UMA is higher than the supplied limit value. The new get function reads back the effective value so that the supplied limit value can be adjusted to the real limit. Reviewed by: jeffr MFC after: 1 week	2010-08-16 14:24:00 +00:00
Matthew D Fleming	f02d86e269	Fix compile. It seemed better to have memguard.c include opt_vm.h in case future compile-time knobs were added that it wants to use. Also add include guards and forward declarations to vm/memguard.h. Approved by: zml (mentor) MFC after: 1 month	2010-08-12 16:54:43 +00:00
Matthew D Fleming	e3813573bd	Rework memguard(9) to reserve significantly more KVA to detect use-after-free over a longer time. Also release the backing pages of a guarded allocation at free(9) time to reduce the overhead of using memguard(9). Allow setting and varying the malloc type at run-time. Add knobs to allow: - randomly guarding memory - adding un-backed KVA guard pages to detect underflow and overflow - a lower limit on the size of allocations that are guarded Reviewed by: alc Reviewed by: brueffer, Ulrich Spörlein <uqs spoerlein net> (man page) Silence from: -arch Approved by: zml (mentor) MFC after: 1 month	2010-08-11 22:10:37 +00:00
Konstantin Belousov	3979450b4c	Add new make_dev_p(9) flag MAKEDEV_ETERNAL to inform devfs that created cdev will never be destroyed. Propagate the flag to devfs vnodes as VV_ETERNVALDEV. Use the flags to avoid acquiring devmtx and taking a thread reference on such nodes. In collaboration with: pho MFC after: 1 month	2010-08-06 09:42:15 +00:00
John Baldwin	a3870a1826	Very rough first cut at NUMA support for the physical page allocator. For now it uses a very dumb first-touch allocation policy. This will change in the future. - Each architecture indicates the maximum number of supported memory domains via a new VM_NDOMAIN parameter in <machine/vmparam.h>. - Each cpu now has a PCPU_GET(domain) member to indicate the memory domain a CPU belongs to. Domain values are dense and numbered from 0. - When a platform supports multiple domains, the default freelist (VM_FREELIST_DEFAULT) is split up into N freelists, one for each domain. The MD code is required to populate an array of mem_affinity structures. Each entry in the array defines a range of memory (start and end) and a domain for the range. Multiple entries may be present for a single domain. The list is terminated by an entry where all fields are zero. This array of structures is used to split up phys_avail[] regions that fall in VM_FREELIST_DEFAULT into per-domain freelists. - Each memory domain has a separate lookup-array of freelists that is used when fulfulling a physical memory allocation. Right now the per-domain freelists are listed in a round-robin order for each domain. In the future a table such as the ACPI SLIT table may be used to order the per-domain lookup lists based on the penalty for each memory domain relative to a specific domain. The lookup lists may be examined via a new vm.phys.lookup_lists sysctl. - The first-touch policy is implemented by using PCPU_GET(domain) to pick a lookup list when allocating memory. Reviewed by: alc	2010-07-27 20:33:50 +00:00
Edward Tomasz Napierala	fd6f4ffb27	Fix commented out resource limit check in mlockall(2). It's still racy, but at least less misleading.	2010-07-27 19:26:18 +00:00
Alan Cox	2af6e14d39	Introduce exec_alloc_args(). The objective being to encapsulate the details of the string buffer allocation in one place. Eliminate the portion of the string buffer that was dedicated to storing the interpreter name. The pointer to the interpreter name can simply be made to point to the appropriate argument string. Reviewed by: kib	2010-07-27 17:31:03 +00:00
Alan Cox	9e4e511499	Change the order in which the file name, arguments, environment, and shell command are stored in exec*()'s demand-paged string buffer. For a "buildworld" on an 8GB amd64 multiprocessor, the new order reduces the number of global TLB shootdowns by 31%. It also eliminates about 330k page faults on the kernel address space. Change exec_shell_imgact() to use "args->begin_argv" consistently as the start of the argument and environment strings. Previously, it would sometimes use "args->buf", which is the start of the overall buffer, but no longer the start of the argument and environment strings. While I'm here, eliminate unnecessary passing of "&length" to copystr(), where we don't actually care about the length of the copied string. Clean up the initialization of the exec map. In particular, use the correct size for an entry, and express that size in the same way that is used when an entry is allocated. The old size was one page too large. (This discrepancy originated in 2004 when I rewrote exec_map_first_page() to use sf_buf_alloc() instead of the exec map for mapping the first page of the executable.) Reviewed by: kib	2010-07-25 17:43:38 +00:00
Jayachandran C.	49ca10d40c	Redo the page table page allocation on MIPS, as suggested by alc@. The UMA zone based allocation is replaced by a scheme that creates a new free page list for the KSEG0 region, and a new function in sys/vm that allocates pages from a specific free page list. This also fixes a race condition introduced by the UMA based page table page allocation code. Dropping the page queue and pmap locks before the call to uma_zfree, and re-acquiring them afterwards will introduce a race condtion(noted by alc@). The changes are : - Revert the earlier changes in MIPS pmap.c that added UMA zone for page table pages. - Add a new freelist VM_FREELIST_HIGHMEM to MIPS vmparam.h for memory that is not directly mapped (in 32bit kernel). Normal page allocations will first try the HIGHMEM freelist and then the default(direct mapped) freelist. - Add a new function 'vm_page_t vm_page_alloc_freelist(int flind, int order, int req)' to vm/vm_page.c to allocate a page from a specified freelist. The MIPS page table pages will be allocated using this function from the freelist containing direct mapped pages. - Move the page initialization code from vm_phys_alloc_contig() to a new function vm_page_alloc_init(), and use this function to initialize pages in vm_page_alloc_freelist() too. - Split the function vm_phys_alloc_pages(int pool, int order) to create vm_phys_alloc_freelist_pages(int flind, int pool, int order), and use this function from both vm_page_alloc_freelist() and vm_phys_alloc_pages(). Reviewed by: alc	2010-07-21 09:27:00 +00:00
Alan Cox	b99348e5ea	Add support for the VM_ALLOC_COUNT() hint to vm_page_alloc(). Consequently, the maintenance of vm_pageout_deficit can be localized to just two places: vm_page_alloc() and vm_pageout_scan(). This change also corrects an off-by-one error in the maintenance of vm_pageout_deficit. Historically, the buffer cache functions, allocbuf() and vm_hold_load_pages(), have not taken into account that vm_page_alloc() already increments vm_pageout_deficit by one. Reviewed by: kib	2010-07-09 19:38:30 +00:00
Konstantin Belousov	1d9e77f6bf	Make VM_ALLOC_RETRY flag mandatory for vm_page_grab(). Assert that the flag is always provided, and unconditionally retry after sleep for the busy page or failed allocation. The intent is to remove VM_ALLOC_RETRY eventually. Proposed and reviewed by: alc	2010-07-08 08:37:51 +00:00
Konstantin Belousov	5f195aa32e	Add the ability for the allocflag argument of the vm_page_grab() to specify the increment of vm_pageout_deficit when sleeping due to page shortage. Then, in allocbuf(), the code to allocate pages when extending vmio buffer can be replaced by a call to vm_page_grab(). Suggested and reviewed by: alc MFC after: 2 weeks	2010-07-05 21:13:32 +00:00
Konstantin Belousov	757216f3a5	Several cleanups for the r209686: - remove unused defines; - remove unused curgeneration argument for vm_object_page_collect_flush(); - always assert that vm_object_page_clean() is called for OBJT_VNODE; - move vm_page_find_least() into for() statement initial clause. Submitted by: alc	2010-07-04 19:02:32 +00:00
Konstantin Belousov	e239bb9730	Reimplement vm_object_page_clean(), using the fact that vm object memq is ordered by page index. This greatly simplifies the implementation, since we no longer need to mark the pages with VPO_CLEANCHK to denote the progress. It is enough to remember the current position by index before dropping the object lock. Remove VPO_CLEANCHK and VM_PAGER_IGNORE_CLEANCHK as unused. Garbage-collect vm.msync_flush_flags sysctl. Suggested and reviewed by: alc Tested by: pho	2010-07-04 11:26:56 +00:00
Konstantin Belousov	b382c10a57	Introduce a helper function vm_page_find_least(). Use it in several places, which inline the function. Reviewed by: alc Tested by: pho MFC after: 1 week	2010-07-04 11:13:33 +00:00
Alan Cox	b64400a03f	Improve the comment and man page for vm_page_alloc(). Specifically, document one of the optional flags; clarify which of the flags are optional (and which are not), and remove mention of a restriction on the reclamation of cached pages that no longer holds since version 7. MFC after: 1 week	2010-07-03 18:25:37 +00:00
Alan Cox	cdaba1f2be	Push down the acquisition of the page queues lock into vm_pageout_page_stats(). In particular, avoid acquiring the page queues lock unless iterating over the active queue.	2010-07-02 20:56:22 +00:00
Alan Cox	4b0640310a	Use vm_page_prev() instead of vm_page_lookup() in the implementation of vm_fault()'s automatic delete-behind heuristic. vm_page_prev() is typically faster.	2010-07-02 19:59:18 +00:00
Alan Cox	9cf5198832	With the demise of page coloring, the page queue macros no longer serve any useful purpose. Eliminate them. Reviewed by: kib	2010-07-02 15:02:51 +00:00
Alan Cox	95976f3f38	Simplify entry to vm_pageout_clean(). Expect the page to be locked. Previously, the caller unlocked the page, and vm_pageout_clean() immediately reacquired the page lock. Also, assert rather than test that the page is neither busy nor held. Since vm_pageout_clean() is called with the object and page locked, the page can't have changed state since the caller verified that the page is neither busy nor held.	2010-06-30 17:20:33 +00:00
Alan Cox	91b4f42767	Introduce vm_page_next() and vm_page_prev(), and use them in vm_pageout_clean(). When iterating over a range of pages, these functions can be cheaper than vm_page_lookup() because their implementation takes advantage of the vm_object's memq being ordered. Reviewed by: kib@ MFC after: 3 weeks	2010-06-21 23:27:24 +00:00
Sean Bruno	bf96595915	Add a new column to the output of vmstat -z to indicate the number of times the system was forced to sleep when requesting a new allocation. Expand the debugger hook, db_show_uma, to display these results as well. This has proven to be very useful in out of memory situations when it is not known why systems have become sluggish or fail in odd ways. Reviewed by: rwatson alc Approved by: scottl (mentor) peter Obtained from: Yahoo Inc.	2010-06-15 19:28:37 +00:00
Alan Cox	9ee2165f5d	Eliminate checks for a page having a NULL object in vm_pageout_scan() and vm_pageout_page_stats(). These checks were recently introduced by the first page locking commit, r207410, but they are not needed. At the same time, eliminate some redundant accesses to the page's object field. (These accesses should have neen eliminated by r207410.) Make the assertion in vm_page_flag_set() stricter. Specifically, only managed pages should have PG_WRITEABLE set. Add a comment documenting an assertion to vm_page_flag_clear(). It has long been the case that fictitious pages have their wire count permanently set to one. Add comments to vm_page_wire() and vm_page_unwire() documenting this. Add assertions to these functions as well. Update the comment describing vm_page_unwire(). Much of the old comment had little to do with vm_page_unwire(), but a lot to do with _vm_page_deactivate(). Move relevant parts of the old comment to _vm_page_deactivate(). Only pages that belong to an object can be paged out. Therefore, it is pointless for vm_page_unwire() to acquire the page queues lock and enqueue such pages in one of the paging queues. Generally speaking, such pages are immediately freed after the call to vm_page_unwire(). Previously, it was the call to vm_page_free() that reacquired the page queues lock and removed these pages from the paging queues. Now, we will never acquire the page queues lock for this case. (It is also worth noting that since both vm_page_unwire() and vm_page_free() occurred with the page locked, the page daemon never saw the page with its object field set to NULL.) Change the panic with vm_page_unwire() to provide a more precise message. Reviewed by: kib@	2010-06-14 19:54:19 +00:00
John Baldwin	3aa6d94e0c	Update several places that iterate over CPUs to use CPU_FOREACH().	2010-06-11 18:46:34 +00:00
Alan Cox	ce18658792	Reduce the scope of the page queues lock and the number of PG_REFERENCED changes in vm_pageout_object_deactivate_pages(). Simplify this function's inner loop using TAILQ_FOREACH(), and shorten some of its overly long lines. Update a stale comment. Assert that PG_REFERENCED may be cleared only if the object containing the page is locked. Add a comment documenting this. Assert that a caller to vm_page_requeue() holds the page queues lock, and assert that the page is on a page queue. Push down the page queues lock into pmap_ts_referenced() and pmap_page_exists_quick(). (As of now, there are no longer any pmap functions that expect to be called with the page queues lock held.) Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever be passed an unmanaged page. Assert this rather than returning "0" and "FALSE" respectively. ARM: Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH(). Push down the page queues lock inside of pmap_clearbit(), simplifying pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write(). Additionally, this allows for avoiding the acquisition of the page queues lock in some cases. PowerPC/AIM: moea_page_exits_quick() and moea_page_wired_mappings() will never be called before pmap initialization is complete. Therefore, the check for moea_initialized can be eliminated. Push down the page queues lock inside of moea_clear_bit(), simplifying moea_clear_modify() and moea_clear_reference(). The last parameter to moea_clear_bit() is never used. Eliminate it. PowerPC/BookE: Simplify mmu_booke_page_exists_quick()'s control flow. Reviewed by: kib@	2010-06-10 16:56:35 +00:00
Jayachandran C.	17dca144a2	Make vm_contig_grow_cache() extern, and use it when vm_phys_alloc_contig() fails to allocate MIPS page table pages. The current usage of VM_WAIT in case of vm_phys_alloc_contig() failure is not correct, because: "There is no guarantee that any of the available free (or cached) pages after the VM_WAIT will fall within the range of suitable physical addresses. Every time this function sleeps and a single page is freed (or cached) by someone else, this function will be reawakened. With a little bad luck, you could spin indefinitely." We also add low and high parameters to vm_contig_grow_cache() and vm_contig_launder() so that we restrict vm_contig_launder() to the range of pages we are interested in. Reported by: alc Reviewed by: alc Approved by: rrs (mentor)	2010-06-04 06:35:36 +00:00
Konstantin Belousov	4d65036b4f	Do not leak vm page lock in vm_contig_launder(), vm_pageout_page_lock() always returns with the page locked. Submitted by: alc Pointy hat to: kib	2010-06-03 18:34:34 +00:00
Konstantin Belousov	2bbfbc3fe2	Add assertion and comment in vm_page_flag_set() describing the expectations when the PG_WRITEABLE flag is set. Reviewed by: alc	2010-06-03 10:11:45 +00:00
Alan Cox	f4e10cdaa6	Maintain the pretense that we support 32KB pages for the sake of the ia64 LINT build.	2010-06-03 02:24:53 +00:00
Alan Cox	c8fa870982	Minimize the use of the page queues lock for synchronizing access to the page's dirty field. With the exception of one case, access to this field is now synchronized by the object lock.	2010-06-02 15:46:37 +00:00
Alan Cox	8f0d5d3b9f	When I pushed down the page queues lock into pmap_is_modified(), I created an ordering dependence: A pmap operation that clears PG_WRITEABLE and calls vm_page_dirty() must perform the call first. Otherwise, pmap_is_modified() could return FALSE without acquiring the page queues lock because the page is not (currently) writeable, and the caller to pmap_is_modified() might believe that the page's dirty field is clear because it has not seen the effect of the vm_page_dirty() call. When I pushed down the page queues lock into pmap_is_modified(), I overlooked one place where this ordering dependence is violated: pmap_enter(). In a rare situation pmap_enter() can be called to replace a dirty mapping to one page with a mapping to another page. (I say rare because replacements generally occur as a result of a copy-on-write fault, and so the old page is not dirty.) This change delays clearing PG_WRITEABLE until after vm_page_dirty() has been called. Fixing the ordering dependency also makes it easy to introduce a small optimization: When pmap_enter() used to replace a mapping to one page with a mapping to another page, it freed the pv entry for the first mapping and later called the pv entry allocator for the new mapping. Now, pmap_enter() attempts to recycle the old pv entry, saving two calls to the pv entry allocator. There is no point in setting PG_WRITEABLE on unmanaged pages, so don't. Update a comment to reflect this. Tidy up the variable declarations at the start of pmap_enter().	2010-05-29 17:10:45 +00:00
Alan Cox	c46b90e90a	Push down page queues lock acquisition in pmap_enter_object() and pmap_is_referenced(). Eliminate the corresponding page queues lock acquisitions from vm_map_pmap_enter() and mincore(), respectively. In mincore(), this allows some additional cases to complete without ever acquiring the page queues lock. Assert that the page is managed in pmap_is_referenced(). On powerpc/aim, push down the page queues lock acquisition from moea_is_modified() and moea_is_referenced() into moea*_query_bit(). Again, this will allow some additional cases to complete without ever acquiring the page queues lock. Reorder a few statements in vm_page_dontneed() so that a race can't lead to an old reference persisting. This scenario is described in detail by a comment. Correct a spelling error in vm_page_dontneed(). Assert that the object is locked in vm_page_clear_dirty(), and restrict the page queues lock assertion to just those cases in which the page is currently writeable. Add object locking to vnode_pager_generic_putpages(). This was the one and only place where vm_page_clear_dirty() was being called without the object being locked. Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call to vm_page_clear_dirty(). Change vnode_pager_generic_putpages() to the modern-style of function definition. Also, change the name of one of the parameters to follow virtual memory system naming conventions. Reviewed by: kib	2010-05-26 18:00:44 +00:00
Alan Cox	e98d019d3c	Eliminate the acquisition and release of the page queues lock from vfs_busy_pages(). It is no longer needed. Submitted by: kib	2010-05-25 02:26:25 +00:00
Alan Cox	567e51e18c	Roughly half of a typical pmap_mincore() implementation is machine- independent code. Move this code into mincore(), and eliminate the page queues lock from pmap_mincore(). Push down the page queues lock into pmap_clear_modify(), pmap_clear_reference(), and pmap_is_modified(). Assert that these functions are never passed an unmanaged page. Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m: Contrary to what the comment says, pmap_mincore() is not simply an optimization. Without a complete pmap_mincore() implementation, mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED because only the pmap can provide this information. Eliminate the page queues lock from vfs_setdirty_locked_object(), vm_pageout_clean(), vm_object_page_collect_flush(), and vm_object_page_clean(). Generally speaking, these are all accesses to the page's dirty field, which are synchronized by the containing vm object's lock. Reduce the scope of the page queues lock in vm_object_madvise() and vm_page_dontneed(). Reviewed by: kib (an earlier version)	2010-05-24 14:26:57 +00:00
Konstantin Belousov	a6e38685f3	When waiting for the busy page, do not unlock the object unless unlock cannot be avoided. Reviewed by: alc MFC after: 1 week	2010-05-20 08:51:01 +00:00
Alan Cox	aa12e8b71d	The page queues lock is no longer required by vm_page_set_invalid(), so eliminate it. Assert that the object containing the page is locked in vm_page_test_dirty(). Perform some style clean up while I'm here. Reviewed by: kib	2010-05-18 16:40:29 +00:00
Alan Cox	9ab6032f73	On entry to pmap_enter(), assert that the page is busy. While I'm here, make the style of assertion used by pmap_enter() consistent across all architectures. On entry to pmap_remove_write(), assert that the page is neither unmanaged nor fictitious, since we cannot remove write access to either kind of page. With the push down of the page queues lock, pmap_remove_write() cannot condition its behavior on the state of the PG_WRITEABLE flag if the page is busy. Assert that the object containing the page is locked. This allows us to know that the page will neither become busy nor will PG_WRITEABLE be set on it while pmap_remove_write() is running. Correct a long-standing bug in vm_page_cowsetup(). We cannot possibly do copy-on-write-based zero-copy transmit on unmanaged or fictitious pages, so don't even try. Previously, the call to pmap_remove_write() would have failed silently.	2010-05-16 23:45:10 +00:00
Alan Cox	a4bc2c8929	Correct an error of omission in r202897: Now that amd64 uses the direct map to access the message buffer, we must explicitly request that the underlying physical pages are included in a crash dump. Reported by: Benjamin Kaduk	2010-05-16 19:25:56 +00:00
Alan Cox	a1a95cd608	Add a comment about the proper use of vm_object_page_remove(). MFC after: 1 week	2010-05-16 16:54:05 +00:00
Alan Cox	b27753086c	Update synchronization annotations for struct vm_page. Add a comment explaining how the setting of PG_WRITEABLE is synchronized.	2010-05-11 01:29:18 +00:00
Konstantin Belousov	6e2175fc06	Continue cleaning the queue instead of moving to the next queue or bailing out if acquisition of page lock caused page position in the queue to change. Pointed out by: alc	2010-05-10 11:53:40 +00:00
Alan Cox	eee9d99231	Push down the acquisition of the page queues lock into vm_pageq_remove(). (This eliminates a surprising number of page queues lock acquisitions by vm_fault() because the page's queue is PQ_NONE and thus the page queues lock is not needed to remove the page from a queue.)	2010-05-09 16:55:42 +00:00
Alan Cox	db1f085eee	Call vm_page_deactivate() rather than vm_page_dontneed() in swp_pager_force_pagein(). By dirtying the page, swp_pager_force_pagein() forces vm_page_dontneed() to insert the page at the head of the inactive queue, just like vm_page_deactivate() does. Moreover, because the page was invalid, it can't have been mapped, and thus the other effect of vm_page_dontneed(), clearing the page's reference bits has no effect. In summary, there is no reason to call vm_page_dontneed() since its effect will be identical to calling the simpler vm_page_deactivate().	2010-05-09 16:27:42 +00:00
Alan Cox	d061cdd513	Remove the page queues lock around a call to vm_page_activate(). Make the page dirty before adding it to the active queue.	2010-05-09 00:32:52 +00:00
Alan Cox	34e7251f10	Minimize the scope of the page queues lock in vm_fault().	2010-05-08 21:35:51 +00:00
Alan Cox	3c4a24406b	Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and vm_page_try_to_free(). Consequently, push down the page queues lock into pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and pmap_remove_write(). Push down the page queues lock into Xen's pmap_page_is_mapped(). (I overlooked the Xen pmap in r207702.) Switch to a per-processor counter for the total number of pages cached.	2010-05-08 20:34:01 +00:00
Jung-uk Kim	af394cfa36	Fix a typo in the previous commit.	2010-05-07 21:06:52 +00:00
Konstantin Belousov	af4b86b949	One more use for vm_pageout_init_marker(). Reviewed by: alc	2010-05-07 18:57:26 +00:00
Alan Cox	97c3834772	Eliminate unnecessary page queues locking.	2010-05-07 16:22:06 +00:00
Alan Cox	03679e2334	Push down the page queues lock into vm_page_activate().	2010-05-07 15:49:43 +00:00
Alan Cox	c1f98960a3	Update the synchronization requirements for the page usage count.	2010-05-07 06:58:53 +00:00
Alan Cox	7072188017	Eliminate acquisitions of the page queues lock that are no longer needed. Switch to a per-processor counter for the number of pages freed during process termination.	2010-05-07 05:23:15 +00:00
Alan Cox	9402dff3de	Push down the page queues lock into vm_page_deactivate(). Eliminate an incorrect comment.	2010-05-07 04:14:07 +00:00
Alan Cox	eb00b276ab	Eliminate page queues locking around most calls to vm_page_free().	2010-05-06 18:58:32 +00:00
Alan Cox	fd8c28bfdf	Update a comment to say that access to a page's wire count is now synchronized by the page lock.	2010-05-06 17:28:59 +00:00
Alan Cox	7024db1d40	Push down the page queues lock inside of vm_page_free_toq() and pmap_page_is_mapped() in preparation for removing page queues locking around calls to vm_page_free(). Setting aside the assertion that calls pmap_page_is_mapped(), vm_page_free_toq() now acquires and holds the page queues lock just long enough to actually add or remove the page from the paging queues. Update vm_page_unhold() to reflect the above change.	2010-05-06 16:39:43 +00:00
Konstantin Belousov	8c6162468b	Add a helper function vm_pageout_page_lock(), similar to tegge' vm_pageout_fallback_object_lock(), to obtain the page lock while having page queue lock locked, and still maintain the page position in a queue. Use the helper to lock the page in the pageout daemon and contig launder iterators instead of skipping the page if its lock is contested. Skipping locked pages easily causes pagedaemon or launder to not make a progress with page cleaning. Proposed and reviewed by: alc	2010-05-06 04:57:33 +00:00
Alan Cox	5ac59343be	Acquire the page lock around all remaining calls to vm_page_free() on managed pages that didn't already have that lock held. (Freeing an unmanaged page, such as the various pmaps use, doesn't require the page lock.) This allows a change in vm_page_remove()'s locking requirements. It now expects the page lock to be held instead of the page queues lock. Consequently, the page queues lock is no longer required at all by callers to vm_page_rename(). Discussed with: kib	2010-05-05 18:16:06 +00:00
Alan Cox	e3ef0d2fcf	Push down the acquisition of the page queues lock into vm_page_unwire(). Update the comment describing which lock should be held on entry to vm_page_wire(). Reviewed by: kib	2010-05-05 03:45:46 +00:00
Alan Cox	a7283d3213	Add page locking to the vm_page_cow* functions. Push down the acquisition and release of the page queues lock into vm_page_wire(). Reviewed by: kib	2010-05-04 15:55:41 +00:00
Alan Cox	0c41a69e71	Add lock assertions.	2010-05-04 05:55:19 +00:00
Konstantin Belousov	5637a59143	Handle busy status of the page in a way expected for pager_getpage(). Flush requested page, unbusy other pages, do not clear m->busy. Reviewed by: alc MFC after: 1 week	2010-05-03 19:19:58 +00:00
Alan Cox	2d5d7f7f61	Acquire the page lock around vm_page_wire() in vm_page_grab(). Assert that the page lock is held in vm_page_wire().	2010-05-03 17:55:32 +00:00
Alan Cox	451033a48a	It makes more sense for the object-based backend allocator to use OBJT_PHYS objects instead of OBJT_DEFAULT objects because we never reclaim or pageout the allocated pages. Moreover, they are mapped with pmap_qenter(), which creates unmanaged mappings. Reviewed by: kib	2010-05-03 17:35:31 +00:00
Alan Cox	746c2ddee8	The pages allocated by kmem_alloc_attr() and kmem_malloc() are unmanaged. Consequently, neither the page lock nor the page queues lock is needed to unwire and free them.	2010-05-03 07:08:16 +00:00
Alan Cox	9f2512bab5	Assert that the page queues lock is held in vm_page_remove() and vm_page_unwire() only if the page is managed, i.e., pageable.	2010-05-03 07:00:50 +00:00
Alan Cox	b8d36afcfe	Add page lock assertions where we access the page's hold_count.	2010-05-02 23:33:10 +00:00
Alan Cox	6c56db5c9e	Eliminate an assignment that was made redundant by r207410.	2010-05-02 21:04:59 +00:00
Alan Cox	447fe2a4c6	Defer the acquisition of the page and page queues locks in vm_pageout_object_deactivate_pages().	2010-05-02 20:46:17 +00:00
Alan Cox	f623e55269	Simplify vm_fault(). The introduction of the new page lock renders a bit of cleverness by vm_fault() to avoid repeatedly releasing and reacquiring the page queues lock pointless. Reviewed by: kib, kmacy	2010-05-02 20:24:25 +00:00
Alan Cox	ac800a8490	Correct an error in r207410: Remove an unlock of a lock that is no longer held.	2010-05-02 18:09:33 +00:00
Alan Cox	b88b6c9d80	It makes no sense for vm_page_sleep_if_busy()'s helper, vm_page_sleep(), to unconditionally set PG_REFERENCED on a page before sleeping. In many cases, it's perfectly ok for the page to disappear, i.e., be reclaimed by the page daemon, before the caller to vm_page_sleep() is reawakened. Instead, we now explicitly set PG_REFERENCED in those cases where having the page persist until the caller is awakened is clearly desirable. Note, however, that setting PG_REFERENCED on the page is still only a hint, and not a guarantee that the page should persist.	2010-05-02 17:33:46 +00:00
Alan Cox	f6c8c187d4	This change addresses the race condition that was introduced by the previous revision, r207450, to this file. Specifically, between dropping the page queues lock in vm_contig_launder() and reacquiring it in vm_contig_launder_page(), the page may be removed from the active or inactive queue. It could be wired, freed, cached, etc. None of which vm_contig_launder_page() is prepared for. Reviewed by: kib, kmacy	2010-05-02 16:44:06 +00:00
Alan Cox	9b55fc0429	Correct an error of omission in r206819. If VMFS_TLB_ALIGNED_SPACE is specified to vm_map_find(), then retry the vm_map_findspace() if vm_map_insert() fails because the aligned space is already partly used. Reported by: Neel Natu	2010-05-02 01:25:03 +00:00
Kip Macy	0ce3ba8cd5	Update locking comment above vm_page: - re-assign page queue lock "Q" - assign page lock "P" - update several uncommented fields - observe that hold_count is now protected by the page lock "P"	2010-05-01 03:41:21 +00:00
Kip Macy	7bec141b12	push up dropping of the page queue lock to avoid holding it in vm_pageout_flush	2010-04-30 22:31:37 +00:00
Kip Macy	ad0c05daf9	don't call vm_pageout_flush with the page queue mutex held Reported by: Michael Butler	2010-04-30 21:21:21 +00:00
Kip Macy	6dd8b893d6	- acquire the page lock in vm_contig_launder_page before checking page fields - release page queue lock before calling vm_pageout_flush	2010-04-30 21:20:14 +00:00
Kip Macy	e8f263195d	- don't check hold_count without the page lock held - don't leak the page lock if m->object is NULL (assuming that that check will in fact even be valid when m->object is protected by the page lock)	2010-04-30 19:40:37 +00:00
Konstantin Belousov	e20e8c1558	Unlock page lock instead of recursively locking it.	2010-04-30 16:20:14 +00:00
Kip Macy	6d74d042e3	don't allow unsynchronized free in vm_page_unhold	2010-04-30 02:46:49 +00:00
Kip Macy	2965a45315	On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib	2010-04-30 00:46:43 +00:00
Alan Cox	82bfb965d1	Simplify the inner loop of vm_pageout_object_deactivate_pages(). Rather than checking each page for PG_UNMANAGED, check the vm object's type. Only OBJT_PHYS can have unmanaged pages. Eliminate a pointless counter. The vm object is locked, that lock is never released by the inner loop, and the set of pages contained by the vm object is not changed by the inner loop. Therefore, the counter serves no purpose.	2010-04-29 16:18:45 +00:00
Konstantin Belousov	6fb8c0c117	When doing kstack swapin, read as much pages in one run as possible. Suggested and reviewed by: alc (previous version) Tested by: pho MFC after: 2 weeks	2010-04-29 09:59:16 +00:00
Konstantin Belousov	e86a87e97e	In swap pager, do not free the non-requested pages from the run if they are wired. Kstack pages are wired, this change prepares swap pager for handling of long runs of kstack pages. Noted and reviewed by: alc Tested by: pho MFC after: 2 weeks	2010-04-29 09:57:25 +00:00
Alan Cox	77d6d85393	Setting PG_REFERENCED on a page at the end of vm_fault() is redundant since the page table entry's accessed bit is either preset by the immediately preceding call to pmap_enter() or by hardware (or software) upon return from vm_fault() when the faulting access is restarted.	2010-04-28 06:34:47 +00:00
Alan Cox	6a2a3d7338	Change vm_object_madvise() so that it checks whether the page is invalid or unmanaged before acquiring the page queues lock. Neither of these tests require that lock. Moreover, a better way of testing if the page is unmanaged is to test the type of vm object. This avoids a pointless vm_page_lookup(). MFC after: 3 weeks	2010-04-28 04:57:32 +00:00
Alan Cox	7b85f59183	Resurrect pmap_is_referenced() and use it in mincore(). Essentially, pmap_ts_referenced() is not always appropriate for checking whether or not pages have been referenced because it clears any reference bits that it encounters. For example, in mincore(), clearing the reference bits has two negative consequences. First, it throws off the activity count calculations performed by the page daemon. Specifically, a page on which mincore() has called pmap_ts_referenced() looks less active to the page daemon than it should. Consequently, the page could be deactivated prematurely by the page daemon. Arguably, this problem could be fixed by having mincore() duplicate the activity count calculation on the page. However, there is a second problem for which that is not a solution. In order to clear a reference on a 4KB page, it may be necessary to demote a 2/4MB page mapping. Thus, a mincore() by one process can have the side effect of demoting a superpage mapping within another process!	2010-04-24 17:32:52 +00:00
Alan Cox	5d4a7b7945	Eliminate an unnecessary call to pmap_remove_all(). If a page belongs to an object whose reference count is zero, then that page cannot possibly be mapped.	2010-04-20 04:16:39 +00:00
Alan Cox	7b7d5b6c58	vm_thread_swapout() can safely dirty the page before rather than after acquiring the page queues lock.	2010-04-19 00:18:14 +00:00
Juli Mallett	ca596a25f0	o) Add a VM find-space option, VMFS_TLB_ALIGNED_SPACE, which searches the address space for an address as aligned by the new pmap_align_tlb() function, which is for constraints imposed by the TLB. [1] o) Add a kmem_alloc_nofault_space() function, which acts like kmem_alloc_nofault() but allows the caller to specify which find-space option to use. [1] o) Use kmem_alloc_nofault_space() with VMFS_TLB_ALIGNED_SPACE to allocate the kernel stack address on MIPS. [1] o) Make pmap_align_tlb() on MIPS align addresses so that they do not start on an odd boundary within the TLB, so that they are suitable for insertion as wired entries and do not have to share a TLB entry with another mapping, assuming they are appropriately-sized. o) Eliminate md_realstack now that the kstack will be appropriately-aligned on MIPS. o) Increase the number of guard pages to 2 so that we retain the proper alignment of the kstack address. Reviewed by: [1] alc X-MFC-after: Making sure alc has not come up with a better interface.	2010-04-18 22:32:07 +00:00
Alan Cox	b28889a2fc	Remove a nonsensical test from vm_pageout_clean(). A page can't be in the inactive queue and have a non-zero wire count. Reviewed by: kib MFC after: 3 weeks	2010-04-18 21:29:28 +00:00
Alan Cox	4b9dd5d537	There is no justification for vm_object_split() setting PG_REFERENCED on a page that it is going to sleep on. Eliminate it. MFC after: 3 weeks	2010-04-18 17:50:09 +00:00
Alan Cox	b11b56b55b	In vm_object_madvise() setting PG_REFERENCED on a page before sleeping on that page only makes sense if the advice is MADV_WILLNEED. In that case, the intention is to activate the page, so discouraging the page daemon from reclaiming the page makes sense. In contrast, in the other cases, MADV_DONTNEED and MADV_FREE, it makes no sense whatsoever to discourage the page daemon from reclaiming the page by setting PG_REFERENCED. Wrap a nearby line. Discussed with: kib MFC after: 3 weeks	2010-04-17 21:14:37 +00:00
Alan Cox	aefea7f519	In vm_object_backing_scan(), setting PG_REFERENCED on a page before sleeping on that page is nonsensical. Doing so reduces the likelihood that the page daemon will reclaim the page before the thread waiting in vm_object_backing_scan() is reawakened. However, it does not guarantee that the page is not reclaimed, so vm_object_backing_scan() restarts after reawakening. More importantly, this muddles the meaning of PG_REFERENCED. There is no reason to believe that the caller of vm_object_backing_scan() is going to use (i.e., access) the contents of the page. There is especially no reason to believe that an access is more likely because vm_object_backing_scan() had to sleep on the page. Discussed with: kib MFC after: 3 weeks	2010-04-17 18:35:07 +00:00
Alan Cox	0b6ace4743	Setting PG_REFERENCED on the requested page in swap_pager_getpages() is either redundant or harmful, depending on the caller. For example, when called by vm_fault(), it is redundant. However, when called by vm_thread_swapin(), it is harmful. Specifically, if the thread is later swapped out, having PG_REFERENCED set on its stack pages leads the page daemon to reactivate these stack pages and delay their reclamation. Reviewed by: kib MFC after: 3 weeks	2010-04-17 17:02:17 +00:00
Alan Cox	6c68c971cb	Simplify vm_thread_swapin().	2010-04-13 06:48:37 +00:00
Alan Cox	ac45ee97c9	Initialize the virtual memory-related resource limits in a single place. Previously, one of these limits was initialized in two places to a different value in each place. Moreover, because an unsigned int was used to represent the amount of pageable physical memory, some of these limits were incorrectly initialized on 64-bit architectures. (Currently, this error is masked by login.conf's default settings.) Make vm_thread_swapin() and vm_thread_swapout() static. Submitted by: bde (an earlier version) Reviewed by: kib	2010-04-11 16:26:07 +00:00
Alan Cox	8ef9d880e6	Introduce the function kmem_alloc_attr(), which allocates kernel virtual memory with the specified physical attributes. In particular, like kmem_alloc_contig(), the caller can specify the physical address range from which the physical pages are allocated and the memory attributes (i.e., cache behavior) for these physical pages. However, in contrast to kmem_alloc_contig() or contigmalloc(), the physical pages that are allocated by kmem_alloc_attr() are not necessarily physically contiguous. This function is needed by DRM and VirtualBox. Correct an error in the prototype for kmem_malloc(). The third argument had the wrong type. Tested by: rnoland MFC after: 3 days	2010-04-09 02:39:20 +00:00
Joel Dahl	c0587701ad	Start copyright notice with /*-	2010-04-07 16:29:10 +00:00
Konstantin Belousov	3f1c4c4f31	When OOM searches for a process to kill, ignore the processes already killed by OOM. When killed process waits for a page allocation, try to satisfy the request as fast as possible. This removes the often encountered deadlock, where OOM continously selects the same victim process, that sleeps uninterruptibly waiting for a page. The killed process may still sleep if page cannot be obtained immediately, but testing has shown that system has much higher chance to survive in OOM situation with the patch. In collaboration with: pho Reviewed by: alc MFC after: 4 weeks	2010-04-06 10:43:01 +00:00
Alan Cox	f6d00b38c7	vm_reserv_alloc_page() should never be called on an OBJT_SG object, just as it is never called on an OBJT_DEVICE object. (This change should have been included in r195840.) Reported by: dougb@, avg@ MFC after: 3 days	2010-04-05 06:23:31 +00:00
Alan Cox	92351f162e	Make _vm_map_init() the one place where the vm map's pmap field is initialized. Reviewed by: kib	2010-04-03 19:07:05 +00:00
Alan Cox	0ef12795b5	Re-enable the call to pmap_release() by vmspace_dofree(). The accounting problem that is described in the comment has been addressed. Submitted by: kib Tested by: pho (a few months ago) MFC after: 6 weeks	2010-04-03 16:20:22 +00:00
John Baldwin	5711bf30da	Reject attempts to create a MAP_ANON mapping with a non-zero offset. PR: kern/71258 Submitted by: Alexander Best MFC after: 2 weeks	2010-03-23 21:08:07 +00:00
Kip Macy	1a23373ceb	- enable alignment on amd64 only - only align pcpu caches and the volatile portion of uma_zone	2010-03-22 22:39:32 +00:00
Kip Macy	6b4391d786	turn 205266 in to a no-op until the problem can be properly diagnosed	2010-03-18 20:30:25 +00:00
Kip Macy	5e4bb93cca	Cache line align various structures and move volatile counters to not share a cache line with (mostly) immutable state Reviewed by: jeff@ MFC after: 7 days	2010-03-17 21:18:28 +00:00
Konstantin Belousov	ddb16cfc32	Update comment for vm_page_alloc(9), listing all acceptable flags [1]. Note that the function does not sleep, it can block. Submitted by: Giovanni Trematerra <giovanni.trematerra gmail com> [1] MFC after: 3 days	2010-02-27 17:09:28 +00:00
Konstantin Belousov	d7de6e2cb6	Remove write-only variable. MFC after: 3 days	2010-02-22 16:00:56 +00:00
Alan Cox	fd776d18a0	Align the start of the clean submap to a superpage boundary. Although no superpage mappings are created within the clean submap, aligning the start of the clean submap helps to prevent interference with kmem_alloc()'s use of superpages.	2010-02-21 22:23:13 +00:00
Konstantin Belousov	41c2274481	The MAP_ENTRY_NEEDS_COPY flag belongs to protoeflags, cow variable uses different namespace. Reported by: Jonathan Anderson <jonathan.anderson cl cam ac uk> MFC after: 3 days	2010-01-29 19:25:45 +00:00
Konstantin Belousov	b9f180d1de	When a vnode-backed vm object is referenced, it increments the vnode reference count, and decrements it on dereference. If referenced object is deallocated, object type is reset to OBJT_DEAD. Consequently, all vnode references that are owned by object references are never released. vunref() the vnode in vm object deallocation code for OBJT_VNODE appropriate number of times to prevent leak. Add an assertion to the vm_pageout() to make sure that we never get reference on the vnode but then do not execute code to release it. In collaboration with: pho Reviewed by: alc MFC after: 3 weeks	2010-01-17 21:26:14 +00:00
Robert Noland	cfd7bacef2	Update d_mmap() to accept vm_ooffset_t and vm_memattr_t. This replaces d_mmap() with the d_mmap2() implementation and also changes the type of offset to vm_ooffset_t. Purge d_mmap2(). All driver modules will need to be rebuilt since D_VERSION is also bumped. Reviewed by: jhb@ MFC after: Not in this lifetime...	2009-12-29 21:51:28 +00:00
Antoine Brodin	13e403fdea	(S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument. Fix some wrong usages. Note: this does not affect generated binaries as this argument is not used. PR: 137213 Submitted by: Eygene Ryabinkin (initial version) MFC after: 1 month	2009-12-28 22:56:30 +00:00
Konstantin Belousov	49e3050e6c	VI_OBJDIRTY vnode flag mirrors the state of OBJ_MIGHTBEDIRTY vm object flag. Besides providing the redundand information, need to update both vnode and object flags causes more acquisition of vnode interlock. OBJ_MIGHTBEDIRTY is only checked for vnode-backed vm objects. Remove VI_OBJDIRTY and make sure that OBJ_MIGHTBEDIRTY is set only for vnode-backed vm objects. Suggested and reviewed by: alc Tested by: pho MFC after: 3 weeks	2009-12-21 12:29:38 +00:00
Antoine Brodin	4e2d83fc4a	Remove trailing ";" in UMA_HASH_INSERT and UMA_HASH_REMOVE macros. MFC after: 1 month	2009-12-05 17:45:56 +00:00
Alan Cox	79f6ebe233	Properly synchronize the previous change.	2009-11-28 00:50:09 +00:00
Alan Cox	d8778512cf	Support the new VM_PROT_COPY option on wired pages. The effect of which is that a debugger can now set a breakpoint in a program that uses mlock(2) on its text segment or mlockall(2) on its entire address space.	2009-11-27 22:08:29 +00:00
Alan Cox	e2997fea72	Simplify the invocation of vm_fault(). Specifically, eliminate the flag VM_FAULT_DIRTY. The information provided by this flag can be trivially inferred by vm_fault(). Discussed with: kib	2009-11-27 20:24:11 +00:00

... 3 4 5 6 7 ...

3041 Commits