freebsd-skq

Author	SHA1	Message	Date
kib	0125db5772	Assert that vnode is exclusively locked when its vm object is resized. Reviewed by: tegge	2009-02-08 19:44:50 +00:00
kib	7a52ad11f4	Do not leak the MAP_ENTRY_IN_TRANSITION flag when copying map entry on fork. Otherwise, copied entry cannot be removed in the child map. Reviewed by: tegge MFC after: 2 weeks	2009-02-08 19:41:08 +00:00
kib	b798264c6d	Style.	2009-02-08 19:37:01 +00:00
jeff	69d1bd8670	- Make the keg abstraction more complete. Permit a zone to have multiple backend kegs so it may source compatible memory from multiple backends. This is useful for cases such as NUMA or different layouts for the same memory type. - Provide a new api for adding new backend kegs to secondary zones. - Provide a new flag for adjusting the layout of zones to stagger allocations better across cache lines. Sponsored by: Nokia	2009-01-25 09:11:24 +00:00
jhb	a9601871c9	- Mark all standalone INT/LONG/QUAD sysctl's MPSAFE. This is done inside the SYSCTL() macros and thus does not need to be done for all of the nodes scattered across the source tree. - Mark the name-cache related sysctl's (including debug.hashstat.) MPSAFE. - Mark vm.loadavg MPSAFE. - Remove GIANT_REQUIRED from vmtotal() (everything in this routine already has sufficient locking) and mark vm.vmtotal MPSAFE. - Mark the vm.stats.(sys\|vm). sysctls MPSAFE.	2009-01-23 22:49:23 +00:00
jhb	dc43531891	Now that vfs_markatime() no longer requires an exclusive lock due to the VOP_MARKATIME() changes, use a shared vnode lock for mmap(). Submitted by: ups	2009-01-21 14:43:35 +00:00
kib	ac1b596fda	Extend the struct vm_page wire_count to u_int to avoid the overflow of the counter, that may happen when too many sendfile(2) calls are being executed with this vnode [1]. To keep the size of the struct vm_page and offsets of the fields accessed by out-of-tree modules, swap the types and locations of the wire_count and cow fields. Add safety checks to detect cow overflow and force fallback to the normal copy code for zero-copy sockets. [2] Reported by: Anton Yuzhaninov <citrin citrin ru> [1] Suggested by: alc [2] Reviewed by: alc MFC after: 2 weeks	2009-01-03 13:24:08 +00:00
alc	a2ba509efc	Resurrect shared map locks allowing greater concurrency during some map operations, such as page faults. An earlier version of this change was ... Reviewed by: kib Tested by: pho MFC after: 6 weeks	2009-01-01 00:31:46 +00:00
alc	81e94e0a4c	Update or eliminate some stale comments.	2008-12-31 05:44:05 +00:00
alc	fa3b7c7db3	Avoid an unnecessary memory dereference in vm_map_entry_splay().	2008-12-30 21:52:18 +00:00
alc	c1be5ff444	Style change to vm_map_lookup(): Eliminate a macro of dubious value.	2008-12-30 20:51:07 +00:00
alc	96037be899	Move the implementation of the vm map's fast path on address lookup from vm_map_lookup{,_locked}() to vm_map_lookup_entry(). Having the fast path in vm_map_lookup{,_locked}() limits its benefits to page faults. Moving it to vm_map_lookup_entry() extends its benefits to other operations on the vm map.	2008-12-30 19:48:03 +00:00
rnoland	63e9bb0efa	Fix printing of KASSERT message missed in r163604. Approved by: kib	2008-12-21 16:56:13 +00:00
kib	4edc1f9451	Instead of forcing vn_start_write() to reset mp back to NULL for the failed calls with non-NULL vp, explicitely clear mp after failure. Tested by: stass Reviewed by: tegge PR: 123768 MFC after: 1 week	2008-11-16 21:57:54 +00:00
raj	032a270ba5	Support kernel crash mini dumps on ARM architecture. Obtained from: Juniper Networks, Semihalf	2008-11-06 16:20:27 +00:00
keramida	4b957256d0	Various comment nits, and typos.	2008-11-02 00:41:26 +00:00
rwatson	3667673537	Update mmap() comment: no more block devices, so no more block device cache coherency questions. MFC after: 3 days	2008-10-22 16:50:12 +00:00
attilio	b8bf37e585	Remove the struct thread unuseful argument from bufobj interface. In particular following functions KPI results modified: - bufobj_invalbuf() - bufsync() and BO_SYNC() "virtual method" of the buffer objects set. Main consumers of bufobj functions are affected by this change too and, in particular, functions which changed their KPI are: - vinvalbuf() - g_vfs_close() Due to the KPI breakage, __FreeBSD_version will be bumped in a later commit. As a side note, please consider just temporary the 'curthread' argument passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP Reviewed by: kib Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2008-10-10 21:23:50 +00:00
kib	de9e891748	Move the code for doing out-of-memory grass from vm_pageout_scan() into the separate function vm_pageout_oom(). Supply a parameter for vm_pageout_oom() describing a reason for the call. Call vm_pageout_oom() from the swp_pager_meta_build() when swap zone is exhausted. Reviewed by: alc Tested by: pho, jhb MFC after: 2 weeks	2008-09-29 19:45:12 +00:00
emaste	a173e0f84d	Move CTASSERT from header file to source file, per implementation note now in the CTASSERT man page.	2008-09-26 18:44:40 +00:00
kib	496b70bc6a	Save previous content of the td_fpop before storing the current filedescriptor into it. Make sure that td_fpop is NULL when calling d_mmap from dev_pager_getpages(). Change guards against td_fpop field being non-NULL with private state for another device, and against sudden clearing the td_fpop. This could occur when either a driver method calls another driver through the filedescriptor operation, or a page fault happen while driver is writing to a memory backed by another driver. Noted by: rwatson Tested by: rnoland MFC after: 3 days	2008-09-26 14:50:49 +00:00
alc	409b7f1ce0	Prevent an integer overflow in vm_pageout_page_stats() on machines with a large number of physical pages. PR: 126158 Submitted by: Dmitry Tejblum MFC after: 3 days	2008-09-21 18:01:34 +00:00
kib	2a1cba02c6	Allow the d_mmap driver methods to use cdevpriv KPI during verification phase of establishing mapping. Discussed with: rwatson, jhb, rnoland Tested by: rnoland MFC after: 3 days	2008-09-20 19:56:02 +00:00
attilio	dbf35e279f	Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread was always curthread and totally unuseful. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2008-08-28 15:23:18 +00:00
antoine	d4ceeb8daa	Remove unused variable nosleepwithlocks. PR: 126609 Submitted by: Mateusz Guzik MFC after: 1 month X-MFC: to stable/7 only, this variable is still used in stable/6	2008-08-23 12:40:07 +00:00
nwhitehorn	077618820d	Allow the MD UMA allocator to use VM routines like kmem_*(). Existing code requires MD allocator to be available early in the boot process, before the VM is fully available. This defines a new VM define (UMA_MD_SMALL_ALLOC_NEEDS_VM) that allows an MD UMA small allocator to become available at the same time as the default UMA allocator. Approved by: marcel (mentor)	2008-08-23 01:35:36 +00:00
julian	0592958505	A bunch of formatting fixes brough to light by, or created by the Vimage commit a few days ago.	2008-08-20 01:05:56 +00:00
kmacy	de0a83a794	Work around differences in page allocation for initial page tables on xen MFC after: 1 month	2008-08-17 23:40:29 +00:00
emaste	b30356b634	Fix REDZONE(9) on amd64 and perhaps other 64 bit targets -- ensure the space that redzone adds to the allocation for storing its metadata is at least as large as the metadata that it will store there. Submitted by: Nima Misaghian	2008-08-13 17:32:48 +00:00
jhb	8af56fb687	If a thread that is swapped out is made runnable, then the setrunnable() routine wakes up proc0 so that proc0 can swap the thread back in. Historically, this has been done by waking up proc0 directly from setrunnable() itself via a wakeup(). When waking up a sleeping thread that was swapped out (the usual case when waking proc0 since only sleeping threads are eligible to be swapped out), this resulted in a bit of recursion (e.g. wakeup() -> setrunnable() -> wakeup()). With sleep queues having separate locks in 6.x and later, this caused a spin lock LOR (sleepq lock -> sched_lock/thread lock -> sleepq lock). An attempt was made to fix this in 7.0 by making the proc0 wakeup use the ithread mechanism for doing the wakeup. However, this required grabbing proc0's thread lock to perform the wakeup. If proc0 was asleep elsewhere in the kernel (e.g. waiting for disk I/O), then this degenerated into the same LOR since the thread lock would be some other sleepq lock. Fix this by deferring the wakeup of the swapper until after the sleepq lock held by the upper layer has been locked. The setrunnable() routine now returns a boolean value to indicate whether or not proc0 needs to be woken up. The end result is that consumers of the sleepq API such as *sleep/wakeup, condition variables, sx locks, and lockmgr, have to wakeup proc0 if they get a non-zero return value from sleepq_abort(), sleepq_broadcast(), or sleepq_signal(). Discussed with: jeff Glanced at by: sam Tested by: Jurgen Weber jurgen - ish com au MFC after: 2 weeks	2008-08-05 20:02:31 +00:00
trhodes	f37865f7f0	Fill in a few sysctl descriptions. Reviewed by: alc, Matt Dillon <dillon@apollo.backplane.com> Approved by: alc	2008-08-03 14:26:15 +00:00
jhb	dccc76958e	One more whitespace nit.	2008-07-30 21:23:32 +00:00
jhb	746c7fb6aa	A few more whitespace fixes.	2008-07-30 21:18:08 +00:00
jhb	785ebd6040	If the kernel has run out of metadata for swap, then explicitly panic() instead of emitting a warning before deadlocking. MFC after: 1 month	2008-07-30 21:12:15 +00:00
kib	85be7d9093	The behaviour of the lockmgr going back at least to the 4.4BSD-Lite2 was to downgrade the exclusive lock to shared one when exclusive lock owner requested shared lock. New lockmgr panics instead. The vnode_pager_lock function requests shared lock on the vnode backing the OBJT_VNODE, and can be called when the current thread already holds an exlcusive lock on the vnode. For instance, it happens when handling page fault from the VOP_WRITE() uiomove that writes to the file, with the faulted in page fetched from the vm object backed by the same file. We then get the situation described above. Verify whether the vnode is already exclusively locked by the curthread and request recursed exclusive vnode lock instead of shared, if true. Reported by: gallatin Discussed with: attilio	2008-07-30 18:16:06 +00:00
alc	dba03ac26a	Eliminate stale comments from kmem_malloc().	2008-07-18 17:41:31 +00:00
kib	b341cd2da2	Use the VM_ALLOC_INTERRUPT for the page requests when allocating memory for the bio for swapout write. It allows the page allocator to drain free page list deeper. As result, a deadlock where pageout deamon sleeps waiting for bio to be allocated for swapout is no more reproducable in practice. Alan said that M_USE_RESERVE shall be ressurrected and used there, but until this is implemented, M_NOWAIT does exactly what is needed. Tested by: pho, kris Reviewed by: alc No objections from: phk MFC after: 2 weeks (RELENG_7 only)	2008-07-11 11:27:42 +00:00
alc	c016906f4e	Enable the creation of a kmem map larger than 4GB. Submitted by: Tz-Huan Huang Make several variables related to kmem map auto-sizing static. Found by: CScout	2008-07-05 19:34:33 +00:00
alc	9fe1756df8	Make preparations for increasing the size of the kernel virtual address space on the amd64 architecture. The amd64 architecture requires kernel code and global variables to reside in the highest 2GB of the 64-bit virtual address space. Thus, the memory allocated during bootstrap, before the call to kmem_init(), starts at KERNBASE, which is not necessarily the same as VM_MIN_KERNEL_ADDRESS on amd64.	2008-06-22 04:54:27 +00:00
alc	bec9e612d2	KERNBASE is not necessarily an address within the kernel map, e.g., PowerPC/AIM. Consequently, it should not be used to determine the maximum number of kernel map entries. Intead, use VM_MIN_KERNEL_ADDRESS, which marks the start of the kernel map on all architectures. Tested by: marcel@ (PowerPC/AIM)	2008-06-21 21:02:13 +00:00
ups	16b9649ce4	Fix vm object creation locking to allow SHARED vnode locking for vnode_create_vobject. (Not currently used) Noticed by: kib@	2008-06-12 20:46:47 +00:00
alc	25f7299f0f	Essentially, neither madvise(..., MADV_DONTNEED) nor madvise(..., MADV_FREE) work. (Moreover, I don't believe that they have ever worked as intended.) The explanation is fairly simple. Both MADV_DONTNEED and MADV_FREE perform vm_page_dontneed() on each page within the range given to madvise(). This function moves the page to the inactive queue. Specifically, if the page is clean, it is moved to the head of the inactive queue where it is first in line for processing by the page daemon. On the other hand, if it is dirty, it is placed at the tail. Let's further examine the case in which the page is clean. Recall that the page is at the head of the line for processing by the page daemon. The expectation of vm_page_dontneed()'s author was that the page would be transferred from the inactive queue to the cache queue by the page daemon. (Once the page is in the cache queue, it is, in effect, free, that is, it can be reallocated to a new vm object by vm_page_alloc() if it isn't reactivated quickly enough by a user of the old vm object.) The trouble is that nowhere in the execution of either MADV_DONTNEED or MADV_FREE is either the machine-independent reference flag (PG_REFERENCED) or the reference bit in any page table entry (PTE) mapping the page cleared. Consequently, the immediate reaction of the page daemon is to reactivate the page because it is referenced. In effect, the madvise() was for naught. The case in which the page was dirty is not too different. Instead of being laundered, the page is reactivated. Note: The essential difference between MADV_DONTNEED and MADV_FREE is that MADV_FREE clears a page's dirty field. So, MADV_FREE is always executing the clean case above. This revision changes vm_page_dontneed() to clear both the machine- independent reference flag (PG_REFERENCED) and the reference bit in all PTEs mapping the page. MFC after: 6 weeks	2008-06-06 18:38:43 +00:00
alc	933221c70b	To date, our implementation of munmap(2) has required that the entirety of the specified range be mapped. Specifically, it has returned EINVAL if the entire range is not mapped. There is not, however, any basis for this in either SuSv2 or our own man page. Moreover, neither Linux nor Solaris impose this requirement. This revision removes this requirement. Submitted by: Tijl Coosemans PR: 118510 MFC after: 6 weeks	2008-05-24 21:57:16 +00:00
ups	fbd329664f	Allow VM object creation in ufs_lookup. (If vfs.vmiodirenable is set) Directory IO without a VM object will store data in 'malloced' buffers severely limiting caching of the data. Without this change VM objects for directories are only created on an open() of the directory. TODO: Inline test if VM object already exists to avoid locking/function call overhead. Tested by: kris@ Reviewed by: jeff@ Reported by: David Filo	2008-05-20 19:05:43 +00:00
alc	a8f81206ad	Retire pmap_addr_hint(). It is no longer used.	2008-05-18 04:16:57 +00:00
alc	0a1d595c3a	In order to map device memory using superpages, mmap(2) must find a superpage-aligned virtual address for the mapping. Revision 1.65 implemented an overly simplistic and generally ineffectual method for finding a superpage-aligned virtual address. Specifically, it rounds the virtual address corresponding to the end of the data segment up to the next superpage-aligned virtual address. If this virtual address is unallocated, then the device will be mapped using superpages. Unfortunately, in modern times, where applications like the X server dynamically load much of their code, this virtual address is already allocated. In such cases, mmap(2) simply uses the first available virtual address, which is not necessarily superpage aligned. This revision changes mmap(2) to use a more robust method, specifically, the VMFS_ALIGNED_SPACE option that is now implemented by vm_map_find().	2008-05-17 19:32:48 +00:00
alc	1e4c5769eb	Preset a device object's alignment ("pg_color") based upon the physical address of the device's memory. This enables pmap_align_superpage() to propose a virtual address for mapping the device memory that permits the use of superpage mappings.	2008-05-17 16:26:34 +00:00
alc	47efda9e75	Don't call vm_reserv_alloc_page() on device-backed objects. Otherwise, the system may panic because there is no reservation structure corresponding to the physical address of the device memory. Reported by: Giorgos Keramidas	2008-05-15 18:52:31 +00:00
alc	0f95f2184f	Provide the new argument to kmem_suballoc().	2008-05-10 23:39:27 +00:00
alc	c251140c26	Introduce a new parameter "superpage_align" to kmem_suballoc() that is used to request superpage alignment for the submap. Request superpage alignment for the kmem_map. Pass VMFS_ANY_SPACE instead of TRUE to vm_map_find(). (They are currently equivalent but VMFS_ANY_SPACE is the new preferred spelling.) Remove a stale comment from kmem_malloc().	2008-05-10 21:46:20 +00:00
alc	5a23437099	Generalize vm_map_find(9)'s parameter "find_space". Specifically, add support for VMFS_ALIGNED_SPACE, which requests the allocation of an address range best suited to superpages. The old options TRUE and FALSE are mapped to VMFS_ANY_SPACE and VMFS_NO_SPACE, so that there is no immediate need to update all of vm_map_find(9)'s callers. While I'm here, correct a misstatement about vm_map_find(9)'s return values in the man page.	2008-05-10 18:55:35 +00:00
alc	9e8bccea75	Introduce pmap_align_superpage(). It increases the starting virtual address of the given mapping if a different alignment might result in more superpage mappings.	2008-05-09 16:48:07 +00:00
kmacy	afbf6fcd73	add malloc flag to blist so that it can be used in ithread context Reviewed by: alc, bsdimp	2008-05-05 19:48:54 +00:00
alc	b48c49ab25	Eliminate pointless casts from kmem_suballoc().	2008-04-28 17:25:27 +00:00
alc	e919c282cd	vm_map_fixed(), unlike vm_map_find(), does not update "addr", so it can be passed by value.	2008-04-28 05:30:23 +00:00
jeff	9d30d1d7a4	- Make SCHED_STATS more generic by adding a wrapper to create the variables and sysctl nodes. - In reset walk the children of kern_sched_stats and reset the counters via the oid_arg1 pointer. This allows us to add arbitrary counters to the tree and still reset them properly. - Define a set of switch types to be passed with flags to mi_switch(). These types are named SWT_*. These types correspond to SCHED_STATS counters and are automatically handled in this way. - Make the new SWT_ types more specific than the older switch stats. There are now stats for idle switches, remote idle wakeups, remote preemption ithreads idling, etc. - Add switch statistics for ULE's pickcpu algorithm. These stats include how much migration there is, how often affinity was successful, how often threads were migrated to the local cpu on wakeup, etc. Sponsored by: Nokia	2008-04-17 04:20:10 +00:00
alc	2f4904816f	Introduce vm_reserv_reclaim_contig(). This function is used by contigmalloc(9) as a last resort to steal pages from an inactive, partially-used superpage reservation. Rename vm_reserv_reclaim() to vm_reserv_reclaim_inactive() and refactor it so that a separate subroutine is responsible for breaking the selected reservation. This subroutine is also used by vm_reserv_reclaim_contig().	2008-04-06 18:09:28 +00:00
alc	871b77b7f6	Eliminate an unnecessary test from vm_phys_unfree_page().	2008-04-05 05:02:53 +00:00
alc	927cf7f228	Update a comment to vm_map_pmap_enter().	2008-04-04 19:14:58 +00:00
alc	067dba5f97	Reintroduce UMA_SLAB_KMAP; however, change its spelling to UMA_SLAB_KERNEL for consistency with its sibling UMA_SLAB_KMEM. (UMA_SLAB_KMAP met its original demise in revision 1.30 of vm/uma_core.c.) UMA_SLAB_KERNEL is now required by the jumbo frame allocators. Without it, UMA cannot correctly return pages from the jumbo frame zones to the VM system because it resets the pages' object field to NULL instead of the kernel object. In more detail, the jumbo frame zones are created with the option UMA_ZONE_REFCNT. This causes UMA to overwrite the pages' object field with the address of the slab. However, when UMA wants to release these pages, it doesn't know how to restore the object field, so it sets it to NULL. This change teaches UMA how to reset the object field to the kernel object. Crashes reported by: kris Fix tested by: kris Fix discussed with: jeff MFC after: 6 weeks	2008-04-04 18:41:12 +00:00
alc	a4bcbb3771	Eliminate an unnecessary printf() from kmem_suballoc(). The subsequent panic() can be extended to convey the same information.	2008-03-30 20:08:59 +00:00
jeff	43963c5cfa	- Use vm_object_reference_locked() directly from vm_object_reference(). This is intended to get rid of vget() consumers who don't wish to acquire a lock. This is functionally the same as calling vref(). vm_object_reference_locked() already uses vref. Discussed with: alc	2008-03-29 07:06:13 +00:00
kib	de73f6b678	Do not dereference cdev->si_cdevsw, use the dev_refthread() to properly obtain the reference. In particular, this fixes the panic reported in the PR. Remove the comments stating that this needs to be done. PR: kern/119422 MFC after: 1 week	2008-03-20 16:08:42 +00:00
alc	caedbf233d	Rename vm_pageq_requeue() to vm_page_requeue() on account of its recent migration to vm/vm_page.c.	2008-03-19 20:24:35 +00:00
jeff	46f09d5bc3	- Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from requiring the per-process spinlock to only requiring the process lock. - Reflect these changes in the proc.h documentation and consumers throughout the kernel. This is a substantial reduction in locking cost for these fields and was made possible by recent changes to threading support.	2008-03-19 06:19:01 +00:00
alc	4e9b2a2931	Almost seven years ago, vm/vm_page.c was split into three parts: vm/vm_contig.c, vm/vm_page.c, and vm/vm_pageq.c. Today, vm/vm_pageq.c has withered to the point that it contains only four short functions, two of which are only used by vm/vm_page.c. Since I can't foresee any reason for vm/vm_pageq.c to grow, it is time to fold the remaining contents of vm/vm_pageq.c back into vm/vm_page.c. Add some comments. Rename one of the functions, vm_pageq_enqueue(), that is now static within vm/vm_page.c to vm_page_enqueue(). Eliminate PQ_MAXCOUNT as it no longer serves any purpose.	2008-03-18 06:52:15 +00:00
alc	0de51cf047	Simplify the inner loop of vm_fault()'s delete-behind heuristic. Instead of checking each page for PG_UNMANAGED, perform a one-time check whether the object is OBJT_PHYS. (PG_UNMANAGED pages only belong to OBJT_PHYS objects.)	2008-03-16 17:37:19 +00:00
rwatson	877d7c65ba	In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink	2008-03-16 10:58:09 +00:00
jeff	acb93d599c	Remove kernel support for M:N threading. While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.	2008-03-12 10:12:01 +00:00
jeff	3b1acbdce2	- Pass the priority argument from sleep() into sleepq and down into sched_sleep(). This removes extra thread_lock() acquisition and allows the scheduler to decide what to do with the static boost. - Change the priority arguments to cv_ to match sleepq/msleep/etc. where 0 means no priority change. Catch -1 in cv_broadcastpri() and convert it to 0 for now. - Set a flag when sleeping in a way that is compatible with swapping since direct priority comparisons are meaningless now. - Add a sysctl to ule, kern.sched.static_boost, that defaults to on which controls the boost behavior. Turning it off gives better performance in some workloads but needs more investigation. - While we're modifying sleepq, change signal and broadcast to both return with the lock held as the lock was held on enter. Reviewed by: jhb, peter	2008-03-12 06:31:06 +00:00
alc	160b9af7de	Eliminate an unnecessary test from vm_fault's delete-behind heuristic. Specifically, since the delete-behind heuristic is never applied to a device-backed object, there is no point in checking whether each of the object's pages is fictitious. (Only device-backed objects have fictitious pages.)	2008-03-09 06:08:58 +00:00
marcel	7834123faf	Make the vm_pmap field of struct vmspace the last field in the structure. This allows per-CPU variations of struct pmap on a single architecture without affecting the machine-independent fields. As such, the PMAP variations don't affect the ABI. They become part of it.	2008-03-01 22:54:42 +00:00
alc	0f6d386ab0	Correct a long-standing error in vm_object_page_remove(). Specifically, pmap_remove_all() must not be called on fictitious pages. To date, fictitious pages have been allocated from zeroed memory, effectively hiding this problem because the fictitious pages appear to have an empty pv list. Submitted by: Kostik Belousov Rewrite the comments describing vm_object_page_remove() to better describe what it does. Add an assertion. Reviewed by: Kostik Belousov MFC after: 1 week	2008-02-26 17:16:48 +00:00
alc	c69581f28f	Correct a long-standing error in vm_object_deallocate(). Specifically, only anonymous default (OBJT_DEFAULT) and swap (OBJT_SWAP) objects should ever have OBJ_ONEMAPPING set. However, vm_object_deallocate() was setting it on device (OBJT_DEVICE) objects. As a result, vm_object_page_remove() could be called on a device object and if that occurred pmap_remove_all() would be called on the device object's pages. However, a device object's pages are fictitious, and fictitious pages do not have an initialized pv list (struct md_page). To date, fictitious pages have been allocated from zeroed memory, effectively hiding this problem. Now, however, the conversion of rotting diagnostics to invariants in the amd64 and i386 pmaps has revealed the problem. Specifically, assertion failures have occurred during the initialization phase of the X server on some hardware. MFC after: 1 week Discussed with: Kostik Belousov Reported by: Michiel Boland	2008-02-24 18:03:56 +00:00
attilio	71b7824213	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>	2008-01-13 14:44:15 +00:00
pjd	340995c5fd	When one tries to allocate memory with the M_WAITOK flag and we are short in address space in kmem map call vm_lowmem event in a loop and wait a bit for subsystems to reclaim some memory which in turn will reclaim address space as well. Note, this is a work-around. Reviewed by: alc Approved by: alc MFC after: 3 days	2008-01-10 08:36:38 +00:00
attilio	18d0a0dd51	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>	2008-01-10 01:10:58 +00:00
jhb	8cd9437636	Add a new file descriptor type for IPC shared memory objects and use it to implement shm_open(2) and shm_unlink(2) in the kernel: - Each shared memory file descriptor is associated with a swap-backed vm object which provides the backing store. Each descriptor starts off with a size of zero, but the size can be altered via ftruncate(2). The shared memory file descriptors also support fstat(2). read(2), write(2), ioctl(2), select(2), poll(2), and kevent(2) are not supported on shared memory file descriptors. - shm_open(2) and shm_unlink(2) are now implemented as system calls that manage shared memory file descriptors. The virtual namespace that maps pathnames to shared memory file descriptors is implemented as a hash table where the hash key is generated via the 32-bit Fowler/Noll/Vo hash of the pathname. - As an extension, the constant 'SHM_ANON' may be specified in place of the path argument to shm_open(2). In this case, an unnamed shared memory file descriptor will be created similar to the IPC_PRIVATE key for shmget(2). Note that the shared memory object can still be shared among processes by sharing the file descriptor via fork(2) or sendmsg(2), but it is unnamed. This effectively serves to implement the getmemfd() idea bandied about the lists several times over the years. - The backing store for shared memory file descriptors are garbage collected when they are not referenced by any open file descriptors or the shm_open(2) virtual namespace. Submitted by: dillon, peter (previous versions) Submitted by: rwatson (I based this on his version) Reviewed by: alc (suggested converting getmemfd() to shm_open())	2008-01-08 21:58:16 +00:00
csjp	856850f578	When MAC is enabled in the kernel, fix a panic triggered by a locking assertion hit in swapoff_one() when we un-mount a swap partition. We should be using curthread where we used thread0 before. This change also replaces the thread argument with a credential argument, as the MAC framework only requires the cred. It should be noted that this allows the machine to be rebooted without panicing with "cannot differ from curthread or NULL" when MAC is enabled. Submitted by: rwatson Reviewed by: attilio MFC after: 2 weeks	2008-01-08 14:58:41 +00:00
kib	8e700284f5	In the vm_map_stack(), check for the specified stack region wraparound. Reported and tested by: Peter Holm Reviewed by: alc MFC after: 3 days	2008-01-04 04:33:13 +00:00
alc	545d26e30b	Add an access type parameter to pmap_enter(). It will be used to implement superpage promotion. Correct a style error in kmem_malloc(): pmap_enter()'s last parameter is a Boolean.	2008-01-03 07:34:34 +00:00
alc	3c35e9380c	Defer setting either PG_CACHED or PG_FREE until after the free page queues lock is acquired. Otherwise, the state of a reservation's pages' flags and its population count can be inconsistent. That could result in a page being freed twice. Reported by: kris	2008-01-02 04:43:47 +00:00
alc	5a0aa042a2	Correct a style error that was introduced in revision 1.77.	2008-01-01 20:36:04 +00:00
alc	4565fa1697	Add the superpage reservation system. This is "part 2 of 2" of the machine-independent support for superpages. (The earlier part was the rewrite of the physical memory allocator.) The remainder of the code required for superpages support is machine-dependent and will be added to the various pmap implementations at a later date. Initially, I am only supporting one large page size per architecture. Moreover, I am only enabling the reservation system on amd64. (In an emergency, it can be disabled by setting VM_NRESERVLEVELS to 0 in amd64/include/vmparam.h or your kernel configuration file.)	2007-12-29 19:53:04 +00:00
alc	33b51104e1	Add a list of reservations to the vm object structure. Recycle the vm object's "pg_color" field to represent the color of the first virtual page address at which the object is mapped instead of the color of the object's first physical page. Since an object may not be mapped, introduce a flag "OBJ_COLORED" that indicates whether "pg_color" is valid.	2007-12-27 17:56:35 +00:00
alc	44c0fb33db	Add the superpage reservation type.	2007-12-27 17:08:11 +00:00
alc	c6d4b1ee92	Update the comment describing vm_phys_unfree_page().	2007-12-21 02:44:31 +00:00
alc	4518d14d23	Modify vm_phys_unfree_page() so that it no longer requires the given page to be in the free lists. Instead, it now returns TRUE if it removed the page from the free lists and FALSE if the page was not in the free lists. This change is required to support superpage reservations. Specifically, once reservations are introduced, a cached page can either be in the free lists or a reservation.	2007-12-20 22:45:54 +00:00
alc	af99f17cda	Correct one half of a loop continuation condition in vm_phys_unfree_page(). At present, this error is inconsequential; the other half of the loop continuation condition is sufficient to achieve correct execution.	2007-12-19 23:09:45 +00:00
alc	3c2abd13fb	Eliminate redundant code from vm_page_startup().	2007-12-19 05:47:50 +00:00
alc	5929e7ecb6	Simplify vm_page_free_toq().	2007-12-11 21:20:34 +00:00
alc	cf47268b02	Correct a comment.	2007-12-02 07:43:42 +00:00
rwatson	090235e567	Modify stack(9) stack_print() and stack_sbuf_print() routines to use new linker interfaces for looking up function names and offsets from instruction pointers. Create two variants of each call: one that is "DDB-safe" and avoids locking in the linker, and one that is safe for use in live kernels, by virtue of observing locking, and in particular safe when kernel modules are being loaded and unloaded simultaneous to their use. This will allow them to be used outside of debugging contexts. Modify two of three current stack(9) consumers to use the DDB-safe interfaces, as they run in low-level debugging contexts, such as inside lockmgr(9) and the kernel memory allocator. Update man page.	2007-12-01 22:04:16 +00:00
alc	625e38eddc	Make contigmalloc(9)'s page laundering more robust. Specifically, use vm_pageout_fallback_object_lock() in vm_contig_launder_page() to better handle a lock-ordering problem. Consequently, trylock's failure on the page's containing object no longer implies that the page cannot be laundered. MFC after: 6 weeks	2007-11-25 20:37:29 +00:00
alc	9ba5385124	Tidy up: Add comments. Eliminate the pointless malloc_type_allocated(..., 0) calls that occur when contigmalloc() has failed. Eliminate the acquisition and release of the page queues lock from vm_page_release_contig(). Rename contigmalloc2() to contigmapping(), reflecting what it does.	2007-11-25 07:42:34 +00:00
alc	35af042efc	Add a read/write sysctl for reconfiguring the maximum number of physical pages that can be wired. Submitted by: Eugene Grosbein PR: 114654 MFC after: 6 weeks	2007-11-23 00:30:19 +00:00
alc	dbffaeda47	Remove an unnecessary call to pmap_remove_all() and the associated "XXX" comments from vnode_pager_setsize(). This call was introduced in revision 1.140 to address a problem that no longer exists. Specifically, pmap_zero_page_area() has replaced a (possibly) problematic implementation of page zeroing that was based on vm_pager_map(), bzero(), and vm_pager_unmap().	2007-11-22 20:01:38 +00:00
alc	018efe29f9	When reactivating a cached page, reset the page's pool to the default pool. (Not doing this before was a performance pessimization but not a cause for panic.)	2007-11-21 23:22:10 +00:00
alc	d1ab859bdc	Prevent the leakage of wired pages in the following circumstances: First, a file is mmap(2)ed and then mlock(2)ed. Later, it is truncated. Under "normal" circumstances, i.e., when the file is not mlock(2)ed, the pages beyond the EOF are unmapped and freed. However, when the file is mlock(2)ed, the pages beyond the EOF are unmapped but not freed because they have a non-zero wire count. This can be a mistake. Specifically, it is a mistake if the sole reason why the pages are wired is because of wired, managed mappings. Previously, unmapping the pages destroys these wired, managed mappings, but does not reduce the pages' wire count. Consequently, when the file is unmapped, the pages are not unwired because the wired mapping has been destroyed. Moreover, when the vm object is finally destroyed, the pages are leaked because they are still wired. The fix is to reduce the pages' wired count by the number of wired, managed mappings destroyed. To do this, I introduce a new pmap function pmap_page_wired_mappings() that returns the number of managed mappings to the given physical page that are wired, and I use this function in vm_object_page_remove(). Reviewed by: tegge MFC after: 6 weeks	2007-11-17 22:52:29 +00:00
pjd	f372a74b85	Change unused 'user_wait' argument to 'timo' argument, which will be used to specify timeout for msleep(9). Discussed with: alc Reviewed by: alc	2007-11-07 21:56:58 +00:00
kib	9ae733819b	Fix for the panic("vm_thread_new: kstack allocation failed") and silent NULL pointer dereference in the i386 and sparc64 pmap_pinit() when the kmem_alloc_nofault() failed to allocate address space. Both functions now return error instead of panicing or dereferencing NULL. As consequence, vmspace_exec() and vmspace_unshare() returns the errno int. struct vmspace arg was added to vm_forkproc() to avoid dealing with failed allocation when most of the fork1() job is already done. The kernel stack for the thread is now set up in the thread_alloc(), that itself may return NULL. Also, allocation of the first process thread is performed in the fork1() to properly deal with stack allocation failure. proc_linkup() is separated into proc_linkup() called from fork1(), and proc_linkup0(), that is used to set up the kernel process (was known as swapper). In collaboration with: Peter Holm Reviewed by: jhb	2007-11-05 11:36:16 +00:00
kib	8cd6397d8a	The intent of the freeing the (zeroed) page in vm_page_cache() for default object rather than cache it was to have vm_pager_has_page(object, pindex, ...) == FALSE to imply that there is no cached page in object at pindex. This allows to avoid explicit checks for cached pages in vm_object_backing_scan(). For now, we need the same bandaid for the swap object, otherwise both the vm_page_lookup() and the pager can report that there is no page at offset, while page is stored in the cache. Also, this fixes another instance of the KASSERT("object type is incompatible") failure in the vm_page_cache_transfer(). Reported and tested by: Peter Holm Reviewed by: alc MFC after: 3 days	2007-11-05 10:25:12 +00:00
maxim	7382e0b507	o Fix panic message: it's swap_pager_putpages() not swap_pager_getpages(). Submitted by: Mark Tinguely	2007-11-02 20:48:10 +00:00
remko	5d7d5a6a8a	Correct a copy and paste'o in phys_pager.c, we are talking about phys here and not about devices. PR: 93755 Approved by: imp (mentor, implicit when re-assigning the ticket to me).	2007-10-30 14:48:13 +00:00
alc	acb713befa	Change vm_page_cache_transfer() such that it does not transfer pages that would have an offset beyond the end of the target object. Such pages should remain in the source object. MFC after: 3 days Diagnosed and reviewed by: Kostik Belousov Reported and tested by: Peter Holm	2007-10-27 00:09:30 +00:00
rwatson	60570a92bf	Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer	2007-10-24 19:04:04 +00:00
alc	7fd960900d	Correct an error of omission in the reimplementation of the page cache: vnode_pager_setsize() must handle the case where a file is truncated to a non-page-size-aligned boundary and there is a cached page underlying the new end of file. Reported by: kris, tegge Tested by: kris MFC after: 3 days	2007-10-22 06:23:46 +00:00
alc	bb82ce71e3	Correct an error in vm_map_sync(), nee vm_map_clean(), that has existed since revision 1.1. Specifically, neither traversal of the vm map checks whether the end of the vm map has been reached. Consequently, the first traversal can wrap around and bogusly return an error. This error has gone unnoticed for so long because no one had ever before tried msync(2)ing a region above the stack. Reported by: peter MFC after: 1 week	2007-10-22 05:21:05 +00:00
julian	51d643caa6	Rename the kthread_xxx (e.g. kthread_create()) calls to kproc_xxx as they actually make whole processes. Thos makes way for us to add REAL kthread_create() and friends that actually make theads. it turns out that most of these calls actually end up being moved back to the thread version when it's added. but we need to make this cosmetic change first. I'd LOVE to do this rename in 7.0 so that we can eventually MFC the new kthread_xxx() calls.	2007-10-20 23:23:23 +00:00
alc	79cc4a8646	The previous revision, updating vm_object_page_remove() for the new page cache, did not account for the case where the vm object has nothing but cached pages. Reported by: kris, tegge Reviewed by: tegge MFC after: 3 days	2007-10-18 23:02:18 +00:00
peter	9b1e0dd3a8	Fix cosmetic bug in stale copy of msync_args. 'len' is size_t, not int.	2007-10-18 22:47:39 +00:00
ru	61b9ccb51d	Fix CTL_VM_NAMES.	2007-10-16 11:32:57 +00:00
jhb	a448c36f7e	Allow recursion on the 'zones' internal UMA zone. Submitted by: thompsa MFC after: 1 week Approved by: re (kensmith) Discussed with: jeff	2007-10-11 20:11:27 +00:00
kib	53bbfe99fb	Do not dereference NULL pointer. Reported by: Peter Holm Reviewed by: alc Approved by: re (kensmith)	2007-10-08 20:09:53 +00:00
alc	d53c0afe54	In the rare case that vm_page_cache() actually frees the given page, it must first ensure that the page is no longer mapped. This is trivially accomplished by calling pmap_remove_all() a little earlier in vm_page_cache(). While I'm in the neighborbood, make a related panic message a little more useful. Approved by: re (kensmith) Reported by: Peter Holm and Konstantin Belousov Reviewed by: Konstantin Belousov	2007-10-08 18:01:38 +00:00
alc	19c4fce2e3	Correct a lock assertion failure in sparc64's pmap_page_is_mapped() that is a consequence of sparc64/sparc64/vm_machdep.c revision 1.76. It occurs when uma_small_free() frees a page. The solution has two parts: (1) Mark pages allocated with VM_ALLOC_NOOBJ as PG_UNMANAGED. (2) Defer the lock assertion in pmap_page_is_mapped() until after PG_UNMANAGED is tested. This is safe because both PG_UNMANAGED and PG_FICTITIOUS are immutable flags, i.e., they do not change state between the time that a page is allocated and freed. Approved by: re (kensmith) PR: 116794	2007-10-07 18:03:03 +00:00
alc	9d3ffe57ce	Correct an error of omission in the reimplementation of the page cache: vm_object_page_remove() should convert any cached pages that fall with the specified range to free pages. Otherwise, there could be a problem if a file is first truncated and then regrown. Specifically, some old data from prior to the truncation might reappear. Generalize vm_page_cache_free() to support the conversion of either a subset or the entirety of an object's cached pages. Reported by: tegge Reviewed by: tegge Approved by: re (kensmith)	2007-09-27 04:21:59 +00:00
alc	b72c80753d	Correct an error in the previous revision, specifically, vm_object_madvise() should request that the reactivated, cached page not be busied. Reported by: Rink Springer Approved by: re (kensmith)	2007-09-25 21:01:10 +00:00
alc	d1bce06c64	Change the management of cached pages (PQ_CACHE) in two fundamental ways: (1) Cached pages are no longer kept in the object's resident page splay tree and memq. Instead, they are kept in a separate per-object splay tree of cached pages. However, access to this new per-object splay tree is synchronized by the _free_ page queues lock, not to be confused with the heavily contended page queues lock. Consequently, a cached page can be reclaimed by vm_page_alloc(9) without acquiring the object's lock or the page queues lock. This solves a problem independently reported by tegge@ and Isilon. Specifically, they observed the page daemon consuming a great deal of CPU time because of pages bouncing back and forth between the cache queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of this problem turned out to be a deadlock avoidance strategy employed when selecting a cached page to reclaim in vm_page_select_cache(). However, the root cause was really that reclaiming a cached page required the acquisition of an object lock while the page queues lock was already held. Thus, this change addresses the problem at its root, by eliminating the need to acquire the object's lock. Moreover, keeping cached pages in the object's primary splay tree and memq was, in effect, optimizing for the uncommon case. Cached pages are reclaimed far, far more often than they are reactivated. Instead, this change makes reclamation cheaper, especially in terms of synchronization overhead, and reactivation more expensive, because reactivated pages will have to be reentered into the object's primary splay tree and memq. (2) Cached pages are now stored alongside free pages in the physical memory allocator's buddy queues, increasing the likelihood that large allocations of contiguous physical memory (i.e., superpages) will succeed. Finally, as a result of this change long-standing restrictions on when and where a cached page can be reclaimed and returned by vm_page_alloc(9) are eliminated. Specifically, calls to vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and return a formerly cached page. Consequently, a call to malloc(9) specifying M_NOWAIT is less likely to fail. Discussed with: many over the course of the summer, including jeff@, Justin Husted @ Isilon, peter@, tegge@ Tested by: an earlier version by kris@ Approved by: re (kensmith)	2007-09-25 06:25:06 +00:00
jeff	e132f56a45	- Redefine p_swtime and td_slptime as p_swtick and td_slptick. This changes the units from seconds to the value of 'ticks' when swapped in/out. ULE does not have a periodic timer that scans all threads in the system and as such maintaining a per-second counter is difficult. - Change computations requiring the unit in seconds to subtract ticks and divide by hz. This does make the wraparound condition hz times more frequent but this is still in the range of several months to years and the adverse effects are minimal. Approved by: re	2007-09-21 05:07:07 +00:00
jeff	3fc0f8b973	- Move all of the PS_ flags into either p_flag or td_flags. - p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or previously the sched_lock. These bugs have existed for some time. - Allow swapout to try each thread in a process individually and then swapin the whole process if any of these fail. This allows us to move most scheduler related swap flags into td_flags. - Keep ki_sflag for backwards compat but change all in source tools to use the new and more correct location of P_INMEM. Reported by: pho Reviewed by: attilio, kib Approved by: re (kensmith)	2007-09-17 05:31:39 +00:00
alc	0188378655	Correct an assertion in vm_pageout_flush(). Specifically, if a page's status after vm_pager_put_pages() is VM_PAGER_PEND, then it could have already been recycled, i.e., freed and reallocated to a new purpose; thus, asserting that such pages cannot be written is inappropriate. Reported by: kris Submitted by: tegge Approved by: re (kensmith) MFC after: 1 week	2007-09-15 18:30:28 +00:00
kib	77766ce03f	Do not drop vm_map lock between doing vm_map_remove() and vm_map_insert(). For this, introduce vm_map_fixed() that does that for MAP_FIXED case. Dropping the lock allowed for parallel thread to occupy the freed space. Reported by: Tijl Coosemans <tijl ulyssis org> Reviewed by: alc Approved by: re (kensmith) MFC after: 2 weeks	2007-08-20 12:05:45 +00:00
kib	05d51a15e9	Remove comment that is no longer quite true. Noted by: alc Approved by: re (kensmith)	2007-08-18 16:41:31 +00:00
kib	ba6ef6ecca	Fix the phys_pager in the way similar to the rev. 1.83 of the sys/vm/device_pager.c: Protect the creation of the phys pager with non-NULL handle with the phys_pager_mtx. Lookup of phys pager in the pagers list by handle is now synchronized with its removal from the list, and phys_pager_mtx is put before vm object lock in lock order. Dispose the phys_pager_alloc_lock and tsleep calls, together with acquiring Giant, since phys_pager_mtx now covers the same block. Reviewed by: alc Approved by: re (kensmith)	2007-08-18 16:40:33 +00:00
kib	8423133063	Protect the creation of the device pager with the dev_pager_mtx. Lookup of device pager in the pagers list by handle is now synchronized with its removal from the list, and dev_pager_mtx is put before vm object lock in lock order. Dispose the dev_pager_sx lock, since dev_pager_mtx now covers the same block. Noted by: kensmith Reviewed by: alc Approved by: re (kensmith)	2007-08-07 15:36:25 +00:00
alc	a1f1ba2d56	Consider a scenario in which one processor, call it Pt, is performing vm_object_terminate() on a device-backed object at the same time that another processor, call it Pa, is performing dev_pager_alloc() on the same device. The problem is that vm_pager_object_lookup() should not be allowed to return a doomed object, i.e., an object with OBJ_DEAD set, but it does. In detail, the unfortunate sequence of events is: Pt in vm_object_terminate() holds the doomed object's lock and sets OBJ_DEAD on the object. Pa in dev_pager_alloc() holds dev_pager_sx and calls vm_pager_object_lookup(), which returns the doomed object. Next, Pa calls vm_object_reference(), which requires the doomed object's lock, so Pa waits for Pt to release the doomed object's lock. Pt proceeds to the point in vm_object_terminate() where it releases the doomed object's lock. Pa is now able to complete vm_object_reference() because it can now complete the acquisition of the doomed object's lock. So, now the doomed object has a reference count of one! Pa releases dev_pager_sx and returns the doomed object from dev_pager_alloc(). Pt now acquires dev_pager_mtx, removes the doomed object from dev_pager_object_list, releases dev_pager_mtx, and finally calls uma_zfree with the doomed object. However, the doomed object is still in use by Pa. Repeating my key point, vm_pager_object_lookup() must not return a doomed object. Moreover, the test for the object's state, i.e., doomed or not, and the increment of the object's reference count should be carried out atomically. Reviewed by: kib Approved by: re (kensmith) MFC after: 3 weeks	2007-08-05 21:04:32 +00:00
kib	04fb5db609	Do not acquire Giant unconditionally around the calls to the cdevsw d_mmap methods. prep_cdevsw() already installs the shims that acquire/drop Giant for the methods of a driver that specified the D_NEEDGIANT flag. Reviewed by: alc Approved by: re (kensmith)	2007-08-05 05:40:52 +00:00
alc	215153274b	Add a counter for the total number of pages cached and support for reporting the value of this counter in the program "vmstat". Approved by: re (rwatson)	2007-07-27 20:01:22 +00:00
pjd	fe74e944d1	When we do open, we should lock the vnode exclusively. This fixes few races: - fifo race, where two threads assign v_fifoinfo, - v_writecount modifications, - v_object modifications, - and probably more... Discussed with: kib, ups Approved by: re (rwatson)	2007-07-26 16:58:09 +00:00
alc	8765bda351	Two changes to vm_fault_additional_pages(): 1. Rewrite the backward scan. Specifically, reverse the order in which pages are allocated so that upon failure it is never necessary to free pages that were just allocated. Moreover, any allocated pages can be put to use. This makes the backward scan behave just like the forward scan. 2. Eliminate an explicit, unsynchronized check for low memory before calling vm_page_alloc(). It serves no useful purpose. It is, in effect, optimizing the uncommon case at the expense of the common case. Approved by: re (hrs) MFC after: 3 weeks	2007-07-20 06:55:11 +00:00
alc	dc39b85c98	Eliminate two unused functions: vm_phys_alloc_pages() and vm_phys_free_pages(). Rename vm_phys_alloc_pages_locked() to vm_phys_alloc_pages() and vm_phys_free_pages_locked() to vm_phys_free_pages(). Add comments regarding the need for the free page queues lock to be held by callers to these functions. No functional changes. Approved by: re (hrs)	2007-07-14 21:21:17 +00:00
alc	c29e755cdb	Eliminate dead code, specifically, an unused sysctl: "vm.idlezero_maxrun". Approved by: re (hrs)	2007-07-14 19:00:44 +00:00
alc	3e2faffa45	Update a comment describing the page queues. Approved by: re (hrs)	2007-07-13 04:42:20 +00:00
alc	db9dd359b6	Eliminate dead code. Approved by: re (hrs)	2007-07-12 22:23:28 +00:00
alc	3b36c8e7eb	Correct a problem in the ZERO_COPY_SOCKETS option, specifically, in vm_page_cowfault(). Initially, if vm_page_cowfault() sleeps, the given page is wired, preventing it from being recycled. However, when transmission of the page completes, the page is unwired and returned to the page queues. At that point, the page is not in any special state that prevents it from being recycled. Consequently, vm_page_cowfault() should verify that the page is still held by the same vm object before retrying the replacement of the page. Note: The containing object is, however, safe from being recycled by virtue of having a non-zero paging-in-progress count. While I'm here, add some assertions and comments. Approved by: re (rwatson) MFC After: 3 weeks	2007-07-10 18:41:34 +00:00
alc	f58d26b291	Eliminate the special case handling of OBJT_DEVICE objects in vm_fault_additional_pages() that was introduced in revision 1.47. Then as now, it is unnecessary because dev_pager_haspage() returns zero for both the number of pages to read ahead and read behind, producing the same exact behavior by vm_fault_additional_pages() as the special case handling. Approved by: re (rwatson)	2007-07-08 19:42:52 +00:00
alc	0985df88fc	When a cached page is reactivated in vm_fault(), update the counter that tracks the total number of reactivated pages. (We have not been counting reactivations by vm_fault() since revision 1.46.) Correct a comment in vm_fault_additional_pages(). Approved by: re (kensmith) MFC after: 1 week	2007-07-06 21:25:21 +00:00
peter	fd45cf2a6d	Add freebsd6_ wrappers for mmap/lseek/pread/pwrite/truncate/ftruncate Approved by: re (kensmith)	2007-07-04 22:57:21 +00:00
alc	ab67a07868	In the previous revision, when I replaced the unconditional acquisition of Giant in vm_pageout_scan() with VFS_LOCK_GIANT(), I had to eliminate the acquisition of the vnode interlock before releasing the vm object's lock because the vnode interlock cannot be held when VFS_LOCK_GIANT() is performed. Unfortunately, this allows the vnode to be recycled between the release of the vm object's lock and the vget() on the vnode. In this revision, I prevent the vnode from being recycled by acquiring another reference to the vm object and underlying vnode before releasing the vm object's lock. This change also addresses another preexisting but trivial problem. By acquiring another reference to the vm object, I also prevent the vm object from being recycled. Previously, the "vnodes skipped" counter could be wrong because if it examined a recycled vm object. Reported by: kib Reviewed by: kib Approved by: re (kensmith) MFC after: 3 weeks	2007-07-02 06:56:37 +00:00
alc	c6438ef575	Eliminate the use of Giant from vm_daemon(). Replace the unconditional use of Giant in vm_pageout_scan() with VFS_LOCK_GIANT(). Approved by: re (kensmith) MFC after: 3 weeks	2007-06-26 18:24:05 +00:00
alc	df5f1d1131	Eliminate GIANT_REQUIRED from swap_pager_putpages(). Approved by: re (mux) MFC after: 1 week	2007-06-24 18:40:30 +00:00
alc	c7ee2c66ef	Eliminate unnecessary checks from vm_pageout_clean(): The page that is passed to vm_pageout_clean() cannot possibly be PG_UNMANAGED because it came from the inactive queue and PG_UNMANAGED pages are not in any page queue. Moreover, PG_UNMANAGED pages only exist in OBJT_PHYS objects, and all pages within a OBJT_PHYS object are PG_UNMANAGED. So, if the page that is passed to vm_pageout_clean() is not PG_UNMANAGED, then it cannot be from an OBJT_PHYS object and its neighbors from the same object cannot themselves be PG_UNMANAGED. Reviewed by: tegge	2007-06-18 02:04:38 +00:00
mjacob	a7dcde4629	Don't declare inline a function which isn't.	2007-06-17 04:19:05 +00:00
mjacob	fadc531504	Make sure object is NULL- there is a possible case where you could fall through to it being used w/o being set. Put a break in the default case.	2007-06-17 04:17:48 +00:00
mjacob	cd7cf5829b	Initialize reqpage to zero.	2007-06-17 04:14:27 +00:00
alc	011d4e557f	If attempting to cache a "busy", panic instead of printing a diagnostic message and returning.	2007-06-16 21:07:51 +00:00
alc	8386bd54e6	Update a comment.	2007-06-16 05:25:53 +00:00
alc	a8415c5a0d	Enable the new physical memory allocator. This allocator uses a binary buddy system with a twist. First and foremost, this allocator is required to support the implementation of superpages. As a side effect, it enables a more robust implementation of contigmalloc(9). Moreover, this reimplementation of contigmalloc(9) eliminates the acquisition of Giant by contigmalloc(..., M_NOWAIT, ...). The twist is that this allocator tries to reduce the number of TLB misses incurred by accesses through a direct map to small, UMA-managed objects and page table pages. Roughly speaking, the physical pages that are allocated for such purposes are clustered together in the physical address space. The performance benefits vary. In the most extreme case, a uniprocessor kernel running on an Opteron, I measured an 18% reduction in system time during a buildworld. This allocator does not implement page coloring. The reason is that superpages have much the same effect. The contiguous physical memory allocation necessary for a superpage is inherently colored. Finally, the one caveat is that this allocator does not effectively support prezeroed pages. I hope this is temporary. On i386, this is a slight pessimization. However, on amd64, the beneficial effects of the direct-map optimization outweigh the ill effects. I speculate that this is true in general of machines with a direct map. Approved by: re	2007-06-16 04:57:06 +00:00
alc	0630c4e3a4	Eliminate dead code: We have not performed pageouts on the kernel object in this millenium.	2007-06-13 06:10:10 +00:00
alc	a9a2aaf8ad	Conditionally acquire Giant in vm_contig_launder_page().	2007-06-11 03:20:16 +00:00
attilio	e9fc4edc44	Optimize vmmeter locking. In particular: - Add an explicative table for locking of struct vmmeter members - Apply new rules for some of those members - Remove some unuseful comments Heavily reviewed by: alc, bde, jeff Approved by: jeff (mentor)	2007-06-10 21:59:14 +00:00
alc	ee6e89585d	Add a new physical memory allocator. However, do not yet connect it to the build. This allocator uses a binary buddy system with a twist. First and foremost, this allocator is required to support the implementation of superpages. As a side effect, it enables a more robust implementation of contigmalloc(9). Moreover, this reimplementation of contigmalloc(9) eliminates the acquisition of Giant by contigmalloc(..., M_NOWAIT, ...). The twist is that this allocator tries to reduce the number of TLB misses incurred by accesses through a direct map to small, UMA-managed objects and page table pages. Roughly speaking, the physical pages that are allocated for such purposes are clustered together in the physical address space. The performance benefits vary. In the most extreme case, a uniprocessor kernel running on an Opteron, I measured an 18% reduction in system time during a buildworld. This allocator does not implement page coloring. The reason is that superpages have much the same effect. The contiguous physical memory allocation necessary for a superpage is inherently colored. Finally, the one caveat is that this allocator does not effectively support prezeroed pages. I hope this is temporary. On i386, this is a slight pessimization. However, on amd64, the beneficial effects of the direct-map optimization outweigh the ill effects. I speculate that this is true in general of machines with a direct map. Approved by: re	2007-06-10 00:49:16 +00:00
jeff	91d1501790	Commit 14/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-05 00:00:57 +00:00
attilio	9bd4fdf7ce	Do proper "locking" for missing vmmeters part. Now, we assume no more sched_lock protection for some of them and use the distribuited loads method for vmmeter (distribuited through CPUs). Reviewed by: alc, bde Approved by: jeff (mentor)	2007-06-04 21:45:18 +00:00
attilio	e333d0ff0e	Rework the PCPU_* (MD) interface: - Rename PCPU_LAZY_INC into PCPU_INC - Add the PCPU_ADD interface which just does an add on the pcpu member given a specific value. Note that for most architectures PCPU_INC and PCPU_ADD are not safe. This is a point that needs some discussions/work in the next days. Reviewed by: alc, bde Approved by: jeff (mentor)	2007-06-04 21:38:48 +00:00
jeff	a7a8bac81f	- Move rusage from being per-process in struct pstats to per-thread in td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)	2007-06-01 01:12:45 +00:00
attilio	7dd8ed88a9	Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)	2007-05-31 22:52:15 +00:00
kib	f13486a222	Revert UF_OPENING workaround for CURRENT. Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation argument from being file descriptor index into the pointer to struct file. Proposed and reviewed by: jhb Reviewed by: daichi (unionfs) Approved by: re (kensmith)	2007-05-31 11:51:53 +00:00
attilio	d5fdf88def	Add functions sx_xlock_sig() and sx_slock_sig(). These functions are intended to do the same actions of sx_xlock() and sx_slock() but with the difference to perform an interruptible sleep, so that sleep can be interrupted by external events. In order to support these new featueres, some code renstruction is needed, but external API won't be affected at all. Note: use "void" cast for "int" returning functions in order to avoid tools like Coverity prevents to whine. Requested by: rwatson Tested by: rwatson Reviewed by: jhb Approved by: jeff (mentor)	2007-05-31 09:14:48 +00:00
alc	08b2128056	Eliminate the reactivation of cached pages in vm_fault_prefault() and vm_map_pmap_enter() unless the caller is madvise(MADV_WILLNEED). With the exception of calls to vm_map_pmap_enter() from madvise(MADV_WILLNEED), vm_fault_prefault() and vm_map_pmap_enter() are both used to create speculative mappings. Thus, always reactivating cached pages is a mistake. In principle, cached pages should only be reactivated by an actual access. Otherwise, the following misbehavior can occur. On a hard fault for a text page the clustering algorithm fetches not only the required page but also several of the adjacent pages. Now, suppose that one or more of the adjacent pages are never accessed. Ultimately, these unused pages become cached pages through the efforts of the page daemon. However, the next activation of the executable reactivates and maps these unused pages. Consequently, they are never replaced. In effect, they become pinned in memory.	2007-05-22 04:45:59 +00:00
jeff	953418f0d5	- rename VMCNT_DEC to VMCNT_SUB to reflect the count argument. Suggested by: julian@ Contributed by: attilio@	2007-05-20 22:33:42 +00:00
jeff	e1996cb960	- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>	2007-05-18 07:10:50 +00:00
rwatson	e8ccc2cbbd	Update stale comment on protecting UMA per-CPU caches: we now use critical sections rather than mutexes.	2007-05-09 22:53:34 +00:00
alc	b34f6f7ab1	Define every architecture as either VM_PHYSSEG_DENSE or VM_PHYSSEG_SPARSE depending on whether the physical address space is densely or sparsely populated with memory. The effect of this definition is to determine which of two implementations of vm_page_array and PHYS_TO_VM_PAGE() is used. The legacy implementation is obtained by defining VM_PHYSSEG_DENSE, and a new implementation that trades off time for space is obtained by defining VM_PHYSSEG_SPARSE. For now, all architectures except for ia64 and sparc64 define VM_PHYSSEG_DENSE. Defining VM_PHYSSEG_SPARSE on ia64 allows the entirety of my Itanium 2's memory to be used. Previously, only the first 1 GB could be used. Defining VM_PHYSSEG_SPARSE on sparc64 allows USIIIi-based systems to boot without crashing. This change is a combination of Nathan Whitehorn's patch and my own work in perforce. Discussed with: kmacy, marius, Nathan Whitehorn PR: 112194	2007-05-05 19:50:28 +00:00
alc	c32653f226	Remove some code from vmspace_fork() that became redundant after revision 1.334 modified _vm_map_init() to initialize the new vm map's flags to zero.	2007-04-26 05:48:17 +00:00
rwatson	bd3cfaaef8	Audit pathnames looked up in swapon(2) and swapoff(2). MFC after: 2 weeks Obtained from: TrustedBSD Project	2007-04-23 14:41:34 +00:00
alc	a10280e050	Correct contigmalloc2()'s implementation of M_ZERO. Specifically, contigmalloc2() was always testing the first physical page for PG_ZERO, not the current page of interest. Submitted by: Michael Plass PR: 81301 MFC after: 1 week	2007-04-19 05:39:54 +00:00
alc	6e14b3e802	Correct two comments. Submitted by: Michael Plass	2007-04-19 04:52:47 +00:00
keramida	f572f4ce6a	Minor typo fix, noticed while I was going through *_pager.c files.	2007-04-10 12:34:51 +00:00
pjd	a4513e9da8	When KVA is exhausted, try the vm_lowmem event for the last time before panicing. This helps a lot in ZFS stability.	2007-04-05 20:52:51 +00:00
pjd	b6f1c3fccc	Fix a problem for file systems that don't implement VOP_BMAP() operation. The problem is this: vm_fault_additional_pages() calls vm_pager_has_page(), which calls vnode_pager_haspage(). Now when VOP_BMAP() returns an error (eg. EOPNOTSUPP), vnode_pager_haspage() returns TRUE without initializing 'before' and 'after' arguments, so we have some accidental values there. This bascially was causing this condition to be meet: if ((rahead + rbehind) > ((cnt.v_free_count + cnt.v_cache_count) - cnt.v_free_reserved)) { pagedaemon_wakeup(); [...] } (we have some random values in rahead and rbehind variables) I'm not entirely sure this is the right fix, maybe we should just return FALSE in vnode_pager_haspage() when VOP_BMAP() fails? alc@ knows about this problem, maybe he will be able to come up with a better fix if this is not the right one.	2007-04-05 20:49:46 +00:00
alc	ecbefa2cc5	Prevent a race between vm_object_collapse() and vm_object_split() from causing a crash. Suppose that we have two objects, obj and backing_obj, where backing_obj is obj's backing object. Further, suppose that backing_obj has a reference count of two. One being the reference held by obj and the other by a map entry. Now, suppose that the map entry is deallocated and its reference removed by vm_object_deallocate(). vm_object_deallocate() recognizes that the only remaining reference is from a shadow object, obj, and calls vm_object_collapse() on obj. vm_object_collapse() executes if (backing_object->ref_count == 1) { /* * If there is exactly one reference to the backing * object, we can collapse it into the parent. */ vm_object_backing_scan(object, OBSC_COLLAPSE_WAIT); vm_object_backing_scan(OBSC_COLLAPSE_WAIT) executes if (op & OBSC_COLLAPSE_WAIT) { vm_object_set_flag(backing_object, OBJ_DEAD); } Finally, suppose that either vm_object_backing_scan() or vm_object_collapse() sleeps releasing its locks. At this instant, another thread executes vm_object_split(). It crashes in vm_object_reference_locked() on the assertion that the object is not dead. If, however, assertions are not enabled, it crashes much later, after the object has been recycled, in vm_object_deallocate() because the shadow count and shadow list are inconsistent. Reviewed by: tegge Reported by: jhb MFC after: 1 week	2007-03-27 08:55:17 +00:00
alc	989c09dfd5	Two small changes to vm_map_pmap_enter(): 1) Eliminate an unnecessary check for fictitious pages. Specifically, only device-backed objects contain fictitious pages and the object is not device-backed. 2) Change the types of "psize" and "tmpidx" to vm_pindex_t in order to prevent possible wrap around with extremely large maps and objects, respectively. Observed by: tegge (last summer)	2007-03-25 19:33:40 +00:00
alc	584a970755	vm_page_busy() no longer requires the page queues lock to be held. Reduce the scope of the page queues lock in vm_fault() accordingly.	2007-03-23 06:11:25 +00:00
alc	efc8daaecb	Change the order of lock reacquisition in vm_object_split() in order to simplify the code slightly. Add a comment concerning lock ordering.	2007-03-22 07:02:43 +00:00
alc	2210798637	Use PCPU_LAZY_INC() to update page fault statistics.	2007-03-05 18:55:14 +00:00
jhb	54e4ea54b6	Use pause() in vm_object_deallocate() to yield the CPU to the lock holder rather than a tsleep() on &proc0. The only wakeup on &proc0 is intended to awaken the swapper, not random threads blocked in vm_object_deallocate().	2007-02-27 19:40:26 +00:00
jhb	9081d44243	Use pause() rather than tsleep() on stack variables and function pointers.	2007-02-27 17:23:29 +00:00
alc	573a964db6	Change the way that unmanaged pages are created. Specifically, immediately flag any page that is allocated to a OBJT_PHYS object as unmanaged in vm_page_alloc() rather than waiting for a later call to vm_page_unmanage(). This allows for the elimination of some uses of the page queues lock. Change the type of the kernel and kmem objects from OBJT_DEFAULT to OBJT_PHYS. This allows us to take advantage of the above change to simplify the allocation of unmanaged pages in kmem_alloc() and kmem_malloc(). Remove vm_page_unmanage(). It is no longer used.	2007-02-25 06:14:58 +00:00
alc	e4e74de1c2	Change the page's CLEANCHK flag from being a page queue mutex synchronized flag to a vm object mutex synchronized flag.	2007-02-22 06:15:52 +00:00
alc	c0ed1b65cf	Enable vm_page_free() and vm_page_free_zero() to be called on some pages without the page queues lock being held, specifically, pages that are not contained in a vm object and not a member of a page queue.	2007-02-18 05:54:42 +00:00
alc	7913035c29	Remove a stale comment. Add punctuation to a nearby comment.	2007-02-17 19:37:00 +00:00
alc	3e7d1b7ebd	Relax the page queue lock assertions in vm_page_remove() and vm_page_free_toq() to account for recent changes that allow vm_page_free_toq() to be called on some pages without the page queues lock being held, specifically, pages that are not contained in a vm object and not a member of a page queue. (Examples of such pages include page table pages, pv entry pages, and uma small alloc pages.)	2007-02-15 05:43:38 +00:00
alc	0250171bf6	Avoid the unnecessary acquisition of the free page queues lock when a page is actually being added to the hold queue, not the free queue. At the same time, avoid unnecessary tests to wake up threads waiting for free memory and the idle thread that zeroes free pages. (These tests will be performed later when the page finally moves from the hold queue to the free queue.)	2007-02-14 07:05:55 +00:00
rwatson	b9f6b60a84	Add uma_set_align() interface, which will be called at most once during boot by MD code to indicated detected alignment preference. Rather than cache alignment being encoded in UMA consumers by defining a global alignment value of (16 - 1) in UMA_ALIGN_CACHE, UMA_ALIGN_CACHE is now a special value (-1) that causes UMA to look at registered alignment. If no preferred alignment has been selected by MD code, a default alignment of (16 - 1) will be used. Currently, no hardware platforms specify alignment; architecture maintainers will need to modify MD startup code to specify an alignment if desired. This must occur before initialization of UMA so that all UMA zones pick up the requested alignment. Reviewed by: jeff, alc Submitted by: attilio	2007-02-11 20:13:52 +00:00
alc	909177e227	Use the free page queue mutex instead of the page queue mutex to synchronize sleeping and waking of the zero idle thread.	2007-02-11 05:18:40 +00:00
jhb	9c764c7fc3	- Move 'struct swdevt' back into swap_pager.h and expose it to userland. - Restore support for fetching swap information from crash dumps via kvm_get_swapinfo(3) to fix pstat -T/-s on crash dumps. Reviewed by: arch@, phk MFC after: 1 week	2007-02-07 17:43:11 +00:00
alc	2eb15b506b	Change the pagedaemon, vm_wait(), and vm_waitpfault() to sleep on the vm page queue free mutex instead of the vm page queue mutex.	2007-02-07 06:37:30 +00:00
alc	4881bd38e2	Change the free page queue lock from a spin mutex to a default (blocking) mutex. With the demise of Alpha support, there is no longer a reason for it to be a spin mutex.	2007-02-05 06:02:55 +00:00
mohans	83064ec323	Fix for problems that occur when all mbuf clusters migrate to the mbuf packet zone. Cluster allocations fail when this happens. Also processes that may have blocked on cluster allocations will never be woken up. Thanks to rwatson for an overview of the issue and pointers to the mbuma paper and his tool to dump out UMA zones. Reviewed by: andre@	2007-01-25 01:05:23 +00:00
mohans	9799fcf93c	Fix for a bug where only one process (of multiple) blocked on maxpages on a zone is woken up, with the rest never being woken up as a result of the ZFLAG_FULL flag being cleared. Wakeup all such blocked procsses instead. This change introduces a thundering herd, but since this should be relatively infrequent, optimizing this (by introducing a count of blocked processes, for example) may be premature. Reviewd by: ups@	2007-01-24 22:49:11 +00:00
jeff	474b917526	- Remove setrunqueue and replace it with direct calls to sched_add(). setrunqueue() was mostly empty. The few asserts and thread state setting were moved to the individual schedulers. sched_add() was chosen to displace it for naming consistency reasons. - Remove adjustrunqueue, it was 4 lines of code that was ifdef'd to be different on all three schedulers where it was only called in one place each. - Remove the long ifdef'd out remrunqueue code. - Remove the now redundant ts_state. Inspect the thread state directly. - Don't set TSF_* flags from kern_switch.c, we were only doing this to support a feature in one scheduler. - Change sched_choose() to return a thread rather than a td_sched. Also, rely on the schedulers to return the idlethread. This simplifies the logic in choosethread(). Aside from the run queue links kern_switch.c mostly does not care about the contents of td_sched. Discussed with: julian - Move the idle thread loop into the per scheduler area. ULE wants to do something different from the other schedulers. Suggested by: jhb Tested on: x86/amd64 sched_{4BSD, ULE, CORE}.	2007-01-23 08:46:51 +00:00
delphij	9856d14ea1	Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form.	2007-01-17 15:05:52 +00:00
rwatson	d6f9a991c6	Remove uma_zalloc_arg() hack, which coerced M_WAITOK to M_NOWAIT when allocations were made using improper flags in interrupt context. Replace with a simple WITNESS warning call. This restores the invariant that M_WAITOK allocations will always succeed or die horribly trying, which is relied on by many UMA consumers. MFC after: 3 weeks Discussed with: jhb	2007-01-10 21:04:43 +00:00
alc	57699b6f9c	Declare the map entry created by kmem_init() for the range from VM_MIN_KERNEL_ADDRESS to the end of the kernel's bootstrap data as MAP_NOFAULT.	2007-01-07 07:32:04 +00:00
jhb	db17c1d997	- Add a new function uma_zone_exhausted() to see if a zone is full. - Add a printf in swp_pager_meta_build() to warn if the swapzone becomes exhausted so that there's at least a warning before a box that runs out of swapzone space before running out of swap space deadlocks. MFC after: 1 week Reviwed by: alc	2007-01-05 19:09:01 +00:00
alc	143ffef93b	Optimize vm_object_split(). Specifically, make the number of iterations equal to the number of physical pages that are renamed to the new object rather than the new object's virtual size.	2006-12-17 20:14:43 +00:00
alc	71d9f62bca	Simplify the computation of the new object's size in vm_object_split().	2006-12-16 08:17:07 +00:00
kmacy	d0a44b7b7c	Remove the requirement that phys_avail be sorted in ascending order by explicitly finding the lowest and highest addresses when calculating the size of the vm_pages array Reviewed by :alc	2006-12-08 08:44:47 +00:00
julian	396ed947f6	Threading cleanup.. part 2 of several. Make part of John Birrell's KSE patch permanent.. Specifically, remove: Any reference of the ksegrp structure. This feature was never fully utilised and made things overly complicated. All code in the scheduler that tried to make threaded programs fair to unthreaded programs. Libpthread processes will already do this to some extent and libthr processes already disable it. Also: Since this makes such a big change to the scheduler(s), take the opportunity to rename some structures and elements that had to be moved anyhow. This makes the code a lot more readable. The ULE scheduler compiles again but I have no idea if it works. The 4bsd scheduler still reqires a little cleaning and some functions that now do ALMOST nothing will go away, but I thought I'd do that as a separate commit. Tested by David Xu, and Dan Eischen using libthr and libpthread.	2006-12-06 06:34:57 +00:00
ru	01fad2a3f6	The clean_map has been made local to vm_init.c long ago.	2006-11-20 16:23:34 +00:00
ru	825e064fde	Remove a redundant pointer-type variable.	2006-11-20 08:33:55 +00:00
ru	e58221ee6f	When counting vm totals, skip unreferenced objects, including vnodes representing mounted file systems. Reviewed by: alc MFC after: 3 days	2006-11-20 00:16:00 +00:00
alc	ce5848ab99	There is no point in setting PG_REFERENCED on kmem_object pages because they are "unmanaged", i.e., non-pageable, pages. Remove a stale comment.	2006-11-13 00:27:02 +00:00
alc	6093953d36	Make pmap_enter() responsible for setting PG_WRITEABLE instead of its caller. (As a beneficial side-effect, a high-contention acquisition of the page queues lock in vm_fault() is eliminated.)	2006-11-12 21:48:34 +00:00
alc	375373276d	I misplaced the assertion that was added to vm_page_startup() in the previous change. Correct its placement.	2006-11-08 19:11:54 +00:00
alc	763826feff	Simplify the construction of the free queues in vm_page_startup(). Add an assertion to test a hypothesis concerning other redundant computation in vm_page_startup().	2006-11-08 18:43:47 +00:00
alc	1f317d8152	Ensure that the page's oflags field is initialized by contigmalloc().	2006-11-08 06:23:29 +00:00
rwatson	10d0d9cf47	Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>	2006-11-06 13:42:10 +00:00
jb	f82c799735	Make KSE a kernel option, turned on by default in all GENERIC kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@	2006-10-26 21:42:22 +00:00
rwatson	d4be3ff623	Better align output of "show uma" by moving from displaying the basic counters of allocs/frees/use for each zone to the same statistics shown by userspace "vmstat -z". MFC after: 3 days	2006-10-26 12:55:32 +00:00
alc	f395a2d02c	The page queues lock is no longer required by vm_page_wakeup().	2006-10-23 05:27:31 +00:00
alc	5d9c66a3f8	The page queues lock is no longer required by vm_page_busy() or vm_page_wakeup(). Reduce or eliminate its use accordingly.	2006-10-22 21:18:48 +00:00
rwatson	7beaaf5cd2	Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA	2006-10-22 11:52:19 +00:00
alc	cbcb760109	Replace PG_BUSY with VPO_BUSY. In other words, changes to the page's busy flag, i.e., VPO_BUSY, are now synchronized by the per-vm object lock instead of the global page queues lock.	2006-10-22 04:28:14 +00:00
alc	7d7a43f1b4	Eliminate unnecessary PG_BUSY tests. They originally served a purpose that is now handled by vm object locking.	2006-10-21 21:02:04 +00:00
alc	31fbaf5d5c	Long ago, revision 1.22 of vm/vm_pager.h introduced a bug. Specifically, it introduced a check after the call to file system's get pages method that assumes that the get pages method does not change the array of pages that is passed to it. In the case of vnode_pager_generic_getpages(), this assumption has been incorrect. The contents of the array of pages may be shifted by vnode_pager_generic_getpages(). Likely, the problem has been hidden by vnode_pager_haspage() limiting the set of pages that are passed to vnode_pager_generic_getpages() such that a shift never occurs. The fix implemented herein is to adjust the pointer to the array of pages rather than shifting the pages within the array. MFC after: 3 weeks Fix suggested by: tegge	2006-10-14 23:21:48 +00:00
alc	07a6c3ab4e	Change vnode_pager_addr() such that on returning it distinguishes between an error returned by VOP_BMAP() and a hole in the file. Change the callers to vnode_pager_addr() such that they return VM_PAGER_ERROR when VOP_BMAP fails instead of a zero-filled page. Reviewed by: tegge MFC after: 3 weeks	2006-10-14 22:09:03 +00:00
kmacy	e728d44745	sun4v requires TSBs (translation storage buffers) to be contiguous and be size aligned requiring heavy usage of vm_page_alloc_contig This change makes vm_page_alloc_contig SMP safe Approved by: scottl (acting as backup for mentor rwatson)	2006-10-12 04:41:39 +00:00
alc	26e34ffad0	Distinguish between two distinct kinds of errors from VOP_BMAP() in vnode_pager_generic_getpages(): (1) that VOP_BMAP() is unsupported by the underlying file system and (2) an error in performing the VOP_BMAP(). Previously, vnode_pager_generic_getpages() assumed that all errors were of the first type. If, in fact, the error was of the second type, the likely outcome was for the process to become permanently blocked on a busy page. MFC after: 3 weeks Reviewed by: tegge	2006-10-10 18:26:18 +00:00
alc	bd713f224f	Change vnode_pager_generic_getpages() so that it does not panic if the given file is sparse. Instead, it zeroes the requested page. Reviewed by: tegge PR: kern/98116 MFC after: 3 days	2006-10-08 20:26:16 +00:00
kensmith	0c209e1877	Fix two minor style(9) nits in v1.313 which were noticed during an MFC review. alc@ will be MFCing V1.313 plus style fix to RELENG_6.	2006-09-29 00:20:56 +00:00
alc	9fce925349	Make vm_page_release_contig() static.	2006-09-03 22:24:08 +00:00
alc	f2ccfe9525	Refactor vm_page_sleep_if_busy() so that the test for a busy page is inlined and a procedure call is made in the rare case, i.e., when it is necessary to sleep. In this case, inlining the test actually makes the kernel smaller.	2006-08-27 19:50:13 +00:00
alc	8f32cfe8b1	Prevent a call to contigmalloc() that asks for more physical memory than the machine has from causing a panic. Submitted by: Michael Plass PR: 101668 MFC after: 3 days	2006-08-26 02:43:23 +00:00
alc	d108c1d6d1	The return value from vm_pageq_add_new_page() is not used. Eliminate it.	2006-08-25 04:36:19 +00:00
alc	fe4d04c555	Add _vm_stats and _vm_stats_misc to the sysctl declarations in sysctl.h and eliminate their declarations from various source files.	2006-08-21 06:27:28 +00:00
alc	24209e2264	vm_page_zero_idle()'s return value serves no purpose. Eliminate it.	2006-08-21 00:55:05 +00:00
alc	ce3ad47700	Page flags are reset on (re)allocation. There is no need to clear any flags except for PG_ZERO in vm_page_free_toq().	2006-08-21 00:34:31 +00:00
alc	cc1f2c465b	Reimplement the page's NOSYNC flag as an object-synchronized instead of a page queues-synchronized flag. Reduce the scope of the page queues lock in vm_fault() accordingly. Move vm_fault()'s call to vm_object_set_writeable_dirty() outside of the scope of the page queues lock. Reviewed by: tegge Additionally, eliminate an unnecessary dereference in computing the argument that is passed to vm_object_set_writeable_dirty().	2006-08-13 00:11:09 +00:00
alc	b787cad1e0	Ensure that the page's new field for object-synchronized flags is always initialized to zero. Call vm_page_sleep_if_busy() instead of duplicating its implementation in vm_page_grab().	2006-08-11 17:18:58 +00:00
alc	bc546843d7	Change vm_page_cowfault() so that it doesn't allocate a pre-busied page.	2006-08-10 04:48:29 +00:00
alc	b98eae58a6	Introduce a field to struct vm_page for storing flags that are synchronized by the lock on the object containing the page. Transition PG_WANTED and PG_SWAPINPROG to use the new field, eliminating the need for holding the page queues lock when setting or clearing these flags. Rename PG_WANTED and PG_SWAPINPROG to VPO_WANTED and VPO_SWAPINPROG, respectively. Eliminate the assertion that the page queues lock is held in vm_page_io_finish(). Eliminate the acquisition and release of the page queues lock around calls to vm_page_io_finish() in kern_sendfile() and vfs_unbusy_pages().	2006-08-09 17:43:27 +00:00
alc	bc6eabeb88	Eliminate the acquisition and release of the page queues lock around a call to vm_page_sleep_if_busy().	2006-08-06 00:17:17 +00:00
alc	84c8fb9bd2	Change vm_page_sleep_if_busy() so that it no longer requires the caller to hold the page queues lock.	2006-08-06 00:15:40 +00:00
alc	3f8b2509e4	Remove a stale comment.	2006-08-05 19:07:07 +00:00
alc	cbc0dafbb2	When sleeping on a busy page, use the lock from the containing object rather than the global page queues lock.	2006-08-03 23:56:11 +00:00
alc	a152234cf9	Complete the transition from pmap_page_protect() to pmap_remove_write(). Originally, I had adopted sparc64's name, pmap_clear_write(), for the function that is now pmap_remove_write(). However, this function is more like pmap_remove_all() than like pmap_clear_modify() or pmap_clear_reference(), hence, the name change. The higher-level rationale behind this change is described in src/sys/amd64/amd64/pmap.c revision 1.567. The short version is that I'm trying to clean up and fix our support for execute access. Reviewed by: marcel@ (ia64)	2006-08-01 19:06:06 +00:00
alc	89dea53ec2	Export the number of object bypasses and collapses through sysctl.	2006-07-22 22:31:57 +00:00
alc	b5b274360a	Retire debug.mpsafevm. None of the architectures supported in CVS require it any longer.	2006-07-21 23:22:49 +00:00
alc	d0e4b9565d	Eliminate OBJ_WRITEABLE. It hasn't been used in a long time.	2006-07-21 06:40:29 +00:00
alc	004ef88e09	Add pmap_clear_write() to the interface between the virtual memory system's machine-dependent and machine-independent layers. Once pmap_clear_write() is implemented on all of our supported architectures, I intend to replace all calls to pmap_page_protect() by calls to pmap_clear_write(). Why? Both the use and implementation of pmap_page_protect() in our virtual memory system has subtle errors, specifically, the management of execute permission is broken on some architectures. The "prot" argument to pmap_page_protect() should behave differently from the "prot" argument to other pmap functions. Instead of meaning, "give the specified access rights to all of the physical page's mappings," it means "don't take away the specified access rights from all of the physical page's mappings, but do take away the ones that aren't specified." However, owing to our i386 legacy, i.e., no support for no-execute rights, all but one invocation of pmap_page_protect() specifies VM_PROT_READ only, when the intent is, in fact, to remove only write permission. Consequently, a faithful implementation of pmap_page_protect(), e.g., ia64, would remove execute permission as well as write permission. On the other hand, some architectures that support execute permission have basically ignored whether or not VM_PROT_EXECUTE is passed to pmap_page_protect(), e.g., amd64 and sparc64. This change represents the first step in replacing pmap_page_protect() by the less subtle pmap_clear_write() that is already implemented on amd64, i386, and sparc64. Discussed with: grehan@ and marcel@	2006-07-20 17:48:41 +00:00
rwatson	3f582797ac	Fix build of uma_core.c when DDB is not compiled into the kernel by making uma_zone_sumstat() ifdef DDB, as it's only used with DDB now. Submitted by: Wolfram Fenske <Wolfram.Fenske at Student.Uni-Magdeburg.DE>	2006-07-18 01:13:18 +00:00
alc	50001bf119	Ensure that vm_object_deallocate() doesn't dereference a stale object pointer: When vm_object_deallocate() sleeps because of a non-zero paging in progress count on either object or object's shadow, vm_object_deallocate() must ensure that object is still the shadow's backing object when it reawakens. In fact, object may have been deallocated while vm_object_deallocate() slept. If so, reacquiring the lock on object can lead to a deadlock. Submitted by: ups@ MFC after: 3 weeks	2006-07-17 06:45:03 +00:00
rwatson	7fe36465aa	Remove sysctl_vm_zone() and vm.zone sysctl from 7.x. As of 6.x, libmemstat(3) is used by vmstat (and friends) to produce more accurate and more detailed statistics information in a machine-readable way, and vmstat continues to provide the same text-based front-end. This change should not be MFC'd.	2006-07-16 22:53:26 +00:00
alc	e79c83b068	Set debug.mpsafevm to true on PowerPC. (Now, by default, all architectures in CVS have debug.mpsafevm set to true.) Tested by: grehan@	2006-07-10 07:08:05 +00:00
jhb	d4be78a6fa	Move the code to handle the vm.blacklist tunable up a layer into vm_page_startup(). As a result, we now only lookup the tunable once instead of looking it up once for every physical page of memory in the system. This cuts out about a 1 second or so delay in boot on x86 systems. The delay is much larger and more noticable on sun4v apparently. Reported by: kmacy MFC after: 1 week	2006-06-23 16:44:24 +00:00
kib	b90260e703	Make the mincore(2) return ENOMEM when requested range is not fully mapped. Requested by: Bruno Haible <bruno at clisp org> Reviewed by: alc Approved by: pjd (mentor) MFC after: 1 month	2006-06-21 12:59:05 +00:00
alc	3323d8b7b4	Use ptoa(psize) instead of size to compute the end of the mapping in vm_map_pmap_enter().	2006-06-17 08:45:01 +00:00

... 3 4 5 6 7 ...

2687 Commits