freebsd-skq

Author	SHA1	Message	Date
Tom Rhodes	6bd9cb1c81	Fill in a few sysctl descriptions. Reviewed by: alc, Matt Dillon <dillon@apollo.backplane.com> Approved by: alc	2008-08-03 14:26:15 +00:00
John Baldwin	2c3b410b3a	One more whitespace nit.	2008-07-30 21:23:32 +00:00
John Baldwin	3cca4b6fe8	A few more whitespace fixes.	2008-07-30 21:18:08 +00:00
John Baldwin	3677ad363b	If the kernel has run out of metadata for swap, then explicitly panic() instead of emitting a warning before deadlocking. MFC after: 1 month	2008-07-30 21:12:15 +00:00
Konstantin Belousov	24bbc85bf6	The behaviour of the lockmgr going back at least to the 4.4BSD-Lite2 was to downgrade the exclusive lock to shared one when exclusive lock owner requested shared lock. New lockmgr panics instead. The vnode_pager_lock function requests shared lock on the vnode backing the OBJT_VNODE, and can be called when the current thread already holds an exlcusive lock on the vnode. For instance, it happens when handling page fault from the VOP_WRITE() uiomove that writes to the file, with the faulted in page fetched from the vm object backed by the same file. We then get the situation described above. Verify whether the vnode is already exclusively locked by the curthread and request recursed exclusive vnode lock instead of shared, if true. Reported by: gallatin Discussed with: attilio	2008-07-30 18:16:06 +00:00
Alan Cox	fb272dc841	Eliminate stale comments from kmem_malloc().	2008-07-18 17:41:31 +00:00
Konstantin Belousov	11041003c6	Use the VM_ALLOC_INTERRUPT for the page requests when allocating memory for the bio for swapout write. It allows the page allocator to drain free page list deeper. As result, a deadlock where pageout deamon sleeps waiting for bio to be allocated for swapout is no more reproducable in practice. Alan said that M_USE_RESERVE shall be ressurrected and used there, but until this is implemented, M_NOWAIT does exactly what is needed. Tested by: pho, kris Reviewed by: alc No objections from: phk MFC after: 2 weeks (RELENG_7 only)	2008-07-11 11:27:42 +00:00
Alan Cox	b89eaf4e9f	Enable the creation of a kmem map larger than 4GB. Submitted by: Tz-Huan Huang Make several variables related to kmem map auto-sizing static. Found by: CScout	2008-07-05 19:34:33 +00:00
Alan Cox	5cfa90e902	Make preparations for increasing the size of the kernel virtual address space on the amd64 architecture. The amd64 architecture requires kernel code and global variables to reside in the highest 2GB of the 64-bit virtual address space. Thus, the memory allocated during bootstrap, before the call to kmem_init(), starts at KERNBASE, which is not necessarily the same as VM_MIN_KERNEL_ADDRESS on amd64.	2008-06-22 04:54:27 +00:00
Alan Cox	c1f02198d1	KERNBASE is not necessarily an address within the kernel map, e.g., PowerPC/AIM. Consequently, it should not be used to determine the maximum number of kernel map entries. Intead, use VM_MIN_KERNEL_ADDRESS, which marks the start of the kernel map on all architectures. Tested by: marcel@ (PowerPC/AIM)	2008-06-21 21:02:13 +00:00
Stephan Uphoff	11be8415c9	Fix vm object creation locking to allow SHARED vnode locking for vnode_create_vobject. (Not currently used) Noticed by: kib@	2008-06-12 20:46:47 +00:00
Alan Cox	8bcd3b1998	Essentially, neither madvise(..., MADV_DONTNEED) nor madvise(..., MADV_FREE) work. (Moreover, I don't believe that they have ever worked as intended.) The explanation is fairly simple. Both MADV_DONTNEED and MADV_FREE perform vm_page_dontneed() on each page within the range given to madvise(). This function moves the page to the inactive queue. Specifically, if the page is clean, it is moved to the head of the inactive queue where it is first in line for processing by the page daemon. On the other hand, if it is dirty, it is placed at the tail. Let's further examine the case in which the page is clean. Recall that the page is at the head of the line for processing by the page daemon. The expectation of vm_page_dontneed()'s author was that the page would be transferred from the inactive queue to the cache queue by the page daemon. (Once the page is in the cache queue, it is, in effect, free, that is, it can be reallocated to a new vm object by vm_page_alloc() if it isn't reactivated quickly enough by a user of the old vm object.) The trouble is that nowhere in the execution of either MADV_DONTNEED or MADV_FREE is either the machine-independent reference flag (PG_REFERENCED) or the reference bit in any page table entry (PTE) mapping the page cleared. Consequently, the immediate reaction of the page daemon is to reactivate the page because it is referenced. In effect, the madvise() was for naught. The case in which the page was dirty is not too different. Instead of being laundered, the page is reactivated. Note: The essential difference between MADV_DONTNEED and MADV_FREE is that MADV_FREE clears a page's dirty field. So, MADV_FREE is always executing the clean case above. This revision changes vm_page_dontneed() to clear both the machine- independent reference flag (PG_REFERENCED) and the reference bit in all PTEs mapping the page. MFC after: 6 weeks	2008-06-06 18:38:43 +00:00
Alan Cox	ba3042115f	To date, our implementation of munmap(2) has required that the entirety of the specified range be mapped. Specifically, it has returned EINVAL if the entire range is not mapped. There is not, however, any basis for this in either SuSv2 or our own man page. Moreover, neither Linux nor Solaris impose this requirement. This revision removes this requirement. Submitted by: Tijl Coosemans PR: 118510 MFC after: 6 weeks	2008-05-24 21:57:16 +00:00
Stephan Uphoff	2ac78f0e1a	Allow VM object creation in ufs_lookup. (If vfs.vmiodirenable is set) Directory IO without a VM object will store data in 'malloced' buffers severely limiting caching of the data. Without this change VM objects for directories are only created on an open() of the directory. TODO: Inline test if VM object already exists to avoid locking/function call overhead. Tested by: kris@ Reviewed by: jeff@ Reported by: David Filo	2008-05-20 19:05:43 +00:00
Alan Cox	1ec1304bdb	Retire pmap_addr_hint(). It is no longer used.	2008-05-18 04:16:57 +00:00
Alan Cox	d0a83a83bf	In order to map device memory using superpages, mmap(2) must find a superpage-aligned virtual address for the mapping. Revision 1.65 implemented an overly simplistic and generally ineffectual method for finding a superpage-aligned virtual address. Specifically, it rounds the virtual address corresponding to the end of the data segment up to the next superpage-aligned virtual address. If this virtual address is unallocated, then the device will be mapped using superpages. Unfortunately, in modern times, where applications like the X server dynamically load much of their code, this virtual address is already allocated. In such cases, mmap(2) simply uses the first available virtual address, which is not necessarily superpage aligned. This revision changes mmap(2) to use a more robust method, specifically, the VMFS_ALIGNED_SPACE option that is now implemented by vm_map_find().	2008-05-17 19:32:48 +00:00
Alan Cox	e46cd4132c	Preset a device object's alignment ("pg_color") based upon the physical address of the device's memory. This enables pmap_align_superpage() to propose a virtual address for mapping the device memory that permits the use of superpage mappings.	2008-05-17 16:26:34 +00:00
Alan Cox	f578838754	Don't call vm_reserv_alloc_page() on device-backed objects. Otherwise, the system may panic because there is no reservation structure corresponding to the physical address of the device memory. Reported by: Giorgos Keramidas	2008-05-15 18:52:31 +00:00
Alan Cox	6ac3ab7f98	Provide the new argument to kmem_suballoc().	2008-05-10 23:39:27 +00:00
Alan Cox	3202ed7523	Introduce a new parameter "superpage_align" to kmem_suballoc() that is used to request superpage alignment for the submap. Request superpage alignment for the kmem_map. Pass VMFS_ANY_SPACE instead of TRUE to vm_map_find(). (They are currently equivalent but VMFS_ANY_SPACE is the new preferred spelling.) Remove a stale comment from kmem_malloc().	2008-05-10 21:46:20 +00:00
Alan Cox	26c538ffcd	Generalize vm_map_find(9)'s parameter "find_space". Specifically, add support for VMFS_ALIGNED_SPACE, which requests the allocation of an address range best suited to superpages. The old options TRUE and FALSE are mapped to VMFS_ANY_SPACE and VMFS_NO_SPACE, so that there is no immediate need to update all of vm_map_find(9)'s callers. While I'm here, correct a misstatement about vm_map_find(9)'s return values in the man page.	2008-05-10 18:55:35 +00:00
Alan Cox	d3249b142b	Introduce pmap_align_superpage(). It increases the starting virtual address of the given mapping if a different alignment might result in more superpage mappings.	2008-05-09 16:48:07 +00:00
Kip Macy	c8c7ad9260	add malloc flag to blist so that it can be used in ithread context Reviewed by: alc, bsdimp	2008-05-05 19:48:54 +00:00
Alan Cox	2bc24aa956	Eliminate pointless casts from kmem_suballoc().	2008-04-28 17:25:27 +00:00
Alan Cox	b8ca4ef2e3	vm_map_fixed(), unlike vm_map_find(), does not update "addr", so it can be passed by value.	2008-04-28 05:30:23 +00:00
Jeff Roberson	8df78c41d6	- Make SCHED_STATS more generic by adding a wrapper to create the variables and sysctl nodes. - In reset walk the children of kern_sched_stats and reset the counters via the oid_arg1 pointer. This allows us to add arbitrary counters to the tree and still reset them properly. - Define a set of switch types to be passed with flags to mi_switch(). These types are named SWT_*. These types correspond to SCHED_STATS counters and are automatically handled in this way. - Make the new SWT_ types more specific than the older switch stats. There are now stats for idle switches, remote idle wakeups, remote preemption ithreads idling, etc. - Add switch statistics for ULE's pickcpu algorithm. These stats include how much migration there is, how often affinity was successful, how often threads were migrated to the local cpu on wakeup, etc. Sponsored by: Nokia	2008-04-17 04:20:10 +00:00
Alan Cox	44aab2c3de	Introduce vm_reserv_reclaim_contig(). This function is used by contigmalloc(9) as a last resort to steal pages from an inactive, partially-used superpage reservation. Rename vm_reserv_reclaim() to vm_reserv_reclaim_inactive() and refactor it so that a separate subroutine is responsible for breaking the selected reservation. This subroutine is also used by vm_reserv_reclaim_contig().	2008-04-06 18:09:28 +00:00
Alan Cox	2fbced6574	Eliminate an unnecessary test from vm_phys_unfree_page().	2008-04-05 05:02:53 +00:00
Alan Cox	c416972587	Update a comment to vm_map_pmap_enter().	2008-04-04 19:14:58 +00:00
Alan Cox	7630c26507	Reintroduce UMA_SLAB_KMAP; however, change its spelling to UMA_SLAB_KERNEL for consistency with its sibling UMA_SLAB_KMEM. (UMA_SLAB_KMAP met its original demise in revision 1.30 of vm/uma_core.c.) UMA_SLAB_KERNEL is now required by the jumbo frame allocators. Without it, UMA cannot correctly return pages from the jumbo frame zones to the VM system because it resets the pages' object field to NULL instead of the kernel object. In more detail, the jumbo frame zones are created with the option UMA_ZONE_REFCNT. This causes UMA to overwrite the pages' object field with the address of the slab. However, when UMA wants to release these pages, it doesn't know how to restore the object field, so it sets it to NULL. This change teaches UMA how to reset the object field to the kernel object. Crashes reported by: kris Fix tested by: kris Fix discussed with: jeff MFC after: 6 weeks	2008-04-04 18:41:12 +00:00
Alan Cox	24dedba9f5	Eliminate an unnecessary printf() from kmem_suballoc(). The subsequent panic() can be extended to convey the same information.	2008-03-30 20:08:59 +00:00
Jeff Roberson	52481a9a9d	- Use vm_object_reference_locked() directly from vm_object_reference(). This is intended to get rid of vget() consumers who don't wish to acquire a lock. This is functionally the same as calling vref(). vm_object_reference_locked() already uses vref. Discussed with: alc	2008-03-29 07:06:13 +00:00
Konstantin Belousov	91a35e7870	Do not dereference cdev->si_cdevsw, use the dev_refthread() to properly obtain the reference. In particular, this fixes the panic reported in the PR. Remove the comments stating that this needs to be done. PR: kern/119422 MFC after: 1 week	2008-03-20 16:08:42 +00:00
Alan Cox	e5b006ffca	Rename vm_pageq_requeue() to vm_page_requeue() on account of its recent migration to vm/vm_page.c.	2008-03-19 20:24:35 +00:00
Jeff Roberson	374ae2a393	- Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from requiring the per-process spinlock to only requiring the process lock. - Reflect these changes in the proc.h documentation and consumers throughout the kernel. This is a substantial reduction in locking cost for these fields and was made possible by recent changes to threading support.	2008-03-19 06:19:01 +00:00
Alan Cox	1fa94a36b1	Almost seven years ago, vm/vm_page.c was split into three parts: vm/vm_contig.c, vm/vm_page.c, and vm/vm_pageq.c. Today, vm/vm_pageq.c has withered to the point that it contains only four short functions, two of which are only used by vm/vm_page.c. Since I can't foresee any reason for vm/vm_pageq.c to grow, it is time to fold the remaining contents of vm/vm_pageq.c back into vm/vm_page.c. Add some comments. Rename one of the functions, vm_pageq_enqueue(), that is now static within vm/vm_page.c to vm_page_enqueue(). Eliminate PQ_MAXCOUNT as it no longer serves any purpose.	2008-03-18 06:52:15 +00:00
Alan Cox	ec96dca788	Simplify the inner loop of vm_fault()'s delete-behind heuristic. Instead of checking each page for PG_UNMANAGED, perform a one-time check whether the object is OBJT_PHYS. (PG_UNMANAGED pages only belong to OBJT_PHYS objects.)	2008-03-16 17:37:19 +00:00
Robert Watson	237fdd787b	In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink	2008-03-16 10:58:09 +00:00
Jeff Roberson	6617724c5f	Remove kernel support for M:N threading. While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.	2008-03-12 10:12:01 +00:00
Jeff Roberson	c5aa6b581d	- Pass the priority argument from sleep() into sleepq and down into sched_sleep(). This removes extra thread_lock() acquisition and allows the scheduler to decide what to do with the static boost. - Change the priority arguments to cv_ to match sleepq/msleep/etc. where 0 means no priority change. Catch -1 in cv_broadcastpri() and convert it to 0 for now. - Set a flag when sleeping in a way that is compatible with swapping since direct priority comparisons are meaningless now. - Add a sysctl to ule, kern.sched.static_boost, that defaults to on which controls the boost behavior. Turning it off gives better performance in some workloads but needs more investigation. - While we're modifying sleepq, change signal and broadcast to both return with the lock held as the lock was held on enter. Reviewed by: jhb, peter	2008-03-12 06:31:06 +00:00
Alan Cox	593e717ec9	Eliminate an unnecessary test from vm_fault's delete-behind heuristic. Specifically, since the delete-behind heuristic is never applied to a device-backed object, there is no point in checking whether each of the object's pages is fictitious. (Only device-backed objects have fictitious pages.)	2008-03-09 06:08:58 +00:00
Marcel Moolenaar	8775db6f50	Make the vm_pmap field of struct vmspace the last field in the structure. This allows per-CPU variations of struct pmap on a single architecture without affecting the machine-independent fields. As such, the PMAP variations don't affect the ABI. They become part of it.	2008-03-01 22:54:42 +00:00
Alan Cox	688559667f	Correct a long-standing error in vm_object_page_remove(). Specifically, pmap_remove_all() must not be called on fictitious pages. To date, fictitious pages have been allocated from zeroed memory, effectively hiding this problem because the fictitious pages appear to have an empty pv list. Submitted by: Kostik Belousov Rewrite the comments describing vm_object_page_remove() to better describe what it does. Add an assertion. Reviewed by: Kostik Belousov MFC after: 1 week	2008-02-26 17:16:48 +00:00
Alan Cox	4c8e0452e0	Correct a long-standing error in vm_object_deallocate(). Specifically, only anonymous default (OBJT_DEFAULT) and swap (OBJT_SWAP) objects should ever have OBJ_ONEMAPPING set. However, vm_object_deallocate() was setting it on device (OBJT_DEVICE) objects. As a result, vm_object_page_remove() could be called on a device object and if that occurred pmap_remove_all() would be called on the device object's pages. However, a device object's pages are fictitious, and fictitious pages do not have an initialized pv list (struct md_page). To date, fictitious pages have been allocated from zeroed memory, effectively hiding this problem. Now, however, the conversion of rotting diagnostics to invariants in the amd64 and i386 pmaps has revealed the problem. Specifically, assertion failures have occurred during the initialization phase of the X server on some hardware. MFC after: 1 week Discussed with: Kostik Belousov Reported by: Michiel Boland	2008-02-24 18:03:56 +00:00
Attilio Rao	22db15c06f	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>	2008-01-13 14:44:15 +00:00
Pawel Jakub Dawidek	79c2840d1d	When one tries to allocate memory with the M_WAITOK flag and we are short in address space in kmem map call vm_lowmem event in a loop and wait a bit for subsystems to reclaim some memory which in turn will reclaim address space as well. Note, this is a work-around. Reviewed by: alc Approved by: alc MFC after: 3 days	2008-01-10 08:36:38 +00:00
Attilio Rao	cb05b60a89	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>	2008-01-10 01:10:58 +00:00
John Baldwin	8e38aeff17	Add a new file descriptor type for IPC shared memory objects and use it to implement shm_open(2) and shm_unlink(2) in the kernel: - Each shared memory file descriptor is associated with a swap-backed vm object which provides the backing store. Each descriptor starts off with a size of zero, but the size can be altered via ftruncate(2). The shared memory file descriptors also support fstat(2). read(2), write(2), ioctl(2), select(2), poll(2), and kevent(2) are not supported on shared memory file descriptors. - shm_open(2) and shm_unlink(2) are now implemented as system calls that manage shared memory file descriptors. The virtual namespace that maps pathnames to shared memory file descriptors is implemented as a hash table where the hash key is generated via the 32-bit Fowler/Noll/Vo hash of the pathname. - As an extension, the constant 'SHM_ANON' may be specified in place of the path argument to shm_open(2). In this case, an unnamed shared memory file descriptor will be created similar to the IPC_PRIVATE key for shmget(2). Note that the shared memory object can still be shared among processes by sharing the file descriptor via fork(2) or sendmsg(2), but it is unnamed. This effectively serves to implement the getmemfd() idea bandied about the lists several times over the years. - The backing store for shared memory file descriptors are garbage collected when they are not referenced by any open file descriptors or the shm_open(2) virtual namespace. Submitted by: dillon, peter (previous versions) Submitted by: rwatson (I based this on his version) Reviewed by: alc (suggested converting getmemfd() to shm_open())	2008-01-08 21:58:16 +00:00
Christian S.J. Peron	35918c55e5	When MAC is enabled in the kernel, fix a panic triggered by a locking assertion hit in swapoff_one() when we un-mount a swap partition. We should be using curthread where we used thread0 before. This change also replaces the thread argument with a credential argument, as the MAC framework only requires the cred. It should be noted that this allows the machine to be rebooted without panicing with "cannot differ from curthread or NULL" when MAC is enabled. Submitted by: rwatson Reviewed by: attilio MFC after: 2 weeks	2008-01-08 14:58:41 +00:00
Konstantin Belousov	77bc7900bc	In the vm_map_stack(), check for the specified stack region wraparound. Reported and tested by: Peter Holm Reviewed by: alc MFC after: 3 days	2008-01-04 04:33:13 +00:00
Alan Cox	eb2a051720	Add an access type parameter to pmap_enter(). It will be used to implement superpage promotion. Correct a style error in kmem_malloc(): pmap_enter()'s last parameter is a Boolean.	2008-01-03 07:34:34 +00:00
Alan Cox	273bf93c8d	Defer setting either PG_CACHED or PG_FREE until after the free page queues lock is acquired. Otherwise, the state of a reservation's pages' flags and its population count can be inconsistent. That could result in a page being freed twice. Reported by: kris	2008-01-02 04:43:47 +00:00
Alan Cox	af6ce1660a	Correct a style error that was introduced in revision 1.77.	2008-01-01 20:36:04 +00:00
Alan Cox	f8a47341fe	Add the superpage reservation system. This is "part 2 of 2" of the machine-independent support for superpages. (The earlier part was the rewrite of the physical memory allocator.) The remainder of the code required for superpages support is machine-dependent and will be added to the various pmap implementations at a later date. Initially, I am only supporting one large page size per architecture. Moreover, I am only enabling the reservation system on amd64. (In an emergency, it can be disabled by setting VM_NRESERVLEVELS to 0 in amd64/include/vmparam.h or your kernel configuration file.)	2007-12-29 19:53:04 +00:00
Alan Cox	3df92083af	Add a list of reservations to the vm object structure. Recycle the vm object's "pg_color" field to represent the color of the first virtual page address at which the object is mapped instead of the color of the object's first physical page. Since an object may not be mapped, introduce a flag "OBJ_COLORED" that indicates whether "pg_color" is valid.	2007-12-27 17:56:35 +00:00
Alan Cox	ae0fee95e1	Add the superpage reservation type.	2007-12-27 17:08:11 +00:00
Alan Cox	9742373a92	Update the comment describing vm_phys_unfree_page().	2007-12-21 02:44:31 +00:00
Alan Cox	e35395ce21	Modify vm_phys_unfree_page() so that it no longer requires the given page to be in the free lists. Instead, it now returns TRUE if it removed the page from the free lists and FALSE if the page was not in the free lists. This change is required to support superpage reservations. Specifically, once reservations are introduced, a cached page can either be in the free lists or a reservation.	2007-12-20 22:45:54 +00:00
Alan Cox	bc8794a12a	Correct one half of a loop continuation condition in vm_phys_unfree_page(). At present, this error is inconsequential; the other half of the loop continuation condition is sufficient to achieve correct execution.	2007-12-19 23:09:45 +00:00
Alan Cox	0349775790	Eliminate redundant code from vm_page_startup().	2007-12-19 05:47:50 +00:00
Alan Cox	21e10ad46a	Simplify vm_page_free_toq().	2007-12-11 21:20:34 +00:00
Alan Cox	b640825647	Correct a comment.	2007-12-02 07:43:42 +00:00
Robert Watson	9ccca7d1b1	Modify stack(9) stack_print() and stack_sbuf_print() routines to use new linker interfaces for looking up function names and offsets from instruction pointers. Create two variants of each call: one that is "DDB-safe" and avoids locking in the linker, and one that is safe for use in live kernels, by virtue of observing locking, and in particular safe when kernel modules are being loaded and unloaded simultaneous to their use. This will allow them to be used outside of debugging contexts. Modify two of three current stack(9) consumers to use the DDB-safe interfaces, as they run in low-level debugging contexts, such as inside lockmgr(9) and the kernel memory allocator. Update man page.	2007-12-01 22:04:16 +00:00
Alan Cox	da31e3aa04	Make contigmalloc(9)'s page laundering more robust. Specifically, use vm_pageout_fallback_object_lock() in vm_contig_launder_page() to better handle a lock-ordering problem. Consequently, trylock's failure on the page's containing object no longer implies that the page cannot be laundered. MFC after: 6 weeks	2007-11-25 20:37:29 +00:00
Alan Cox	9c5ce94257	Tidy up: Add comments. Eliminate the pointless malloc_type_allocated(..., 0) calls that occur when contigmalloc() has failed. Eliminate the acquisition and release of the page queues lock from vm_page_release_contig(). Rename contigmalloc2() to contigmapping(), reflecting what it does.	2007-11-25 07:42:34 +00:00
Alan Cox	5dfc28704d	Add a read/write sysctl for reconfiguring the maximum number of physical pages that can be wired. Submitted by: Eugene Grosbein PR: 114654 MFC after: 6 weeks	2007-11-23 00:30:19 +00:00
Alan Cox	82cfdd5adc	Remove an unnecessary call to pmap_remove_all() and the associated "XXX" comments from vnode_pager_setsize(). This call was introduced in revision 1.140 to address a problem that no longer exists. Specifically, pmap_zero_page_area() has replaced a (possibly) problematic implementation of page zeroing that was based on vm_pager_map(), bzero(), and vm_pager_unmap().	2007-11-22 20:01:38 +00:00
Alan Cox	ddd6e7d2ab	When reactivating a cached page, reset the page's pool to the default pool. (Not doing this before was a performance pessimization but not a cause for panic.)	2007-11-21 23:22:10 +00:00
Alan Cox	59677d3c0e	Prevent the leakage of wired pages in the following circumstances: First, a file is mmap(2)ed and then mlock(2)ed. Later, it is truncated. Under "normal" circumstances, i.e., when the file is not mlock(2)ed, the pages beyond the EOF are unmapped and freed. However, when the file is mlock(2)ed, the pages beyond the EOF are unmapped but not freed because they have a non-zero wire count. This can be a mistake. Specifically, it is a mistake if the sole reason why the pages are wired is because of wired, managed mappings. Previously, unmapping the pages destroys these wired, managed mappings, but does not reduce the pages' wire count. Consequently, when the file is unmapped, the pages are not unwired because the wired mapping has been destroyed. Moreover, when the vm object is finally destroyed, the pages are leaked because they are still wired. The fix is to reduce the pages' wired count by the number of wired, managed mappings destroyed. To do this, I introduce a new pmap function pmap_page_wired_mappings() that returns the number of managed mappings to the given physical page that are wired, and I use this function in vm_object_page_remove(). Reviewed by: tegge MFC after: 6 weeks	2007-11-17 22:52:29 +00:00
Pawel Jakub Dawidek	8ce2d00a04	Change unused 'user_wait' argument to 'timo' argument, which will be used to specify timeout for msleep(9). Discussed with: alc Reviewed by: alc	2007-11-07 21:56:58 +00:00
Konstantin Belousov	89b57fcf01	Fix for the panic("vm_thread_new: kstack allocation failed") and silent NULL pointer dereference in the i386 and sparc64 pmap_pinit() when the kmem_alloc_nofault() failed to allocate address space. Both functions now return error instead of panicing or dereferencing NULL. As consequence, vmspace_exec() and vmspace_unshare() returns the errno int. struct vmspace arg was added to vm_forkproc() to avoid dealing with failed allocation when most of the fork1() job is already done. The kernel stack for the thread is now set up in the thread_alloc(), that itself may return NULL. Also, allocation of the first process thread is performed in the fork1() to properly deal with stack allocation failure. proc_linkup() is separated into proc_linkup() called from fork1(), and proc_linkup0(), that is used to set up the kernel process (was known as swapper). In collaboration with: Peter Holm Reviewed by: jhb	2007-11-05 11:36:16 +00:00
Konstantin Belousov	aefac17759	The intent of the freeing the (zeroed) page in vm_page_cache() for default object rather than cache it was to have vm_pager_has_page(object, pindex, ...) == FALSE to imply that there is no cached page in object at pindex. This allows to avoid explicit checks for cached pages in vm_object_backing_scan(). For now, we need the same bandaid for the swap object, otherwise both the vm_page_lookup() and the pager can report that there is no page at offset, while page is stored in the cache. Also, this fixes another instance of the KASSERT("object type is incompatible") failure in the vm_page_cache_transfer(). Reported and tested by: Peter Holm Reviewed by: alc MFC after: 3 days	2007-11-05 10:25:12 +00:00
Maxim Konovalov	7036145b25	o Fix panic message: it's swap_pager_putpages() not swap_pager_getpages(). Submitted by: Mark Tinguely	2007-11-02 20:48:10 +00:00
Remko Lodder	248a0568e7	Correct a copy and paste'o in phys_pager.c, we are talking about phys here and not about devices. PR: 93755 Approved by: imp (mentor, implicit when re-assigning the ticket to me).	2007-10-30 14:48:13 +00:00
Alan Cox	21f7958604	Change vm_page_cache_transfer() such that it does not transfer pages that would have an offset beyond the end of the target object. Such pages should remain in the source object. MFC after: 3 days Diagnosed and reviewed by: Kostik Belousov Reported and tested by: Peter Holm	2007-10-27 00:09:30 +00:00
Robert Watson	30d239bc4c	Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer	2007-10-24 19:04:04 +00:00
Alan Cox	0ab3c7a594	Correct an error of omission in the reimplementation of the page cache: vnode_pager_setsize() must handle the case where a file is truncated to a non-page-size-aligned boundary and there is a cached page underlying the new end of file. Reported by: kris, tegge Tested by: kris MFC after: 3 days	2007-10-22 06:23:46 +00:00
Alan Cox	7b0e72d184	Correct an error in vm_map_sync(), nee vm_map_clean(), that has existed since revision 1.1. Specifically, neither traversal of the vm map checks whether the end of the vm map has been reached. Consequently, the first traversal can wrap around and bogusly return an error. This error has gone unnoticed for so long because no one had ever before tried msync(2)ing a region above the stack. Reported by: peter MFC after: 1 week	2007-10-22 05:21:05 +00:00
Julian Elischer	3745c395ec	Rename the kthread_xxx (e.g. kthread_create()) calls to kproc_xxx as they actually make whole processes. Thos makes way for us to add REAL kthread_create() and friends that actually make theads. it turns out that most of these calls actually end up being moved back to the thread version when it's added. but we need to make this cosmetic change first. I'd LOVE to do this rename in 7.0 so that we can eventually MFC the new kthread_xxx() calls.	2007-10-20 23:23:23 +00:00
Alan Cox	2573269111	The previous revision, updating vm_object_page_remove() for the new page cache, did not account for the case where the vm object has nothing but cached pages. Reported by: kris, tegge Reviewed by: tegge MFC after: 3 days	2007-10-18 23:02:18 +00:00
Peter Wemm	c899450b21	Fix cosmetic bug in stale copy of msync_args. 'len' is size_t, not int.	2007-10-18 22:47:39 +00:00
Ruslan Ermilov	8229241a90	Fix CTL_VM_NAMES.	2007-10-16 11:32:57 +00:00
John Baldwin	71eb44c7b1	Allow recursion on the 'zones' internal UMA zone. Submitted by: thompsa MFC after: 1 week Approved by: re (kensmith) Discussed with: jeff	2007-10-11 20:11:27 +00:00
Konstantin Belousov	4ab8ab9285	Do not dereference NULL pointer. Reported by: Peter Holm Reviewed by: alc Approved by: re (kensmith)	2007-10-08 20:09:53 +00:00
Alan Cox	b8c5048025	In the rare case that vm_page_cache() actually frees the given page, it must first ensure that the page is no longer mapped. This is trivially accomplished by calling pmap_remove_all() a little earlier in vm_page_cache(). While I'm in the neighborbood, make a related panic message a little more useful. Approved by: re (kensmith) Reported by: Peter Holm and Konstantin Belousov Reviewed by: Konstantin Belousov	2007-10-08 18:01:38 +00:00
Alan Cox	dc9250f55c	Correct a lock assertion failure in sparc64's pmap_page_is_mapped() that is a consequence of sparc64/sparc64/vm_machdep.c revision 1.76. It occurs when uma_small_free() frees a page. The solution has two parts: (1) Mark pages allocated with VM_ALLOC_NOOBJ as PG_UNMANAGED. (2) Defer the lock assertion in pmap_page_is_mapped() until after PG_UNMANAGED is tested. This is safe because both PG_UNMANAGED and PG_FICTITIOUS are immutable flags, i.e., they do not change state between the time that a page is allocated and freed. Approved by: re (kensmith) PR: 116794	2007-10-07 18:03:03 +00:00
Alan Cox	c944491426	Correct an error of omission in the reimplementation of the page cache: vm_object_page_remove() should convert any cached pages that fall with the specified range to free pages. Otherwise, there could be a problem if a file is first truncated and then regrown. Specifically, some old data from prior to the truncation might reappear. Generalize vm_page_cache_free() to support the conversion of either a subset or the entirety of an object's cached pages. Reported by: tegge Reviewed by: tegge Approved by: re (kensmith)	2007-09-27 04:21:59 +00:00
Alan Cox	f3a2ed4bd9	Correct an error in the previous revision, specifically, vm_object_madvise() should request that the reactivated, cached page not be busied. Reported by: Rink Springer Approved by: re (kensmith)	2007-09-25 21:01:10 +00:00
Alan Cox	7bfda801a8	Change the management of cached pages (PQ_CACHE) in two fundamental ways: (1) Cached pages are no longer kept in the object's resident page splay tree and memq. Instead, they are kept in a separate per-object splay tree of cached pages. However, access to this new per-object splay tree is synchronized by the _free_ page queues lock, not to be confused with the heavily contended page queues lock. Consequently, a cached page can be reclaimed by vm_page_alloc(9) without acquiring the object's lock or the page queues lock. This solves a problem independently reported by tegge@ and Isilon. Specifically, they observed the page daemon consuming a great deal of CPU time because of pages bouncing back and forth between the cache queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of this problem turned out to be a deadlock avoidance strategy employed when selecting a cached page to reclaim in vm_page_select_cache(). However, the root cause was really that reclaiming a cached page required the acquisition of an object lock while the page queues lock was already held. Thus, this change addresses the problem at its root, by eliminating the need to acquire the object's lock. Moreover, keeping cached pages in the object's primary splay tree and memq was, in effect, optimizing for the uncommon case. Cached pages are reclaimed far, far more often than they are reactivated. Instead, this change makes reclamation cheaper, especially in terms of synchronization overhead, and reactivation more expensive, because reactivated pages will have to be reentered into the object's primary splay tree and memq. (2) Cached pages are now stored alongside free pages in the physical memory allocator's buddy queues, increasing the likelihood that large allocations of contiguous physical memory (i.e., superpages) will succeed. Finally, as a result of this change long-standing restrictions on when and where a cached page can be reclaimed and returned by vm_page_alloc(9) are eliminated. Specifically, calls to vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and return a formerly cached page. Consequently, a call to malloc(9) specifying M_NOWAIT is less likely to fail. Discussed with: many over the course of the summer, including jeff@, Justin Husted @ Isilon, peter@, tegge@ Tested by: an earlier version by kris@ Approved by: re (kensmith)	2007-09-25 06:25:06 +00:00
Jeff Roberson	258853ab1c	- Redefine p_swtime and td_slptime as p_swtick and td_slptick. This changes the units from seconds to the value of 'ticks' when swapped in/out. ULE does not have a periodic timer that scans all threads in the system and as such maintaining a per-second counter is difficult. - Change computations requiring the unit in seconds to subtract ticks and divide by hz. This does make the wraparound condition hz times more frequent but this is still in the range of several months to years and the adverse effects are minimal. Approved by: re	2007-09-21 05:07:07 +00:00
Jeff Roberson	b61ce5b0e6	- Move all of the PS_ flags into either p_flag or td_flags. - p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or previously the sched_lock. These bugs have existed for some time. - Allow swapout to try each thread in a process individually and then swapin the whole process if any of these fail. This allows us to move most scheduler related swap flags into td_flags. - Keep ki_sflag for backwards compat but change all in source tools to use the new and more correct location of P_INMEM. Reported by: pho Reviewed by: attilio, kib Approved by: re (kensmith)	2007-09-17 05:31:39 +00:00
Alan Cox	4cd457233b	Correct an assertion in vm_pageout_flush(). Specifically, if a page's status after vm_pager_put_pages() is VM_PAGER_PEND, then it could have already been recycled, i.e., freed and reallocated to a new purpose; thus, asserting that such pages cannot be written is inappropriate. Reported by: kris Submitted by: tegge Approved by: re (kensmith) MFC after: 1 week	2007-09-15 18:30:28 +00:00
Konstantin Belousov	d239bd3ccc	Do not drop vm_map lock between doing vm_map_remove() and vm_map_insert(). For this, introduce vm_map_fixed() that does that for MAP_FIXED case. Dropping the lock allowed for parallel thread to occupy the freed space. Reported by: Tijl Coosemans <tijl ulyssis org> Reviewed by: alc Approved by: re (kensmith) MFC after: 2 weeks	2007-08-20 12:05:45 +00:00
Konstantin Belousov	daab56673e	Remove comment that is no longer quite true. Noted by: alc Approved by: re (kensmith)	2007-08-18 16:41:31 +00:00
Konstantin Belousov	efe7553ed7	Fix the phys_pager in the way similar to the rev. 1.83 of the sys/vm/device_pager.c: Protect the creation of the phys pager with non-NULL handle with the phys_pager_mtx. Lookup of phys pager in the pagers list by handle is now synchronized with its removal from the list, and phys_pager_mtx is put before vm object lock in lock order. Dispose the phys_pager_alloc_lock and tsleep calls, together with acquiring Giant, since phys_pager_mtx now covers the same block. Reviewed by: alc Approved by: re (kensmith)	2007-08-18 16:40:33 +00:00
Konstantin Belousov	deea654ebf	Protect the creation of the device pager with the dev_pager_mtx. Lookup of device pager in the pagers list by handle is now synchronized with its removal from the list, and dev_pager_mtx is put before vm object lock in lock order. Dispose the dev_pager_sx lock, since dev_pager_mtx now covers the same block. Noted by: kensmith Reviewed by: alc Approved by: re (kensmith)	2007-08-07 15:36:25 +00:00
Alan Cox	b5e8f167b9	Consider a scenario in which one processor, call it Pt, is performing vm_object_terminate() on a device-backed object at the same time that another processor, call it Pa, is performing dev_pager_alloc() on the same device. The problem is that vm_pager_object_lookup() should not be allowed to return a doomed object, i.e., an object with OBJ_DEAD set, but it does. In detail, the unfortunate sequence of events is: Pt in vm_object_terminate() holds the doomed object's lock and sets OBJ_DEAD on the object. Pa in dev_pager_alloc() holds dev_pager_sx and calls vm_pager_object_lookup(), which returns the doomed object. Next, Pa calls vm_object_reference(), which requires the doomed object's lock, so Pa waits for Pt to release the doomed object's lock. Pt proceeds to the point in vm_object_terminate() where it releases the doomed object's lock. Pa is now able to complete vm_object_reference() because it can now complete the acquisition of the doomed object's lock. So, now the doomed object has a reference count of one! Pa releases dev_pager_sx and returns the doomed object from dev_pager_alloc(). Pt now acquires dev_pager_mtx, removes the doomed object from dev_pager_object_list, releases dev_pager_mtx, and finally calls uma_zfree with the doomed object. However, the doomed object is still in use by Pa. Repeating my key point, vm_pager_object_lookup() must not return a doomed object. Moreover, the test for the object's state, i.e., doomed or not, and the increment of the object's reference count should be carried out atomically. Reviewed by: kib Approved by: re (kensmith) MFC after: 3 weeks	2007-08-05 21:04:32 +00:00
Konstantin Belousov	c6199d59e3	Do not acquire Giant unconditionally around the calls to the cdevsw d_mmap methods. prep_cdevsw() already installs the shims that acquire/drop Giant for the methods of a driver that specified the D_NEEDGIANT flag. Reviewed by: alc Approved by: re (kensmith)	2007-08-05 05:40:52 +00:00
Alan Cox	eaa29f1ce4	Add a counter for the total number of pages cached and support for reporting the value of this counter in the program "vmstat". Approved by: re (rwatson)	2007-07-27 20:01:22 +00:00
Pawel Jakub Dawidek	57fd3d5572	When we do open, we should lock the vnode exclusively. This fixes few races: - fifo race, where two threads assign v_fifoinfo, - v_writecount modifications, - v_object modifications, - and probably more... Discussed with: kib, ups Approved by: re (rwatson)	2007-07-26 16:58:09 +00:00

1 2 3 4 5 ...

2507 Commits