freebsd-dev

Author	SHA1	Message	Date
Konstantin Belousov	f25fa6abb2	Reverse r196640 and r196644 for now.	2009-08-29 21:53:08 +00:00
Konstantin Belousov	c3cf0b476f	Remove the altkstacks, instead instantiate threads with kernel stack allocated with the right size from the start. For the thread that has kernel stack cached, verify that requested stack size is equial to the actual, and reallocate the stack if sizes differ [1]. This fixes the bug introduced by r173361 that was committed several days after r173004 and consisted of kthread_add(9) ignoring the non-default kernel stack size. Also, r173361 removed the caching of the kernel stacks for a non-first thread in the process. Introduce separate kernel stack cache that keeps some limited amount of preallocated kernel stacks to lower the latency of thread allocation. Add vm_lowmem handler to prune the cache on low memory condition. This way, system with reasonable amount of the threads get lower latency of thread creation, while still not exhausting significant portion of KVA for unused kstacks. Submitted by: peter [1] Discussed with: jhb, julian, peter Reviewed by: jhb Tested by: pho MFC after: 1 week	2009-08-29 13:28:02 +00:00
John Baldwin	58e279db02	Mark the fake pages constructed by the OBJT_SG pager valid. This was accidentally lost at one point during the PAT development. Without this fix vm_pager_get_pages() was zeroing each of the pages. Submitted by: czander @ NVidia MFC after: 3 days	2009-08-29 02:17:40 +00:00
John Baldwin	2fa8c8d21e	Extend the device pager to support different memory attributes on different pages in an object. - Add a new variant of d_mmap() currently called d_mmap2() which accepts an additional in/out parameter that is the memory attribute to use for the requested page. - A driver either uses d_mmap() or d_mmap2() for all requests but not both. The current implementation uses a flag in the cdevsw (D_MMAP2) to indicate that the driver provides a d_mmap2() handler instead of d_mmap(). This is done to make the change ABI compatible with existing drivers and MFC'able to 7 and 8. Submitted by: alc MFC after: 1 month	2009-08-28 14:06:55 +00:00
John Baldwin	63cbfee7b0	Remove debugging that crept in with previous commit. Reported by: nwhitehorn Approved by: re (kib)	2009-07-24 15:06:49 +00:00
John Baldwin	013818111a	Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to a device pager (OBJT_DEVICE) object in that it uses fictitious pages to provide aliases to other memory addresses. The primary difference is that it uses an sglist(9) to determine the physical addresses for a given offset into the object instead of invoking the d_mmap() method in a device driver. Reviewed by: alc Approved by: re (kensmith) MFC after: 2 weeks	2009-07-24 13:50:29 +00:00
Alan Cox	9861cbc6ca	Change the handling of fictitious pages by pmap_page_set_memattr() on amd64 and i386. Essentially, fictitious pages provide a mechanism for creating aliases for either normal or device-backed pages. Therefore, pmap_page_set_memattr() on a fictitious page needn't update the direct map or flush the cache. Such actions are the responsibility of the "primary" instance of the page or the device driver that "owns" the physical address. For example, these actions are already performed by pmap_mapdev(). The device pager needn't restore the memory attributes on a fictitious page before releasing it. It's now pointless. Add pmap_page_set_memattr() to the Xen pmap. Approved by: re (kib)	2009-07-19 21:40:19 +00:00
Alan Cox	13de722155	An addendum to r195649, "Add support to the virtual memory system for configuring machine-dependent memory attributes...": Don't set the memory attribute for a "real" page that is allocated to a device object in vm_page_alloc(). It is a pointless act, because the device pager replaces this "real" page with a "fake" page and sets the memory attribute on that "fake" page. Eliminate pointless code from pmap_cache_bits() on amd64. Employ the "Self Snoop" feature supported by some x86 processors to avoid cache flushes in the pmap. Approved by: re (kib)	2009-07-18 01:50:05 +00:00
John Baldwin	0fe0ed8bf8	- Change mmap() to fail requests with EINVAL that pass a length of 0. This behavior is mandated by POSIX. - Do not fail requests that pass a length greater than SSIZE_MAX (such as > 2GB on 32-bit platforms). The 'len' parameter is actually an unsigned 'size_t' so negative values don't really make sense. Submitted by: Alexander Best alexbestms at math.uni-muenster.de Reviewed by: alc Approved by: re (kib) MFC after: 1 week	2009-07-14 19:45:36 +00:00
Alan Cox	3153e878dd	Add support to the virtual memory system for configuring machine- dependent memory attributes: Rename vm_cache_mode_t to vm_memattr_t. The new name reflects the fact that there are machine-dependent memory attributes that have nothing to do with controlling the cache's behavior. Introduce vm_object_set_memattr() for setting the default memory attributes that will be given to an object's pages. Introduce and use pmap_page_{get,set}_memattr() for getting and setting a page's machine-dependent memory attributes. Add full support for these functions on amd64 and i386 and stubs for them on the other architectures. The function pmap_page_set_memattr() is also responsible for any other machine-dependent aspects of changing a page's memory attributes, such as flushing the cache or updating the direct map. The uses include kmem_alloc_contig(), vm_page_alloc(), and the device pager: kmem_alloc_contig() can now be used to allocate kernel memory with non-default memory attributes on amd64 and i386. vm_page_alloc() and the device pager will set the memory attributes for the real or fictitious page according to the object's default memory attributes. Update the various pmap functions on amd64 and i386 that map pages to incorporate each page's memory attributes in the mapping. Notes: (1) Inherent to this design are safety features that prevent the specification of inconsistent memory attributes by different mappings on amd64 and i386. In addition, the device pager provides a warning when a device driver creates a fictitious page with memory attributes that are inconsistent with the real page that the fictitious page is an alias for. (2) Storing the machine-dependent memory attributes for amd64 and i386 as a dedicated "int" in "struct md_page" represents a compromise between space efficiency and the ease of MFCing these changes to RELENG_7. In collaboration with: jhb Approved by: re (kib)	2009-07-12 23:31:20 +00:00
Konstantin Belousov	529ab57b9a	When VM_MAP_WIRE_HOLESOK is not specified and vm_map_wire(9) encounters non-readable and non-executable map entry, the entry is skipped from wiring and loop is aborted. But, since MAP_ENTRY_WIRE_SKIPPED was not set for the map entry, its wired_count is later erronously decremented. vm_map_delete(9) for such map entry stuck in "vmmaps". Properly set MAP_ENTRY_WIRE_SKIPPED when aborting the loop. Reported by: John Marshall <john.marshall riverwillow com au> Approved by: re (kensmith)	2009-07-12 12:37:38 +00:00
Konstantin Belousov	121fd46175	When forking a vm space that has wired map entries, do not forget to charge the objects created by vm_fault_copy_entry. The object charge was set, but reserve not incremented. Reported by: Greg Rivers <gcr+freebsd-current tharned org> Reviewed by: alc (previous version) Approved by: re (kensmith)	2009-07-03 22:17:37 +00:00
Konstantin Belousov	9b4d473a6e	Eliminiate code duplication by calling vm_object_destroy() from vm_object_collapse(). Requested and reviewed by: alc Approved by: re (kensmith)	2009-06-28 08:42:17 +00:00
Alan Cox	e999111ae7	This change is the next step in implementing the cache control functionality required by video card drivers. Specifically, this change introduces vm_cache_mode_t with an appropriate VM_CACHE_DEFAULT definition on all architectures. In addition, this changes adds a vm_cache_mode_t parameter to kmem_alloc_contig() and vm_phys_alloc_contig(). These will be the interfaces for allocating mapped kernel memory and physical memory, respectively, with non-default cache modes. In collaboration with: jhb	2009-06-26 04:47:43 +00:00
Konstantin Belousov	9f80ce043d	Change the type of uio_resid member of struct uio from int to ssize_t. Note that this does not actually enable full-range i/o requests for 64 architectures, and is done now to update KBI only. Tested by: pho Reviewed by: jhb, bde (as part of the review of the bigger patch)	2009-06-25 18:46:30 +00:00
Konstantin Belousov	7a8af8eee2	Initialize the uip to silence gcc warning that seems to sneak in in some build environments. Reported by: alc, bf1783 at googlemail com	2009-06-24 09:26:33 +00:00
Alan Cox	26f4eea53f	The bits set in a page's dirty mask are a subset of the bits set in its valid mask. Consequently, there is no need to perform a bit-wise and of the page's dirty and valid masks in order to determine which parts of a page are dirty and valid. Eliminate an unnecessary #include.	2009-06-24 04:45:03 +00:00
Konstantin Belousov	3364c323e6	Implement global and per-uid accounting of the anonymous memory. Add rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved for the uid. The accounting information (charge) is associated with either map entry, or vm object backing the entry, assuming the object is the first one in the shadow chain and entry does not require COW. Charge is moved from entry to object on allocation of the object, e.g. during the mmap, assuming the object is allocated, or on the first page fault on the entry. It moves back to the entry on forks due to COW setup. The per-entry granularity of accounting makes the charge process fair for processes that change uid during lifetime, and decrements charge for proper uid when region is unmapped. The interface of vm_pager_allocate(9) is extended by adding struct ucred *, that is used to charge appropriate uid when allocation if performed by kernel, e.g. md(4). Several syscalls, among them is fork(2), may now return ENOMEM when global or per-uid limits are enforced. In collaboration with: pho Reviewed by: alc Approved by: re (kensmith)	2009-06-23 20:45:22 +00:00
Alan Cox	ce1b857f33	Validate the page in one place, dev_pager_getpages(), rather than doing it in two places, dev_pager_getfake() and dev_pager_updatefake(). Compare a pointer to "NULL" rather than "0".	2009-06-22 19:09:48 +00:00
Alan Cox	ef327c3ee7	Implement a mechanism within vm_phys_alloc_contig() to defer all necessary calls to vdrop() until after the free page queues lock is released. This eliminates repeatedly releasing and reacquiring the free page queues lock each time the last cached page is reclaimed from a vnode-backed object.	2009-06-21 20:29:14 +00:00
Alan Cox	6f0489c670	Strive for greater consistency among the places that implement real, fictious, and contiguous page allocation. Eliminate unnecessary reinitialization of a page's fields.	2009-06-21 00:21:33 +00:00
Andrew Thompson	f06a3a36ac	Track the kernel mapping of a physical page by a new entry in vm_page structure. When the page is shared, the kernel mapping becomes a special type of managed page to force the cache off the page mappings. This is needed to avoid stale entries on all ARM VIVT caches, and VIPT caches with cache color issue. Submitted by: Mark Tinguely Reviewed by: alc Tested by: Grzegorz Bernacki, thompsa	2009-06-18 20:42:37 +00:00
Alan Cox	aea6e893ed	Add support for UMA_SLAB_KERNEL to page_free(). (While I'm here remove an unnecessary newline character from the end of two panic messages.)	2009-06-18 07:27:11 +00:00
Alan Cox	f0553fdbc4	Eliminate unnecessary forward declarations.	2009-06-17 20:12:23 +00:00
Alan Cox	d78200e4e8	Refactor contigmalloc() into two functions: a simple front-end that deals with the malloc tag and calls a new back-end, kmem_alloc_contig(), that allocates the pages and maps them. The motivations for this change are two-fold: (1) A cache mode parameter will be added to kmem_alloc_contig(). In other words, kmem_alloc_contig() will be extended to support the allocation of memory with caller-specified caching. (2) The UMA allocation function that is used by the two jumbo frames zones can use kmem_alloc_contig() in place of contigmalloc() and thereby avoid having free jumbo frames held by the zone counted as live malloc()ed memory.	2009-06-17 17:19:48 +00:00
Alan Cox	2d59a004af	Pass the size of the mapping to contigmapping() as a "vm_size_t" rather than a "vm_pindex_t". A "vm_size_t" is more convenient for it to use.	2009-06-17 07:11:38 +00:00
Alan Cox	ead1d027bd	Make the maintenance of a page's valid bits by contigmalloc() more like kmem_alloc() and kmem_malloc(). Specifically, defer the setting of the page's valid bits until contigmapping() when the mapping is known to be successful.	2009-06-17 04:57:32 +00:00
Alan Cox	387aabc513	Long, long ago in r27464 special case code for mapping device-backed memory with 4MB pages was added to pmap_object_init_pt(). This code assumes that the pages of a OBJT_DEVICE object are always physically contiguous. Unfortunately, this is not always the case. For example, jhb@ informs me that the recently introduced /dev/ksyms driver creates a OBJT_DEVICE object that violates this assumption. Thus, this revision modifies pmap_object_init_pt() to abort the mapping if the OBJT_DEVICE object's pages are not physically contiguous. This revision also changes some inconsistent if not buggy behavior. For example, the i386 version aborts if the first 4MB virtual page that would be mapped is already valid. However, it incorrectly replaces any subsequent 4MB virtual page mappings that it encounters, potentially leaking a page table page. The amd64 version has a bug of my own creation. It potentially busies the wrong page and always an insufficent number of pages if it blocks allocating a page table page. To my knowledge, there have been no reports of these bugs, hence, their persistance. I suspect that the existing restrictions that pmap_object_init_pt() placed on the OBJT_DEVICE objects that it would choose to map, for example, that the first page must be aligned on a 2 or 4MB physical boundary and that the size of the mapping must be a multiple of the large page size, were enough to avoid triggering the bug for drivers like ksyms. However, one side effect of testing the OBJT_DEVICE object's pages for physical contiguity is that a dubious difference between pmap_object_init_pt() and the standard path for mapping devices pages, i.e., vm_fault(), has been eliminated. Previously, pmap_object_init_pt() would only instantiate the first PG_FICTITOUS page being mapped because it never examined the rest. Now, however, pmap_object_init_pt() uses the new function vm_object_populate() to instantiate them all (in order to support testing their physical contiguity). These pages need to be instantiated for the mechanism that I have prototyped for automatically maintaining the consistency of the PAT settings across multiple mappings, particularly, amd64's direct mapping, to work. (Translation: This change is also being made to support jhb@'s work on the Nvidia feature requests.) Discussed with: jhb@	2009-06-14 19:51:43 +00:00
Alan Cox	53f55a430f	Eliminate an unnecessary clearing of a page's dirty bits in phys_pager_getpages().	2009-06-13 20:58:12 +00:00
Alan Cox	1136ed06b9	Eliminate an unnecessary restriction on the vm object type from vm_map_pmap_enter(). The immediate effect of this change is that automatic prefaulting by mmap() for small mappings is performed on POSIX shared memory objects just the same as it is on ordinary files.	2009-06-09 17:04:39 +00:00
Alan Cox	0a2e596a93	Eliminate unnecessary obfuscation when testing a page's valid bits.	2009-06-07 19:38:26 +00:00
Alan Cox	fe6ad778fe	Eliminate an unneeded forward declaration. (This should have been removed in revision 1.42.)	2009-06-06 21:23:29 +00:00
Alan Cox	d1a6e42ddd	If vm_pager_get_pages() returns VM_PAGER_OK, then there is no need to check the page's valid bits. The page is guaranteed to be fully valid. (For the record, this is documented in vm/vm_pager.h's comments.)	2009-06-06 20:13:14 +00:00
Alan Cox	7a122777c9	vm_thread_swapin() needn't validate any pages. The pages are already validated by vm_pager_get_pages().	2009-06-05 17:06:20 +00:00
Alan Cox	42d9e2c4a6	Simplify contigfree().	2009-06-05 16:55:10 +00:00
Robert Watson	bcf11e8d00	Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd	2009-06-05 14:55:22 +00:00
Alan Cox	3c33df624c	Correct a boundary case error in the management of a page's dirty bits by shm_dotruncate() and vnode_pager_setsize(). Specifically, if the length of a shared memory object or a file is truncated such that the length modulo the page size is between 1 and 511, then all of the page's dirty bits were cleared. Now, a dirty bit is cleared only if the corresponding block is truncated in its entirety.	2009-06-02 08:02:27 +00:00
John Baldwin	64345f0b57	Add an extension to the character device interface that allows character device drivers to use arbitrary VM objects to satisfy individual mmap() requests. - A new d_mmap_single(cdev, &foff, objsize, &object, prot) callback is added to cdevsw. This function is called for each mmap() request. If it returns ENODEV, then the mmap() request will fall back to using the device's device pager object and d_mmap(). Otherwise, the method can return a VM object to satisfy this entire mmap() request via object. It can also modify the starting offset into this object via foff. This allows device drivers to use the file offset as a cookie to identify specific VM objects. - vm_mmap_vnode() has been changed to call vm_mmap_cdev() directly when mapping V_CHR vnodes. This avoids duplicating all the cdev mmap handling code and simplifies some of vm_mmap_vnode(). - D_VERSION has been bumped to D_VERSION_02. Older device drivers using D_VERSION_01 are still supported. MFC after: 1 month	2009-06-01 21:32:52 +00:00
Alan Cox	461c78604e	Eliminate a stale comment and the two remaining uses of the "register" keyword in this file.	2009-05-30 22:15:55 +00:00
Alan Cox	edd16ab140	Add assertions in two places where a page's valid or dirty bits are changed.	2009-05-30 22:06:58 +00:00
Alan Cox	a28042d1e3	Change vm_object_page_remove() such that it clears the page's dirty bits when it invalidates the page. Suggested by: tegge	2009-05-28 07:26:36 +00:00
Alan Cox	b78ddb0b8a	Revise vm_pageout_scan()'s handling of partially dirty pages. Specifically, rather than unconditionally making partially dirty pages fully dirty, only make partially dirty pages fully dirty if the pmap says that the page has been modified. (This change is also a small optimization. It eliminate an unnecessary call to pmap_is_modified() on pages that are mapped read only.) Suggested by: tegge	2009-05-28 06:52:14 +00:00
Kip Macy	e95d34711b	- back out direct map hack - it is no longer needed	2009-05-19 01:14:37 +00:00
Alan Cox	47916d0c37	Eliminate a pointless call to pmap_clear_reference() from vm_pageout_scan(). If the page belongs to an object with a reference count of zero, then it can't have any managed mappings on which to clear a reference bit.	2009-05-17 20:40:41 +00:00
Kip Macy	32237d8492	apply band-aid to x86_64 systems with more physical memory than kmem by allocating from the direct map	2009-05-16 19:17:15 +00:00
Alan Cox	42eb41087c	Eliminate unnecessary clearing of the page's dirty mask from various getpages functions. Eliminate a stale comment.	2009-05-15 04:33:35 +00:00
Alan Cox	1c1b26f276	Eliminate page queues locking from bufdone_finish() through the following changes: Rename vfs_page_set_valid() to vfs_page_set_validclean() to reflect what this function actually does. Suggested by: tegge Introduce a new version of vfs_page_set_valid() that does no more than what the function's name implies. Specifically, it does not update the page's dirty mask, and thus it does not require the page queues lock to be held. Update two of the three callers to the old vfs_page_set_valid() to call vfs_page_set_validclean() instead because they actually require the page's dirty mask to be cleared. Introduce vm_page_set_valid(). Reviewed by: tegge	2009-05-13 05:39:39 +00:00
Alan Cox	12aa4fdca9	Eliminate gratuitous clearing of the page's dirty mask.	2009-05-12 05:49:02 +00:00
Alan Cox	0d53a17bde	Fix a race involving vnode_pager_input_smlfs(). Specifically, in the case that vnode_pager_input_smlfs() zeroes the page, it should not mark the page as valid until after the page is zeroed. Otherwise, the page could be mapped for read access (e.g., by vm_map_pmap_enter()) before the page is zeroed. Reviewed by: tegge Eliminate gratuitous clearing of the page's dirty mask by vnode_pager_input_smlfs(). Instead, assert that the page is clean. Reviewed by: tegge Eliminate some blank lines. Eliminate pointless calls to pmap_clear_modify() and vm_page_undirty() from vnode_pager_input_old(). The page is not mapped. Therefore, it cannot have any page table entries that are modified. Eliminate an incorrect comment from vnode_pager_generic_getpages().	2009-05-09 08:30:44 +00:00
Alan Cox	d7d9cfed36	Eliminate an incorrect comment.	2009-05-07 05:44:13 +00:00
Alan Cox	3a2cdcb0e3	Eliminate vnode_pager_input_smlfs()'s pointless call to pmap_clear_modify(). The page can't possibly have any modified page table entries because it isn't even mapped.	2009-05-04 06:30:00 +00:00
Konstantin Belousov	7981aa2431	Use the acquired reference to the vmspace instead of direct dereferencing of p->p_vmspace in a place where it was missed in r191277. Noted by: pluknet gmail com	2009-04-28 11:45:36 +00:00
Konstantin Belousov	8eb5a1cdee	Fix typo.	2009-04-28 11:43:35 +00:00
Alan Cox	a80982c113	Eliminate an errant comment. Discussed with: tegge	2009-04-26 21:24:50 +00:00
Alan Cox	78cfe1f7bd	Eliminate an archaic band-aid. The immediately preceding comment already explains why the band-aid is unnecessary. Suggested by: tegge	2009-04-26 20:54:57 +00:00
Alan Cox	016a3c93b2	Eliminate unnecessary calls to pmap_clear_modify(). Specifically, calling pmap_clear_modify() on a page is pointless if that page is not mapped or it is only mapped for read access. Instead, assert that the page is not mapped or not mapped for write access as appropriate. Eliminate unnecessary clearing of a page's dirty mask. Instead, assert that the page's dirty mask is clear.	2009-04-25 02:59:06 +00:00
Konstantin Belousov	bb2ac86f7d	Do not call vm_page_lookup() from the ddb routine, namely from "show vmopag" implementation. The vm_page_lookup() code modifies splay tree of the object pages, and asserts that object lock is taken. First issue could cause kernel data corruption, and second one instantly panics the INVARIANTS-enabled kernel. Take the advantage of the fact that object->memq is ordered by page index, and iterate over memq to calculate the runs. While there, make the code slightly more style-compliant by moving variables declarations to the right place. Discussed with: jhb, alc Reviewed by: alc MFC after: 2 weeks	2009-04-23 21:09:47 +00:00
Konstantin Belousov	6bed074cd2	In both pageout oom handler and vm_daemon, acquire the reference to the vmspace of the examined process instead of directly accessing its vmspace, that may change. Also, as an optimization, check for P_INEXEC flag before examining the process. Reported and tested by: pho (previous version) Reviewed by: alc MFC after: 3 week	2009-04-19 20:53:47 +00:00
Alan Cox	f4b0c119c0	Calling pmap_clear_modify() after calling pmap_remove_write() is pointless. The latter function already clears the modified status from each of the page's mappings.	2009-04-19 07:18:08 +00:00
Alan Cox	f9855e177d	Allow valid pages to be mapped for read access when they have a non-zero busy count. Only mappings that allow write access should be prevented by a non-zero busy count. (The prohibition on mapping pages for read access when they have a non- zero busy count originated in revision 1.202 of i386/i386/pmap.c when this code was a part of the pmap.) Reviewed by: tegge	2009-04-19 00:34:34 +00:00
Alan Cox	b9519926e6	Remove execute permission from the memory allocated by sbrk(). Pre-announced on: -arch (3/31/09) Discussed with: rwatson Tested by: marius (sparc64)	2009-04-11 22:34:08 +00:00
Alan Cox	ab5378cf11	Previously, when vm_page_free_toq() was performed on a page belonging to a reservation, unless all of the reservation's pages were free, the reservation was moved to the head of the partially-populated reservations queue, where it would be the next reservation to be broken in case the free page queues were emptied. Now, instead, I am moving it to the tail. Very likely this reservation is in the process of being freed in its entirety, so placing it at the tail of the queue makes it more likely that the underlying physical memory will be returned to the free page queues as one contiguous chunk. If a reservation must be broken, it will, instead, be the longest unchanged reservation, which is arguably the reservation that is least likely to ever achieve promotion or be freed in its entirety. MFC after: 6 weeks	2009-04-11 09:09:00 +00:00
Konstantin Belousov	6d7e809123	When vm_map_wire(9) is allowed to skip holes in the wired region, skip the mappings without any of read and execution rights, in particular, the PROT_NONE entries. This makes mlockall(2) work for the process address space that has such mappings. Since protection mode of the entry may change between setting MAP_ENTRY_IN_TRANSITION and final pass over the region that records the wire status of the entries, allocate new map entry flag MAP_ENTRY_WIRE_SKIPPED to mark the skipped PROT_NONE entries. Reported and tested by: Hans Ottevanger <fbsdhackers beasties demon nl> Reviewed by: alc MFC after: 3 weeks	2009-04-10 10:16:03 +00:00
Alan Cox	beb3c3a9c5	Retire VM_PROT_READ_IS_EXEC. It was intended to be a micro-optimization, but I see no benefit from it today. VM_PROT_READ_IS_EXEC was only intended for use on processors that do not distinguish between read and execute permission. On an mmap(2) or mprotect(2), it automatically added execute permission if the caller specified permissions included read permission. The hope was that this would reduce the number of vm map entries needed to implement an address space because there would be fewer neighboring vm map entries that differed only in the presence or absence of VM_PROT_EXECUTE. (See vm/vm_mmap.c revision 1.56.) Today, I don't see any real applications that benefit from VM_PROT_READ_IS_EXEC. In any case, vm map entries are now organized as a self-adjusting binary search tree instead of an ordered list. So, the need for coalescing vm map entries is not as great as it once was.	2009-04-04 23:12:14 +00:00
Alan Cox	a7f9bae19e	Eliminate dead code. Reviewed by: jhb	2009-04-01 04:36:37 +00:00
John Baldwin	5bd65606f4	Adjust some variables (mostly related to the buffer cache) that hold address space sizes to be longs instead of ints. Specifically, the follow values are now longs: runningbufspace, bufspace, maxbufspace, bufmallocspace, maxbufmallocspace, lobufspace, hibufspace, lorunningspace, hirunningspace, maxswzone, maxbcache, and maxpipekva. Previously, a relatively small number (~ 44000) of buffers set in kern.nbuf would result in integer overflows resulting either in hangs or bogus values of hidirtybuffers and lodirtybuffers. Now one has to overflow a long to see such problems. There was a check for a nbuf setting that would cause overflows in the auto-tuning of nbuf. I've changed it to always check and cap nbuf but warn if a user-supplied tunable would cause overflow. Note that this changes the ABI of several sysctls that are used by things like top(1), etc., so any MFC would probably require a some gross shims to allow for that. MFC after: 1 month	2009-03-09 19:35:20 +00:00
Alan Cox	5758fe7185	Prior to r188331 a map entry's last read offset was only updated by a hard fault. In r188331 this update was relocated because of synchronization changes to a place where it would occur on both hard and soft faults. This change again restricts the update to hard faults.	2009-02-25 07:52:53 +00:00
Konstantin Belousov	655c349022	Revert the addition of the freelist argument for the vm_map_delete() function, done in r188334. Instead, collect the entries that shall be freed, in the deferred_freelist member of the map. Automatically purge the deferred freelist when map is unlocked. Tested by: pho Reviewed by: alc	2009-02-24 20:57:43 +00:00
Konstantin Belousov	3a0916b8ea	Add the assertion macros for the map locks. Use them in several map manipulation functions. Tested by: pho Reviewed by: alc	2009-02-24 20:43:29 +00:00
Konstantin Belousov	e608cc3c8d	Update the comment after the r188334. Reviewed by: alc	2009-02-24 20:23:16 +00:00
Roman Divacky	af83f5d77c	Change the functions to ANSI in those cases where it breaks promotion to int rule. See ISO C Standard: SS6.7.5.3:15. Approved by: kib (mentor) Reviewed by: warner Tested by: silence on -current	2009-02-24 18:09:31 +00:00
Robert Watson	9309e63c1f	Put debug.vm_lowmem sysctl under DIAGNOSTIC. Submitted by: sam MFC after: 3 days	2009-02-23 23:30:17 +00:00
Robert Watson	86f087370b	Add a debugging sysctl, debug.vm_lowmem, that when assigned a value of 1 will trigger a pass through the VM's low-memory handlers, such as protocol and UMA drain routines. This makes it easier to exercise these otherwise rarely-invoked code paths. MFC after: 3 days	2009-02-23 23:00:12 +00:00
Alan Cox	bfd9b137a0	Reduce the scope of the page queues lock in vm_object_page_remove(). MFC after: 1 week	2009-02-21 20:57:25 +00:00
Alan Cox	9d13a605d4	Eliminate stale comments.	2009-02-20 16:19:34 +00:00
Konstantin Belousov	cb61d6987e	Comment out the assertion from r188321. It is not valid for nfs. Reported by: alc	2009-02-09 11:32:23 +00:00
Alan Cox	c722e407dc	Avoid some cases of unnecessary page queues locking by vm_fault's delete- behind heuristic.	2009-02-09 06:23:21 +00:00
Alan Cox	7b54b1a9f5	Eliminate OBJ_NEEDGIANT. After r188331, OBJ_NEEDGIANT's only use is by a redundant assertion in vm_fault(). Reviewed by: kib	2009-02-08 22:17:24 +00:00
Konstantin Belousov	2fada4c2b3	Remove no longer valid comment. Submitted by: alc	2009-02-08 21:20:13 +00:00
Konstantin Belousov	b0994946c7	Improve comments, correct English. Submitted by: alc	2009-02-08 20:52:09 +00:00
Konstantin Belousov	897d81a020	Do not call vm_object_deallocate() from vm_map_delete(), because we hold the map lock there, and might need the vnode lock for OBJT_VNODE objects. Postpone object deallocation until caller of vm_map_delete() drops the map lock. Link the map entries to be freed into the freelist, that is released by the new helper function vm_map_entry_free_freelist(). Reviewed by: tegge, alc Tested by: pho	2009-02-08 20:39:17 +00:00
Konstantin Belousov	e53fa61bf2	In vm_map_sync(), do not call vm_object_sync() while holding map lock. Reference object, drop the map lock, and then call vm_object_sync(). The object sync might require vnode lock for OBJT_VNODE type objects. Reviewed by: tegge Tested by: pho	2009-02-08 20:30:51 +00:00
Konstantin Belousov	d2bf64c309	Do not sleep for vnode lock while holding map lock in vm_fault. Try to acquire vnode lock for OBJT_VNODE object after map lock is dropped. Because we have the busy page(s) in the object, sleeping there would result in deadlock with vnode resize. Try to get lock without sleeping, and, if the attempt failed, drop the state, lock the vnode, and restart the fault handler from the start with already locked vnode. Because the vnode_pager_lock() function is inlined in vm_fault(), axe it. Based on suggestion by: alc Reviewed by: tegge, alc Tested by: pho	2009-02-08 20:23:46 +00:00
Konstantin Belousov	7fd10fb3c7	Add the comments to vm_map_simplify_entry() and vmspace_fork(), describing why several calls to vm_deallocate_object() with locked map do not result in the acquisition of the vnode lock after map lock. Suggested and reviewed by: tegge	2009-02-08 20:00:33 +00:00
Konstantin Belousov	1fac7d7f35	Lock the new map in vmspace_fork(). The newly allocated map should not be accessible outside vmspace_fork() yet, but locking it would satisfy the protocol of the vm_map_entry_link() and other functions called from vmspace_fork(). Use trylock that is supposedly cannot fail, to silence WITNESS warning of the nested acquisition of the sx lock with the same name. Suggested and reviewed by: tegge	2009-02-08 19:55:03 +00:00
Konstantin Belousov	705f0a82c2	Assert that vnode is exclusively locked when its vm object is resized. Reviewed by: tegge	2009-02-08 19:44:50 +00:00
Konstantin Belousov	9f6acfd1a8	Do not leak the MAP_ENTRY_IN_TRANSITION flag when copying map entry on fork. Otherwise, copied entry cannot be removed in the child map. Reviewed by: tegge MFC after: 2 weeks	2009-02-08 19:41:08 +00:00
Konstantin Belousov	0d0be82a5d	Style.	2009-02-08 19:37:01 +00:00
Jeff Roberson	e20a199fd5	- Make the keg abstraction more complete. Permit a zone to have multiple backend kegs so it may source compatible memory from multiple backends. This is useful for cases such as NUMA or different layouts for the same memory type. - Provide a new api for adding new backend kegs to secondary zones. - Provide a new flag for adjusting the layout of zones to stagger allocations better across cache lines. Sponsored by: Nokia	2009-01-25 09:11:24 +00:00
John Baldwin	8a7ef10b71	- Mark all standalone INT/LONG/QUAD sysctl's MPSAFE. This is done inside the SYSCTL() macros and thus does not need to be done for all of the nodes scattered across the source tree. - Mark the name-cache related sysctl's (including debug.hashstat.) MPSAFE. - Mark vm.loadavg MPSAFE. - Remove GIANT_REQUIRED from vmtotal() (everything in this routine already has sufficient locking) and mark vm.vmtotal MPSAFE. - Mark the vm.stats.(sys\|vm). sysctls MPSAFE.	2009-01-23 22:49:23 +00:00
John Baldwin	fa3de7700c	Now that vfs_markatime() no longer requires an exclusive lock due to the VOP_MARKATIME() changes, use a shared vnode lock for mmap(). Submitted by: ups	2009-01-21 14:43:35 +00:00
Konstantin Belousov	641e2829b6	Extend the struct vm_page wire_count to u_int to avoid the overflow of the counter, that may happen when too many sendfile(2) calls are being executed with this vnode [1]. To keep the size of the struct vm_page and offsets of the fields accessed by out-of-tree modules, swap the types and locations of the wire_count and cow fields. Add safety checks to detect cow overflow and force fallback to the normal copy code for zero-copy sockets. [2] Reported by: Anton Yuzhaninov <citrin citrin ru> [1] Suggested by: alc [2] Reviewed by: alc MFC after: 2 weeks	2009-01-03 13:24:08 +00:00
Alan Cox	05a8c41419	Resurrect shared map locks allowing greater concurrency during some map operations, such as page faults. An earlier version of this change was ... Reviewed by: kib Tested by: pho MFC after: 6 weeks	2009-01-01 00:31:46 +00:00
Alan Cox	e2abaaaa2b	Update or eliminate some stale comments.	2008-12-31 05:44:05 +00:00
Alan Cox	7438d60b4b	Avoid an unnecessary memory dereference in vm_map_entry_splay().	2008-12-30 21:52:18 +00:00
Alan Cox	095104ac36	Style change to vm_map_lookup(): Eliminate a macro of dubious value.	2008-12-30 20:51:07 +00:00
Alan Cox	4c3ef59e3d	Move the implementation of the vm map's fast path on address lookup from vm_map_lookup{,_locked}() to vm_map_lookup_entry(). Having the fast path in vm_map_lookup{,_locked}() limits its benefits to page faults. Moving it to vm_map_lookup_entry() extends its benefits to other operations on the vm map.	2008-12-30 19:48:03 +00:00
Robert Noland	e9f541267d	Fix printing of KASSERT message missed in r163604. Approved by: kib	2008-12-21 16:56:13 +00:00
Konstantin Belousov	6129343d5d	Instead of forcing vn_start_write() to reset mp back to NULL for the failed calls with non-NULL vp, explicitely clear mp after failure. Tested by: stass Reviewed by: tegge PR: 123768 MFC after: 1 week	2008-11-16 21:57:54 +00:00
Rafal Jaworowski	8e321b7943	Support kernel crash mini dumps on ARM architecture. Obtained from: Juniper Networks, Semihalf	2008-11-06 16:20:27 +00:00
Giorgos Keramidas	2db63c5e38	Various comment nits, and typos.	2008-11-02 00:41:26 +00:00
Robert Watson	556c3162b9	Update mmap() comment: no more block devices, so no more block device cache coherency questions. MFC after: 3 days	2008-10-22 16:50:12 +00:00
Attilio Rao	0d7935fd01	Remove the struct thread unuseful argument from bufobj interface. In particular following functions KPI results modified: - bufobj_invalbuf() - bufsync() and BO_SYNC() "virtual method" of the buffer objects set. Main consumers of bufobj functions are affected by this change too and, in particular, functions which changed their KPI are: - vinvalbuf() - g_vfs_close() Due to the KPI breakage, __FreeBSD_version will be bumped in a later commit. As a side note, please consider just temporary the 'curthread' argument passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP Reviewed by: kib Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2008-10-10 21:23:50 +00:00
Konstantin Belousov	2025d69ba7	Move the code for doing out-of-memory grass from vm_pageout_scan() into the separate function vm_pageout_oom(). Supply a parameter for vm_pageout_oom() describing a reason for the call. Call vm_pageout_oom() from the swp_pager_meta_build() when swap zone is exhausted. Reviewed by: alc Tested by: pho, jhb MFC after: 2 weeks	2008-09-29 19:45:12 +00:00
Ed Maste	a8a478fce6	Move CTASSERT from header file to source file, per implementation note now in the CTASSERT man page.	2008-09-26 18:44:40 +00:00
Konstantin Belousov	7818e0a545	Save previous content of the td_fpop before storing the current filedescriptor into it. Make sure that td_fpop is NULL when calling d_mmap from dev_pager_getpages(). Change guards against td_fpop field being non-NULL with private state for another device, and against sudden clearing the td_fpop. This could occur when either a driver method calls another driver through the filedescriptor operation, or a page fault happen while driver is writing to a memory backed by another driver. Noted by: rwatson Tested by: rnoland MFC after: 3 days	2008-09-26 14:50:49 +00:00
Alan Cox	8d28bf04e2	Prevent an integer overflow in vm_pageout_page_stats() on machines with a large number of physical pages. PR: 126158 Submitted by: Dmitry Tejblum MFC after: 3 days	2008-09-21 18:01:34 +00:00
Konstantin Belousov	36b907893d	Allow the d_mmap driver methods to use cdevpriv KPI during verification phase of establishing mapping. Discussed with: rwatson, jhb, rnoland Tested by: rnoland MFC after: 3 days	2008-09-20 19:56:02 +00:00
Attilio Rao	0359a12ead	Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread was always curthread and totally unuseful. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2008-08-28 15:23:18 +00:00
Antoine Brodin	2f2ea10a07	Remove unused variable nosleepwithlocks. PR: 126609 Submitted by: Mateusz Guzik MFC after: 1 month X-MFC: to stable/7 only, this variable is still used in stable/6	2008-08-23 12:40:07 +00:00
Nathan Whitehorn	f620b5bf45	Allow the MD UMA allocator to use VM routines like kmem_*(). Existing code requires MD allocator to be available early in the boot process, before the VM is fully available. This defines a new VM define (UMA_MD_SMALL_ALLOC_NEEDS_VM) that allows an MD UMA small allocator to become available at the same time as the default UMA allocator. Approved by: marcel (mentor)	2008-08-23 01:35:36 +00:00
Julian Elischer	ac957cd271	A bunch of formatting fixes brough to light by, or created by the Vimage commit a few days ago.	2008-08-20 01:05:56 +00:00
Kip Macy	4b34502e99	Work around differences in page allocation for initial page tables on xen MFC after: 1 month	2008-08-17 23:40:29 +00:00
Ed Maste	4222358722	Fix REDZONE(9) on amd64 and perhaps other 64 bit targets -- ensure the space that redzone adds to the allocation for storing its metadata is at least as large as the metadata that it will store there. Submitted by: Nima Misaghian	2008-08-13 17:32:48 +00:00
John Baldwin	da7bbd2c08	If a thread that is swapped out is made runnable, then the setrunnable() routine wakes up proc0 so that proc0 can swap the thread back in. Historically, this has been done by waking up proc0 directly from setrunnable() itself via a wakeup(). When waking up a sleeping thread that was swapped out (the usual case when waking proc0 since only sleeping threads are eligible to be swapped out), this resulted in a bit of recursion (e.g. wakeup() -> setrunnable() -> wakeup()). With sleep queues having separate locks in 6.x and later, this caused a spin lock LOR (sleepq lock -> sched_lock/thread lock -> sleepq lock). An attempt was made to fix this in 7.0 by making the proc0 wakeup use the ithread mechanism for doing the wakeup. However, this required grabbing proc0's thread lock to perform the wakeup. If proc0 was asleep elsewhere in the kernel (e.g. waiting for disk I/O), then this degenerated into the same LOR since the thread lock would be some other sleepq lock. Fix this by deferring the wakeup of the swapper until after the sleepq lock held by the upper layer has been locked. The setrunnable() routine now returns a boolean value to indicate whether or not proc0 needs to be woken up. The end result is that consumers of the sleepq API such as *sleep/wakeup, condition variables, sx locks, and lockmgr, have to wakeup proc0 if they get a non-zero return value from sleepq_abort(), sleepq_broadcast(), or sleepq_signal(). Discussed with: jeff Glanced at by: sam Tested by: Jurgen Weber jurgen - ish com au MFC after: 2 weeks	2008-08-05 20:02:31 +00:00
Tom Rhodes	6bd9cb1c81	Fill in a few sysctl descriptions. Reviewed by: alc, Matt Dillon <dillon@apollo.backplane.com> Approved by: alc	2008-08-03 14:26:15 +00:00
John Baldwin	2c3b410b3a	One more whitespace nit.	2008-07-30 21:23:32 +00:00
John Baldwin	3cca4b6fe8	A few more whitespace fixes.	2008-07-30 21:18:08 +00:00
John Baldwin	3677ad363b	If the kernel has run out of metadata for swap, then explicitly panic() instead of emitting a warning before deadlocking. MFC after: 1 month	2008-07-30 21:12:15 +00:00
Konstantin Belousov	24bbc85bf6	The behaviour of the lockmgr going back at least to the 4.4BSD-Lite2 was to downgrade the exclusive lock to shared one when exclusive lock owner requested shared lock. New lockmgr panics instead. The vnode_pager_lock function requests shared lock on the vnode backing the OBJT_VNODE, and can be called when the current thread already holds an exlcusive lock on the vnode. For instance, it happens when handling page fault from the VOP_WRITE() uiomove that writes to the file, with the faulted in page fetched from the vm object backed by the same file. We then get the situation described above. Verify whether the vnode is already exclusively locked by the curthread and request recursed exclusive vnode lock instead of shared, if true. Reported by: gallatin Discussed with: attilio	2008-07-30 18:16:06 +00:00
Alan Cox	fb272dc841	Eliminate stale comments from kmem_malloc().	2008-07-18 17:41:31 +00:00
Konstantin Belousov	11041003c6	Use the VM_ALLOC_INTERRUPT for the page requests when allocating memory for the bio for swapout write. It allows the page allocator to drain free page list deeper. As result, a deadlock where pageout deamon sleeps waiting for bio to be allocated for swapout is no more reproducable in practice. Alan said that M_USE_RESERVE shall be ressurrected and used there, but until this is implemented, M_NOWAIT does exactly what is needed. Tested by: pho, kris Reviewed by: alc No objections from: phk MFC after: 2 weeks (RELENG_7 only)	2008-07-11 11:27:42 +00:00
Alan Cox	b89eaf4e9f	Enable the creation of a kmem map larger than 4GB. Submitted by: Tz-Huan Huang Make several variables related to kmem map auto-sizing static. Found by: CScout	2008-07-05 19:34:33 +00:00
Alan Cox	5cfa90e902	Make preparations for increasing the size of the kernel virtual address space on the amd64 architecture. The amd64 architecture requires kernel code and global variables to reside in the highest 2GB of the 64-bit virtual address space. Thus, the memory allocated during bootstrap, before the call to kmem_init(), starts at KERNBASE, which is not necessarily the same as VM_MIN_KERNEL_ADDRESS on amd64.	2008-06-22 04:54:27 +00:00
Alan Cox	c1f02198d1	KERNBASE is not necessarily an address within the kernel map, e.g., PowerPC/AIM. Consequently, it should not be used to determine the maximum number of kernel map entries. Intead, use VM_MIN_KERNEL_ADDRESS, which marks the start of the kernel map on all architectures. Tested by: marcel@ (PowerPC/AIM)	2008-06-21 21:02:13 +00:00
Stephan Uphoff	11be8415c9	Fix vm object creation locking to allow SHARED vnode locking for vnode_create_vobject. (Not currently used) Noticed by: kib@	2008-06-12 20:46:47 +00:00
Alan Cox	8bcd3b1998	Essentially, neither madvise(..., MADV_DONTNEED) nor madvise(..., MADV_FREE) work. (Moreover, I don't believe that they have ever worked as intended.) The explanation is fairly simple. Both MADV_DONTNEED and MADV_FREE perform vm_page_dontneed() on each page within the range given to madvise(). This function moves the page to the inactive queue. Specifically, if the page is clean, it is moved to the head of the inactive queue where it is first in line for processing by the page daemon. On the other hand, if it is dirty, it is placed at the tail. Let's further examine the case in which the page is clean. Recall that the page is at the head of the line for processing by the page daemon. The expectation of vm_page_dontneed()'s author was that the page would be transferred from the inactive queue to the cache queue by the page daemon. (Once the page is in the cache queue, it is, in effect, free, that is, it can be reallocated to a new vm object by vm_page_alloc() if it isn't reactivated quickly enough by a user of the old vm object.) The trouble is that nowhere in the execution of either MADV_DONTNEED or MADV_FREE is either the machine-independent reference flag (PG_REFERENCED) or the reference bit in any page table entry (PTE) mapping the page cleared. Consequently, the immediate reaction of the page daemon is to reactivate the page because it is referenced. In effect, the madvise() was for naught. The case in which the page was dirty is not too different. Instead of being laundered, the page is reactivated. Note: The essential difference between MADV_DONTNEED and MADV_FREE is that MADV_FREE clears a page's dirty field. So, MADV_FREE is always executing the clean case above. This revision changes vm_page_dontneed() to clear both the machine- independent reference flag (PG_REFERENCED) and the reference bit in all PTEs mapping the page. MFC after: 6 weeks	2008-06-06 18:38:43 +00:00
Alan Cox	ba3042115f	To date, our implementation of munmap(2) has required that the entirety of the specified range be mapped. Specifically, it has returned EINVAL if the entire range is not mapped. There is not, however, any basis for this in either SuSv2 or our own man page. Moreover, neither Linux nor Solaris impose this requirement. This revision removes this requirement. Submitted by: Tijl Coosemans PR: 118510 MFC after: 6 weeks	2008-05-24 21:57:16 +00:00
Stephan Uphoff	2ac78f0e1a	Allow VM object creation in ufs_lookup. (If vfs.vmiodirenable is set) Directory IO without a VM object will store data in 'malloced' buffers severely limiting caching of the data. Without this change VM objects for directories are only created on an open() of the directory. TODO: Inline test if VM object already exists to avoid locking/function call overhead. Tested by: kris@ Reviewed by: jeff@ Reported by: David Filo	2008-05-20 19:05:43 +00:00
Alan Cox	1ec1304bdb	Retire pmap_addr_hint(). It is no longer used.	2008-05-18 04:16:57 +00:00
Alan Cox	d0a83a83bf	In order to map device memory using superpages, mmap(2) must find a superpage-aligned virtual address for the mapping. Revision 1.65 implemented an overly simplistic and generally ineffectual method for finding a superpage-aligned virtual address. Specifically, it rounds the virtual address corresponding to the end of the data segment up to the next superpage-aligned virtual address. If this virtual address is unallocated, then the device will be mapped using superpages. Unfortunately, in modern times, where applications like the X server dynamically load much of their code, this virtual address is already allocated. In such cases, mmap(2) simply uses the first available virtual address, which is not necessarily superpage aligned. This revision changes mmap(2) to use a more robust method, specifically, the VMFS_ALIGNED_SPACE option that is now implemented by vm_map_find().	2008-05-17 19:32:48 +00:00
Alan Cox	e46cd4132c	Preset a device object's alignment ("pg_color") based upon the physical address of the device's memory. This enables pmap_align_superpage() to propose a virtual address for mapping the device memory that permits the use of superpage mappings.	2008-05-17 16:26:34 +00:00
Alan Cox	f578838754	Don't call vm_reserv_alloc_page() on device-backed objects. Otherwise, the system may panic because there is no reservation structure corresponding to the physical address of the device memory. Reported by: Giorgos Keramidas	2008-05-15 18:52:31 +00:00
Alan Cox	6ac3ab7f98	Provide the new argument to kmem_suballoc().	2008-05-10 23:39:27 +00:00
Alan Cox	3202ed7523	Introduce a new parameter "superpage_align" to kmem_suballoc() that is used to request superpage alignment for the submap. Request superpage alignment for the kmem_map. Pass VMFS_ANY_SPACE instead of TRUE to vm_map_find(). (They are currently equivalent but VMFS_ANY_SPACE is the new preferred spelling.) Remove a stale comment from kmem_malloc().	2008-05-10 21:46:20 +00:00
Alan Cox	26c538ffcd	Generalize vm_map_find(9)'s parameter "find_space". Specifically, add support for VMFS_ALIGNED_SPACE, which requests the allocation of an address range best suited to superpages. The old options TRUE and FALSE are mapped to VMFS_ANY_SPACE and VMFS_NO_SPACE, so that there is no immediate need to update all of vm_map_find(9)'s callers. While I'm here, correct a misstatement about vm_map_find(9)'s return values in the man page.	2008-05-10 18:55:35 +00:00
Alan Cox	d3249b142b	Introduce pmap_align_superpage(). It increases the starting virtual address of the given mapping if a different alignment might result in more superpage mappings.	2008-05-09 16:48:07 +00:00
Kip Macy	c8c7ad9260	add malloc flag to blist so that it can be used in ithread context Reviewed by: alc, bsdimp	2008-05-05 19:48:54 +00:00
Alan Cox	2bc24aa956	Eliminate pointless casts from kmem_suballoc().	2008-04-28 17:25:27 +00:00
Alan Cox	b8ca4ef2e3	vm_map_fixed(), unlike vm_map_find(), does not update "addr", so it can be passed by value.	2008-04-28 05:30:23 +00:00
Jeff Roberson	8df78c41d6	- Make SCHED_STATS more generic by adding a wrapper to create the variables and sysctl nodes. - In reset walk the children of kern_sched_stats and reset the counters via the oid_arg1 pointer. This allows us to add arbitrary counters to the tree and still reset them properly. - Define a set of switch types to be passed with flags to mi_switch(). These types are named SWT_*. These types correspond to SCHED_STATS counters and are automatically handled in this way. - Make the new SWT_ types more specific than the older switch stats. There are now stats for idle switches, remote idle wakeups, remote preemption ithreads idling, etc. - Add switch statistics for ULE's pickcpu algorithm. These stats include how much migration there is, how often affinity was successful, how often threads were migrated to the local cpu on wakeup, etc. Sponsored by: Nokia	2008-04-17 04:20:10 +00:00
Alan Cox	44aab2c3de	Introduce vm_reserv_reclaim_contig(). This function is used by contigmalloc(9) as a last resort to steal pages from an inactive, partially-used superpage reservation. Rename vm_reserv_reclaim() to vm_reserv_reclaim_inactive() and refactor it so that a separate subroutine is responsible for breaking the selected reservation. This subroutine is also used by vm_reserv_reclaim_contig().	2008-04-06 18:09:28 +00:00
Alan Cox	2fbced6574	Eliminate an unnecessary test from vm_phys_unfree_page().	2008-04-05 05:02:53 +00:00
Alan Cox	c416972587	Update a comment to vm_map_pmap_enter().	2008-04-04 19:14:58 +00:00
Alan Cox	7630c26507	Reintroduce UMA_SLAB_KMAP; however, change its spelling to UMA_SLAB_KERNEL for consistency with its sibling UMA_SLAB_KMEM. (UMA_SLAB_KMAP met its original demise in revision 1.30 of vm/uma_core.c.) UMA_SLAB_KERNEL is now required by the jumbo frame allocators. Without it, UMA cannot correctly return pages from the jumbo frame zones to the VM system because it resets the pages' object field to NULL instead of the kernel object. In more detail, the jumbo frame zones are created with the option UMA_ZONE_REFCNT. This causes UMA to overwrite the pages' object field with the address of the slab. However, when UMA wants to release these pages, it doesn't know how to restore the object field, so it sets it to NULL. This change teaches UMA how to reset the object field to the kernel object. Crashes reported by: kris Fix tested by: kris Fix discussed with: jeff MFC after: 6 weeks	2008-04-04 18:41:12 +00:00
Alan Cox	24dedba9f5	Eliminate an unnecessary printf() from kmem_suballoc(). The subsequent panic() can be extended to convey the same information.	2008-03-30 20:08:59 +00:00
Jeff Roberson	52481a9a9d	- Use vm_object_reference_locked() directly from vm_object_reference(). This is intended to get rid of vget() consumers who don't wish to acquire a lock. This is functionally the same as calling vref(). vm_object_reference_locked() already uses vref. Discussed with: alc	2008-03-29 07:06:13 +00:00
Konstantin Belousov	91a35e7870	Do not dereference cdev->si_cdevsw, use the dev_refthread() to properly obtain the reference. In particular, this fixes the panic reported in the PR. Remove the comments stating that this needs to be done. PR: kern/119422 MFC after: 1 week	2008-03-20 16:08:42 +00:00
Alan Cox	e5b006ffca	Rename vm_pageq_requeue() to vm_page_requeue() on account of its recent migration to vm/vm_page.c.	2008-03-19 20:24:35 +00:00
Jeff Roberson	374ae2a393	- Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from requiring the per-process spinlock to only requiring the process lock. - Reflect these changes in the proc.h documentation and consumers throughout the kernel. This is a substantial reduction in locking cost for these fields and was made possible by recent changes to threading support.	2008-03-19 06:19:01 +00:00

1 2 3 4 5 ...

2672 Commits